regex-2016.01.10/0000777000000000000000000000000012644552200011346 5ustar 00000000000000regex-2016.01.10/PKG-INFO0000666000000000000000000013161712644552200012454 0ustar 00000000000000Metadata-Version: 1.1 Name: regex Version: 2016.01.10 Summary: Alternative regular expression module, to replace re. Home-page: https://bitbucket.org/mrabarnett/mrab-regex Author: Matthew Barnett Author-email: regex@mrabarnett.plus.com License: Python Software Foundation License Description: Introduction ------------ This new regex implementation is intended eventually to replace Python's current re module implementation. For testing and comparison with the current 're' module the new implementation is in the form of a module called 'regex'. Old vs new behaviour -------------------- This module has 2 behaviours: * **Version 0** behaviour (old behaviour, compatible with the current re module): * Indicated by the ``VERSION0`` or ``V0`` flag, or ``(?V0)`` in the pattern. * Zero-width matches are handled like in the re module: * ``.split`` won't split a string at a zero-width match. * ``.sub`` will advance by one character after a zero-width match. * Inline flags apply to the entire pattern, and they can't be turned off. * Only simple sets are supported. * Case-insensitive matches in Unicode use simple case-folding by default. * **Version 1** behaviour (new behaviour, different from the current re module): * Indicated by the ``VERSION1`` or ``V1`` flag, or ``(?V1)`` in the pattern. * Zero-width matches are handled like in Perl and PCRE: * ``.split`` will split a string at a zero-width match. * ``.sub`` will handle zero-width matches correctly. * Inline flags apply to the end of the group or pattern, and they can be turned off. * Nested sets and set operations are supported. * Case-insensitive matches in Unicode use full case-folding by default. If no version is specified, the regex module will default to ``regex.DEFAULT_VERSION``. In the short term this will be ``VERSION0``, but in the longer term it will be ``VERSION1``. Case-insensitive matches in Unicode ----------------------------------- The regex module supports both simple and full case-folding for case-insensitive matches in Unicode. Use of full case-folding can be turned on using the ``FULLCASE`` or ``F`` flag, or ``(?f)`` in the pattern. Please note that this flag affects how the ``IGNORECASE`` flag works; the ``FULLCASE`` flag itself does not turn on case-insensitive matching. In the version 0 behaviour, the flag is off by default. In the version 1 behaviour, the flag is on by default. Nested sets and set operations ------------------------------ It's not possible to support both simple sets, as used in the re module, and nested sets at the same time because of a difference in the meaning of an unescaped ``"["`` in a set. For example, the pattern ``[[a-z]--[aeiou]]`` is treated in the version 0 behaviour (simple sets, compatible with the re module) as: * Set containing "[" and the letters "a" to "z" * Literal "--" * Set containing letters "a", "e", "i", "o", "u" but in the version 1 behaviour (nested sets, enhanced behaviour) as: * Set which is: * Set containing the letters "a" to "z" * but excluding: * Set containing the letters "a", "e", "i", "o", "u" Version 0 behaviour: only simple sets are supported. Version 1 behaviour: nested sets and set operations are supported. Flags ----- There are 2 kinds of flag: scoped and global. Scoped flags can apply to only part of a pattern and can be turned on or off; global flags apply to the entire pattern and can only be turned on. The scoped flags are: ``FULLCASE``, ``IGNORECASE``, ``MULTILINE``, ``DOTALL``, ``VERBOSE``, ``WORD``. The global flags are: ``ASCII``, ``BESTMATCH``, ``ENHANCEMATCH``, ``LOCALE``, ``POSIX``, ``REVERSE``, ``UNICODE``, ``VERSION0``, ``VERSION1``. If neither the ``ASCII``, ``LOCALE`` nor ``UNICODE`` flag is specified, it will default to ``UNICODE`` if the regex pattern is a Unicode string and ``ASCII`` if it's a bytestring. The ``ENHANCEMATCH`` flag makes fuzzy matching attempt to improve the fit of the next match that it finds. The ``BESTMATCH`` flag makes fuzzy matching search for the best match instead of the next match. Notes on named capture groups ----------------------------- All capture groups have a group number, starting from 1. Groups with the same group name will have the same group number, and groups with a different group name will have a different group number. The same name can be used by more than one group, with later captures 'overwriting' earlier captures. All of the captures of the group will be available from the ``captures`` method of the match object. Group numbers will be reused across different branches of a branch reset, eg. ``(?|(first)|(second))`` has only group 1. If capture groups have different group names then they will, of course, have different group numbers, eg. ``(?|(?Pfirst)|(?Psecond))`` has group 1 ("foo") and group 2 ("bar"). In the regex ``(\s+)(?|(?P[A-Z]+)|(\w+) (?P[0-9]+)`` there are 2 groups: * ``(\s+)`` is group 1. * ``(?P[A-Z]+)`` is group 2, also called "foo". * ``(\w+)`` is group 2 because of the branch reset. * ``(?P[0-9]+)`` is group 2 because it's called "foo". If you want to prevent ``(\w+)`` from being group 2, you need to name it (different name, different group number). Multithreading -------------- The regex module releases the GIL during matching on instances of the built-in (immutable) string classes, enabling other Python threads to run concurrently. It is also possible to force the regex module to release the GIL during matching by calling the matching methods with the keyword argument ``concurrent=True``. The behaviour is undefined if the string changes during matching, so use it *only* when it is guaranteed that that won't happen. Building for 64-bits -------------------- If the source files are built for a 64-bit target then the string positions will also be 64-bit. Unicode ------- This module supports Unicode 8.0. Full Unicode case-folding is supported. Additional features ------------------- The issue numbers relate to the Python bug tracker, except where listed as "Hg issue". * Added support for lookaround in conditional pattern (Hg issue 163) The test of a conditional pattern can now be a lookaround. Examples: .. sourcecode:: python >>> regex.match(r'(?(?=\d)\d+|\w+)', '123abc') >>> regex.match(r'(?(?=\d)\d+|\w+)', 'abc123') This is not quite the same as putting a lookaround in the first branch of a pair of alternatives. Examples: .. sourcecode:: python >>> print(regex.match(r'(?:(?=\d)\d+\b|\w+)', '123abc')) >>> print(regex.match(r'(?(?=\d)\d+\b|\w+)', '123abc')) None In the first example, the lookaround matched, but the remainder of the first branch failed to match, and so the second branch was attempted, whereas in the second example, the lookaround matched, and the first branch failed to match, but the second branch was **not** attempted. * Added POSIX matching (leftmost longest) (Hg issue 150) The POSIX standard for regex is to return the leftmost longest match. This can be turned on using the ``POSIX`` flag (``(?p)``). Examples: .. sourcecode:: python >>> # Normal matching. >>> regex.search(r'Mr|Mrs', 'Mrs') >>> regex.search(r'one(self)?(selfsufficient)?', 'oneselfsufficient') >>> # POSIX matching. >>> regex.search(r'(?p)Mr|Mrs', 'Mrs') >>> regex.search(r'(?p)one(self)?(selfsufficient)?', 'oneselfsufficient') Note that it will take longer to find matches because when it finds a match at a certain position, it won't return that immediately, but will keep looking to see if there's another longer match there. * Added ``(?(DEFINE)...)`` (Hg issue 152) If there's no group called "DEFINE", then ... will be ignored, but any group definitions within it will be available. Examples: .. sourcecode:: python >>> regex.search(r'(?(DEFINE)(?P\d+)(?P\w+))(?&quant) (?&item)', '5 elephants') * Added ``(*PRUNE)``, ``(*SKIP)`` and ``(*FAIL)`` (Hg issue 153) ``(*PRUNE)`` discards the backtracking info up to that point. When used in an atomic group or a lookaround, it won't affect the enclosing pattern. ``(*SKIP)`` is similar to ``(*PRUNE)``, except that it also sets where in the text the next attempt to match will start. When used in an atomic group or a lookaround, it won't affect the enclosing pattern. ``(*FAIL)`` causes immediate backtracking. ``(*F)`` is a permitted abbreviation. * Added ``\K`` (Hg issue 151) Keeps the part of the entire match after the position where ``\K`` occurred; the part before it is discarded. It does not affect what capture groups return. Examples: .. sourcecode:: python >>> m = regex.search(r'(\w\w\K\w\w\w)', 'abcdef') >>> m[0] 'cde' >>> m[1] 'abcde' >>> >>> m = regex.search(r'(?r)(\w\w\K\w\w\w)', 'abcdef') >>> m[0] 'bc' >>> m[1] 'bcdef' * Added capture subscripting for ``expandf`` and ``subf``/``subfn`` (Hg issue 133) **(Python 2.6 and above)** You can now use subscripting to get the captures of a repeated capture group. Examples: .. sourcecode:: python >>> m = regex.match(r"(\w)+", "abc") >>> m.expandf("{1}") 'c' >>> m.expandf("{1[0]} {1[1]} {1[2]}") 'a b c' >>> m.expandf("{1[-1]} {1[-2]} {1[-3]}") 'c b a' >>> >>> m = regex.match(r"(?P\w)+", "abc") >>> m.expandf("{letter}") 'c' >>> m.expandf("{letter[0]} {letter[1]} {letter[2]}") 'a b c' >>> m.expandf("{letter[-1]} {letter[-2]} {letter[-3]}") 'c b a' * Added support for referring to a group by number using ``(?P=...)``. This is in addition to the existing ``\g<...>``. * Fixed the handling of locale-sensitive regexes. The ``LOCALE`` flag is intended for legacy code and has limited support. You're still recommended to use Unicode instead. * Added partial matches (Hg issue 102) A partial match is one that matches up to the end of string, but that string has been truncated and you want to know whether a complete match could be possible if the string had not been truncated. Partial matches are supported by ``match``, ``search``, ``fullmatch`` and ``finditer`` with the ``partial`` keyword argument. Match objects have a ``partial`` attribute, which is ``True`` if it's a partial match. For example, if you wanted a user to enter a 4-digit number and check it character by character as it was being entered: .. sourcecode:: python >>> pattern = regex.compile(r'\d{4}') >>> # Initially, nothing has been entered: >>> print(pattern.fullmatch('', partial=True)) >>> # An empty string is OK, but it's only a partial match. >>> # The user enters a letter: >>> print(pattern.fullmatch('a', partial=True)) None >>> # It'll never match. >>> # The user deletes that and enters a digit: >>> print(pattern.fullmatch('1', partial=True)) >>> # It matches this far, but it's only a partial match. >>> # The user enters 2 more digits: >>> print(pattern.fullmatch('123', partial=True)) >>> # It matches this far, but it's only a partial match. >>> # The user enters another digit: >>> print(pattern.fullmatch('1234', partial=True)) >>> # It's a complete match. >>> # If the user enters another digit: >>> print(pattern.fullmatch('12345', partial=True)) None >>> # It's no longer a match. >>> # This is a partial match: >>> pattern.match('123', partial=True).partial True >>> # This is a complete match: >>> pattern.match('1233', partial=True).partial False * ``*`` operator not working correctly with sub() (Hg issue 106) Sometimes it's not clear how zero-width matches should be handled. For example, should ``.*`` match 0 characters directly after matching >0 characters? Most regex implementations follow the lead of Perl (PCRE), but the re module sometimes doesn't. The Perl behaviour appears to be the most common (and the re module is sometimes definitely wrong), so in version 1 the regex module follows the Perl behaviour, whereas in version 0 it follows the legacy re behaviour. Examples: .. sourcecode:: python >>> # Version 0 behaviour (like re) >>> regex.sub('(?V0).*', 'x', 'test') 'x' >>> regex.sub('(?V0).*?', '|', 'test') '|t|e|s|t|' >>> # Version 1 behaviour (like Perl) >>> regex.sub('(?V1).*', 'x', 'test') 'xx' >>> regex.sub('(?V1).*?', '|', 'test') '|||||||||' * re.group() should never return a bytearray (issue #18468) For compatibility with the re module, the regex module returns all matching bytestrings as ``bytes``, starting from Python 3.4. Examples: .. sourcecode:: python >>> # Python 3.4 and later >>> regex.match(b'.', bytearray(b'a')).group() b'a' >>> # Python 3.1-3.3 >>> regex.match(b'.', bytearray(b'a')).group() bytearray(b'a') * Added ``capturesdict`` (Hg issue 86) ``capturesdict`` is a combination of ``groupdict`` and ``captures``: ``groupdict`` returns a dict of the named groups and the last capture of those groups. ``captures`` returns a list of all the captures of a group ``capturesdict`` returns a dict of the named groups and lists of all the captures of those groups. Examples: .. sourcecode:: python >>> m = regex.match(r"(?:(?P\w+) (?P\d+)\n)+", "one 1\ntwo 2\nthree 3\n") >>> m.groupdict() {'word': 'three', 'digits': '3'} >>> m.captures("word") ['one', 'two', 'three'] >>> m.captures("digits") ['1', '2', '3'] >>> m.capturesdict() {'word': ['one', 'two', 'three'], 'digits': ['1', '2', '3']} * Allow duplicate names of groups (Hg issue 87) Group names can now be duplicated. Examples: .. sourcecode:: python >>> # With optional groups: >>> >>> # Both groups capture, the second capture 'overwriting' the first. >>> m = regex.match(r"(?P\w+)? or (?P\w+)?", "first or second") >>> m.group("item") 'second' >>> m.captures("item") ['first', 'second'] >>> # Only the second group captures. >>> m = regex.match(r"(?P\w+)? or (?P\w+)?", " or second") >>> m.group("item") 'second' >>> m.captures("item") ['second'] >>> # Only the first group captures. >>> m = regex.match(r"(?P\w+)? or (?P\w+)?", "first or ") >>> m.group("item") 'first' >>> m.captures("item") ['first'] >>> >>> # With mandatory groups: >>> >>> # Both groups capture, the second capture 'overwriting' the first. >>> m = regex.match(r"(?P\w*) or (?P\w*)?", "first or second") >>> m.group("item") 'second' >>> m.captures("item") ['first', 'second'] >>> # Again, both groups capture, the second capture 'overwriting' the first. >>> m = regex.match(r"(?P\w*) or (?P\w*)", " or second") >>> m.group("item") 'second' >>> m.captures("item") ['', 'second'] >>> # And yet again, both groups capture, the second capture 'overwriting' the first. >>> m = regex.match(r"(?P\w*) or (?P\w*)", "first or ") >>> m.group("item") '' >>> m.captures("item") ['first', ''] * Added ``fullmatch`` (issue #16203) ``fullmatch`` behaves like ``match``, except that it must match all of the string. Examples: .. sourcecode:: python >>> print(regex.fullmatch(r"abc", "abc").span()) (0, 3) >>> print(regex.fullmatch(r"abc", "abcx")) None >>> print(regex.fullmatch(r"abc", "abcx", endpos=3).span()) (0, 3) >>> print(regex.fullmatch(r"abc", "xabcy", pos=1, endpos=4).span()) (1, 4) >>> >>> regex.match(r"a.*?", "abcd").group(0) 'a' >>> regex.fullmatch(r"a.*?", "abcd").group(0) 'abcd' * Added ``subf`` and ``subfn`` **(Python 2.6 and above)** ``subf`` and ``subfn`` are alternatives to ``sub`` and ``subn`` respectively. When passed a replacement string, they treat it as a format string. Examples: .. sourcecode:: python >>> regex.subf(r"(\w+) (\w+)", "{0} => {2} {1}", "foo bar") 'foo bar => bar foo' >>> regex.subf(r"(?P\w+) (?P\w+)", "{word2} {word1}", "foo bar") 'bar foo' * Added ``expandf`` to match object **(Python 2.6 and above)** ``expandf`` is an alternative to ``expand``. When passed a replacement string, it treats it as a format string. Examples: .. sourcecode:: python >>> m = regex.match(r"(\w+) (\w+)", "foo bar") >>> m.expandf("{0} => {2} {1}") 'foo bar => bar foo' >>> >>> m = regex.match(r"(?P\w+) (?P\w+)", "foo bar") >>> m.expandf("{word2} {word1}") 'bar foo' * Detach searched string A match object contains a reference to the string that was searched, via its ``string`` attribute. The match object now has a ``detach_string`` method that will 'detach' that string, making it available for garbage collection (this might save valuable memory if that string is very large). Example: .. sourcecode:: python >>> m = regex.search(r"\w+", "Hello world") >>> print(m.group()) Hello >>> print(m.string) Hello world >>> m.detach_string() >>> print(m.group()) Hello >>> print(m.string) None * Characters in a group name (issue #14462) A group name can now contain the same characters as an identifier. These are different in Python 2 and Python 3. * Recursive patterns (Hg issue 27) Recursive and repeated patterns are supported. ``(?R)`` or ``(?0)`` tries to match the entire regex recursively. ``(?1)``, ``(?2)``, etc, try to match the relevant capture group. ``(?&name)`` tries to match the named capture group. Examples: .. sourcecode:: python >>> regex.match(r"(Tarzan|Jane) loves (?1)", "Tarzan loves Jane").groups() ('Tarzan',) >>> regex.match(r"(Tarzan|Jane) loves (?1)", "Jane loves Tarzan").groups() ('Jane',) >>> m = regex.search(r"(\w)(?:(?R)|(\w?))\1", "kayak") >>> m.group(0, 1, 2) ('kayak', 'k', None) The first two examples show how the subpattern within the capture group is reused, but is _not_ itself a capture group. In other words, ``"(Tarzan|Jane) loves (?1)"`` is equivalent to ``"(Tarzan|Jane) loves (?:Tarzan|Jane)"``. It's possible to backtrack into a recursed or repeated group. You can't call a group if there is more than one group with that group name or group number (``"ambiguous group reference"``). For example, ``(?P\w+) (?P\w+) (?&foo)?`` has 2 groups called "foo" (both group 1) and ``(?|([A-Z]+)|([0-9]+)) (?1)?`` has 2 groups with group number 1. The alternative forms ``(?P>name)`` and ``(?P&name)`` are also supported. * repr(regex) doesn't include actual regex (issue #13592) The repr of a compiled regex is now in the form of a eval-able string. For example: .. sourcecode:: python >>> r = regex.compile("foo", regex.I) >>> repr(r) "regex.Regex('foo', flags=regex.I | regex.V0)" >>> r regex.Regex('foo', flags=regex.I | regex.V0) The regex module has Regex as an alias for the 'compile' function. * Improve the repr for regular expression match objects (issue #17087) The repr of a match object is now a more useful form. For example: .. sourcecode:: python >>> regex.search(r"\d+", "abc012def") * Python lib re cannot handle Unicode properly due to narrow/wide bug (issue #12729) The source code of the regex module has been updated to support PEP 393 ("Flexible String Representation"), which is new in Python 3.3. * Full Unicode case-folding is supported. In version 1 behaviour, the regex module uses full case-folding when performing case-insensitive matches in Unicode. Examples (in Python 3): .. sourcecode:: python >>> regex.match(r"(?iV1)strasse", "stra\N{LATIN SMALL LETTER SHARP S}e").span() (0, 6) >>> regex.match(r"(?iV1)stra\N{LATIN SMALL LETTER SHARP S}e", "STRASSE").span() (0, 7) In version 0 behaviour, it uses simple case-folding for backward compatibility with the re module. * Approximate "fuzzy" matching (Hg issue 12, Hg issue 41, Hg issue 109) Regex usually attempts an exact match, but sometimes an approximate, or "fuzzy", match is needed, for those cases where the text being searched may contain errors in the form of inserted, deleted or substituted characters. A fuzzy regex specifies which types of errors are permitted, and, optionally, either the minimum and maximum or only the maximum permitted number of each type. (You cannot specify only a minimum.) The 3 types of error are: * Insertion, indicated by "i" * Deletion, indicated by "d" * Substitution, indicated by "s" In addition, "e" indicates any type of error. The fuzziness of a regex item is specified between "{" and "}" after the item. Examples: * ``foo`` match "foo" exactly * ``(?:foo){i}`` match "foo", permitting insertions * ``(?:foo){d}`` match "foo", permitting deletions * ``(?:foo){s}`` match "foo", permitting substitutions * ``(?:foo){i,s}`` match "foo", permitting insertions and substitutions * ``(?:foo){e}`` match "foo", permitting errors If a certain type of error is specified, then any type not specified will **not** be permitted. In the following examples I'll omit the item and write only the fuzziness: * ``{i<=3}`` permit at most 3 insertions, but no other types * ``{d<=3}`` permit at most 3 deletions, but no other types * ``{s<=3}`` permit at most 3 substitutions, but no other types * ``{i<=1,s<=2}`` permit at most 1 insertion and at most 2 substitutions, but no deletions * ``{e<=3}`` permit at most 3 errors * ``{1<=e<=3}`` permit at least 1 and at most 3 errors * ``{i<=2,d<=2,e<=3}`` permit at most 2 insertions, at most 2 deletions, at most 3 errors in total, but no substitutions It's also possible to state the costs of each type of error and the maximum permitted total cost. Examples: * ``{2i+2d+1s<=4}`` each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4 * ``{i<=1,d<=1,s<=1,2i+2d+1s<=4}`` at most 1 insertion, at most 1 deletion, at most 1 substitution; each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4 You can also use "<" instead of "<=" if you want an exclusive minimum or maximum: * ``{e<=3}`` permit up to 3 errors * ``{e<4}`` permit fewer than 4 errors * ``{0>> # A 'raw' fuzzy match: >>> regex.fullmatch(r"(?:cats|cat){e<=1}", "cat").fuzzy_counts (0, 0, 1) >>> # 0 substitutions, 0 insertions, 1 deletion. >>> # A better match might be possible if the ENHANCEMATCH flag used: >>> regex.fullmatch(r"(?e)(?:cats|cat){e<=1}", "cat").fuzzy_counts (0, 0, 0) >>> # 0 substitutions, 0 insertions, 0 deletions. * Named lists (Hg issue 11) ``\L`` There are occasions where you may want to include a list (actually, a set) of options in a regex. One way is to build the pattern like this: .. sourcecode:: python >>> p = regex.compile(r"first|second|third|fourth|fifth") but if the list is large, parsing the resulting regex can take considerable time, and care must also be taken that the strings are properly escaped if they contain any character that has a special meaning in a regex, and that if there is a shorter string that occurs initially in a longer string that the longer string is listed before the shorter one, for example, "cats" before "cat". The new alternative is to use a named list: .. sourcecode:: python >>> option_set = ["first", "second", "third", "fourth", "fifth"] >>> p = regex.compile(r"\L", options=option_set) The order of the items is irrelevant, they are treated as a set. The named lists are available as the ``.named_lists`` attribute of the pattern object : .. sourcecode:: python >>> print(p.named_lists) {'options': frozenset({'second', 'fifth', 'fourth', 'third', 'first'})} * Start and end of word ``\m`` matches at the start of a word. ``\M`` matches at the end of a word. Compare with ``\b``, which matches at the start or end of a word. * Unicode line separators Normally the only line separator is ``\n`` (``\x0A``), but if the ``WORD`` flag is turned on then the line separators are the pair ``\x0D\x0A``, and ``\x0A``, ``\x0B``, ``\x0C`` and ``\x0D``, plus ``\x85``, ``\u2028`` and ``\u2029`` when working with Unicode. This affects the regex dot ``"."``, which, with the ``DOTALL`` flag turned off, matches any character except a line separator. It also affects the line anchors ``^`` and ``$`` (in multiline mode). * Set operators **Version 1 behaviour only** Set operators have been added, and a set ``[...]`` can include nested sets. The operators, in order of increasing precedence, are: * ``||`` for union ("x||y" means "x or y") * ``~~`` (double tilde) for symmetric difference ("x~~y" means "x or y, but not both") * ``&&`` for intersection ("x&&y" means "x and y") * ``--`` (double dash) for difference ("x--y" means "x but not y") Implicit union, ie, simple juxtaposition like in ``[ab]``, has the highest precedence. Thus, ``[ab&&cd]`` is the same as ``[[a||b]&&[c||d]]``. Examples: * ``[ab]`` # Set containing 'a' and 'b' * ``[a-z]`` # Set containing 'a' .. 'z' * ``[[a-z]--[qw]]`` # Set containing 'a' .. 'z', but not 'q' or 'w' * ``[a-z--qw]`` # Same as above * ``[\p{L}--QW]`` # Set containing all letters except 'Q' and 'W' * ``[\p{N}--[0-9]]`` # Set containing all numbers except '0' .. '9' * ``[\p{ASCII}&&\p{Letter}]`` # Set containing all characters which are ASCII and letter * regex.escape (issue #2650) regex.escape has an additional keyword parameter ``special_only``. When True, only 'special' regex characters, such as '?', are escaped. Examples: .. sourcecode:: python >>> regex.escape("foo!?") 'foo\\!\\?' >>> regex.escape("foo!?", special_only=True) 'foo!\\?' * Repeated captures (issue #7132) A match object has additional methods which return information on all the successful matches of a repeated capture group. These methods are: * ``matchobject.captures([group1, ...])`` * Returns a list of the strings matched in a group or groups. Compare with ``matchobject.group([group1, ...])``. * ``matchobject.starts([group])`` * Returns a list of the start positions. Compare with ``matchobject.start([group])``. * ``matchobject.ends([group])`` * Returns a list of the end positions. Compare with ``matchobject.end([group])``. * ``matchobject.spans([group])`` * Returns a list of the spans. Compare with ``matchobject.span([group])``. Examples: .. sourcecode:: python >>> m = regex.search(r"(\w{3})+", "123456789") >>> m.group(1) '789' >>> m.captures(1) ['123', '456', '789'] >>> m.start(1) 6 >>> m.starts(1) [0, 3, 6] >>> m.end(1) 9 >>> m.ends(1) [3, 6, 9] >>> m.span(1) (6, 9) >>> m.spans(1) [(0, 3), (3, 6), (6, 9)] * Atomic grouping (issue #433030) ``(?>...)`` If the following pattern subsequently fails, then the subpattern as a whole will fail. * Possessive quantifiers. ``(?:...)?+`` ; ``(?:...)*+`` ; ``(?:...)++`` ; ``(?:...){min,max}+`` The subpattern is matched up to 'max' times. If the following pattern subsequently fails, then all of the repeated subpatterns will fail as a whole. For example, ``(?:...)++`` is equivalent to ``(?>(?:...)+)``. * Scoped flags (issue #433028) ``(?flags-flags:...)`` The flags will apply only to the subpattern. Flags can be turned on or off. * Inline flags (issue #433024, issue #433027) ``(?flags-flags)`` Version 0 behaviour: the flags apply to the entire pattern, and they can't be turned off. Version 1 behaviour: the flags apply to the end of the group or pattern, and they can be turned on or off. * Repeated repeats (issue #2537) A regex like ``((x|y+)*)*`` will be accepted and will work correctly, but should complete more quickly. * Definition of 'word' character (issue #1693050) The definition of a 'word' character has been expanded for Unicode. It now conforms to the Unicode specification at ``http://www.unicode.org/reports/tr29/``. This applies to ``\w``, ``\W``, ``\b`` and ``\B``. * Groups in lookahead and lookbehind (issue #814253) Groups and group references are permitted in both lookahead and lookbehind. * Variable-length lookbehind A lookbehind can match a variable-length string. * Correct handling of charset with ignore case flag (issue #3511) Ranges within charsets are handled correctly when the ignore-case flag is turned on. * Unmatched group in replacement (issue #1519638) An unmatched group is treated as an empty string in a replacement template. * 'Pathological' patterns (issue #1566086, issue #1662581, issue #1448325, issue #1721518, issue #1297193) 'Pathological' patterns should complete more quickly. * Flags argument for regex.split, regex.sub and regex.subn (issue #3482) ``regex.split``, ``regex.sub`` and ``regex.subn`` support a 'flags' argument. * Pos and endpos arguments for regex.sub and regex.subn ``regex.sub`` and ``regex.subn`` support 'pos' and 'endpos' arguments. * 'Overlapped' argument for regex.findall and regex.finditer ``regex.findall`` and ``regex.finditer`` support an 'overlapped' flag which permits overlapped matches. * Unicode escapes (issue #3665) The Unicode escapes ``\uxxxx`` and ``\Uxxxxxxxx`` are supported. * Large patterns (issue #1160) Patterns can be much larger. * Zero-width match with regex.finditer (issue #1647489) ``regex.finditer`` behaves correctly when it splits at a zero-width match. * Zero-width split with regex.split (issue #3262) Version 0 behaviour: a string won't be split at a zero-width match. Version 1 behaviour: a string will be split at a zero-width match. * Splititer ``regex.splititer`` has been added. It's a generator equivalent of ``regex.split``. * Subscripting for groups A match object accepts access to the captured groups via subscripting and slicing: .. sourcecode:: python >>> m = regex.search(r"(?P.*?)(?P\d+)(?P.*)", "pqr123stu") >>> print m["before"] pqr >>> print m["num"] 123 >>> print m["after"] stu >>> print len(m) 4 >>> print m[:] ('pqr123stu', 'pqr', '123', 'stu') * Named groups Groups can be named with ``(?...)`` as well as the current ``(?P...)``. * Group references Groups can be referenced within a pattern with ``\g``. This also allows there to be more than 99 groups. * Named characters ``\N{name}`` Named characters are supported. (Note: only those known by Python's Unicode database are supported.) * Unicode codepoint properties, including scripts and blocks ``\p{property=value}``; ``\P{property=value}``; ``\p{value}`` ; ``\P{value}`` Many Unicode properties are supported, including blocks and scripts. ``\p{property=value}`` or ``\p{property:value}`` matches a character whose property ``property`` has value ``value``. The inverse of ``\p{property=value}`` is ``\P{property=value}`` or ``\p{^property=value}``. If the short form ``\p{value}`` is used, the properties are checked in the order: ``General_Category``, ``Script``, ``Block``, binary property: * ``Latin``, the 'Latin' script (``Script=Latin``). * ``Cyrillic``, the 'Cyrillic' script (``Script=Cyrillic``). * ``BasicLatin``, the 'BasicLatin' block (``Block=BasicLatin``). * ``Alphabetic``, the 'Alphabetic' binary property (``Alphabetic=Yes``). A short form starting with ``Is`` indicates a script or binary property: * ``IsLatin``, the 'Latin' script (``Script=Latin``). * ``IsCyrillic``, the 'Cyrillic' script (``Script=Cyrillic``). * ``IsAlphabetic``, the 'Alphabetic' binary property (``Alphabetic=Yes``). A short form starting with ``In`` indicates a block property: * ``InBasicLatin``, the 'BasicLatin' block (``Block=BasicLatin``). * ``InCyrillic``, the 'Cyrillic' block (``Block=Cyrillic``). * POSIX character classes ``[[:alpha:]]``; ``[[:^alpha:]]`` POSIX character classes are supported. These are normally treated as an alternative form of ``\p{...}``. The exceptions are ``alnum``, ``digit``, ``punct`` and ``xdigit``, whose definitions are different from those of Unicode. ``[[:alnum:]]`` is equivalent to ``\p{posix_alnum}``. ``[[:digit:]]`` is equivalent to ``\p{posix_digit}``. ``[[:punct:]]`` is equivalent to ``\p{posix_punct}``. ``[[:xdigit:]]`` is equivalent to ``\p{posix_xdigit}``. * Search anchor ``\G`` A search anchor has been added. It matches at the position where each search started/continued and can be used for contiguous matches or in negative variable-length lookbehinds to limit how far back the lookbehind goes: .. sourcecode:: python >>> regex.findall(r"\w{2}", "abcd ef") ['ab', 'cd', 'ef'] >>> regex.findall(r"\G\w{2}", "abcd ef") ['ab', 'cd'] * The search starts at position 0 and matches 2 letters 'ab'. * The search continues at position 2 and matches 2 letters 'cd'. * The search continues at position 4 and fails to match any letters. * The anchor stops the search start position from being advanced, so there are no more results. * Reverse searching Searches can now work backwards: .. sourcecode:: python >>> regex.findall(r".", "abc") ['a', 'b', 'c'] >>> regex.findall(r"(?r).", "abc") ['c', 'b', 'a'] Note: the result of a reverse search is not necessarily the reverse of a forward search: .. sourcecode:: python >>> regex.findall(r"..", "abcde") ['ab', 'cd'] >>> regex.findall(r"(?r)..", "abcde") ['de', 'bc'] * Matching a single grapheme ``\X`` The grapheme matcher is supported. It now conforms to the Unicode specification at ``http://www.unicode.org/reports/tr29/``. * Branch reset ``(?|...|...)`` Capture group numbers will be reused across the alternatives, but groups with different names will have different group numbers. Examples: .. sourcecode:: python >>> regex.match(r"(?|(first)|(second))", "first").groups() ('first',) >>> regex.match(r"(?|(first)|(second))", "second").groups() ('second',) Note that there is only one group. * Default Unicode word boundary The ``WORD`` flag changes the definition of a 'word boundary' to that of a default Unicode word boundary. This applies to ``\b`` and ``\B``. * SRE engine do not release the GIL (issue #1366311) The regex module can release the GIL during matching (see the above section on multithreading). Iterators can be safely shared across threads. Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: Python Software Foundation License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python :: 2.5 Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3.1 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Topic :: Scientific/Engineering :: Information Analysis Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Text Processing Classifier: Topic :: Text Processing :: General regex-2016.01.10/README0000666000000000000000000000076112540615514012235 0ustar 00000000000000regex is an alternative to the re package in the Python standard library. It is intended to act as a drop in replacement, and one day to replace re. regex is supported on Python v2.5 to v2.7 and v3.1 to v3.5. For a full list of features see Python2/Features.rst or Python3/Features.rst. To build and install regex for your default Python run python setup.py install To install regex for a specific version run setup.py with that interpreter, e.g. python3.4 setup.py install regex-2016.01.10/setup.py0000666000000000000000000000352612644552200013066 0ustar 00000000000000#!/usr/bin/env python import os import sys from distutils.core import setup, Extension MAJOR, MINOR = sys.version_info[:2] BASE_DIR = os.path.dirname(os.path.abspath(__file__)) PKG_BASE = 'Python%i' % MAJOR DOCS_DIR = os.path.join(BASE_DIR, 'docs') setup( name='regex', version='2016.01.10', description='Alternative regular expression module, to replace re.', long_description=open(os.path.join(DOCS_DIR, 'Features.rst')).read(), # PyPI does spam protection on email addresses, no need to do it here author='Matthew Barnett', author_email='regex@mrabarnett.plus.com', maintainer='Matthew Barnett', maintainer_email='regex@mrabarnett.plus.com', url='https://bitbucket.org/mrabarnett/mrab-regex', classifiers=[ 'Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', 'License :: OSI Approved :: Python Software Foundation License', 'Operating System :: OS Independent', 'Programming Language :: Python :: 2.5', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3.1', 'Programming Language :: Python :: 3.2', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Topic :: Scientific/Engineering :: Information Analysis', 'Topic :: Software Development :: Libraries :: Python Modules', 'Topic :: Text Processing', 'Topic :: Text Processing :: General', ], license='Python Software Foundation License', py_modules = ['regex', '_regex_core', 'test_regex'], package_dir={'': PKG_BASE}, ext_modules=[Extension('_regex', [os.path.join(PKG_BASE, '_regex.c'), os.path.join(PKG_BASE, '_regex_unicode.c')])], ) regex-2016.01.10/docs/0000777000000000000000000000000012644552200012276 5ustar 00000000000000regex-2016.01.10/docs/Features.html0000666000000000000000000034717612624411313014761 0ustar 00000000000000

Introduction

This new regex implementation is intended eventually to replace Python's current re module implementation.

For testing and comparison with the current 're' module the new implementation is in the form of a module called 'regex'.

Old vs new behaviour

This module has 2 behaviours:

  • Version 0 behaviour (old behaviour, compatible with the current re module):
    • Indicated by the VERSION0 or V0 flag, or (?V0) in the pattern.
    • Zero-width matches are handled like in the re module:
      • .split won't split a string at a zero-width match.
      • .sub will advance by one character after a zero-width match.
    • Inline flags apply to the entire pattern, and they can't be turned off.
    • Only simple sets are supported.
    • Case-insensitive matches in Unicode use simple case-folding by default.
  • Version 1 behaviour (new behaviour, different from the current re module):
    • Indicated by the VERSION1 or V1 flag, or (?V1) in the pattern.
    • Zero-width matches are handled like in Perl and PCRE:
      • .split will split a string at a zero-width match.
      • .sub will handle zero-width matches correctly.
    • Inline flags apply to the end of the group or pattern, and they can be turned off.
    • Nested sets and set operations are supported.
    • Case-insensitive matches in Unicode use full case-folding by default.

If no version is specified, the regex module will default to regex.DEFAULT_VERSION. In the short term this will be VERSION0, but in the longer term it will be VERSION1.

Case-insensitive matches in Unicode

The regex module supports both simple and full case-folding for case-insensitive matches in Unicode. Use of full case-folding can be turned on using the FULLCASE or F flag, or (?f) in the pattern. Please note that this flag affects how the IGNORECASE flag works; the FULLCASE flag itself does not turn on case-insensitive matching.

In the version 0 behaviour, the flag is off by default.

In the version 1 behaviour, the flag is on by default.

Nested sets and set operations

It's not possible to support both simple sets, as used in the re module, and nested sets at the same time because of a difference in the meaning of an unescaped "[" in a set.

For example, the pattern [[a-z]--[aeiou]] is treated in the version 0 behaviour (simple sets, compatible with the re module) as:

  • Set containing "[" and the letters "a" to "z"
  • Literal "--"
  • Set containing letters "a", "e", "i", "o", "u"

but in the version 1 behaviour (nested sets, enhanced behaviour) as:

  • Set which is:
    • Set containing the letters "a" to "z"
  • but excluding:
    • Set containing the letters "a", "e", "i", "o", "u"

Version 0 behaviour: only simple sets are supported.

Version 1 behaviour: nested sets and set operations are supported.

Flags

There are 2 kinds of flag: scoped and global. Scoped flags can apply to only part of a pattern and can be turned on or off; global flags apply to the entire pattern and can only be turned on.

The scoped flags are: FULLCASE, IGNORECASE, MULTILINE, DOTALL, VERBOSE, WORD.

The global flags are: ASCII, BESTMATCH, ENHANCEMATCH, LOCALE, POSIX, REVERSE, UNICODE, VERSION0, VERSION1.

If neither the ASCII, LOCALE nor UNICODE flag is specified, it will default to UNICODE if the regex pattern is a Unicode string and ASCII if it's a bytestring.

The ENHANCEMATCH flag makes fuzzy matching attempt to improve the fit of the next match that it finds.

The BESTMATCH flag makes fuzzy matching search for the best match instead of the next match.

Notes on named capture groups

All capture groups have a group number, starting from 1.

Groups with the same group name will have the same group number, and groups with a different group name will have a different group number.

The same name can be used by more than one group, with later captures 'overwriting' earlier captures. All of the captures of the group will be available from the captures method of the match object.

Group numbers will be reused across different branches of a branch reset, eg. (?|(first)|(second)) has only group 1. If capture groups have different group names then they will, of course, have different group numbers, eg. (?|(?P<foo>first)|(?P<bar>second)) has group 1 ("foo") and group 2 ("bar").

In the regex (\s+)(?|(?P<foo>[A-Z]+)|(\w+) (?P<foo>[0-9]+) there are 2 groups:

  • (\s+) is group 1.
  • (?P<foo>[A-Z]+) is group 2, also called "foo".
  • (\w+) is group 2 because of the branch reset.
  • (?P<foo>[0-9]+) is group 2 because it's called "foo".

If you want to prevent (\w+) from being group 2, you need to name it (different name, different group number).

Multithreading

The regex module releases the GIL during matching on instances of the built-in (immutable) string classes, enabling other Python threads to run concurrently. It is also possible to force the regex module to release the GIL during matching by calling the matching methods with the keyword argument concurrent=True. The behaviour is undefined if the string changes during matching, so use it only when it is guaranteed that that won't happen.

Building for 64-bits

If the source files are built for a 64-bit target then the string positions will also be 64-bit.

Unicode

This module supports Unicode 8.0.

Full Unicode case-folding is supported.

Additional features

The issue numbers relate to the Python bug tracker, except where listed as "Hg issue".

  • Added support for lookaround in conditional pattern (Hg issue 163)

    The test of a conditional pattern can now be a lookaround.

    Examples:

    >>> regex.match(r'(?(?=\d)\d+|\w+)', '123abc')
    <regex.Match object; span=(0, 3), match='123'>
    >>> regex.match(r'(?(?=\d)\d+|\w+)', 'abc123')
    <regex.Match object; span=(0, 6), match='abc123'>
    

    This is not quite the same as putting a lookaround in the first branch of a pair of alternatives.

    Examples:

    >>> print(regex.match(r'(?:(?=\d)\d+\b|\w+)', '123abc'))
    <regex.Match object; span=(0, 6), match='123abc'>
    >>> print(regex.match(r'(?(?=\d)\d+\b|\w+)', '123abc'))
    None
    

    In the first example, the lookaround matched, but the remainder of the first branch failed to match, and so the second branch was attempted, whereas in the second example, the lookaround matched, and the first branch failed to match, but the second branch was not attempted.

  • Added POSIX matching (leftmost longest) (Hg issue 150)

    The POSIX standard for regex is to return the leftmost longest match. This can be turned on using the POSIX flag ((?p)).

    Examples:

    >>> # Normal matching.
    >>> regex.search(r'Mr|Mrs', 'Mrs')
    <regex.Match object; span=(0, 2), match='Mr'>
    >>> regex.search(r'one(self)?(selfsufficient)?', 'oneselfsufficient')
    <regex.Match object; span=(0, 7), match='oneself'>
    >>> # POSIX matching.
    >>> regex.search(r'(?p)Mr|Mrs', 'Mrs')
    <regex.Match object; span=(0, 3), match='Mrs'>
    >>> regex.search(r'(?p)one(self)?(selfsufficient)?', 'oneselfsufficient')
    <regex.Match object; span=(0, 17), match='oneselfsufficient'>
    

    Note that it will take longer to find matches because when it finds a match at a certain position, it won't return that immediately, but will keep looking to see if there's another longer match there.

  • Added (?(DEFINE)...) (Hg issue 152)

    If there's no group called "DEFINE", then ... will be ignored, but any group definitions within it will be available.

    Examples:

    >>> regex.search(r'(?(DEFINE)(?P<quant>\d+)(?P<item>\w+))(?&quant) (?&item)', '5 elephants')
    <regex.Match object; span=(0, 11), match='5 elephants'>
    
  • Added (*PRUNE), (*SKIP) and (*FAIL) (Hg issue 153)

    (*PRUNE) discards the backtracking info up to that point. When used in an atomic group or a lookaround, it won't affect the enclosing pattern.

    (*SKIP) is similar to (*PRUNE), except that it also sets where in the text the next attempt to match will start. When used in an atomic group or a lookaround, it won't affect the enclosing pattern.

    (*FAIL) causes immediate backtracking. (*F) is a permitted abbreviation.

  • Added \K (Hg issue 151)

    Keeps the part of the entire match after the position where \K occurred; the part before it is discarded.

    It does not affect what capture groups return.

    Examples:

    >>> m = regex.search(r'(\w\w\K\w\w\w)', 'abcdef')
    >>> m[0]
    'cde'
    >>> m[1]
    'abcde'
    >>>
    >>> m = regex.search(r'(?r)(\w\w\K\w\w\w)', 'abcdef')
    >>> m[0]
    'bc'
    >>> m[1]
    'bcdef'
    
  • Added capture subscripting for expandf and subf/subfn (Hg issue 133) (Python 2.6 and above)

    You can now use subscripting to get the captures of a repeated capture group.

    Examples:

    >>> m = regex.match(r"(\w)+", "abc")
    >>> m.expandf("{1}")
    'c'
    >>> m.expandf("{1[0]} {1[1]} {1[2]}")
    'a b c'
    >>> m.expandf("{1[-1]} {1[-2]} {1[-3]}")
    'c b a'
    >>>
    >>> m = regex.match(r"(?P<letter>\w)+", "abc")
    >>> m.expandf("{letter}")
    'c'
    >>> m.expandf("{letter[0]} {letter[1]} {letter[2]}")
    'a b c'
    >>> m.expandf("{letter[-1]} {letter[-2]} {letter[-3]}")
    'c b a'
    
  • Added support for referring to a group by number using (?P=...).

    This is in addition to the existing \g<...>.

  • Fixed the handling of locale-sensitive regexes.

    The LOCALE flag is intended for legacy code and has limited support. You're still recommended to use Unicode instead.

  • Added partial matches (Hg issue 102)

    A partial match is one that matches up to the end of string, but that string has been truncated and you want to know whether a complete match could be possible if the string had not been truncated.

    Partial matches are supported by match, search, fullmatch and finditer with the partial keyword argument.

    Match objects have a partial attribute, which is True if it's a partial match.

    For example, if you wanted a user to enter a 4-digit number and check it character by character as it was being entered:

    >>> pattern = regex.compile(r'\d{4}')
    
    >>> # Initially, nothing has been entered:
    >>> print(pattern.fullmatch('', partial=True))
    <regex.Match object; span=(0, 0), match='', partial=True>
    
    >>> # An empty string is OK, but it's only a partial match.
    >>> # The user enters a letter:
    >>> print(pattern.fullmatch('a', partial=True))
    None
    >>> # It'll never match.
    
    >>> # The user deletes that and enters a digit:
    >>> print(pattern.fullmatch('1', partial=True))
    <regex.Match object; span=(0, 1), match='1', partial=True>
    >>> # It matches this far, but it's only a partial match.
    
    >>> # The user enters 2 more digits:
    >>> print(pattern.fullmatch('123', partial=True))
    <regex.Match object; span=(0, 3), match='123', partial=True>
    >>> # It matches this far, but it's only a partial match.
    
    >>> # The user enters another digit:
    >>> print(pattern.fullmatch('1234', partial=True))
    <regex.Match object; span=(0, 4), match='1234'>
    >>> # It's a complete match.
    
    >>> # If the user enters another digit:
    >>> print(pattern.fullmatch('12345', partial=True))
    None
    >>> # It's no longer a match.
    
    >>> # This is a partial match:
    >>> pattern.match('123', partial=True).partial
    True
    
    >>> # This is a complete match:
    >>> pattern.match('1233', partial=True).partial
    False
    
  • * operator not working correctly with sub() (Hg issue 106)

    Sometimes it's not clear how zero-width matches should be handled. For example, should .* match 0 characters directly after matching >0 characters?

    Most regex implementations follow the lead of Perl (PCRE), but the re module sometimes doesn't. The Perl behaviour appears to be the most common (and the re module is sometimes definitely wrong), so in version 1 the regex module follows the Perl behaviour, whereas in version 0 it follows the legacy re behaviour.

    Examples:

    >>> # Version 0 behaviour (like re)
    >>> regex.sub('(?V0).*', 'x', 'test')
    'x'
    >>> regex.sub('(?V0).*?', '|', 'test')
    '|t|e|s|t|'
    
    >>> # Version 1 behaviour (like Perl)
    >>> regex.sub('(?V1).*', 'x', 'test')
    'xx'
    >>> regex.sub('(?V1).*?', '|', 'test')
    '|||||||||'
    
  • re.group() should never return a bytearray (issue #18468)

    For compatibility with the re module, the regex module returns all matching bytestrings as bytes, starting from Python 3.4.

    Examples:

    >>> # Python 3.4 and later
    >>> regex.match(b'.', bytearray(b'a')).group()
    b'a'
    
    >>> # Python 3.1-3.3
    >>> regex.match(b'.', bytearray(b'a')).group()
    bytearray(b'a')
    
  • Added capturesdict (Hg issue 86)

    capturesdict is a combination of groupdict and captures:

    groupdict returns a dict of the named groups and the last capture of those groups.

    captures returns a list of all the captures of a group

    capturesdict returns a dict of the named groups and lists of all the captures of those groups.

    Examples:

    >>> m = regex.match(r"(?:(?P<word>\w+) (?P<digits>\d+)\n)+", "one 1\ntwo 2\nthree 3\n")
    >>> m.groupdict()
    {'word': 'three', 'digits': '3'}
    >>> m.captures("word")
    ['one', 'two', 'three']
    >>> m.captures("digits")
    ['1', '2', '3']
    >>> m.capturesdict()
    {'word': ['one', 'two', 'three'], 'digits': ['1', '2', '3']}
    
  • Allow duplicate names of groups (Hg issue 87)

    Group names can now be duplicated.

    Examples:

    >>> # With optional groups:
    >>>
    >>> # Both groups capture, the second capture 'overwriting' the first.
    >>> m = regex.match(r"(?P<item>\w+)? or (?P<item>\w+)?", "first or second")
    >>> m.group("item")
    'second'
    >>> m.captures("item")
    ['first', 'second']
    >>> # Only the second group captures.
    >>> m = regex.match(r"(?P<item>\w+)? or (?P<item>\w+)?", " or second")
    >>> m.group("item")
    'second'
    >>> m.captures("item")
    ['second']
    >>> # Only the first group captures.
    >>> m = regex.match(r"(?P<item>\w+)? or (?P<item>\w+)?", "first or ")
    >>> m.group("item")
    'first'
    >>> m.captures("item")
    ['first']
    >>>
    >>> # With mandatory groups:
    >>>
    >>> # Both groups capture, the second capture 'overwriting' the first.
    >>> m = regex.match(r"(?P<item>\w*) or (?P<item>\w*)?", "first or second")
    >>> m.group("item")
    'second'
    >>> m.captures("item")
    ['first', 'second']
    >>> # Again, both groups capture, the second capture 'overwriting' the first.
    >>> m = regex.match(r"(?P<item>\w*) or (?P<item>\w*)", " or second")
    >>> m.group("item")
    'second'
    >>> m.captures("item")
    ['', 'second']
    >>> # And yet again, both groups capture, the second capture 'overwriting' the first.
    >>> m = regex.match(r"(?P<item>\w*) or (?P<item>\w*)", "first or ")
    >>> m.group("item")
    ''
    >>> m.captures("item")
    ['first', '']
    
  • Added fullmatch (issue #16203)

    fullmatch behaves like match, except that it must match all of the string.

    Examples:

    >>> print(regex.fullmatch(r"abc", "abc").span())
    (0, 3)
    >>> print(regex.fullmatch(r"abc", "abcx"))
    None
    >>> print(regex.fullmatch(r"abc", "abcx", endpos=3).span())
    (0, 3)
    >>> print(regex.fullmatch(r"abc", "xabcy", pos=1, endpos=4).span())
    (1, 4)
    >>>
    >>> regex.match(r"a.*?", "abcd").group(0)
    'a'
    >>> regex.fullmatch(r"a.*?", "abcd").group(0)
    'abcd'
    
  • Added subf and subfn (Python 2.6 and above)

    subf and subfn are alternatives to sub and subn respectively. When passed a replacement string, they treat it as a format string.

    Examples:

    >>> regex.subf(r"(\w+) (\w+)", "{0} => {2} {1}", "foo bar")
    'foo bar => bar foo'
    >>> regex.subf(r"(?P<word1>\w+) (?P<word2>\w+)", "{word2} {word1}", "foo bar")
    'bar foo'
    
  • Added expandf to match object (Python 2.6 and above)

    expandf is an alternative to expand. When passed a replacement string, it treats it as a format string.

    Examples:

    >>> m = regex.match(r"(\w+) (\w+)", "foo bar")
    >>> m.expandf("{0} => {2} {1}")
    'foo bar => bar foo'
    >>>
    >>> m = regex.match(r"(?P<word1>\w+) (?P<word2>\w+)", "foo bar")
    >>> m.expandf("{word2} {word1}")
    'bar foo'
    
  • Detach searched string

    A match object contains a reference to the string that was searched, via its string attribute. The match object now has a detach_string method that will 'detach' that string, making it available for garbage collection (this might save valuable memory if that string is very large).

    Example:

    >>> m = regex.search(r"\w+", "Hello world")
    >>> print(m.group())
    Hello
    >>> print(m.string)
    Hello world
    >>> m.detach_string()
    >>> print(m.group())
    Hello
    >>> print(m.string)
    None
    
  • Characters in a group name (issue #14462)

    A group name can now contain the same characters as an identifier. These are different in Python 2 and Python 3.

  • Recursive patterns (Hg issue 27)

    Recursive and repeated patterns are supported.

    (?R) or (?0) tries to match the entire regex recursively. (?1), (?2), etc, try to match the relevant capture group.

    (?&name) tries to match the named capture group.

    Examples:

    >>> regex.match(r"(Tarzan|Jane) loves (?1)", "Tarzan loves Jane").groups()
    ('Tarzan',)
    >>> regex.match(r"(Tarzan|Jane) loves (?1)", "Jane loves Tarzan").groups()
    ('Jane',)
    
    >>> m = regex.search(r"(\w)(?:(?R)|(\w?))\1", "kayak")
    >>> m.group(0, 1, 2)
    ('kayak', 'k', None)
    

    The first two examples show how the subpattern within the capture group is reused, but is _not_ itself a capture group. In other words, "(Tarzan|Jane) loves (?1)" is equivalent to "(Tarzan|Jane) loves (?:Tarzan|Jane)".

    It's possible to backtrack into a recursed or repeated group.

    You can't call a group if there is more than one group with that group name or group number ("ambiguous group reference"). For example, (?P<foo>\w+) (?P<foo>\w+) (?&foo)? has 2 groups called "foo" (both group 1) and (?|([A-Z]+)|([0-9]+)) (?1)? has 2 groups with group number 1.

    The alternative forms (?P>name) and (?P&name) are also supported.

  • repr(regex) doesn't include actual regex (issue #13592)

    The repr of a compiled regex is now in the form of a eval-able string. For example:

    >>> r = regex.compile("foo", regex.I)
    >>> repr(r)
    "regex.Regex('foo', flags=regex.I | regex.V0)"
    >>> r
    regex.Regex('foo', flags=regex.I | regex.V0)
    

    The regex module has Regex as an alias for the 'compile' function.

  • Improve the repr for regular expression match objects (issue #17087)

    The repr of a match object is now a more useful form. For example:

    >>> regex.search(r"\d+", "abc012def")
    <regex.Match object; span=(3, 6), match='012'>
    
  • Python lib re cannot handle Unicode properly due to narrow/wide bug (issue #12729)

    The source code of the regex module has been updated to support PEP 393 ("Flexible String Representation"), which is new in Python 3.3.

  • Full Unicode case-folding is supported.

    In version 1 behaviour, the regex module uses full case-folding when performing case-insensitive matches in Unicode.

    Examples (in Python 3):

    >>> regex.match(r"(?iV1)strasse", "stra\N{LATIN SMALL LETTER SHARP S}e").span()
    (0, 6)
    >>> regex.match(r"(?iV1)stra\N{LATIN SMALL LETTER SHARP S}e", "STRASSE").span()
    (0, 7)
    

    In version 0 behaviour, it uses simple case-folding for backward compatibility with the re module.

  • Approximate "fuzzy" matching (Hg issue 12, Hg issue 41, Hg issue 109)

    Regex usually attempts an exact match, but sometimes an approximate, or "fuzzy", match is needed, for those cases where the text being searched may contain errors in the form of inserted, deleted or substituted characters.

    A fuzzy regex specifies which types of errors are permitted, and, optionally, either the minimum and maximum or only the maximum permitted number of each type. (You cannot specify only a minimum.)

    The 3 types of error are:

    • Insertion, indicated by "i"
    • Deletion, indicated by "d"
    • Substitution, indicated by "s"

    In addition, "e" indicates any type of error.

    The fuzziness of a regex item is specified between "{" and "}" after the item.

    Examples:

    • foo match "foo" exactly
    • (?:foo){i} match "foo", permitting insertions
    • (?:foo){d} match "foo", permitting deletions
    • (?:foo){s} match "foo", permitting substitutions
    • (?:foo){i,s} match "foo", permitting insertions and substitutions
    • (?:foo){e} match "foo", permitting errors

    If a certain type of error is specified, then any type not specified will not be permitted.

    In the following examples I'll omit the item and write only the fuzziness:

    • {i<=3} permit at most 3 insertions, but no other types
    • {d<=3} permit at most 3 deletions, but no other types
    • {s<=3} permit at most 3 substitutions, but no other types
    • {i<=1,s<=2} permit at most 1 insertion and at most 2 substitutions, but no deletions
    • {e<=3} permit at most 3 errors
    • {1<=e<=3} permit at least 1 and at most 3 errors
    • {i<=2,d<=2,e<=3} permit at most 2 insertions, at most 2 deletions, at most 3 errors in total, but no substitutions

    It's also possible to state the costs of each type of error and the maximum permitted total cost.

    Examples:

    • {2i+2d+1s<=4} each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4
    • {i<=1,d<=1,s<=1,2i+2d+1s<=4} at most 1 insertion, at most 1 deletion, at most 1 substitution; each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4

    You can also use "<" instead of "<=" if you want an exclusive minimum or maximum:

    • {e<=3} permit up to 3 errors
    • {e<4} permit fewer than 4 errors
    • {0<e<4} permit more than 0 but fewer than 4 errors

    By default, fuzzy matching searches for the first match that meets the given constraints. The ENHANCEMATCH flag will cause it to attempt to improve the fit (i.e. reduce the number of errors) of the match that it has found.

    The BESTMATCH flag will make it search for the best match instead.

    Further examples to note:

    • regex.search("(dog){e}", "cat and dog")[1] returns "cat" because that matches "dog" with 3 errors, which is within the limit (an unlimited number of errors is permitted).
    • regex.search("(dog){e<=1}", "cat and dog")[1] returns " dog" (with a leading space) because that matches "dog" with 1 error, which is within the limit (1 error is permitted).
    • regex.search("(?e)(dog){e<=1}", "cat and dog")[1] returns "dog" (without a leading space) because the fuzzy search matches " dog" with 1 error, which is within the limit (1 error is permitted), and the (?e) then makes it attempt a better fit.

    In the first two examples there are perfect matches later in the string, but in neither case is it the first possible match.

    The match object has an attribute fuzzy_counts which gives the total number of substitutions, insertions and deletions.

    >>> # A 'raw' fuzzy match:
    >>> regex.fullmatch(r"(?:cats|cat){e<=1}", "cat").fuzzy_counts
    (0, 0, 1)
    >>> # 0 substitutions, 0 insertions, 1 deletion.
    
    >>> # A better match might be possible if the ENHANCEMATCH flag used:
    >>> regex.fullmatch(r"(?e)(?:cats|cat){e<=1}", "cat").fuzzy_counts
    (0, 0, 0)
    >>> # 0 substitutions, 0 insertions, 0 deletions.
    
  • Named lists (Hg issue 11)

    \L<name>

    There are occasions where you may want to include a list (actually, a set) of options in a regex.

    One way is to build the pattern like this:

    >>> p = regex.compile(r"first|second|third|fourth|fifth")
    

    but if the list is large, parsing the resulting regex can take considerable time, and care must also be taken that the strings are properly escaped if they contain any character that has a special meaning in a regex, and that if there is a shorter string that occurs initially in a longer string that the longer string is listed before the shorter one, for example, "cats" before "cat".

    The new alternative is to use a named list:

    >>> option_set = ["first", "second", "third", "fourth", "fifth"]
    >>> p = regex.compile(r"\L<options>", options=option_set)
    

    The order of the items is irrelevant, they are treated as a set. The named lists are available as the .named_lists attribute of the pattern object :

    >>> print(p.named_lists)
    {'options': frozenset({'second', 'fifth', 'fourth', 'third', 'first'})}
    
  • Start and end of word

    \m matches at the start of a word.

    \M matches at the end of a word.

    Compare with \b, which matches at the start or end of a word.

  • Unicode line separators

    Normally the only line separator is \n (\x0A), but if the WORD flag is turned on then the line separators are the pair \x0D\x0A, and \x0A, \x0B, \x0C and \x0D, plus \x85, \u2028 and \u2029 when working with Unicode.

    This affects the regex dot ".", which, with the DOTALL flag turned off, matches any character except a line separator. It also affects the line anchors ^ and $ (in multiline mode).

  • Set operators

    Version 1 behaviour only

    Set operators have been added, and a set [...] can include nested sets.

    The operators, in order of increasing precedence, are:

    • || for union ("x||y" means "x or y")
    • ~~ (double tilde) for symmetric difference ("x~~y" means "x or y, but not both")
    • && for intersection ("x&&y" means "x and y")
    • -- (double dash) for difference ("x--y" means "x but not y")

    Implicit union, ie, simple juxtaposition like in [ab], has the highest precedence. Thus, [ab&&cd] is the same as [[a||b]&&[c||d]].

    Examples:

    • [ab] # Set containing 'a' and 'b'
    • [a-z] # Set containing 'a' .. 'z'
    • [[a-z]--[qw]] # Set containing 'a' .. 'z', but not 'q' or 'w'
    • [a-z--qw] # Same as above
    • [\p{L}--QW] # Set containing all letters except 'Q' and 'W'
    • [\p{N}--[0-9]] # Set containing all numbers except '0' .. '9'
    • [\p{ASCII}&&\p{Letter}] # Set containing all characters which are ASCII and letter
  • regex.escape (issue #2650)

    regex.escape has an additional keyword parameter special_only. When True, only 'special' regex characters, such as '?', are escaped.

    Examples:

    >>> regex.escape("foo!?")
    'foo\\!\\?'
    >>> regex.escape("foo!?", special_only=True)
    'foo!\\?'
    
  • Repeated captures (issue #7132)

    A match object has additional methods which return information on all the successful matches of a repeated capture group. These methods are:

    • matchobject.captures([group1, ...])
      • Returns a list of the strings matched in a group or groups. Compare with matchobject.group([group1, ...]).
    • matchobject.starts([group])
      • Returns a list of the start positions. Compare with matchobject.start([group]).
    • matchobject.ends([group])
      • Returns a list of the end positions. Compare with matchobject.end([group]).
    • matchobject.spans([group])
      • Returns a list of the spans. Compare with matchobject.span([group]).

    Examples:

    >>> m = regex.search(r"(\w{3})+", "123456789")
    >>> m.group(1)
    '789'
    >>> m.captures(1)
    ['123', '456', '789']
    >>> m.start(1)
    6
    >>> m.starts(1)
    [0, 3, 6]
    >>> m.end(1)
    9
    >>> m.ends(1)
    [3, 6, 9]
    >>> m.span(1)
    (6, 9)
    >>> m.spans(1)
    [(0, 3), (3, 6), (6, 9)]
    
  • Atomic grouping (issue #433030)

    (?>...)

    If the following pattern subsequently fails, then the subpattern as a whole will fail.

  • Possessive quantifiers.

    (?:...)?+ ; (?:...)*+ ; (?:...)++ ; (?:...){min,max}+

    The subpattern is matched up to 'max' times. If the following pattern subsequently fails, then all of the repeated subpatterns will fail as a whole. For example, (?:...)++ is equivalent to (?>(?:...)+).

  • Scoped flags (issue #433028)

    (?flags-flags:...)

    The flags will apply only to the subpattern. Flags can be turned on or off.

  • Inline flags (issue #433024, issue #433027)

    (?flags-flags)

    Version 0 behaviour: the flags apply to the entire pattern, and they can't be turned off.

    Version 1 behaviour: the flags apply to the end of the group or pattern, and they can be turned on or off.

  • Repeated repeats (issue #2537)

    A regex like ((x|y+)*)* will be accepted and will work correctly, but should complete more quickly.

  • Definition of 'word' character (issue #1693050)

    The definition of a 'word' character has been expanded for Unicode. It now conforms to the Unicode specification at http://www.unicode.org/reports/tr29/. This applies to \w, \W, \b and \B.

  • Groups in lookahead and lookbehind (issue #814253)

    Groups and group references are permitted in both lookahead and lookbehind.

  • Variable-length lookbehind

    A lookbehind can match a variable-length string.

  • Correct handling of charset with ignore case flag (issue #3511)

    Ranges within charsets are handled correctly when the ignore-case flag is turned on.

  • Unmatched group in replacement (issue #1519638)

    An unmatched group is treated as an empty string in a replacement template.

  • 'Pathological' patterns (issue #1566086, issue #1662581, issue #1448325, issue #1721518, issue #1297193)

    'Pathological' patterns should complete more quickly.

  • Flags argument for regex.split, regex.sub and regex.subn (issue #3482)

    regex.split, regex.sub and regex.subn support a 'flags' argument.

  • Pos and endpos arguments for regex.sub and regex.subn

    regex.sub and regex.subn support 'pos' and 'endpos' arguments.

  • 'Overlapped' argument for regex.findall and regex.finditer

    regex.findall and regex.finditer support an 'overlapped' flag which permits overlapped matches.

  • Unicode escapes (issue #3665)

    The Unicode escapes \uxxxx and \Uxxxxxxxx are supported.

  • Large patterns (issue #1160)

    Patterns can be much larger.

  • Zero-width match with regex.finditer (issue #1647489)

    regex.finditer behaves correctly when it splits at a zero-width match.

  • Zero-width split with regex.split (issue #3262)

    Version 0 behaviour: a string won't be split at a zero-width match.

    Version 1 behaviour: a string will be split at a zero-width match.

  • Splititer

    regex.splititer has been added. It's a generator equivalent of regex.split.

  • Subscripting for groups

    A match object accepts access to the captured groups via subscripting and slicing:

    >>> m = regex.search(r"(?P<before>.*?)(?P<num>\d+)(?P<after>.*)", "pqr123stu")
    >>> print m["before"]
    pqr
    >>> print m["num"]
    123
    >>> print m["after"]
    stu
    >>> print len(m)
    4
    >>> print m[:]
    ('pqr123stu', 'pqr', '123', 'stu')
    
  • Named groups

    Groups can be named with (?<name>...) as well as the current (?P<name>...).

  • Group references

    Groups can be referenced within a pattern with \g<name>. This also allows there to be more than 99 groups.

  • Named characters

    \N{name}

    Named characters are supported. (Note: only those known by Python's Unicode database are supported.)

  • Unicode codepoint properties, including scripts and blocks

    \p{property=value}; \P{property=value}; \p{value} ; \P{value}

    Many Unicode properties are supported, including blocks and scripts. \p{property=value} or \p{property:value} matches a character whose property property has value value. The inverse of \p{property=value} is \P{property=value} or \p{^property=value}.

    If the short form \p{value} is used, the properties are checked in the order: General_Category, Script, Block, binary property:

    • Latin, the 'Latin' script (Script=Latin).
    • Cyrillic, the 'Cyrillic' script (Script=Cyrillic).
    • BasicLatin, the 'BasicLatin' block (Block=BasicLatin).
    • Alphabetic, the 'Alphabetic' binary property (Alphabetic=Yes).

    A short form starting with Is indicates a script or binary property:

    • IsLatin, the 'Latin' script (Script=Latin).
    • IsCyrillic, the 'Cyrillic' script (Script=Cyrillic).
    • IsAlphabetic, the 'Alphabetic' binary property (Alphabetic=Yes).

    A short form starting with In indicates a block property:

    • InBasicLatin, the 'BasicLatin' block (Block=BasicLatin).
    • InCyrillic, the 'Cyrillic' block (Block=Cyrillic).
  • POSIX character classes

    [[:alpha:]]; [[:^alpha:]]

    POSIX character classes are supported. These are normally treated as an alternative form of \p{...}.

    The exceptions are alnum, digit, punct and xdigit, whose definitions are different from those of Unicode.

    [[:alnum:]] is equivalent to \p{posix_alnum}.

    [[:digit:]] is equivalent to \p{posix_digit}.

    [[:punct:]] is equivalent to \p{posix_punct}.

    [[:xdigit:]] is equivalent to \p{posix_xdigit}.

  • Search anchor

    \G

    A search anchor has been added. It matches at the position where each search started/continued and can be used for contiguous matches or in negative variable-length lookbehinds to limit how far back the lookbehind goes:

    >>> regex.findall(r"\w{2}", "abcd ef")
    ['ab', 'cd', 'ef']
    >>> regex.findall(r"\G\w{2}", "abcd ef")
    ['ab', 'cd']
    
    • The search starts at position 0 and matches 2 letters 'ab'.
    • The search continues at position 2 and matches 2 letters 'cd'.
    • The search continues at position 4 and fails to match any letters.
    • The anchor stops the search start position from being advanced, so there are no more results.
  • Reverse searching

    Searches can now work backwards:

    >>> regex.findall(r".", "abc")
    ['a', 'b', 'c']
    >>> regex.findall(r"(?r).", "abc")
    ['c', 'b', 'a']
    

    Note: the result of a reverse search is not necessarily the reverse of a forward search:

    >>> regex.findall(r"..", "abcde")
    ['ab', 'cd']
    >>> regex.findall(r"(?r)..", "abcde")
    ['de', 'bc']
    
  • Matching a single grapheme

    \X

    The grapheme matcher is supported. It now conforms to the Unicode specification at http://www.unicode.org/reports/tr29/.

  • Branch reset

    (?|...|...)

    Capture group numbers will be reused across the alternatives, but groups with different names will have different group numbers.

    Examples:

    >>> regex.match(r"(?|(first)|(second))", "first").groups()
    ('first',)
    >>> regex.match(r"(?|(first)|(second))", "second").groups()
    ('second',)
    

    Note that there is only one group.

  • Default Unicode word boundary

    The WORD flag changes the definition of a 'word boundary' to that of a default Unicode word boundary. This applies to \b and \B.

  • SRE engine do not release the GIL (issue #1366311)

    The regex module can release the GIL during matching (see the above section on multithreading).

    Iterators can be safely shared across threads.

regex-2016.01.10/docs/Features.rst0000666000000000000000000010727012624411066014617 0ustar 00000000000000Introduction ------------ This new regex implementation is intended eventually to replace Python's current re module implementation. For testing and comparison with the current 're' module the new implementation is in the form of a module called 'regex'. Old vs new behaviour -------------------- This module has 2 behaviours: * **Version 0** behaviour (old behaviour, compatible with the current re module): * Indicated by the ``VERSION0`` or ``V0`` flag, or ``(?V0)`` in the pattern. * Zero-width matches are handled like in the re module: * ``.split`` won't split a string at a zero-width match. * ``.sub`` will advance by one character after a zero-width match. * Inline flags apply to the entire pattern, and they can't be turned off. * Only simple sets are supported. * Case-insensitive matches in Unicode use simple case-folding by default. * **Version 1** behaviour (new behaviour, different from the current re module): * Indicated by the ``VERSION1`` or ``V1`` flag, or ``(?V1)`` in the pattern. * Zero-width matches are handled like in Perl and PCRE: * ``.split`` will split a string at a zero-width match. * ``.sub`` will handle zero-width matches correctly. * Inline flags apply to the end of the group or pattern, and they can be turned off. * Nested sets and set operations are supported. * Case-insensitive matches in Unicode use full case-folding by default. If no version is specified, the regex module will default to ``regex.DEFAULT_VERSION``. In the short term this will be ``VERSION0``, but in the longer term it will be ``VERSION1``. Case-insensitive matches in Unicode ----------------------------------- The regex module supports both simple and full case-folding for case-insensitive matches in Unicode. Use of full case-folding can be turned on using the ``FULLCASE`` or ``F`` flag, or ``(?f)`` in the pattern. Please note that this flag affects how the ``IGNORECASE`` flag works; the ``FULLCASE`` flag itself does not turn on case-insensitive matching. In the version 0 behaviour, the flag is off by default. In the version 1 behaviour, the flag is on by default. Nested sets and set operations ------------------------------ It's not possible to support both simple sets, as used in the re module, and nested sets at the same time because of a difference in the meaning of an unescaped ``"["`` in a set. For example, the pattern ``[[a-z]--[aeiou]]`` is treated in the version 0 behaviour (simple sets, compatible with the re module) as: * Set containing "[" and the letters "a" to "z" * Literal "--" * Set containing letters "a", "e", "i", "o", "u" but in the version 1 behaviour (nested sets, enhanced behaviour) as: * Set which is: * Set containing the letters "a" to "z" * but excluding: * Set containing the letters "a", "e", "i", "o", "u" Version 0 behaviour: only simple sets are supported. Version 1 behaviour: nested sets and set operations are supported. Flags ----- There are 2 kinds of flag: scoped and global. Scoped flags can apply to only part of a pattern and can be turned on or off; global flags apply to the entire pattern and can only be turned on. The scoped flags are: ``FULLCASE``, ``IGNORECASE``, ``MULTILINE``, ``DOTALL``, ``VERBOSE``, ``WORD``. The global flags are: ``ASCII``, ``BESTMATCH``, ``ENHANCEMATCH``, ``LOCALE``, ``POSIX``, ``REVERSE``, ``UNICODE``, ``VERSION0``, ``VERSION1``. If neither the ``ASCII``, ``LOCALE`` nor ``UNICODE`` flag is specified, it will default to ``UNICODE`` if the regex pattern is a Unicode string and ``ASCII`` if it's a bytestring. The ``ENHANCEMATCH`` flag makes fuzzy matching attempt to improve the fit of the next match that it finds. The ``BESTMATCH`` flag makes fuzzy matching search for the best match instead of the next match. Notes on named capture groups ----------------------------- All capture groups have a group number, starting from 1. Groups with the same group name will have the same group number, and groups with a different group name will have a different group number. The same name can be used by more than one group, with later captures 'overwriting' earlier captures. All of the captures of the group will be available from the ``captures`` method of the match object. Group numbers will be reused across different branches of a branch reset, eg. ``(?|(first)|(second))`` has only group 1. If capture groups have different group names then they will, of course, have different group numbers, eg. ``(?|(?Pfirst)|(?Psecond))`` has group 1 ("foo") and group 2 ("bar"). In the regex ``(\s+)(?|(?P[A-Z]+)|(\w+) (?P[0-9]+)`` there are 2 groups: * ``(\s+)`` is group 1. * ``(?P[A-Z]+)`` is group 2, also called "foo". * ``(\w+)`` is group 2 because of the branch reset. * ``(?P[0-9]+)`` is group 2 because it's called "foo". If you want to prevent ``(\w+)`` from being group 2, you need to name it (different name, different group number). Multithreading -------------- The regex module releases the GIL during matching on instances of the built-in (immutable) string classes, enabling other Python threads to run concurrently. It is also possible to force the regex module to release the GIL during matching by calling the matching methods with the keyword argument ``concurrent=True``. The behaviour is undefined if the string changes during matching, so use it *only* when it is guaranteed that that won't happen. Building for 64-bits -------------------- If the source files are built for a 64-bit target then the string positions will also be 64-bit. Unicode ------- This module supports Unicode 8.0. Full Unicode case-folding is supported. Additional features ------------------- The issue numbers relate to the Python bug tracker, except where listed as "Hg issue". * Added support for lookaround in conditional pattern (Hg issue 163) The test of a conditional pattern can now be a lookaround. Examples: .. sourcecode:: python >>> regex.match(r'(?(?=\d)\d+|\w+)', '123abc') >>> regex.match(r'(?(?=\d)\d+|\w+)', 'abc123') This is not quite the same as putting a lookaround in the first branch of a pair of alternatives. Examples: .. sourcecode:: python >>> print(regex.match(r'(?:(?=\d)\d+\b|\w+)', '123abc')) >>> print(regex.match(r'(?(?=\d)\d+\b|\w+)', '123abc')) None In the first example, the lookaround matched, but the remainder of the first branch failed to match, and so the second branch was attempted, whereas in the second example, the lookaround matched, and the first branch failed to match, but the second branch was **not** attempted. * Added POSIX matching (leftmost longest) (Hg issue 150) The POSIX standard for regex is to return the leftmost longest match. This can be turned on using the ``POSIX`` flag (``(?p)``). Examples: .. sourcecode:: python >>> # Normal matching. >>> regex.search(r'Mr|Mrs', 'Mrs') >>> regex.search(r'one(self)?(selfsufficient)?', 'oneselfsufficient') >>> # POSIX matching. >>> regex.search(r'(?p)Mr|Mrs', 'Mrs') >>> regex.search(r'(?p)one(self)?(selfsufficient)?', 'oneselfsufficient') Note that it will take longer to find matches because when it finds a match at a certain position, it won't return that immediately, but will keep looking to see if there's another longer match there. * Added ``(?(DEFINE)...)`` (Hg issue 152) If there's no group called "DEFINE", then ... will be ignored, but any group definitions within it will be available. Examples: .. sourcecode:: python >>> regex.search(r'(?(DEFINE)(?P\d+)(?P\w+))(?&quant) (?&item)', '5 elephants') * Added ``(*PRUNE)``, ``(*SKIP)`` and ``(*FAIL)`` (Hg issue 153) ``(*PRUNE)`` discards the backtracking info up to that point. When used in an atomic group or a lookaround, it won't affect the enclosing pattern. ``(*SKIP)`` is similar to ``(*PRUNE)``, except that it also sets where in the text the next attempt to match will start. When used in an atomic group or a lookaround, it won't affect the enclosing pattern. ``(*FAIL)`` causes immediate backtracking. ``(*F)`` is a permitted abbreviation. * Added ``\K`` (Hg issue 151) Keeps the part of the entire match after the position where ``\K`` occurred; the part before it is discarded. It does not affect what capture groups return. Examples: .. sourcecode:: python >>> m = regex.search(r'(\w\w\K\w\w\w)', 'abcdef') >>> m[0] 'cde' >>> m[1] 'abcde' >>> >>> m = regex.search(r'(?r)(\w\w\K\w\w\w)', 'abcdef') >>> m[0] 'bc' >>> m[1] 'bcdef' * Added capture subscripting for ``expandf`` and ``subf``/``subfn`` (Hg issue 133) **(Python 2.6 and above)** You can now use subscripting to get the captures of a repeated capture group. Examples: .. sourcecode:: python >>> m = regex.match(r"(\w)+", "abc") >>> m.expandf("{1}") 'c' >>> m.expandf("{1[0]} {1[1]} {1[2]}") 'a b c' >>> m.expandf("{1[-1]} {1[-2]} {1[-3]}") 'c b a' >>> >>> m = regex.match(r"(?P\w)+", "abc") >>> m.expandf("{letter}") 'c' >>> m.expandf("{letter[0]} {letter[1]} {letter[2]}") 'a b c' >>> m.expandf("{letter[-1]} {letter[-2]} {letter[-3]}") 'c b a' * Added support for referring to a group by number using ``(?P=...)``. This is in addition to the existing ``\g<...>``. * Fixed the handling of locale-sensitive regexes. The ``LOCALE`` flag is intended for legacy code and has limited support. You're still recommended to use Unicode instead. * Added partial matches (Hg issue 102) A partial match is one that matches up to the end of string, but that string has been truncated and you want to know whether a complete match could be possible if the string had not been truncated. Partial matches are supported by ``match``, ``search``, ``fullmatch`` and ``finditer`` with the ``partial`` keyword argument. Match objects have a ``partial`` attribute, which is ``True`` if it's a partial match. For example, if you wanted a user to enter a 4-digit number and check it character by character as it was being entered: .. sourcecode:: python >>> pattern = regex.compile(r'\d{4}') >>> # Initially, nothing has been entered: >>> print(pattern.fullmatch('', partial=True)) >>> # An empty string is OK, but it's only a partial match. >>> # The user enters a letter: >>> print(pattern.fullmatch('a', partial=True)) None >>> # It'll never match. >>> # The user deletes that and enters a digit: >>> print(pattern.fullmatch('1', partial=True)) >>> # It matches this far, but it's only a partial match. >>> # The user enters 2 more digits: >>> print(pattern.fullmatch('123', partial=True)) >>> # It matches this far, but it's only a partial match. >>> # The user enters another digit: >>> print(pattern.fullmatch('1234', partial=True)) >>> # It's a complete match. >>> # If the user enters another digit: >>> print(pattern.fullmatch('12345', partial=True)) None >>> # It's no longer a match. >>> # This is a partial match: >>> pattern.match('123', partial=True).partial True >>> # This is a complete match: >>> pattern.match('1233', partial=True).partial False * ``*`` operator not working correctly with sub() (Hg issue 106) Sometimes it's not clear how zero-width matches should be handled. For example, should ``.*`` match 0 characters directly after matching >0 characters? Most regex implementations follow the lead of Perl (PCRE), but the re module sometimes doesn't. The Perl behaviour appears to be the most common (and the re module is sometimes definitely wrong), so in version 1 the regex module follows the Perl behaviour, whereas in version 0 it follows the legacy re behaviour. Examples: .. sourcecode:: python >>> # Version 0 behaviour (like re) >>> regex.sub('(?V0).*', 'x', 'test') 'x' >>> regex.sub('(?V0).*?', '|', 'test') '|t|e|s|t|' >>> # Version 1 behaviour (like Perl) >>> regex.sub('(?V1).*', 'x', 'test') 'xx' >>> regex.sub('(?V1).*?', '|', 'test') '|||||||||' * re.group() should never return a bytearray (issue #18468) For compatibility with the re module, the regex module returns all matching bytestrings as ``bytes``, starting from Python 3.4. Examples: .. sourcecode:: python >>> # Python 3.4 and later >>> regex.match(b'.', bytearray(b'a')).group() b'a' >>> # Python 3.1-3.3 >>> regex.match(b'.', bytearray(b'a')).group() bytearray(b'a') * Added ``capturesdict`` (Hg issue 86) ``capturesdict`` is a combination of ``groupdict`` and ``captures``: ``groupdict`` returns a dict of the named groups and the last capture of those groups. ``captures`` returns a list of all the captures of a group ``capturesdict`` returns a dict of the named groups and lists of all the captures of those groups. Examples: .. sourcecode:: python >>> m = regex.match(r"(?:(?P\w+) (?P\d+)\n)+", "one 1\ntwo 2\nthree 3\n") >>> m.groupdict() {'word': 'three', 'digits': '3'} >>> m.captures("word") ['one', 'two', 'three'] >>> m.captures("digits") ['1', '2', '3'] >>> m.capturesdict() {'word': ['one', 'two', 'three'], 'digits': ['1', '2', '3']} * Allow duplicate names of groups (Hg issue 87) Group names can now be duplicated. Examples: .. sourcecode:: python >>> # With optional groups: >>> >>> # Both groups capture, the second capture 'overwriting' the first. >>> m = regex.match(r"(?P\w+)? or (?P\w+)?", "first or second") >>> m.group("item") 'second' >>> m.captures("item") ['first', 'second'] >>> # Only the second group captures. >>> m = regex.match(r"(?P\w+)? or (?P\w+)?", " or second") >>> m.group("item") 'second' >>> m.captures("item") ['second'] >>> # Only the first group captures. >>> m = regex.match(r"(?P\w+)? or (?P\w+)?", "first or ") >>> m.group("item") 'first' >>> m.captures("item") ['first'] >>> >>> # With mandatory groups: >>> >>> # Both groups capture, the second capture 'overwriting' the first. >>> m = regex.match(r"(?P\w*) or (?P\w*)?", "first or second") >>> m.group("item") 'second' >>> m.captures("item") ['first', 'second'] >>> # Again, both groups capture, the second capture 'overwriting' the first. >>> m = regex.match(r"(?P\w*) or (?P\w*)", " or second") >>> m.group("item") 'second' >>> m.captures("item") ['', 'second'] >>> # And yet again, both groups capture, the second capture 'overwriting' the first. >>> m = regex.match(r"(?P\w*) or (?P\w*)", "first or ") >>> m.group("item") '' >>> m.captures("item") ['first', ''] * Added ``fullmatch`` (issue #16203) ``fullmatch`` behaves like ``match``, except that it must match all of the string. Examples: .. sourcecode:: python >>> print(regex.fullmatch(r"abc", "abc").span()) (0, 3) >>> print(regex.fullmatch(r"abc", "abcx")) None >>> print(regex.fullmatch(r"abc", "abcx", endpos=3).span()) (0, 3) >>> print(regex.fullmatch(r"abc", "xabcy", pos=1, endpos=4).span()) (1, 4) >>> >>> regex.match(r"a.*?", "abcd").group(0) 'a' >>> regex.fullmatch(r"a.*?", "abcd").group(0) 'abcd' * Added ``subf`` and ``subfn`` **(Python 2.6 and above)** ``subf`` and ``subfn`` are alternatives to ``sub`` and ``subn`` respectively. When passed a replacement string, they treat it as a format string. Examples: .. sourcecode:: python >>> regex.subf(r"(\w+) (\w+)", "{0} => {2} {1}", "foo bar") 'foo bar => bar foo' >>> regex.subf(r"(?P\w+) (?P\w+)", "{word2} {word1}", "foo bar") 'bar foo' * Added ``expandf`` to match object **(Python 2.6 and above)** ``expandf`` is an alternative to ``expand``. When passed a replacement string, it treats it as a format string. Examples: .. sourcecode:: python >>> m = regex.match(r"(\w+) (\w+)", "foo bar") >>> m.expandf("{0} => {2} {1}") 'foo bar => bar foo' >>> >>> m = regex.match(r"(?P\w+) (?P\w+)", "foo bar") >>> m.expandf("{word2} {word1}") 'bar foo' * Detach searched string A match object contains a reference to the string that was searched, via its ``string`` attribute. The match object now has a ``detach_string`` method that will 'detach' that string, making it available for garbage collection (this might save valuable memory if that string is very large). Example: .. sourcecode:: python >>> m = regex.search(r"\w+", "Hello world") >>> print(m.group()) Hello >>> print(m.string) Hello world >>> m.detach_string() >>> print(m.group()) Hello >>> print(m.string) None * Characters in a group name (issue #14462) A group name can now contain the same characters as an identifier. These are different in Python 2 and Python 3. * Recursive patterns (Hg issue 27) Recursive and repeated patterns are supported. ``(?R)`` or ``(?0)`` tries to match the entire regex recursively. ``(?1)``, ``(?2)``, etc, try to match the relevant capture group. ``(?&name)`` tries to match the named capture group. Examples: .. sourcecode:: python >>> regex.match(r"(Tarzan|Jane) loves (?1)", "Tarzan loves Jane").groups() ('Tarzan',) >>> regex.match(r"(Tarzan|Jane) loves (?1)", "Jane loves Tarzan").groups() ('Jane',) >>> m = regex.search(r"(\w)(?:(?R)|(\w?))\1", "kayak") >>> m.group(0, 1, 2) ('kayak', 'k', None) The first two examples show how the subpattern within the capture group is reused, but is _not_ itself a capture group. In other words, ``"(Tarzan|Jane) loves (?1)"`` is equivalent to ``"(Tarzan|Jane) loves (?:Tarzan|Jane)"``. It's possible to backtrack into a recursed or repeated group. You can't call a group if there is more than one group with that group name or group number (``"ambiguous group reference"``). For example, ``(?P\w+) (?P\w+) (?&foo)?`` has 2 groups called "foo" (both group 1) and ``(?|([A-Z]+)|([0-9]+)) (?1)?`` has 2 groups with group number 1. The alternative forms ``(?P>name)`` and ``(?P&name)`` are also supported. * repr(regex) doesn't include actual regex (issue #13592) The repr of a compiled regex is now in the form of a eval-able string. For example: .. sourcecode:: python >>> r = regex.compile("foo", regex.I) >>> repr(r) "regex.Regex('foo', flags=regex.I | regex.V0)" >>> r regex.Regex('foo', flags=regex.I | regex.V0) The regex module has Regex as an alias for the 'compile' function. * Improve the repr for regular expression match objects (issue #17087) The repr of a match object is now a more useful form. For example: .. sourcecode:: python >>> regex.search(r"\d+", "abc012def") * Python lib re cannot handle Unicode properly due to narrow/wide bug (issue #12729) The source code of the regex module has been updated to support PEP 393 ("Flexible String Representation"), which is new in Python 3.3. * Full Unicode case-folding is supported. In version 1 behaviour, the regex module uses full case-folding when performing case-insensitive matches in Unicode. Examples (in Python 3): .. sourcecode:: python >>> regex.match(r"(?iV1)strasse", "stra\N{LATIN SMALL LETTER SHARP S}e").span() (0, 6) >>> regex.match(r"(?iV1)stra\N{LATIN SMALL LETTER SHARP S}e", "STRASSE").span() (0, 7) In version 0 behaviour, it uses simple case-folding for backward compatibility with the re module. * Approximate "fuzzy" matching (Hg issue 12, Hg issue 41, Hg issue 109) Regex usually attempts an exact match, but sometimes an approximate, or "fuzzy", match is needed, for those cases where the text being searched may contain errors in the form of inserted, deleted or substituted characters. A fuzzy regex specifies which types of errors are permitted, and, optionally, either the minimum and maximum or only the maximum permitted number of each type. (You cannot specify only a minimum.) The 3 types of error are: * Insertion, indicated by "i" * Deletion, indicated by "d" * Substitution, indicated by "s" In addition, "e" indicates any type of error. The fuzziness of a regex item is specified between "{" and "}" after the item. Examples: * ``foo`` match "foo" exactly * ``(?:foo){i}`` match "foo", permitting insertions * ``(?:foo){d}`` match "foo", permitting deletions * ``(?:foo){s}`` match "foo", permitting substitutions * ``(?:foo){i,s}`` match "foo", permitting insertions and substitutions * ``(?:foo){e}`` match "foo", permitting errors If a certain type of error is specified, then any type not specified will **not** be permitted. In the following examples I'll omit the item and write only the fuzziness: * ``{i<=3}`` permit at most 3 insertions, but no other types * ``{d<=3}`` permit at most 3 deletions, but no other types * ``{s<=3}`` permit at most 3 substitutions, but no other types * ``{i<=1,s<=2}`` permit at most 1 insertion and at most 2 substitutions, but no deletions * ``{e<=3}`` permit at most 3 errors * ``{1<=e<=3}`` permit at least 1 and at most 3 errors * ``{i<=2,d<=2,e<=3}`` permit at most 2 insertions, at most 2 deletions, at most 3 errors in total, but no substitutions It's also possible to state the costs of each type of error and the maximum permitted total cost. Examples: * ``{2i+2d+1s<=4}`` each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4 * ``{i<=1,d<=1,s<=1,2i+2d+1s<=4}`` at most 1 insertion, at most 1 deletion, at most 1 substitution; each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4 You can also use "<" instead of "<=" if you want an exclusive minimum or maximum: * ``{e<=3}`` permit up to 3 errors * ``{e<4}`` permit fewer than 4 errors * ``{0>> # A 'raw' fuzzy match: >>> regex.fullmatch(r"(?:cats|cat){e<=1}", "cat").fuzzy_counts (0, 0, 1) >>> # 0 substitutions, 0 insertions, 1 deletion. >>> # A better match might be possible if the ENHANCEMATCH flag used: >>> regex.fullmatch(r"(?e)(?:cats|cat){e<=1}", "cat").fuzzy_counts (0, 0, 0) >>> # 0 substitutions, 0 insertions, 0 deletions. * Named lists (Hg issue 11) ``\L`` There are occasions where you may want to include a list (actually, a set) of options in a regex. One way is to build the pattern like this: .. sourcecode:: python >>> p = regex.compile(r"first|second|third|fourth|fifth") but if the list is large, parsing the resulting regex can take considerable time, and care must also be taken that the strings are properly escaped if they contain any character that has a special meaning in a regex, and that if there is a shorter string that occurs initially in a longer string that the longer string is listed before the shorter one, for example, "cats" before "cat". The new alternative is to use a named list: .. sourcecode:: python >>> option_set = ["first", "second", "third", "fourth", "fifth"] >>> p = regex.compile(r"\L", options=option_set) The order of the items is irrelevant, they are treated as a set. The named lists are available as the ``.named_lists`` attribute of the pattern object : .. sourcecode:: python >>> print(p.named_lists) {'options': frozenset({'second', 'fifth', 'fourth', 'third', 'first'})} * Start and end of word ``\m`` matches at the start of a word. ``\M`` matches at the end of a word. Compare with ``\b``, which matches at the start or end of a word. * Unicode line separators Normally the only line separator is ``\n`` (``\x0A``), but if the ``WORD`` flag is turned on then the line separators are the pair ``\x0D\x0A``, and ``\x0A``, ``\x0B``, ``\x0C`` and ``\x0D``, plus ``\x85``, ``\u2028`` and ``\u2029`` when working with Unicode. This affects the regex dot ``"."``, which, with the ``DOTALL`` flag turned off, matches any character except a line separator. It also affects the line anchors ``^`` and ``$`` (in multiline mode). * Set operators **Version 1 behaviour only** Set operators have been added, and a set ``[...]`` can include nested sets. The operators, in order of increasing precedence, are: * ``||`` for union ("x||y" means "x or y") * ``~~`` (double tilde) for symmetric difference ("x~~y" means "x or y, but not both") * ``&&`` for intersection ("x&&y" means "x and y") * ``--`` (double dash) for difference ("x--y" means "x but not y") Implicit union, ie, simple juxtaposition like in ``[ab]``, has the highest precedence. Thus, ``[ab&&cd]`` is the same as ``[[a||b]&&[c||d]]``. Examples: * ``[ab]`` # Set containing 'a' and 'b' * ``[a-z]`` # Set containing 'a' .. 'z' * ``[[a-z]--[qw]]`` # Set containing 'a' .. 'z', but not 'q' or 'w' * ``[a-z--qw]`` # Same as above * ``[\p{L}--QW]`` # Set containing all letters except 'Q' and 'W' * ``[\p{N}--[0-9]]`` # Set containing all numbers except '0' .. '9' * ``[\p{ASCII}&&\p{Letter}]`` # Set containing all characters which are ASCII and letter * regex.escape (issue #2650) regex.escape has an additional keyword parameter ``special_only``. When True, only 'special' regex characters, such as '?', are escaped. Examples: .. sourcecode:: python >>> regex.escape("foo!?") 'foo\\!\\?' >>> regex.escape("foo!?", special_only=True) 'foo!\\?' * Repeated captures (issue #7132) A match object has additional methods which return information on all the successful matches of a repeated capture group. These methods are: * ``matchobject.captures([group1, ...])`` * Returns a list of the strings matched in a group or groups. Compare with ``matchobject.group([group1, ...])``. * ``matchobject.starts([group])`` * Returns a list of the start positions. Compare with ``matchobject.start([group])``. * ``matchobject.ends([group])`` * Returns a list of the end positions. Compare with ``matchobject.end([group])``. * ``matchobject.spans([group])`` * Returns a list of the spans. Compare with ``matchobject.span([group])``. Examples: .. sourcecode:: python >>> m = regex.search(r"(\w{3})+", "123456789") >>> m.group(1) '789' >>> m.captures(1) ['123', '456', '789'] >>> m.start(1) 6 >>> m.starts(1) [0, 3, 6] >>> m.end(1) 9 >>> m.ends(1) [3, 6, 9] >>> m.span(1) (6, 9) >>> m.spans(1) [(0, 3), (3, 6), (6, 9)] * Atomic grouping (issue #433030) ``(?>...)`` If the following pattern subsequently fails, then the subpattern as a whole will fail. * Possessive quantifiers. ``(?:...)?+`` ; ``(?:...)*+`` ; ``(?:...)++`` ; ``(?:...){min,max}+`` The subpattern is matched up to 'max' times. If the following pattern subsequently fails, then all of the repeated subpatterns will fail as a whole. For example, ``(?:...)++`` is equivalent to ``(?>(?:...)+)``. * Scoped flags (issue #433028) ``(?flags-flags:...)`` The flags will apply only to the subpattern. Flags can be turned on or off. * Inline flags (issue #433024, issue #433027) ``(?flags-flags)`` Version 0 behaviour: the flags apply to the entire pattern, and they can't be turned off. Version 1 behaviour: the flags apply to the end of the group or pattern, and they can be turned on or off. * Repeated repeats (issue #2537) A regex like ``((x|y+)*)*`` will be accepted and will work correctly, but should complete more quickly. * Definition of 'word' character (issue #1693050) The definition of a 'word' character has been expanded for Unicode. It now conforms to the Unicode specification at ``http://www.unicode.org/reports/tr29/``. This applies to ``\w``, ``\W``, ``\b`` and ``\B``. * Groups in lookahead and lookbehind (issue #814253) Groups and group references are permitted in both lookahead and lookbehind. * Variable-length lookbehind A lookbehind can match a variable-length string. * Correct handling of charset with ignore case flag (issue #3511) Ranges within charsets are handled correctly when the ignore-case flag is turned on. * Unmatched group in replacement (issue #1519638) An unmatched group is treated as an empty string in a replacement template. * 'Pathological' patterns (issue #1566086, issue #1662581, issue #1448325, issue #1721518, issue #1297193) 'Pathological' patterns should complete more quickly. * Flags argument for regex.split, regex.sub and regex.subn (issue #3482) ``regex.split``, ``regex.sub`` and ``regex.subn`` support a 'flags' argument. * Pos and endpos arguments for regex.sub and regex.subn ``regex.sub`` and ``regex.subn`` support 'pos' and 'endpos' arguments. * 'Overlapped' argument for regex.findall and regex.finditer ``regex.findall`` and ``regex.finditer`` support an 'overlapped' flag which permits overlapped matches. * Unicode escapes (issue #3665) The Unicode escapes ``\uxxxx`` and ``\Uxxxxxxxx`` are supported. * Large patterns (issue #1160) Patterns can be much larger. * Zero-width match with regex.finditer (issue #1647489) ``regex.finditer`` behaves correctly when it splits at a zero-width match. * Zero-width split with regex.split (issue #3262) Version 0 behaviour: a string won't be split at a zero-width match. Version 1 behaviour: a string will be split at a zero-width match. * Splititer ``regex.splititer`` has been added. It's a generator equivalent of ``regex.split``. * Subscripting for groups A match object accepts access to the captured groups via subscripting and slicing: .. sourcecode:: python >>> m = regex.search(r"(?P.*?)(?P\d+)(?P.*)", "pqr123stu") >>> print m["before"] pqr >>> print m["num"] 123 >>> print m["after"] stu >>> print len(m) 4 >>> print m[:] ('pqr123stu', 'pqr', '123', 'stu') * Named groups Groups can be named with ``(?...)`` as well as the current ``(?P...)``. * Group references Groups can be referenced within a pattern with ``\g``. This also allows there to be more than 99 groups. * Named characters ``\N{name}`` Named characters are supported. (Note: only those known by Python's Unicode database are supported.) * Unicode codepoint properties, including scripts and blocks ``\p{property=value}``; ``\P{property=value}``; ``\p{value}`` ; ``\P{value}`` Many Unicode properties are supported, including blocks and scripts. ``\p{property=value}`` or ``\p{property:value}`` matches a character whose property ``property`` has value ``value``. The inverse of ``\p{property=value}`` is ``\P{property=value}`` or ``\p{^property=value}``. If the short form ``\p{value}`` is used, the properties are checked in the order: ``General_Category``, ``Script``, ``Block``, binary property: * ``Latin``, the 'Latin' script (``Script=Latin``). * ``Cyrillic``, the 'Cyrillic' script (``Script=Cyrillic``). * ``BasicLatin``, the 'BasicLatin' block (``Block=BasicLatin``). * ``Alphabetic``, the 'Alphabetic' binary property (``Alphabetic=Yes``). A short form starting with ``Is`` indicates a script or binary property: * ``IsLatin``, the 'Latin' script (``Script=Latin``). * ``IsCyrillic``, the 'Cyrillic' script (``Script=Cyrillic``). * ``IsAlphabetic``, the 'Alphabetic' binary property (``Alphabetic=Yes``). A short form starting with ``In`` indicates a block property: * ``InBasicLatin``, the 'BasicLatin' block (``Block=BasicLatin``). * ``InCyrillic``, the 'Cyrillic' block (``Block=Cyrillic``). * POSIX character classes ``[[:alpha:]]``; ``[[:^alpha:]]`` POSIX character classes are supported. These are normally treated as an alternative form of ``\p{...}``. The exceptions are ``alnum``, ``digit``, ``punct`` and ``xdigit``, whose definitions are different from those of Unicode. ``[[:alnum:]]`` is equivalent to ``\p{posix_alnum}``. ``[[:digit:]]`` is equivalent to ``\p{posix_digit}``. ``[[:punct:]]`` is equivalent to ``\p{posix_punct}``. ``[[:xdigit:]]`` is equivalent to ``\p{posix_xdigit}``. * Search anchor ``\G`` A search anchor has been added. It matches at the position where each search started/continued and can be used for contiguous matches or in negative variable-length lookbehinds to limit how far back the lookbehind goes: .. sourcecode:: python >>> regex.findall(r"\w{2}", "abcd ef") ['ab', 'cd', 'ef'] >>> regex.findall(r"\G\w{2}", "abcd ef") ['ab', 'cd'] * The search starts at position 0 and matches 2 letters 'ab'. * The search continues at position 2 and matches 2 letters 'cd'. * The search continues at position 4 and fails to match any letters. * The anchor stops the search start position from being advanced, so there are no more results. * Reverse searching Searches can now work backwards: .. sourcecode:: python >>> regex.findall(r".", "abc") ['a', 'b', 'c'] >>> regex.findall(r"(?r).", "abc") ['c', 'b', 'a'] Note: the result of a reverse search is not necessarily the reverse of a forward search: .. sourcecode:: python >>> regex.findall(r"..", "abcde") ['ab', 'cd'] >>> regex.findall(r"(?r)..", "abcde") ['de', 'bc'] * Matching a single grapheme ``\X`` The grapheme matcher is supported. It now conforms to the Unicode specification at ``http://www.unicode.org/reports/tr29/``. * Branch reset ``(?|...|...)`` Capture group numbers will be reused across the alternatives, but groups with different names will have different group numbers. Examples: .. sourcecode:: python >>> regex.match(r"(?|(first)|(second))", "first").groups() ('first',) >>> regex.match(r"(?|(first)|(second))", "second").groups() ('second',) Note that there is only one group. * Default Unicode word boundary The ``WORD`` flag changes the definition of a 'word boundary' to that of a default Unicode word boundary. This applies to ``\b`` and ``\B``. * SRE engine do not release the GIL (issue #1366311) The regex module can release the GIL during matching (see the above section on multithreading). Iterators can be safely shared across threads. regex-2016.01.10/docs/UnicodeProperties.txt0000666000000000000000000005346112540663542016522 0ustar 00000000000000The following is a list of the 81 properties which are supported by this module: Alphabetic [Alpha] No [F, False, N] Yes [T, True, Y] Alphanumeric [AlNum] No [F, False, N] Yes [T, True, Y] Any No [F, False, N] Yes [T, True, Y] ASCII_Hex_Digit [AHex] No [F, False, N] Yes [T, True, Y] Bidi_Class [bc] Arabic_Letter [AL] Arabic_Number [AN] Boundary_Neutral [BN] Common_Separator [CS] European_Number [EN] European_Separator [ES] European_Terminator [ET] First_Strong_Isolate [FSI] Left_To_Right [L] Left_To_Right_Embedding [LRE] Left_To_Right_Isolate [LRI] Left_To_Right_Override [LRO] Nonspacing_Mark [NSM] Other_Neutral [ON] Paragraph_Separator [B] Pop_Directional_Format [PDF] Pop_Directional_Isolate [PDI] Right_To_Left [R] Right_To_Left_Embedding [RLE] Right_To_Left_Isolate [RLI] Right_To_Left_Override [RLO] Segment_Separator [S] White_Space [WS] Bidi_Control [Bidi_C] No [F, False, N] Yes [T, True, Y] Bidi_Mirrored [Bidi_M] No [F, False, N] Yes [T, True, Y] Blank No [F, False, N] Yes [T, True, Y] Block [blk] Aegean_Numbers Ahom Alchemical_Symbols [Alchemical] Alphabetic_Presentation_Forms [Alphabetic_PF] Anatolian_Hieroglyphs Ancient_Greek_Musical_Notation [Ancient_Greek_Music] Ancient_Greek_Numbers Ancient_Symbols Arabic Arabic_Extended_A [Arabic_Ext_A] Arabic_Mathematical_Alphabetic_Symbols [Arabic_Math] Arabic_Presentation_Forms_A [Arabic_PF_A] Arabic_Presentation_Forms_B [Arabic_PF_B] Arabic_Supplement [Arabic_Sup] Armenian Arrows Avestan Balinese Bamum Bamum_Supplement [Bamum_Sup] Basic_Latin [ASCII] Bassa_Vah Batak Bengali Block_Elements Bopomofo Bopomofo_Extended [Bopomofo_Ext] Box_Drawing Brahmi Braille_Patterns [Braille] Buginese Buhid Byzantine_Musical_Symbols [Byzantine_Music] Carian Caucasian_Albanian Chakma Cham Cherokee Cherokee_Supplement [Cherokee_Sup] CJK_Compatibility [CJK_Compat] CJK_Compatibility_Forms [CJK_Compat_Forms] CJK_Compatibility_Ideographs [CJK_Compat_Ideographs] CJK_Compatibility_Ideographs_Supplement [CJK_Compat_Ideographs_Sup] CJK_Radicals_Supplement [CJK_Radicals_Sup] CJK_Strokes CJK_Symbols_And_Punctuation [CJK_Symbols] CJK_Unified_Ideographs [CJK] CJK_Unified_Ideographs_Extension_A [CJK_Ext_A] CJK_Unified_Ideographs_Extension_B [CJK_Ext_B] CJK_Unified_Ideographs_Extension_C [CJK_Ext_C] CJK_Unified_Ideographs_Extension_D [CJK_Ext_D] CJK_Unified_Ideographs_Extension_E [CJK_Ext_E] Combining_Diacritical_Marks [Diacriticals] Combining_Diacritical_Marks_Extended [Diacriticals_Ext] Combining_Diacritical_Marks_For_Symbols [Combining_Marks_For_Symbols, Diacriticals_For_Symbols] Combining_Diacritical_Marks_Supplement [Diacriticals_Sup] Combining_Half_Marks [Half_Marks] Common_Indic_Number_Forms [Indic_Number_Forms] Control_Pictures Coptic Coptic_Epact_Numbers Counting_Rod_Numerals [Counting_Rod] Cuneiform Cuneiform_Numbers_And_Punctuation [Cuneiform_Numbers] Currency_Symbols Cypriot_Syllabary Cyrillic Cyrillic_Extended_A [Cyrillic_Ext_A] Cyrillic_Extended_B [Cyrillic_Ext_B] Cyrillic_Supplement [Cyrillic_Sup, Cyrillic_Supplementary] Deseret Devanagari Devanagari_Extended [Devanagari_Ext] Dingbats Domino_Tiles [Domino] Duployan Early_Dynastic_Cuneiform Egyptian_Hieroglyphs Elbasan Emoticons Enclosed_Alphanumerics [Enclosed_Alphanum] Enclosed_Alphanumeric_Supplement [Enclosed_Alphanum_Sup] Enclosed_CJK_Letters_And_Months [Enclosed_CJK] Enclosed_Ideographic_Supplement [Enclosed_Ideographic_Sup] Ethiopic Ethiopic_Extended [Ethiopic_Ext] Ethiopic_Extended_A [Ethiopic_Ext_A] Ethiopic_Supplement [Ethiopic_Sup] General_Punctuation [Punctuation] Geometric_Shapes Geometric_Shapes_Extended [Geometric_Shapes_Ext] Georgian Georgian_Supplement [Georgian_Sup] Glagolitic Gothic Grantha Greek_And_Coptic [Greek] Greek_Extended [Greek_Ext] Gujarati Gurmukhi Halfwidth_And_Fullwidth_Forms [Half_And_Full_Forms] Hangul_Compatibility_Jamo [Compat_Jamo] Hangul_Jamo [Jamo] Hangul_Jamo_Extended_A [Jamo_Ext_A] Hangul_Jamo_Extended_B [Jamo_Ext_B] Hangul_Syllables [Hangul] Hanunoo Hatran Hebrew High_Private_Use_Surrogates [High_PU_Surrogates] High_Surrogates Hiragana Ideographic_Description_Characters [IDC] Imperial_Aramaic Inscriptional_Pahlavi Inscriptional_Parthian IPA_Extensions [IPA_Ext] Javanese Kaithi Kana_Supplement [Kana_Sup] Kanbun Kangxi_Radicals [Kangxi] Kannada Katakana Katakana_Phonetic_Extensions [Katakana_Ext] Kayah_Li Kharoshthi Khmer Khmer_Symbols Khojki Khudawadi Lao Latin_1_Supplement [Latin_1, Latin_1_Sup] Latin_Extended_A [Latin_Ext_A] Latin_Extended_Additional [Latin_Ext_Additional] Latin_Extended_B [Latin_Ext_B] Latin_Extended_C [Latin_Ext_C] Latin_Extended_D [Latin_Ext_D] Latin_Extended_E [Latin_Ext_E] Lepcha Letterlike_Symbols Limbu Linear_A Linear_B_Ideograms Linear_B_Syllabary Lisu Low_Surrogates Lycian Lydian Mahajani Mahjong_Tiles [Mahjong] Malayalam Mandaic Manichaean Mathematical_Alphanumeric_Symbols [Math_Alphanum] Mathematical_Operators [Math_Operators] Meetei_Mayek Meetei_Mayek_Extensions [Meetei_Mayek_Ext] Mende_Kikakui Meroitic_Cursive Meroitic_Hieroglyphs Miao Miscellaneous_Mathematical_Symbols_A [Misc_Math_Symbols_A] Miscellaneous_Mathematical_Symbols_B [Misc_Math_Symbols_B] Miscellaneous_Symbols [Misc_Symbols] Miscellaneous_Symbols_And_Arrows [Misc_Arrows] Miscellaneous_Symbols_And_Pictographs [Misc_Pictographs] Miscellaneous_Technical [Misc_Technical] Modi Modifier_Tone_Letters Mongolian Mro Multani Musical_Symbols [Music] Myanmar Myanmar_Extended_A [Myanmar_Ext_A] Myanmar_Extended_B [Myanmar_Ext_B] Nabataean New_Tai_Lue NKo No_Block [NB] Number_Forms Ogham Old_Hungarian Old_Italic Old_North_Arabian Old_Permic Old_Persian Old_South_Arabian Old_Turkic Ol_Chiki Optical_Character_Recognition [OCR] Oriya Ornamental_Dingbats Osmanya Pahawh_Hmong Palmyrene Pau_Cin_Hau Phags_Pa Phaistos_Disc [Phaistos] Phoenician Phonetic_Extensions [Phonetic_Ext] Phonetic_Extensions_Supplement [Phonetic_Ext_Sup] Playing_Cards Private_Use_Area [Private_Use, PUA] Psalter_Pahlavi Rejang Rumi_Numeral_Symbols [Rumi] Runic Samaritan Saurashtra Sharada Shavian Shorthand_Format_Controls Siddham Sinhala Sinhala_Archaic_Numbers Small_Form_Variants [Small_Forms] Sora_Sompeng Spacing_Modifier_Letters [Modifier_Letters] Specials Sundanese Sundanese_Supplement [Sundanese_Sup] Superscripts_And_Subscripts [Super_And_Sub] Supplemental_Arrows_A [Sup_Arrows_A] Supplemental_Arrows_B [Sup_Arrows_B] Supplemental_Arrows_C [Sup_Arrows_C] Supplemental_Mathematical_Operators [Sup_Math_Operators] Supplemental_Punctuation [Sup_Punctuation] Supplemental_Symbols_And_Pictographs [Sup_Symbols_And_Pictographs] Supplementary_Private_Use_Area_A [Sup_PUA_A] Supplementary_Private_Use_Area_B [Sup_PUA_B] Sutton_SignWriting Syloti_Nagri Syriac Tagalog Tagbanwa Tags Tai_Le Tai_Tham Tai_Viet Tai_Xuan_Jing_Symbols [Tai_Xuan_Jing] Takri Tamil Telugu Thaana Thai Tibetan Tifinagh Tirhuta Transport_And_Map_Symbols [Transport_And_Map] Ugaritic Unified_Canadian_Aboriginal_Syllabics [Canadian_Syllabics, UCAS] Unified_Canadian_Aboriginal_Syllabics_Extended [UCAS_Ext] Vai Variation_Selectors [VS] Variation_Selectors_Supplement [VS_Sup] Vedic_Extensions [Vedic_Ext] Vertical_Forms Warang_Citi Yijing_Hexagram_Symbols [Yijing] Yi_Radicals Yi_Syllables Canonical_Combining_Class [ccc] Above [230, A] Above_Left [228, AL] Above_Right [232, AR] Attached_Above [214, ATA] Attached_Above_Right [216, ATAR] Attached_Below [202, ATB] Attached_Below_Left [200, ATBL] Below [220, B] Below_Left [218, BL] Below_Right [222, BR] CCC10 [10] CCC103 [103] CCC107 [107] CCC11 [11] CCC118 [118] CCC12 [12] CCC122 [122] CCC129 [129] CCC13 [13] CCC130 [130] CCC132 [132] CCC133 [133] CCC14 [14] CCC15 [15] CCC16 [16] CCC17 [17] CCC18 [18] CCC19 [19] CCC20 [20] CCC21 [21] CCC22 [22] CCC23 [23] CCC24 [24] CCC25 [25] CCC26 [26] CCC27 [27] CCC28 [28] CCC29 [29] CCC30 [30] CCC31 [31] CCC32 [32] CCC33 [33] CCC34 [34] CCC35 [35] CCC36 [36] CCC84 [84] CCC91 [91] Double_Above [234, DA] Double_Below [233, DB] Iota_Subscript [240, IS] Kana_Voicing [8, KV] Left [224, L] Not_Reordered [0, NR] Nukta [7, NK] Overlay [1, OV] Right [226, R] Virama [9, VR] Cased No [F, False, N] Yes [T, True, Y] Case_Ignorable [CI] No [F, False, N] Yes [T, True, Y] Changes_When_Casefolded [CWCF] No [F, False, N] Yes [T, True, Y] Changes_When_Casemapped [CWCM] No [F, False, N] Yes [T, True, Y] Changes_When_Lowercased [CWL] No [F, False, N] Yes [T, True, Y] Changes_When_Titlecased [CWT] No [F, False, N] Yes [T, True, Y] Changes_When_Uppercased [CWU] No [F, False, N] Yes [T, True, Y] Dash No [F, False, N] Yes [T, True, Y] Decomposition_Type [dt] Canonical [Can] Circle [Enc] Compat [Com] Final [Fin] Font Fraction [Fra] Initial [Init] Isolated [Iso] Medial [Med] Narrow [Nar] Nobreak [Nb] None Small [Sml] Square [Sqr] Sub Super [Sup] Vertical [Vert] Wide Default_Ignorable_Code_Point [DI] No [F, False, N] Yes [T, True, Y] Deprecated [Dep] No [F, False, N] Yes [T, True, Y] Diacritic [Dia] No [F, False, N] Yes [T, True, Y] East_Asian_Width [ea] Ambiguous [A] Fullwidth [F] Halfwidth [H] Narrow [Na] Neutral [N] Wide [W] Extender [Ext] No [F, False, N] Yes [T, True, Y] General_Category [gc] Assigned Cased_Letter [LC] Close_Punctuation [Pe] Connector_Punctuation [Pc] Control [Cc, cntrl] Currency_Symbol [Sc] Dash_Punctuation [Pd] Decimal_Number [digit, Nd] Enclosing_Mark [Me] Final_Punctuation [Pf] Format [Cf] Initial_Punctuation [Pi] Letter [L, L&] Letter_Number [Nl] Line_Separator [Zl] Lowercase_Letter [Ll] Mark [Combining_Mark, M, M&] Math_Symbol [Sm] Modifier_Letter [Lm] Modifier_Symbol [Sk] Nonspacing_Mark [Mn] Number [N, N&] Open_Punctuation [Ps] Other [C, C&] Other_Letter [Lo] Other_Number [No] Other_Punctuation [Po] Other_Symbol [So] Paragraph_Separator [Zp] Private_Use [Co] Punctuation [P, P&, punct] Separator [Z, Z&] Space_Separator [Zs] Spacing_Mark [Mc] Surrogate [Cs] Symbol [S, S&] Titlecase_Letter [Lt] Unassigned [Cn] Uppercase_Letter [Lu] Graph No [F, False, N] Yes [T, True, Y] Grapheme_Base [Gr_Base] No [F, False, N] Yes [T, True, Y] Grapheme_Cluster_Break [GCB] Control [CN] CR Extend [EX] L LF LV LVT Other [XX] Prepend [PP] Regional_Indicator [RI] SpacingMark [SM] T V Grapheme_Extend [Gr_Ext] No [F, False, N] Yes [T, True, Y] Grapheme_Link [Gr_Link] No [F, False, N] Yes [T, True, Y] Hangul_Syllable_Type [hst] Leading_Jamo [L] LVT_Syllable [LVT] LV_Syllable [LV] Not_Applicable [NA] Trailing_Jamo [T] Vowel_Jamo [V] Hex_Digit [Hex] No [F, False, N] Yes [T, True, Y] Hyphen No [F, False, N] Yes [T, True, Y] Ideographic [Ideo] No [F, False, N] Yes [T, True, Y] IDS_Binary_Operator [IDSB] No [F, False, N] Yes [T, True, Y] IDS_Trinary_Operator [IDST] No [F, False, N] Yes [T, True, Y] ID_Continue [IDC] No [F, False, N] Yes [T, True, Y] ID_Start [IDS] No [F, False, N] Yes [T, True, Y] Indic_Positional_Category [InPC] Bottom Bottom_And_Right Left Left_And_Right NA Overstruck Right Top Top_And_Bottom Top_And_Bottom_And_Right Top_And_Left Top_And_Left_And_Right Top_And_Right Visual_Order_Left Indic_Syllabic_Category [InSC] Avagraha Bindu Brahmi_Joining_Number Cantillation_Mark Consonant Consonant_Dead Consonant_Final Consonant_Head_Letter Consonant_Killer Consonant_Medial Consonant_Placeholder Consonant_Preceding_Repha Consonant_Prefixed Consonant_Subjoined Consonant_Succeeding_Repha Consonant_With_Stacker Gemination_Mark Invisible_Stacker Joiner Modifying_Letter Non_Joiner Nukta Number Number_Joiner Other Pure_Killer Register_Shifter Syllable_Modifier Tone_Letter Tone_Mark Virama Visarga Vowel Vowel_Dependent Vowel_Independent Joining_Group [jg] Ain Alaph Alef Beh Beth Burushaski_Yeh_Barree Dal Dalath_Rish E Farsi_Yeh Fe Feh Final_Semkath Gaf Gamal Hah Hamza_On_Heh_Goal [Teh_Marbuta_Goal] He Heh Heh_Goal Heth Kaf Kaph Khaph Knotted_Heh Lam Lamadh Manichaean_Aleph Manichaean_Ayin Manichaean_Beth Manichaean_Daleth Manichaean_Dhamedh Manichaean_Five Manichaean_Gimel Manichaean_Heth Manichaean_Hundred Manichaean_Kaph Manichaean_Lamedh Manichaean_Mem Manichaean_Nun Manichaean_One Manichaean_Pe Manichaean_Qoph Manichaean_Resh Manichaean_Sadhe Manichaean_Samekh Manichaean_Taw Manichaean_Ten Manichaean_Teth Manichaean_Thamedh Manichaean_Twenty Manichaean_Waw Manichaean_Yodh Manichaean_Zayin Meem Mim Noon No_Joining_Group Nun Nya Pe Qaf Qaph Reh Reversed_Pe Rohingya_Yeh Sad Sadhe Seen Semkath Shin Straight_Waw Swash_Kaf Syriac_Waw Tah Taw Teh_Marbuta Teth Waw Yeh Yeh_Barree Yeh_With_Tail Yudh Yudh_He Zain Zhain Joining_Type [jt] Dual_Joining [D] Join_Causing [C] Left_Joining [L] Non_Joining [U] Right_Joining [R] Transparent [T] Join_Control [Join_C] No [F, False, N] Yes [T, True, Y] Line_Break [lb] Alphabetic [AL] Ambiguous [AI] Break_After [BA] Break_Before [BB] Break_Both [B2] Break_Symbols [SY] Carriage_Return [CR] Close_Parenthesis [CP] Close_Punctuation [CL] Combining_Mark [CM] Complex_Context [SA] Conditional_Japanese_Starter [CJ] Contingent_Break [CB] Exclamation [EX] Glue [GL] H2 H3 Hebrew_Letter [HL] Hyphen [HY] Ideographic [ID] Infix_Numeric [IS] Inseparable [IN, Inseperable] JL JT JV Line_Feed [LF] Mandatory_Break [BK] Next_Line [NL] Nonstarter [NS] Numeric [NU] Open_Punctuation [OP] Postfix_Numeric [PO] Prefix_Numeric [PR] Quotation [QU] Regional_Indicator [RI] Space [SP] Surrogate [SG] Unknown [XX] Word_Joiner [WJ] ZWSpace [ZW] Logical_Order_Exception [LOE] No [F, False, N] Yes [T, True, Y] Lowercase [Lower] No [F, False, N] Yes [T, True, Y] Math No [F, False, N] Yes [T, True, Y] Noncharacter_Code_Point [NChar] No [F, False, N] Yes [T, True, Y] Numeric_Type [nt] Decimal [De] Digit [Di] None Numeric [Nu] Numeric_Value [nv] -1/2 0 1 1/10 1/12 1/16 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 10 100 1000 10000 100000 1000000 100000000 10000000000 1000000000000 11 11/12 11/2 12 13 13/2 14 15 15/2 16 17 17/2 18 19 2 2/3 2/5 20 200 2000 20000 200000 21 216000 22 23 24 25 26 27 28 29 3 3/16 3/2 3/4 3/5 3/8 30 300 3000 30000 300000 31 32 33 34 35 36 37 38 39 4 4/5 40 400 4000 40000 400000 41 42 43 432000 44 45 46 47 48 49 5 5/12 5/2 5/6 5/8 50 500 5000 50000 500000 6 60 600 6000 60000 600000 7 7/12 7/2 7/8 70 700 7000 70000 700000 8 80 800 8000 80000 800000 9 9/2 90 900 9000 90000 900000 NaN Other_Alphabetic [OAlpha] No [F, False, N] Yes [T, True, Y] Other_Default_Ignorable_Code_Point [ODI] No [F, False, N] Yes [T, True, Y] Other_Grapheme_Extend [OGr_Ext] No [F, False, N] Yes [T, True, Y] Other_ID_Continue [OIDC] No [F, False, N] Yes [T, True, Y] Other_ID_Start [OIDS] No [F, False, N] Yes [T, True, Y] Other_Lowercase [OLower] No [F, False, N] Yes [T, True, Y] Other_Math [OMath] No [F, False, N] Yes [T, True, Y] Other_Uppercase [OUpper] No [F, False, N] Yes [T, True, Y] Pattern_Syntax [Pat_Syn] No [F, False, N] Yes [T, True, Y] Pattern_White_Space [Pat_WS] No [F, False, N] Yes [T, True, Y] Posix_AlNum No [F, False, N] Yes [T, True, Y] Posix_Digit No [F, False, N] Yes [T, True, Y] Posix_Punct No [F, False, N] Yes [T, True, Y] Posix_XDigit No [F, False, N] Yes [T, True, Y] Print No [F, False, N] Yes [T, True, Y] Quotation_Mark [QMark] No [F, False, N] Yes [T, True, Y] Radical No [F, False, N] Yes [T, True, Y] Script [sc] Ahom Anatolian_Hieroglyphs [Hluw] Arabic [Arab] Armenian [Armn] Avestan [Avst] Balinese [Bali] Bamum [Bamu] Bassa_Vah [Bass] Batak [Batk] Bengali [Beng] Bopomofo [Bopo] Brahmi [Brah] Braille [Brai] Buginese [Bugi] Buhid [Buhd] Canadian_Aboriginal [Cans] Carian [Cari] Caucasian_Albanian [Aghb] Chakma [Cakm] Cham Cherokee [Cher] Common [Zyyy] Coptic [Copt, Qaac] Cuneiform [Xsux] Cypriot [Cprt] Cyrillic [Cyrl] Deseret [Dsrt] Devanagari [Deva] Duployan [Dupl] Egyptian_Hieroglyphs [Egyp] Elbasan [Elba] Ethiopic [Ethi] Georgian [Geor] Glagolitic [Glag] Gothic [Goth] Grantha [Gran] Greek [Grek] Gujarati [Gujr] Gurmukhi [Guru] Han [Hani] Hangul [Hang] Hanunoo [Hano] Hatran [Hatr] Hebrew [Hebr] Hiragana [Hira] Imperial_Aramaic [Armi] Inherited [Qaai, Zinh] Inscriptional_Pahlavi [Phli] Inscriptional_Parthian [Prti] Javanese [Java] Kaithi [Kthi] Kannada [Knda] Katakana [Kana] Katakana_Or_Hiragana [Hrkt] Kayah_Li [Kali] Kharoshthi [Khar] Khmer [Khmr] Khojki [Khoj] Khudawadi [Sind] Lao [Laoo] Latin [Latn] Lepcha [Lepc] Limbu [Limb] Linear_A [Lina] Linear_B [Linb] Lisu Lycian [Lyci] Lydian [Lydi] Mahajani [Mahj] Malayalam [Mlym] Mandaic [Mand] Manichaean [Mani] Meetei_Mayek [Mtei] Mende_Kikakui [Mend] Meroitic_Cursive [Merc] Meroitic_Hieroglyphs [Mero] Miao [Plrd] Modi Mongolian [Mong] Mro [Mroo] Multani [Mult] Myanmar [Mymr] Nabataean [Nbat] New_Tai_Lue [Talu] Nko [Nkoo] Ogham [Ogam] Old_Hungarian [Hung] Old_Italic [Ital] Old_North_Arabian [Narb] Old_Permic [Perm] Old_Persian [Xpeo] Old_South_Arabian [Sarb] Old_Turkic [Orkh] Ol_Chiki [Olck] Oriya [Orya] Osmanya [Osma] Pahawh_Hmong [Hmng] Palmyrene [Palm] Pau_Cin_Hau [Pauc] Phags_Pa [Phag] Phoenician [Phnx] Psalter_Pahlavi [Phlp] Rejang [Rjng] Runic [Runr] Samaritan [Samr] Saurashtra [Saur] Sharada [Shrd] Shavian [Shaw] Siddham [Sidd] SignWriting [Sgnw] Sinhala [Sinh] Sora_Sompeng [Sora] Sundanese [Sund] Syloti_Nagri [Sylo] Syriac [Syrc] Tagalog [Tglg] Tagbanwa [Tagb] Tai_Le [Tale] Tai_Tham [Lana] Tai_Viet [Tavt] Takri [Takr] Tamil [Taml] Telugu [Telu] Thaana [Thaa] Thai Tibetan [Tibt] Tifinagh [Tfng] Tirhuta [Tirh] Ugaritic [Ugar] Unknown [Zzzz] Vai [Vaii] Warang_Citi [Wara] Yi [Yiii] Sentence_Break [SB] ATerm [AT] Close [CL] CR Extend [EX] Format [FO] LF Lower [LO] Numeric [NU] OLetter [LE] Other [XX] SContinue [SC] Sep [SE] Sp STerm [ST] Upper [UP] Soft_Dotted [SD] No [F, False, N] Yes [T, True, Y] STerm No [F, False, N] Yes [T, True, Y] Terminal_Punctuation [Term] No [F, False, N] Yes [T, True, Y] Unified_Ideograph [UIdeo] No [F, False, N] Yes [T, True, Y] Uppercase [Upper] No [F, False, N] Yes [T, True, Y] Variation_Selector [VS] No [F, False, N] Yes [T, True, Y] White_Space [space, WSpace] No [F, False, N] Yes [T, True, Y] Word No [F, False, N] Yes [T, True, Y] Word_Break [WB] ALetter [LE] CR Double_Quote [DQ] Extend ExtendNumLet [EX] Format [FO] Hebrew_Letter [HL] Katakana [KA] LF MidLetter [ML] MidNum [MN] MidNumLet [MB] Newline [NL] Numeric [NU] Other [XX] Regional_Indicator [RI] Single_Quote [SQ] XDigit No [F, False, N] Yes [T, True, Y] XID_Continue [XIDC] No [F, False, N] Yes [T, True, Y] XID_Start [XIDS] No [F, False, N] Yes [T, True, Y] regex-2016.01.10/Python2/0000777000000000000000000000000012644552200012711 5ustar 00000000000000regex-2016.01.10/Python2/regex.py0000666000000000000000000007342212621677507014422 0ustar 00000000000000# # Secret Labs' Regular Expression Engine # # Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved. # # This version of the SRE library can be redistributed under CNRI's # Python 1.6 license. For any other use, please contact Secret Labs # AB (info@pythonware.com). # # Portions of this engine have been developed in cooperation with # CNRI. Hewlett-Packard provided funding for 1.6 integration and # other compatibility work. # # 2010-01-16 mrab Python front-end re-written and extended r"""Support for regular expressions (RE). This module provides regular expression matching operations similar to those found in Perl. It supports both 8-bit and Unicode strings; both the pattern and the strings being processed can contain null bytes and characters outside the US ASCII range. Regular expressions can contain both special and ordinary characters. Most ordinary characters, like "A", "a", or "0", are the simplest regular expressions; they simply match themselves. You can concatenate ordinary characters, so last matches the string 'last'. There are a few differences between the old (legacy) behaviour and the new (enhanced) behaviour, which are indicated by VERSION0 or VERSION1. The special characters are: "." Matches any character except a newline. "^" Matches the start of the string. "$" Matches the end of the string or just before the newline at the end of the string. "*" Matches 0 or more (greedy) repetitions of the preceding RE. Greedy means that it will match as many repetitions as possible. "+" Matches 1 or more (greedy) repetitions of the preceding RE. "?" Matches 0 or 1 (greedy) of the preceding RE. *?,+?,?? Non-greedy versions of the previous three special characters. *+,++,?+ Possessive versions of the previous three special characters. {m,n} Matches from m to n repetitions of the preceding RE. {m,n}? Non-greedy version of the above. {m,n}+ Possessive version of the above. {...} Fuzzy matching constraints. "\\" Either escapes special characters or signals a special sequence. [...] Indicates a set of characters. A "^" as the first character indicates a complementing set. "|" A|B, creates an RE that will match either A or B. (...) Matches the RE inside the parentheses. The contents are captured and can be retrieved or matched later in the string. (?flags-flags) VERSION1: Sets/clears the flags for the remainder of the group or pattern; VERSION0: Sets the flags for the entire pattern. (?:...) Non-capturing version of regular parentheses. (?>...) Atomic non-capturing version of regular parentheses. (?flags-flags:...) Non-capturing version of regular parentheses with local flags. (?P...) The substring matched by the group is accessible by name. (?...) The substring matched by the group is accessible by name. (?P=name) Matches the text matched earlier by the group named name. (?#...) A comment; ignored. (?=...) Matches if ... matches next, but doesn't consume the string. (?!...) Matches if ... doesn't match next. (?<=...) Matches if preceded by .... (? Matches the text matched by the group named name. \G Matches the empty string, but only at the position where the search started. \K Keeps only what follows for the entire match. \L Named list. The list is provided as a keyword argument. \m Matches the empty string, but only at the start of a word. \M Matches the empty string, but only at the end of a word. \n Matches the newline character. \N{name} Matches the named character. \p{name=value} Matches the character if its property has the specified value. \P{name=value} Matches the character if its property hasn't the specified value. \r Matches the carriage-return character. \s Matches any whitespace character; equivalent to [ \t\n\r\f\v]. \S Matches any non-whitespace character; equivalent to [^\s]. \t Matches the tab character. \uXXXX Matches the Unicode codepoint with 4-digit hex code XXXX. \UXXXXXXXX Matches the Unicode codepoint with 8-digit hex code XXXXXXXX. \v Matches the vertical tab character. \w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_] when matching a bytestring or a Unicode string with the ASCII flag, or the whole range of Unicode alphanumeric characters (letters plus digits plus underscore) when matching a Unicode string. With LOCALE, it will match the set [0-9_] plus characters defined as letters for the current locale. \W Matches the complement of \w; equivalent to [^\w]. \xXX Matches the character with 2-digit hex code XX. \X Matches a grapheme. \Z Matches only at the end of the string. \\ Matches a literal backslash. This module exports the following functions: match Match a regular expression pattern at the beginning of a string. fullmatch Match a regular expression pattern against all of a string. search Search a string for the presence of a pattern. sub Substitute occurrences of a pattern found in a string using a template string. subf Substitute occurrences of a pattern found in a string using a format string. subn Same as sub, but also return the number of substitutions made. subfn Same as subf, but also return the number of substitutions made. split Split a string by the occurrences of a pattern. VERSION1: will split at zero-width match; VERSION0: won't split at zero-width match. splititer Return an iterator yielding the parts of a split string. findall Find all occurrences of a pattern in a string. finditer Return an iterator yielding a match object for each match. compile Compile a pattern into a Pattern object. purge Clear the regular expression cache. escape Backslash all non-alphanumerics or special characters in a string. Most of the functions support a concurrent parameter: if True, the GIL will be released during matching, allowing other Python threads to run concurrently. If the string changes during matching, the behaviour is undefined. This parameter is not needed when working on the builtin (immutable) string classes. Some of the functions in this module take flags as optional parameters. Most of these flags can also be set within an RE: A a ASCII Make \w, \W, \b, \B, \d, and \D match the corresponding ASCII character categories. Default when matching a bytestring. B b BESTMATCH Find the best fuzzy match (default is first). D DEBUG Print the parsed pattern. E e ENHANCEMATCH Attempt to improve the fit after finding the first fuzzy match. F f FULLCASE Use full case-folding when performing case-insensitive matching in Unicode. I i IGNORECASE Perform case-insensitive matching. L L LOCALE Make \w, \W, \b, \B, \d, and \D dependent on the current locale. (One byte per character only.) M m MULTILINE "^" matches the beginning of lines (after a newline) as well as the string. "$" matches the end of lines (before a newline) as well as the end of the string. P p POSIX Perform POSIX-standard matching (leftmost longest). R r REVERSE Searches backwards. S s DOTALL "." matches any character at all, including the newline. U u UNICODE Make \w, \W, \b, \B, \d, and \D dependent on the Unicode locale. Default when matching a Unicode string. V0 V0 VERSION0 Turn on the old legacy behaviour. V1 V1 VERSION1 Turn on the new enhanced behaviour. This flag includes the FULLCASE flag. W w WORD Make \b and \B work with default Unicode word breaks and make ".", "^" and "$" work with Unicode line breaks. X x VERBOSE Ignore whitespace and comments for nicer looking REs. This module also defines an exception 'error'. """ # Public symbols. __all__ = ["compile", "escape", "findall", "finditer", "fullmatch", "match", "purge", "search", "split", "splititer", "sub", "subf", "subfn", "subn", "template", "Scanner", "A", "ASCII", "B", "BESTMATCH", "D", "DEBUG", "E", "ENHANCEMATCH", "S", "DOTALL", "F", "FULLCASE", "I", "IGNORECASE", "L", "LOCALE", "M", "MULTILINE", "P", "POSIX", "R", "REVERSE", "T", "TEMPLATE", "U", "UNICODE", "V0", "VERSION0", "V1", "VERSION1", "X", "VERBOSE", "W", "WORD", "error", "Regex"] __version__ = "2.4.85" # -------------------------------------------------------------------- # Public interface. def match(pattern, string, flags=0, pos=None, endpos=None, partial=False, concurrent=None, **kwargs): """Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.""" return _compile(pattern, flags, kwargs).match(string, pos, endpos, concurrent, partial) def fullmatch(pattern, string, flags=0, pos=None, endpos=None, partial=False, concurrent=None, **kwargs): """Try to apply the pattern against all of the string, returning a match object, or None if no match was found.""" return _compile(pattern, flags, kwargs).fullmatch(string, pos, endpos, concurrent, partial) def search(pattern, string, flags=0, pos=None, endpos=None, partial=False, concurrent=None, **kwargs): """Search through string looking for a match to the pattern, returning a match object, or None if no match was found.""" return _compile(pattern, flags, kwargs).search(string, pos, endpos, concurrent, partial) def sub(pattern, repl, string, count=0, flags=0, pos=None, endpos=None, concurrent=None, **kwargs): """Return the string obtained by replacing the leftmost (or rightmost with a reverse pattern) non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed; if a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags, kwargs).sub(repl, string, count, pos, endpos, concurrent) def subf(pattern, format, string, count=0, flags=0, pos=None, endpos=None, concurrent=None, **kwargs): """Return the string obtained by replacing the leftmost (or rightmost with a reverse pattern) non-overlapping occurrences of the pattern in string by the replacement format. format can be either a string or a callable; if a string, it's treated as a format string; if a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags, kwargs).subf(format, string, count, pos, endpos, concurrent) def subn(pattern, repl, string, count=0, flags=0, pos=None, endpos=None, concurrent=None, **kwargs): """Return a 2-tuple containing (new_string, number). new_string is the string obtained by replacing the leftmost (or rightmost with a reverse pattern) non-overlapping occurrences of the pattern in the source string by the replacement repl. number is the number of substitutions that were made. repl can be either a string or a callable; if a string, backslash escapes in it are processed; if a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags, kwargs).subn(repl, string, count, pos, endpos, concurrent) def subfn(pattern, format, string, count=0, flags=0, pos=None, endpos=None, concurrent=None, **kwargs): """Return a 2-tuple containing (new_string, number). new_string is the string obtained by replacing the leftmost (or rightmost with a reverse pattern) non-overlapping occurrences of the pattern in the source string by the replacement format. number is the number of substitutions that were made. format can be either a string or a callable; if a string, it's treated as a format string; if a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags, kwargs).subfn(format, string, count, pos, endpos, concurrent) def split(pattern, string, maxsplit=0, flags=0, concurrent=None, **kwargs): """Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.""" return _compile(pattern, flags, kwargs).split(string, maxsplit, concurrent) def splititer(pattern, string, maxsplit=0, flags=0, concurrent=None, **kwargs): "Return an iterator yielding the parts of a split string." return _compile(pattern, flags, kwargs).splititer(string, maxsplit, concurrent) def findall(pattern, string, flags=0, pos=None, endpos=None, overlapped=False, concurrent=None, **kwargs): """Return a list of all matches in the string. The matches may be overlapped if overlapped is True. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.""" return _compile(pattern, flags, kwargs).findall(string, pos, endpos, overlapped, concurrent) def finditer(pattern, string, flags=0, pos=None, endpos=None, overlapped=False, partial=False, concurrent=None, **kwargs): """Return an iterator over all matches in the string. The matches may be overlapped if overlapped is True. For each match, the iterator returns a match object. Empty matches are included in the result.""" return _compile(pattern, flags, kwargs).finditer(string, pos, endpos, overlapped, concurrent, partial) def compile(pattern, flags=0, **kwargs): "Compile a regular expression pattern, returning a pattern object." return _compile(pattern, flags, kwargs) def purge(): "Clear the regular expression cache" _cache.clear() _locale_sensitive.clear() def template(pattern, flags=0): "Compile a template pattern, returning a pattern object." return _compile(pattern, flags | TEMPLATE) def escape(pattern, special_only=False): "Escape all non-alphanumeric characters or special characters in pattern." s = [] if special_only: for c in pattern: if c in _METACHARS: s.append("\\") s.append(c) elif c == "\x00": s.append("\\000") else: s.append(c) else: for c in pattern: if c in _ALNUM: s.append(c) elif c == "\x00": s.append("\\000") else: s.append("\\") s.append(c) return pattern[ : 0].join(s) # -------------------------------------------------------------------- # Internals. import _regex_core import _regex from threading import RLock as _RLock from locale import getlocale as _getlocale from _regex_core import * from _regex_core import (_ALL_VERSIONS, _ALL_ENCODINGS, _FirstSetError, _UnscopedFlagSet, _check_group_features, _compile_firstset, _compile_replacement, _flatten_code, _fold_case, _get_required_string, _parse_pattern, _shrink_cache) from _regex_core import (ALNUM as _ALNUM, Info as _Info, OP as _OP, Source as _Source, Fuzzy as _Fuzzy) # Version 0 is the old behaviour, compatible with the original 're' module. # Version 1 is the new behaviour, which differs slightly. DEFAULT_VERSION = VERSION0 _METACHARS = frozenset("()[]{}?*+|^$\\.") _regex_core.DEFAULT_VERSION = DEFAULT_VERSION # Caches for the patterns and replacements. _cache = {} _cache_lock = _RLock() _named_args = {} _replacement_cache = {} _locale_sensitive = {} # Maximum size of the cache. _MAXCACHE = 500 _MAXREPCACHE = 500 def _compile(pattern, flags=0, kwargs={}): "Compiles a regular expression to a PatternObject." # We won't bother to cache the pattern if we're debugging. debugging = (flags & DEBUG) != 0 # What locale is this pattern using? locale_key = (type(pattern), pattern) if _locale_sensitive.get(locale_key, True) or (flags & LOCALE) != 0: # This pattern is, or might be, locale-sensitive. pattern_locale = _getlocale()[1] else: # This pattern is definitely not locale-sensitive. pattern_locale = None if not debugging: try: # Do we know what keyword arguments are needed? args_key = pattern, type(pattern), flags args_needed = _named_args[args_key] # Are we being provided with its required keyword arguments? args_supplied = set() if args_needed: for k, v in args_needed: try: args_supplied.add((k, frozenset(kwargs[k]))) except KeyError: raise error("missing named list: {!r}".format(k)) args_supplied = frozenset(args_supplied) # Have we already seen this regular expression and named list? pattern_key = (pattern, type(pattern), flags, args_supplied, DEFAULT_VERSION, pattern_locale) return _cache[pattern_key] except KeyError: # It's a new pattern, or new named list for a known pattern. pass # Guess the encoding from the class of the pattern string. if isinstance(pattern, unicode): guess_encoding = UNICODE elif isinstance(pattern, str): guess_encoding = ASCII elif isinstance(pattern, _pattern_type): if flags: raise ValueError("cannot process flags argument with a compiled pattern") return pattern else: raise TypeError("first argument must be a string or compiled pattern") # Set the default version in the core code in case it has been changed. _regex_core.DEFAULT_VERSION = DEFAULT_VERSION global_flags = flags while True: caught_exception = None try: source = _Source(pattern) info = _Info(global_flags, source.char_type, kwargs) info.guess_encoding = guess_encoding source.ignore_space = bool(info.flags & VERBOSE) parsed = _parse_pattern(source, info) break except _UnscopedFlagSet: # Remember the global flags for the next attempt. global_flags = info.global_flags except error, e: caught_exception = e if caught_exception: raise error(caught_exception.msg, caught_exception.pattern, caught_exception.pos) if not source.at_end(): raise error("unbalanced parenthesis", pattern, source.pos) # Check the global flags for conflicts. version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION if version not in (0, VERSION0, VERSION1): raise ValueError("VERSION0 and VERSION1 flags are mutually incompatible") if (info.flags & _ALL_ENCODINGS) not in (0, ASCII, LOCALE, UNICODE): raise ValueError("ASCII, LOCALE and UNICODE flags are mutually incompatible") if not (info.flags & _ALL_ENCODINGS): if isinstance(pattern, unicode): info.flags |= UNICODE else: info.flags |= ASCII reverse = bool(info.flags & REVERSE) fuzzy = isinstance(parsed, _Fuzzy) # Remember whether this pattern as an inline locale flag. _locale_sensitive[locale_key] = info.inline_locale # Fix the group references. caught_exception = None try: parsed.fix_groups(pattern, reverse, False) except error, e: caught_exception = e if caught_exception: raise error(caught_exception.msg, caught_exception.pattern, caught_exception.pos) # Should we print the parsed pattern? if flags & DEBUG: parsed.dump(indent=0, reverse=reverse) # Optimise the parsed pattern. parsed = parsed.optimise(info) parsed = parsed.pack_characters(info) # Get the required string. req_offset, req_chars, req_flags = _get_required_string(parsed, info.flags) # Build the named lists. named_lists = {} named_list_indexes = [None] * len(info.named_lists_used) args_needed = set() for key, index in info.named_lists_used.items(): name, case_flags = key values = frozenset(kwargs[name]) if case_flags: items = frozenset(_fold_case(info, v) for v in values) else: items = values named_lists[name] = values named_list_indexes[index] = items args_needed.add((name, values)) # Check the features of the groups. _check_group_features(info, parsed) # Compile the parsed pattern. The result is a list of tuples. code = parsed.compile(reverse) # Is there a group call to the pattern as a whole? key = (0, reverse, fuzzy) ref = info.call_refs.get(key) if ref is not None: code = [(_OP.CALL_REF, ref)] + code + [(_OP.END, )] # Add the final 'success' opcode. code += [(_OP.SUCCESS, )] # Compile the additional copies of the groups that we need. for group, rev, fuz in info.additional_groups: code += group.compile(rev, fuz) # Flatten the code into a list of ints. code = _flatten_code(code) if not parsed.has_simple_start(): # Get the first set, if possible. try: fs_code = _compile_firstset(info, parsed.get_firstset(reverse)) fs_code = _flatten_code(fs_code) code = fs_code + code except _FirstSetError: pass # The named capture groups. index_group = dict((v, n) for n, v in info.group_index.items()) # Create the PatternObject. # # Local flags like IGNORECASE affect the code generation, but aren't needed # by the PatternObject itself. Conversely, global flags like LOCALE _don't_ # affect the code generation but _are_ needed by the PatternObject. compiled_pattern = _regex.compile(pattern, info.flags | version, code, info.group_index, index_group, named_lists, named_list_indexes, req_offset, req_chars, req_flags, info.group_count) # Do we need to reduce the size of the cache? if len(_cache) >= _MAXCACHE: _cache_lock.acquire() try: _shrink_cache(_cache, _named_args, _locale_sensitive, _MAXCACHE) finally: _cache_lock.release() if not debugging: if (info.flags & LOCALE) == 0: pattern_locale = None args_needed = frozenset(args_needed) # Store this regular expression and named list. pattern_key = (pattern, type(pattern), flags, args_needed, DEFAULT_VERSION, pattern_locale) _cache[pattern_key] = compiled_pattern # Store what keyword arguments are needed. _named_args[args_key] = args_needed return compiled_pattern def _compile_replacement_helper(pattern, template): "Compiles a replacement template." # This function is called by the _regex module. # Have we seen this before? key = pattern.pattern, pattern.flags, template compiled = _replacement_cache.get(key) if compiled is not None: return compiled if len(_replacement_cache) >= _MAXREPCACHE: _replacement_cache.clear() is_unicode = isinstance(template, unicode) source = _Source(template) if is_unicode: def make_string(char_codes): return u"".join(unichr(c) for c in char_codes) else: def make_string(char_codes): return "".join(chr(c) for c in char_codes) compiled = [] literal = [] while True: ch = source.get() if not ch: break if ch == "\\": # '_compile_replacement' will return either an int group reference # or a string literal. It returns items (plural) in order to handle # a 2-character literal (an invalid escape sequence). is_group, items = _compile_replacement(source, pattern, is_unicode) if is_group: # It's a group, so first flush the literal. if literal: compiled.append(make_string(literal)) literal = [] compiled.extend(items) else: literal.extend(items) else: literal.append(ord(ch)) # Flush the literal. if literal: compiled.append(make_string(literal)) _replacement_cache[key] = compiled return compiled # We define _pattern_type here after all the support objects have been defined. _pattern_type = type(_compile("", 0, {})) # We'll define an alias for the 'compile' function so that the repr of a # pattern object is eval-able. Regex = compile # Register myself for pickling. import copy_reg as _copy_reg def _pickle(p): return _compile, (p.pattern, p.flags) _copy_reg.pickle(_pattern_type, _pickle, _compile) if not hasattr(str, "format"): # Strings don't have the .format method (below Python 2.6). while True: _start = __doc__.find(" subf") if _start < 0: break _end = __doc__.find("\n", _start) + 1 while __doc__.startswith(" ", _end): _end = __doc__.find("\n", _end) + 1 __doc__ = __doc__[ : _start] + __doc__[_end : ] __all__ = [_name for _name in __all__ if not _name.startswith("subf")] del _start, _end del subf, subfn regex-2016.01.10/Python2/test_regex.py0000666000000000000000000051472412624412455015456 0ustar 00000000000000from __future__ import with_statement import regex import string from weakref import proxy import unittest import copy from test.test_support import run_unittest import re # _AssertRaisesContext is defined here because the class doesn't exist before # Python 2.7. class _AssertRaisesContext(object): """A context manager used to implement TestCase.assertRaises* methods.""" def __init__(self, expected, test_case, expected_regexp=None): self.expected = expected self.failureException = test_case.failureException self.expected_regexp = expected_regexp def __enter__(self): return self def __exit__(self, exc_type, exc_value, tb): if exc_type is None: try: exc_name = self.expected.__name__ except AttributeError: exc_name = str(self.expected) raise self.failureException( "%s not raised" % exc_name) if not issubclass(exc_type, self.expected): # let unexpected exceptions pass through return False self.exception = exc_value # store for later retrieval if self.expected_regexp is None: return True expected_regexp = self.expected_regexp if isinstance(expected_regexp, basestring): expected_regexp = re.compile(expected_regexp) if not expected_regexp.search(str(exc_value)): raise self.failureException('"%s" does not match "%s"' % (expected_regexp.pattern, str(exc_value))) return True class RegexTests(unittest.TestCase): PATTERN_CLASS = "" FLAGS_WITH_COMPILED_PAT = "cannot process flags argument with a compiled pattern" INVALID_GROUP_REF = "invalid group reference" MISSING_GT = "missing >" BAD_GROUP_NAME = "bad character in group name" MISSING_GROUP_NAME = "missing group name" MISSING_LT = "missing <" UNKNOWN_GROUP_I = "unknown group" UNKNOWN_GROUP = "unknown group" BAD_ESCAPE = r"bad escape \(end of pattern\)" BAD_OCTAL_ESCAPE = r"bad escape \\" BAD_SET = "unterminated character set" STR_PAT_ON_BYTES = "cannot use a string pattern on a bytes-like object" BYTES_PAT_ON_STR = "cannot use a bytes pattern on a string-like object" STR_PAT_BYTES_TEMPL = "expected str instance, bytes found" BYTES_PAT_STR_TEMPL = "expected a bytes-like object, str found" BYTES_PAT_UNI_FLAG = "cannot use UNICODE flag with a bytes pattern" MIXED_FLAGS = "ASCII, LOCALE and UNICODE flags are mutually incompatible" MISSING_RPAREN = "missing \\)" TRAILING_CHARS = "unbalanced parenthesis" BAD_CHAR_RANGE = "bad character range" NOTHING_TO_REPEAT = "nothing to repeat" MULTIPLE_REPEAT = "multiple repeat" OPEN_GROUP = "cannot refer to an open group" DUPLICATE_GROUP = "duplicate group" CANT_TURN_OFF = "bad inline flags: cannot turn flags off" UNDEF_CHAR_NAME = "undefined character name" # assertRaisesRegex is defined here because the method isn't in the # superclass before Python 2.7. def assertRaisesRegex(self, expected_exception, expected_regexp, callable_obj=None, *args, **kwargs): """Asserts that the message in a raised exception matches a regexp. Args: expected_exception: Exception class expected to be raised. expected_regexp: Regexp (re pattern object or string) expected to be found in error message. callable_obj: Function to be called. args: Extra args. kwargs: Extra kwargs. """ context = _AssertRaisesContext(expected_exception, self, expected_regexp) if callable_obj is None: return context with context: callable_obj(*args, **kwargs) def assertTypedEqual(self, actual, expect, msg=None): self.assertEqual(actual, expect, msg) def recurse(actual, expect): if isinstance(expect, (tuple, list)): for x, y in zip(actual, expect): recurse(x, y) else: self.assertIs(type(actual), type(expect), msg) recurse(actual, expect) def test_weakref(self): s = 'QabbbcR' x = regex.compile('ab+c') y = proxy(x) if x.findall('QabbbcR') != y.findall('QabbbcR'): self.fail() def test_search_star_plus(self): self.assertEqual(regex.search('a*', 'xxx').span(0), (0, 0)) self.assertEqual(regex.search('x*', 'axx').span(), (0, 0)) self.assertEqual(regex.search('x+', 'axx').span(0), (1, 3)) self.assertEqual(regex.search('x+', 'axx').span(), (1, 3)) self.assertEqual(regex.search('x', 'aaa'), None) self.assertEqual(regex.match('a*', 'xxx').span(0), (0, 0)) self.assertEqual(regex.match('a*', 'xxx').span(), (0, 0)) self.assertEqual(regex.match('x*', 'xxxa').span(0), (0, 3)) self.assertEqual(regex.match('x*', 'xxxa').span(), (0, 3)) self.assertEqual(regex.match('a+', 'xxx'), None) def bump_num(self, matchobj): int_value = int(matchobj[0]) return str(int_value + 1) def test_basic_regex_sub(self): self.assertEqual(regex.sub("(?i)b+", "x", "bbbb BBBB"), 'x x') self.assertEqual(regex.sub(r'\d+', self.bump_num, '08.2 -2 23x99y'), '9.3 -3 24x100y') self.assertEqual(regex.sub(r'\d+', self.bump_num, '08.2 -2 23x99y', 3), '9.3 -3 23x99y') self.assertEqual(regex.sub('.', lambda m: r"\n", 'x'), "\\n") self.assertEqual(regex.sub('.', r"\n", 'x'), "\n") self.assertEqual(regex.sub('(?Px)', r'\g\g', 'xx'), 'xxxx') self.assertEqual(regex.sub('(?Px)', r'\g\g<1>', 'xx'), 'xxxx') self.assertEqual(regex.sub('(?Px)', r'\g\g', 'xx'), 'xxxx') self.assertEqual(regex.sub('(?Px)', r'\g<1>\g<1>', 'xx'), 'xxxx') self.assertEqual(regex.sub('a', r'\t\n\v\r\f\a\b\B\Z\a\A\w\W\s\S\d\D', 'a'), "\t\n\v\r\f\a\b\\B\\Z\a\\A\\w\\W\\s\\S\\d\\D") self.assertEqual(regex.sub('a', '\t\n\v\r\f\a', 'a'), "\t\n\v\r\f\a") self.assertEqual(regex.sub('a', '\t\n\v\r\f\a', 'a'), chr(9) + chr(10) + chr(11) + chr(13) + chr(12) + chr(7)) self.assertEqual(regex.sub(r'^\s*', 'X', 'test'), 'Xtest') self.assertEqual(regex.sub(ur"x", ur"\x0A", u"x"), u"\n") self.assertEqual(regex.sub(ur"x", ur"\u000A", u"x"), u"\n") self.assertEqual(regex.sub(ur"x", ur"\U0000000A", u"x"), u"\n") self.assertEqual(regex.sub(ur"x", ur"\N{LATIN CAPITAL LETTER A}", u"x"), u"A") self.assertEqual(regex.sub(r"x", r"\x0A", "x"), "\n") self.assertEqual(regex.sub(r"x", r"\u000A", "x"), "\\u000A") self.assertEqual(regex.sub(r"x", r"\U0000000A", "x"), "\\U0000000A") self.assertEqual(regex.sub(r"x", r"\N{LATIN CAPITAL LETTER A}", "x"), "\\N{LATIN CAPITAL LETTER A}") def test_bug_449964(self): # Fails for group followed by other escape. self.assertEqual(regex.sub(r'(?Px)', r'\g<1>\g<1>\b', 'xx'), "xx\bxx\b") def test_bug_449000(self): # Test for sub() on escaped characters. self.assertEqual(regex.sub(r'\r\n', r'\n', 'abc\r\ndef\r\n'), "abc\ndef\n") self.assertEqual(regex.sub('\r\n', r'\n', 'abc\r\ndef\r\n'), "abc\ndef\n") self.assertEqual(regex.sub(r'\r\n', '\n', 'abc\r\ndef\r\n'), "abc\ndef\n") self.assertEqual(regex.sub('\r\n', '\n', 'abc\r\ndef\r\n'), "abc\ndef\n") def test_bug_1140(self): # regex.sub(x, y, u'') should return u'', not '', and # regex.sub(x, y, '') should return '', not u''. # Also: # regex.sub(x, y, unicode(x)) should return unicode(y), and # regex.sub(x, y, str(x)) should return # str(y) if isinstance(y, str) else unicode(y). for x in 'x', u'x': for y in 'y', u'y': z = regex.sub(x, y, u'') self.assertEqual((type(z), z), (unicode, u'')) z = regex.sub(x, y, '') self.assertEqual((type(z), z), (str, '')) z = regex.sub(x, y, unicode(x)) self.assertEqual((type(z), z), (unicode, unicode(y))) z = regex.sub(x, y, str(x)) self.assertEqual((type(z), z), (type(y), y)) def test_bug_1661(self): # Verify that flags do not get silently ignored with compiled patterns pattern = regex.compile('.') self.assertRaisesRegex(ValueError, self.FLAGS_WITH_COMPILED_PAT, lambda: regex.match(pattern, 'A', regex.I)) self.assertRaisesRegex(ValueError, self.FLAGS_WITH_COMPILED_PAT, lambda: regex.search(pattern, 'A', regex.I)) self.assertRaisesRegex(ValueError, self.FLAGS_WITH_COMPILED_PAT, lambda: regex.findall(pattern, 'A', regex.I)) self.assertRaisesRegex(ValueError, self.FLAGS_WITH_COMPILED_PAT, lambda: regex.compile(pattern, regex.I)) def test_bug_3629(self): # A regex that triggered a bug in the sre-code validator self.assertEqual(repr(type(regex.compile("(?P)(?(quote))"))), self.PATTERN_CLASS) def test_sub_template_numeric_escape(self): # Bug 776311 and friends. self.assertEqual(regex.sub('x', r'\0', 'x'), "\0") self.assertEqual(regex.sub('x', r'\000', 'x'), "\000") self.assertEqual(regex.sub('x', r'\001', 'x'), "\001") self.assertEqual(regex.sub('x', r'\008', 'x'), "\0" + "8") self.assertEqual(regex.sub('x', r'\009', 'x'), "\0" + "9") self.assertEqual(regex.sub('x', r'\111', 'x'), "\111") self.assertEqual(regex.sub('x', r'\117', 'x'), "\117") self.assertEqual(regex.sub('x', r'\1111', 'x'), "\1111") self.assertEqual(regex.sub('x', r'\1111', 'x'), "\111" + "1") self.assertEqual(regex.sub('x', r'\00', 'x'), '\x00') self.assertEqual(regex.sub('x', r'\07', 'x'), '\x07') self.assertEqual(regex.sub('x', r'\08', 'x'), "\0" + "8") self.assertEqual(regex.sub('x', r'\09', 'x'), "\0" + "9") self.assertEqual(regex.sub('x', r'\0a', 'x'), "\0" + "a") self.assertEqual(regex.sub(u'x', ur'\400', u'x'), u"\u0100") self.assertEqual(regex.sub(u'x', ur'\777', u'x'), u"\u01FF") self.assertEqual(regex.sub('x', r'\400', 'x'), "\x00") self.assertEqual(regex.sub('x', r'\777', 'x'), "\xFF") self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\1', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\8', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\9', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\11', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\18', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\1a', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\90', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\99', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\118', 'x')) # r'\11' + '8' self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\11a', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\181', 'x')) # r'\18' + '1' self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\800', 'x')) # r'\80' + '0' # In Python 2.3 (etc), these loop endlessly in sre_parser.py. self.assertEqual(regex.sub('(((((((((((x)))))))))))', r'\11', 'x'), 'x') self.assertEqual(regex.sub('((((((((((y))))))))))(.)', r'\118', 'xyz'), 'xz8') self.assertEqual(regex.sub('((((((((((y))))))))))(.)', r'\11a', 'xyz'), 'xza') def test_qualified_re_sub(self): self.assertEqual(regex.sub('a', 'b', 'aaaaa'), 'bbbbb') self.assertEqual(regex.sub('a', 'b', 'aaaaa', 1), 'baaaa') def test_bug_114660(self): self.assertEqual(regex.sub(r'(\S)\s+(\S)', r'\1 \2', 'hello there'), 'hello there') def test_bug_462270(self): # Test for empty sub() behaviour, see SF bug #462270 self.assertEqual(regex.sub('(?V0)x*', '-', 'abxd'), '-a-b-d-') self.assertEqual(regex.sub('(?V1)x*', '-', 'abxd'), '-a-b--d-') self.assertEqual(regex.sub('x+', '-', 'abxd'), 'ab-d') def test_bug_14462(self): # chr(255) is not a valid identifier in Python 2. group_name = u'\xFF' self.assertRaisesRegex(regex.error, self.BAD_GROUP_NAME, lambda: regex.search(ur'(?P<' + group_name + '>a)', u'a')) def test_symbolic_refs(self): self.assertRaisesRegex(regex.error, self.MISSING_GT, lambda: regex.sub('(?Px)', r'\gx)', r'\g<', 'xx')) self.assertRaisesRegex(regex.error, self.MISSING_LT, lambda: regex.sub('(?Px)', r'\g', 'xx')) self.assertRaisesRegex(regex.error, self.BAD_GROUP_NAME, lambda: regex.sub('(?Px)', r'\g', 'xx')) self.assertRaisesRegex(regex.error, self.BAD_GROUP_NAME, lambda: regex.sub('(?Px)', r'\g<1a1>', 'xx')) self.assertRaisesRegex(IndexError, self.UNKNOWN_GROUP_I, lambda: regex.sub('(?Px)', r'\g', 'xx')) # The new behaviour of unmatched but valid groups is to treat them like # empty matches in the replacement template, like in Perl. self.assertEqual(regex.sub('(?Px)|(?Py)', r'\g', 'xx'), '') self.assertEqual(regex.sub('(?Px)|(?Py)', r'\2', 'xx'), '') # The old behaviour was to raise it as an IndexError. self.assertRaisesRegex(regex.error, self.BAD_GROUP_NAME, lambda: regex.sub('(?Px)', r'\g<-1>', 'xx')) def test_re_subn(self): self.assertEqual(regex.subn("(?i)b+", "x", "bbbb BBBB"), ('x x', 2)) self.assertEqual(regex.subn("b+", "x", "bbbb BBBB"), ('x BBBB', 1)) self.assertEqual(regex.subn("b+", "x", "xyz"), ('xyz', 0)) self.assertEqual(regex.subn("b*", "x", "xyz"), ('xxxyxzx', 4)) self.assertEqual(regex.subn("b*", "x", "xyz", 2), ('xxxyz', 2)) def test_re_split(self): self.assertEqual(regex.split(":", ":a:b::c"), ['', 'a', 'b', '', 'c']) self.assertEqual(regex.split(":*", ":a:b::c"), ['', 'a', 'b', 'c']) self.assertEqual(regex.split("(:*)", ":a:b::c"), ['', ':', 'a', ':', 'b', '::', 'c']) self.assertEqual(regex.split("(?::*)", ":a:b::c"), ['', 'a', 'b', 'c']) self.assertEqual(regex.split("(:)*", ":a:b::c"), ['', ':', 'a', ':', 'b', ':', 'c']) self.assertEqual(regex.split("([b:]+)", ":a:b::c"), ['', ':', 'a', ':b::', 'c']) self.assertEqual(regex.split("(b)|(:+)", ":a:b::c"), ['', None, ':', 'a', None, ':', '', 'b', None, '', None, '::', 'c']) self.assertEqual(regex.split("(?:b)|(?::+)", ":a:b::c"), ['', 'a', '', '', 'c']) self.assertEqual(regex.split("x", "xaxbxc"), ['', 'a', 'b', 'c']) self.assertEqual([m for m in regex.splititer("x", "xaxbxc")], ['', 'a', 'b', 'c']) self.assertEqual(regex.split("(?r)x", "xaxbxc"), ['c', 'b', 'a', '']) self.assertEqual([m for m in regex.splititer("(?r)x", "xaxbxc")], ['c', 'b', 'a', '']) self.assertEqual(regex.split("(x)|(y)", "xaxbxc"), ['', 'x', None, 'a', 'x', None, 'b', 'x', None, 'c']) self.assertEqual([m for m in regex.splititer("(x)|(y)", "xaxbxc")], ['', 'x', None, 'a', 'x', None, 'b', 'x', None, 'c']) self.assertEqual(regex.split("(?r)(x)|(y)", "xaxbxc"), ['c', 'x', None, 'b', 'x', None, 'a', 'x', None, '']) self.assertEqual([m for m in regex.splititer("(?r)(x)|(y)", "xaxbxc")], ['c', 'x', None, 'b', 'x', None, 'a', 'x', None, '']) self.assertEqual(regex.split(r"(?V1)\b", "a b c"), ['', 'a', ' ', 'b', ' ', 'c', '']) self.assertEqual(regex.split(r"(?V1)\m", "a b c"), ['', 'a ', 'b ', 'c']) self.assertEqual(regex.split(r"(?V1)\M", "a b c"), ['a', ' b', ' c', '']) def test_qualified_re_split(self): self.assertEqual(regex.split(":", ":a:b::c", 2), ['', 'a', 'b::c']) self.assertEqual(regex.split(':', 'a:b:c:d', 2), ['a', 'b', 'c:d']) self.assertEqual(regex.split("(:)", ":a:b::c", 2), ['', ':', 'a', ':', 'b::c']) self.assertEqual(regex.split("(:*)", ":a:b::c", 2), ['', ':', 'a', ':', 'b::c']) def test_re_findall(self): self.assertEqual(regex.findall(":+", "abc"), []) self.assertEqual(regex.findall(":+", "a:b::c:::d"), [':', '::', ':::']) self.assertEqual(regex.findall("(:+)", "a:b::c:::d"), [':', '::', ':::']) self.assertEqual(regex.findall("(:)(:*)", "a:b::c:::d"), [(':', ''), (':', ':'), (':', '::')]) self.assertEqual(regex.findall(r"\((?P.{0,5}?TEST)\)", "(MY TEST)"), ["MY TEST"]) self.assertEqual(regex.findall(r"\((?P.{0,3}?TEST)\)", "(MY TEST)"), ["MY TEST"]) self.assertEqual(regex.findall(r"\((?P.{0,3}?T)\)", "(MY T)"), ["MY T"]) self.assertEqual(regex.findall(r"[^a]{2}[A-Z]", "\n S"), [' S']) self.assertEqual(regex.findall(r"[^a]{2,3}[A-Z]", "\n S"), ['\n S']) self.assertEqual(regex.findall(r"[^a]{2,3}[A-Z]", "\n S"), [' S']) self.assertEqual(regex.findall(r"X(Y[^Y]+?){1,2}( |Q)+DEF", "XYABCYPPQ\nQ DEF"), [('YPPQ\n', ' ')]) self.assertEqual(regex.findall(r"(\nTest(\n+.+?){0,2}?)?\n+End", "\nTest\nxyz\nxyz\nEnd"), [('\nTest\nxyz\nxyz', '\nxyz')]) def test_bug_117612(self): self.assertEqual(regex.findall(r"(a|(b))", "aba"), [('a', ''), ('b', 'b'), ('a', '')]) def test_re_match(self): self.assertEqual(regex.match('a', 'a')[:], ('a',)) self.assertEqual(regex.match('(a)', 'a')[:], ('a', 'a')) self.assertEqual(regex.match(r'(a)', 'a')[0], 'a') self.assertEqual(regex.match(r'(a)', 'a')[1], 'a') self.assertEqual(regex.match(r'(a)', 'a').group(1, 1), ('a', 'a')) pat = regex.compile('((a)|(b))(c)?') self.assertEqual(pat.match('a')[:], ('a', 'a', 'a', None, None)) self.assertEqual(pat.match('b')[:], ('b', 'b', None, 'b', None)) self.assertEqual(pat.match('ac')[:], ('ac', 'a', 'a', None, 'c')) self.assertEqual(pat.match('bc')[:], ('bc', 'b', None, 'b', 'c')) self.assertEqual(pat.match('bc')[:], ('bc', 'b', None, 'b', 'c')) # A single group. m = regex.match('(a)', 'a') self.assertEqual(m.group(), 'a') self.assertEqual(m.group(0), 'a') self.assertEqual(m.group(1), 'a') self.assertEqual(m.group(1, 1), ('a', 'a')) pat = regex.compile('(?:(?Pa)|(?Pb))(?Pc)?') self.assertEqual(pat.match('a').group(1, 2, 3), ('a', None, None)) self.assertEqual(pat.match('b').group('a1', 'b2', 'c3'), (None, 'b', None)) self.assertEqual(pat.match('ac').group(1, 'b2', 3), ('a', None, 'c')) def test_re_groupref_exists(self): self.assertEqual(regex.match(r'^(\()?([^()]+)(?(1)\))$', '(a)')[:], ('(a)', '(', 'a')) self.assertEqual(regex.match(r'^(\()?([^()]+)(?(1)\))$', 'a')[:], ('a', None, 'a')) self.assertEqual(regex.match(r'^(\()?([^()]+)(?(1)\))$', 'a)'), None) self.assertEqual(regex.match(r'^(\()?([^()]+)(?(1)\))$', '(a'), None) self.assertEqual(regex.match('^(?:(a)|c)((?(1)b|d))$', 'ab')[:], ('ab', 'a', 'b')) self.assertEqual(regex.match('^(?:(a)|c)((?(1)b|d))$', 'cd')[:], ('cd', None, 'd')) self.assertEqual(regex.match('^(?:(a)|c)((?(1)|d))$', 'cd')[:], ('cd', None, 'd')) self.assertEqual(regex.match('^(?:(a)|c)((?(1)|d))$', 'a')[:], ('a', 'a', '')) # Tests for bug #1177831: exercise groups other than the first group. p = regex.compile('(?Pa)(?Pb)?((?(g2)c|d))') self.assertEqual(p.match('abc')[:], ('abc', 'a', 'b', 'c')) self.assertEqual(p.match('ad')[:], ('ad', 'a', None, 'd')) self.assertEqual(p.match('abd'), None) self.assertEqual(p.match('ac'), None) def test_re_groupref(self): self.assertEqual(regex.match(r'^(\|)?([^()]+)\1$', '|a|')[:], ('|a|', '|', 'a')) self.assertEqual(regex.match(r'^(\|)?([^()]+)\1?$', 'a')[:], ('a', None, 'a')) self.assertEqual(regex.match(r'^(\|)?([^()]+)\1$', 'a|'), None) self.assertEqual(regex.match(r'^(\|)?([^()]+)\1$', '|a'), None) self.assertEqual(regex.match(r'^(?:(a)|c)(\1)$', 'aa')[:], ('aa', 'a', 'a')) self.assertEqual(regex.match(r'^(?:(a)|c)(\1)?$', 'c')[:], ('c', None, None)) self.assertEqual(regex.findall("(?i)(.{1,40}?),(.{1,40}?)(?:;)+(.{1,80}).{1,40}?\\3(\ |;)+(.{1,80}?)\\1", "TEST, BEST; LEST ; Lest 123 Test, Best"), [('TEST', ' BEST', ' LEST', ' ', '123 ')]) def test_groupdict(self): self.assertEqual(regex.match('(?Pfirst) (?Psecond)', 'first second').groupdict(), {'first': 'first', 'second': 'second'}) def test_expand(self): self.assertEqual(regex.match("(?Pfirst) (?Psecond)", "first second").expand(r"\2 \1 \g \g"), 'second first second first') def test_repeat_minmax(self): self.assertEqual(regex.match(r"^(\w){1}$", "abc"), None) self.assertEqual(regex.match(r"^(\w){1}?$", "abc"), None) self.assertEqual(regex.match(r"^(\w){1,2}$", "abc"), None) self.assertEqual(regex.match(r"^(\w){1,2}?$", "abc"), None) self.assertEqual(regex.match(r"^(\w){3}$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){1,3}$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){1,4}$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){3,4}?$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){3}?$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){1,3}?$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){1,4}?$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){3,4}?$", "abc")[1], 'c') self.assertEqual(regex.match("^x{1}$", "xxx"), None) self.assertEqual(regex.match("^x{1}?$", "xxx"), None) self.assertEqual(regex.match("^x{1,2}$", "xxx"), None) self.assertEqual(regex.match("^x{1,2}?$", "xxx"), None) self.assertEqual(regex.match("^x{1}", "xxx")[0], 'x') self.assertEqual(regex.match("^x{1}?", "xxx")[0], 'x') self.assertEqual(regex.match("^x{0,1}", "xxx")[0], 'x') self.assertEqual(regex.match("^x{0,1}?", "xxx")[0], '') self.assertEqual(bool(regex.match("^x{3}$", "xxx")), True) self.assertEqual(bool(regex.match("^x{1,3}$", "xxx")), True) self.assertEqual(bool(regex.match("^x{1,4}$", "xxx")), True) self.assertEqual(bool(regex.match("^x{3,4}?$", "xxx")), True) self.assertEqual(bool(regex.match("^x{3}?$", "xxx")), True) self.assertEqual(bool(regex.match("^x{1,3}?$", "xxx")), True) self.assertEqual(bool(regex.match("^x{1,4}?$", "xxx")), True) self.assertEqual(bool(regex.match("^x{3,4}?$", "xxx")), True) self.assertEqual(regex.match("^x{}$", "xxx"), None) self.assertEqual(bool(regex.match("^x{}$", "x{}")), True) def test_getattr(self): self.assertEqual(regex.compile("(?i)(a)(b)").pattern, '(?i)(a)(b)') self.assertEqual(regex.compile("(?i)(a)(b)").flags, regex.A | regex.I | regex.DEFAULT_VERSION) self.assertEqual(regex.compile(u"(?i)(a)(b)").flags, regex.I | regex.U | regex.DEFAULT_VERSION) self.assertEqual(regex.compile("(?i)(a)(b)").groups, 2) self.assertEqual(regex.compile("(?i)(a)(b)").groupindex, {}) self.assertEqual(regex.compile("(?i)(?Pa)(?Pb)").groupindex, {'first': 1, 'other': 2}) self.assertEqual(regex.match("(a)", "a").pos, 0) self.assertEqual(regex.match("(a)", "a").endpos, 1) self.assertEqual(regex.search("b(c)", "abcdef").pos, 0) self.assertEqual(regex.search("b(c)", "abcdef").endpos, 6) self.assertEqual(regex.search("b(c)", "abcdef").span(), (1, 3)) self.assertEqual(regex.search("b(c)", "abcdef").span(1), (2, 3)) self.assertEqual(regex.match("(a)", "a").string, 'a') self.assertEqual(regex.match("(a)", "a").regs, ((0, 1), (0, 1))) self.assertEqual(repr(type(regex.match("(a)", "a").re)), self.PATTERN_CLASS) # Issue 14260. p = regex.compile(r'abc(?Pdef)') p.groupindex["n"] = 0 self.assertEqual(p.groupindex["n"], 1) def test_special_escapes(self): self.assertEqual(regex.search(r"\b(b.)\b", "abcd abc bcd bx")[1], 'bx') self.assertEqual(regex.search(r"\B(b.)\B", "abc bcd bc abxd")[1], 'bx') self.assertEqual(regex.search(r"\b(b.)\b", "abcd abc bcd bx", regex.LOCALE)[1], 'bx') self.assertEqual(regex.search(r"\B(b.)\B", "abc bcd bc abxd", regex.LOCALE)[1], 'bx') self.assertEqual(regex.search(ur"\b(b.)\b", u"abcd abc bcd bx", regex.UNICODE)[1], u'bx') self.assertEqual(regex.search(ur"\B(b.)\B", u"abc bcd bc abxd", regex.UNICODE)[1], u'bx') self.assertEqual(regex.search(r"^abc$", "\nabc\n", regex.M)[0], 'abc') self.assertEqual(regex.search(r"^\Aabc\Z$", "abc", regex.M)[0], 'abc') self.assertEqual(regex.search(r"^\Aabc\Z$", "\nabc\n", regex.M), None) self.assertEqual(regex.search(ur"\b(b.)\b", u"abcd abc bcd bx")[1], u'bx') self.assertEqual(regex.search(ur"\B(b.)\B", u"abc bcd bc abxd")[1], u'bx') self.assertEqual(regex.search(ur"^abc$", u"\nabc\n", regex.M)[0], u'abc') self.assertEqual(regex.search(ur"^\Aabc\Z$", u"abc", regex.M)[0], u'abc') self.assertEqual(regex.search(ur"^\Aabc\Z$", u"\nabc\n", regex.M), None) self.assertEqual(regex.search(r"\d\D\w\W\s\S", "1aa! a")[0], '1aa! a') self.assertEqual(regex.search(r"\d\D\w\W\s\S", "1aa! a", regex.LOCALE)[0], '1aa! a') self.assertEqual(regex.search(ur"\d\D\w\W\s\S", u"1aa! a", regex.UNICODE)[0], u'1aa! a') def test_bigcharset(self): self.assertEqual(regex.match(ur"(?u)([\u2222\u2223])", u"\u2222")[1], u'\u2222') self.assertEqual(regex.match(ur"(?u)([\u2222\u2223])", u"\u2222", regex.UNICODE)[1], u'\u2222') self.assertEqual(u"".join(regex.findall(u".", u"e\xe8\xe9\xea\xeb\u0113\u011b\u0117", flags=regex.UNICODE)), u'e\xe8\xe9\xea\xeb\u0113\u011b\u0117') self.assertEqual(u"".join(regex.findall(ur"[e\xe8\xe9\xea\xeb\u0113\u011b\u0117]", u"e\xe8\xe9\xea\xeb\u0113\u011b\u0117", flags=regex.UNICODE)), u'e\xe8\xe9\xea\xeb\u0113\u011b\u0117') self.assertEqual(u"".join(regex.findall(ur"e|\xe8|\xe9|\xea|\xeb|\u0113|\u011b|\u0117", u"e\xe8\xe9\xea\xeb\u0113\u011b\u0117", flags=regex.UNICODE)), u'e\xe8\xe9\xea\xeb\u0113\u011b\u0117') def test_anyall(self): self.assertEqual(regex.match("a.b", "a\nb", regex.DOTALL)[0], "a\nb") self.assertEqual(regex.match("a.*b", "a\n\nb", regex.DOTALL)[0], "a\n\nb") def test_non_consuming(self): self.assertEqual(regex.match(r"(a(?=\s[^a]))", "a b")[1], 'a') self.assertEqual(regex.match(r"(a(?=\s[^a]*))", "a b")[1], 'a') self.assertEqual(regex.match(r"(a(?=\s[abc]))", "a b")[1], 'a') self.assertEqual(regex.match(r"(a(?=\s[abc]*))", "a bc")[1], 'a') self.assertEqual(regex.match(r"(a)(?=\s\1)", "a a")[1], 'a') self.assertEqual(regex.match(r"(a)(?=\s\1*)", "a aa")[1], 'a') self.assertEqual(regex.match(r"(a)(?=\s(abc|a))", "a a")[1], 'a') self.assertEqual(regex.match(r"(a(?!\s[^a]))", "a a")[1], 'a') self.assertEqual(regex.match(r"(a(?!\s[abc]))", "a d")[1], 'a') self.assertEqual(regex.match(r"(a)(?!\s\1)", "a b")[1], 'a') self.assertEqual(regex.match(r"(a)(?!\s(abc|a))", "a b")[1], 'a') def test_ignore_case(self): self.assertEqual(regex.match("abc", "ABC", regex.I)[0], 'ABC') self.assertEqual(regex.match(u"abc", u"ABC", regex.I)[0], u'ABC') self.assertEqual(regex.match(r"(a\s[^a]*)", "a bb", regex.I)[1], 'a bb') self.assertEqual(regex.match(r"(a\s[abc])", "a b", regex.I)[1], 'a b') self.assertEqual(regex.match(r"(a\s[abc]*)", "a bb", regex.I)[1], 'a bb') self.assertEqual(regex.match(r"((a)\s\2)", "a a", regex.I)[1], 'a a') self.assertEqual(regex.match(r"((a)\s\2*)", "a aa", regex.I)[1], 'a aa') self.assertEqual(regex.match(r"((a)\s(abc|a))", "a a", regex.I)[1], 'a a') self.assertEqual(regex.match(r"((a)\s(abc|a)*)", "a aa", regex.I)[1], 'a aa') # Issue 3511. self.assertEqual(regex.match(r"[Z-a]", "_").span(), (0, 1)) self.assertEqual(regex.match(r"(?i)[Z-a]", "_").span(), (0, 1)) self.assertEqual(bool(regex.match(ur"(?iu)nao", u"nAo")), True) self.assertEqual(bool(regex.match(ur"(?iu)n\xE3o", u"n\xC3o")), True) self.assertEqual(bool(regex.match(ur"(?iu)n\xE3o", u"N\xC3O")), True) self.assertEqual(bool(regex.match(ur"(?iu)s", u"\u017F")), True) def test_case_folding(self): self.assertEqual(regex.search(ur"(?fiu)ss", u"SS").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)SS", u"ss").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)SS", u"\N{LATIN SMALL LETTER SHARP S}").span(), (0, 1)) self.assertEqual(regex.search(ur"(?fi)\N{LATIN SMALL LETTER SHARP S}", u"SS").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)\N{LATIN SMALL LIGATURE ST}", u"ST").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)ST", u"\N{LATIN SMALL LIGATURE ST}").span(), (0, 1)) self.assertEqual(regex.search(ur"(?fiu)ST", u"\N{LATIN SMALL LIGATURE LONG S T}").span(), (0, 1)) self.assertEqual(regex.search(ur"(?fiu)SST", u"\N{LATIN SMALL LETTER SHARP S}t").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)SST", u"s\N{LATIN SMALL LIGATURE LONG S T}").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)SST", u"s\N{LATIN SMALL LIGATURE ST}").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)\N{LATIN SMALL LIGATURE ST}", u"SST").span(), (1, 3)) self.assertEqual(regex.search(ur"(?fiu)SST", u"s\N{LATIN SMALL LIGATURE ST}").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)FFI", u"\N{LATIN SMALL LIGATURE FFI}").span(), (0, 1)) self.assertEqual(regex.search(ur"(?fiu)FFI", u"\N{LATIN SMALL LIGATURE FF}i").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)FFI", u"f\N{LATIN SMALL LIGATURE FI}").span(), (0, 2)) self.assertEqual(regex.search(ur"(?fiu)\N{LATIN SMALL LIGATURE FFI}", u"FFI").span(), (0, 3)) self.assertEqual(regex.search(ur"(?fiu)\N{LATIN SMALL LIGATURE FF}i", u"FFI").span(), (0, 3)) self.assertEqual(regex.search(ur"(?fiu)f\N{LATIN SMALL LIGATURE FI}", u"FFI").span(), (0, 3)) sigma = u"\u03A3\u03C3\u03C2" for ch1 in sigma: for ch2 in sigma: if not regex.match(ur"(?fiu)" + ch1, ch2): self.fail() self.assertEqual(bool(regex.search(ur"(?iuV1)ff", u"\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)ff", u"\uFB01\uFB00")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)fi", u"\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)fi", u"\uFB01\uFB00")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)fffi", u"\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)f\uFB03", u"\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)ff", u"\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)fi", u"\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)fffi", u"\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)f\uFB03", u"\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)f\uFB01", u"\uFB00i")), True) self.assertEqual(bool(regex.search(ur"(?iuV1)f\uFB01", u"\uFB00i")), True) self.assertEqual(regex.findall(ur"(?iuV0)\m(?:word){e<=3}\M(?ne", u"affine", options=[u"\N{LATIN SMALL LIGATURE FFI}"]).span(), (0, 6)) self.assertEqual(regex.search(ur"(?fi)a\Lne", u"a\N{LATIN SMALL LIGATURE FFI}ne", options=[u"ffi"]).span(), (0, 4)) def test_category(self): self.assertEqual(regex.match(r"(\s)", " ")[1], ' ') def test_not_literal(self): self.assertEqual(regex.search(r"\s([^a])", " b")[1], 'b') self.assertEqual(regex.search(r"\s([^a]*)", " bb")[1], 'bb') def test_search_coverage(self): self.assertEqual(regex.search(r"\s(b)", " b")[1], 'b') self.assertEqual(regex.search(r"a\s", "a ")[0], 'a ') def test_re_escape(self): p = "" self.assertEqual(regex.escape(p), p) for i in range(0, 256): p += chr(i) self.assertEqual(bool(regex.match(regex.escape(chr(i)), chr(i))), True) self.assertEqual(regex.match(regex.escape(chr(i)), chr(i)).span(), (0, 1)) pat = regex.compile(regex.escape(p)) self.assertEqual(pat.match(p).span(), (0, 256)) def test_constants(self): if regex.I != regex.IGNORECASE: self.fail() if regex.L != regex.LOCALE: self.fail() if regex.M != regex.MULTILINE: self.fail() if regex.S != regex.DOTALL: self.fail() if regex.X != regex.VERBOSE: self.fail() def test_flags(self): for flag in [regex.I, regex.M, regex.X, regex.S, regex.L]: self.assertEqual(repr(type(regex.compile('^pattern$', flag))), self.PATTERN_CLASS) def test_sre_character_literals(self): for i in [0, 8, 16, 32, 64, 127, 128, 255]: self.assertEqual(bool(regex.match(r"\%03o" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"\%03o0" % i, chr(i) + "0")), True) self.assertEqual(bool(regex.match(r"\%03o8" % i, chr(i) + "8")), True) self.assertEqual(bool(regex.match(r"\x%02x" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"\x%02x0" % i, chr(i) + "0")), True) self.assertEqual(bool(regex.match(r"\x%02xz" % i, chr(i) + "z")), True) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.match(r"\911", "")) def test_sre_character_class_literals(self): for i in [0, 8, 16, 32, 64, 127, 128, 255]: self.assertEqual(bool(regex.match(r"[\%03o]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\%03o0]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\%03o8]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\x%02x]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\x%02x0]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\x%02xz]" % i, chr(i))), True) self.assertRaisesRegex(regex.error, self.BAD_OCTAL_ESCAPE, lambda: regex.match(r"[\911]", "")) def test_bug_113254(self): self.assertEqual(regex.match(r'(a)|(b)', 'b').start(1), -1) self.assertEqual(regex.match(r'(a)|(b)', 'b').end(1), -1) self.assertEqual(regex.match(r'(a)|(b)', 'b').span(1), (-1, -1)) def test_bug_527371(self): # Bug described in patches 527371/672491. self.assertEqual(regex.match(r'(a)?a','a').lastindex, None) self.assertEqual(regex.match(r'(a)(b)?b','ab').lastindex, 1) self.assertEqual(regex.match(r'(?Pa)(?Pb)?b','ab').lastgroup, 'a') self.assertEqual(regex.match("(?Pa(b))", "ab").lastgroup, 'a') self.assertEqual(regex.match("((a))", "a").lastindex, 1) def test_bug_545855(self): # Bug 545855 -- This pattern failed to cause a compile error as it # should, instead provoking a TypeError. self.assertRaisesRegex(regex.error, self.BAD_SET, lambda: regex.compile('foo[a-')) def test_bug_418626(self): # Bugs 418626 at al. -- Testing Greg Chapman's addition of op code # SRE_OP_MIN_REPEAT_ONE for eliminating recursion on simple uses of # pattern '*?' on a long string. self.assertEqual(regex.match('.*?c', 10000 * 'ab' + 'cd').end(0), 20001) self.assertEqual(regex.match('.*?cd', 5000 * 'ab' + 'c' + 5000 * 'ab' + 'cde').end(0), 20003) self.assertEqual(regex.match('.*?cd', 20000 * 'abc' + 'de').end(0), 60001) # Non-simple '*?' still used to hit the recursion limit, before the # non-recursive scheme was implemented. self.assertEqual(regex.search('(a|b)*?c', 10000 * 'ab' + 'cd').end(0), 20001) def test_bug_612074(self): pat = u"[" + regex.escape(u"\u2039") + u"]" self.assertEqual(regex.compile(pat) and 1, 1) def test_stack_overflow(self): # Nasty cases that used to overflow the straightforward recursive # implementation of repeated groups. self.assertEqual(regex.match('(x)*', 50000 * 'x')[1], 'x') self.assertEqual(regex.match('(x)*y', 50000 * 'x' + 'y')[1], 'x') self.assertEqual(regex.match('(x)*?y', 50000 * 'x' + 'y')[1], 'x') def test_scanner(self): def s_ident(scanner, token): return token def s_operator(scanner, token): return "op%s" % token def s_float(scanner, token): return float(token) def s_int(scanner, token): return int(token) scanner = regex.Scanner([(r"[a-zA-Z_]\w*", s_ident), (r"\d+\.\d*", s_float), (r"\d+", s_int), (r"=|\+|-|\*|/", s_operator), (r"\s+", None), ]) self.assertEqual(repr(type(scanner.scanner.scanner("").pattern)), self.PATTERN_CLASS) self.assertEqual(scanner.scan("sum = 3*foo + 312.50 + bar"), (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')) def test_bug_448951(self): # Bug 448951 (similar to 429357, but with single char match). # (Also test greedy matches.) for op in '', '?', '*': self.assertEqual(regex.match(r'((.%s):)?z' % op, 'z')[:], ('z', None, None)) self.assertEqual(regex.match(r'((.%s):)?z' % op, 'a:z')[:], ('a:z', 'a:', 'a')) def test_bug_725106(self): # Capturing groups in alternatives in repeats. self.assertEqual(regex.match('^((a)|b)*', 'abc')[:], ('ab', 'b', 'a')) self.assertEqual(regex.match('^(([ab])|c)*', 'abc')[:], ('abc', 'c', 'b')) self.assertEqual(regex.match('^((d)|[ab])*', 'abc')[:], ('ab', 'b', None)) self.assertEqual(regex.match('^((a)c|[ab])*', 'abc')[:], ('ab', 'b', None)) self.assertEqual(regex.match('^((a)|b)*?c', 'abc')[:], ('abc', 'b', 'a')) self.assertEqual(regex.match('^(([ab])|c)*?d', 'abcd')[:], ('abcd', 'c', 'b')) self.assertEqual(regex.match('^((d)|[ab])*?c', 'abc')[:], ('abc', 'b', None)) self.assertEqual(regex.match('^((a)c|[ab])*?c', 'abc')[:], ('abc', 'b', None)) def test_bug_725149(self): # Mark_stack_base restoring before restoring marks. self.assertEqual(regex.match('(a)(?:(?=(b)*)c)*', 'abb')[:], ('a', 'a', None)) self.assertEqual(regex.match('(a)((?!(b)*))*', 'abb')[:], ('a', 'a', None, None)) def test_bug_764548(self): # Bug 764548, regex.compile() barfs on str/unicode subclasses. class my_unicode(str): pass pat = regex.compile(my_unicode("abc")) self.assertEqual(pat.match("xyz"), None) def test_finditer(self): it = regex.finditer(r":+", "a:b::c:::d") self.assertEqual([item[0] for item in it], [':', '::', ':::']) def test_bug_926075(self): if regex.compile('bug_926075') is regex.compile(u'bug_926075'): self.fail() def test_bug_931848(self): pattern = u"[\u002E\u3002\uFF0E\uFF61]" self.assertEqual(regex.compile(pattern).split("a.b.c"), ['a', 'b', 'c']) def test_bug_581080(self): it = regex.finditer(r"\s", "a b") self.assertEqual(it.next().span(), (1, 2)) self.assertRaises(StopIteration, lambda: it.next()) scanner = regex.compile(r"\s").scanner("a b") self.assertEqual(scanner.search().span(), (1, 2)) self.assertEqual(scanner.search(), None) def test_bug_817234(self): it = regex.finditer(r".*", "asdf") self.assertEqual(it.next().span(), (0, 4)) self.assertEqual(it.next().span(), (4, 4)) self.assertRaises(StopIteration, lambda: it.next()) def test_empty_array(self): # SF buf 1647541. import array for typecode in 'cbBuhHiIlLfd': a = array.array(typecode) self.assertEqual(regex.compile("bla").match(a), None) self.assertEqual(regex.compile("").match(a)[1 : ], ()) def test_inline_flags(self): # Bug #1700. upper_char = unichr(0x1ea0) # Latin Capital Letter A with Dot Below lower_char = unichr(0x1ea1) # Latin Small Letter A with Dot Below p = regex.compile(upper_char, regex.I | regex.U) self.assertEqual(bool(p.match(lower_char)), True) p = regex.compile(lower_char, regex.I | regex.U) self.assertEqual(bool(p.match(upper_char)), True) p = regex.compile('(?i)' + upper_char, regex.U) self.assertEqual(bool(p.match(lower_char)), True) p = regex.compile('(?i)' + lower_char, regex.U) self.assertEqual(bool(p.match(upper_char)), True) p = regex.compile('(?iu)' + upper_char) self.assertEqual(bool(p.match(lower_char)), True) p = regex.compile('(?iu)' + lower_char) self.assertEqual(bool(p.match(upper_char)), True) self.assertEqual(bool(regex.match(r"(?i)a", "A")), True) self.assertEqual(bool(regex.match(r"a(?i)", "A")), True) self.assertEqual(bool(regex.match(r"(?iV1)a", "A")), True) self.assertEqual(regex.match(r"a(?iV1)", "A"), None) def test_dollar_matches_twice(self): # $ matches the end of string, and just before the terminating \n. pattern = regex.compile('$') self.assertEqual(pattern.sub('#', 'a\nb\n'), 'a\nb#\n#') self.assertEqual(pattern.sub('#', 'a\nb\nc'), 'a\nb\nc#') self.assertEqual(pattern.sub('#', '\n'), '#\n#') pattern = regex.compile('$', regex.MULTILINE) self.assertEqual(pattern.sub('#', 'a\nb\n' ), 'a#\nb#\n#') self.assertEqual(pattern.sub('#', 'a\nb\nc'), 'a#\nb#\nc#') self.assertEqual(pattern.sub('#', '\n'), '#\n#') def test_ascii_and_unicode_flag(self): # Unicode patterns. for flags in (0, regex.UNICODE): pat = regex.compile(u'\xc0', flags | regex.IGNORECASE) self.assertEqual(bool(pat.match(u'\xe0')), True) pat = regex.compile(u'\w', flags) self.assertEqual(bool(pat.match(u'\xe0')), True) pat = regex.compile(u'\xc0', regex.ASCII | regex.IGNORECASE) self.assertEqual(pat.match(u'\xe0'), None) pat = regex.compile(u'(?a)\xc0', regex.IGNORECASE) self.assertEqual(pat.match(u'\xe0'), None) pat = regex.compile(u'\w', regex.ASCII) self.assertEqual(pat.match(u'\xe0'), None) pat = regex.compile(u'(?a)\w') self.assertEqual(pat.match(u'\xe0'), None) # String patterns. for flags in (0, regex.ASCII): pat = regex.compile('\xc0', flags | regex.IGNORECASE) self.assertEqual(pat.match('\xe0'), None) pat = regex.compile('\w') self.assertEqual(pat.match('\xe0'), None) self.assertRaisesRegex(ValueError, self.MIXED_FLAGS, lambda: regex.compile('(?au)\w')) def test_subscripting_match(self): m = regex.match(r'(?\w)', 'xy') if not m: self.fail("Failed: expected match but returned None") elif not m or m[0] != m.group(0) or m[1] != m.group(1): self.fail("Failed") if not m: self.fail("Failed: expected match but returned None") elif m[:] != ('x', 'x'): self.fail("Failed: expected \"('x', 'x')\" but got %s instead" % repr(m[:])) def test_new_named_groups(self): m0 = regex.match(r'(?P\w)', 'x') m1 = regex.match(r'(?\w)', 'x') if not (m0 and m1 and m0[:] == m1[:]): self.fail("Failed") def test_properties(self): self.assertEqual(regex.match('(?i)\xC0', '\xE0'), None) self.assertEqual(regex.match(r'(?i)\xC0', '\xE0'), None) self.assertEqual(regex.match(r'\w', '\xE0'), None) self.assertEqual(bool(regex.match(ur'(?u)\w', u'\xE0')), True) # Dropped the following test. It's not possible to determine what the # correct result should be in the general case. # self.assertEqual(bool(regex.match(r'(?L)\w', '\xE0')), # '\xE0'.isalnum()) self.assertEqual(bool(regex.match(r'(?L)\d', '0')), True) self.assertEqual(bool(regex.match(r'(?L)\s', ' ')), True) self.assertEqual(bool(regex.match(r'(?L)\w', 'a')), True) self.assertEqual(regex.match(r'(?L)\d', '?'), None) self.assertEqual(regex.match(r'(?L)\s', '?'), None) self.assertEqual(regex.match(r'(?L)\w', '?'), None) self.assertEqual(regex.match(r'(?L)\D', '0'), None) self.assertEqual(regex.match(r'(?L)\S', ' '), None) self.assertEqual(regex.match(r'(?L)\W', 'a'), None) self.assertEqual(bool(regex.match(r'(?L)\D', '?')), True) self.assertEqual(bool(regex.match(r'(?L)\S', '?')), True) self.assertEqual(bool(regex.match(r'(?L)\W', '?')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{Cyrillic}', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)(?iu)\p{Cyrillic}', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{IsCyrillic}', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{Script=Cyrillic}', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{InCyrillic}', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{Block=Cyrillic}', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:Cyrillic:]]', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:IsCyrillic:]]', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:Script=Cyrillic:]]', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:InCyrillic:]]', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:Block=Cyrillic:]]', u'\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\P{Cyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\P{IsCyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\P{Script=Cyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\P{InCyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\P{Block=Cyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{^Cyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{^IsCyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{^Script=Cyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{^InCyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{^Block=Cyrillic}', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:^Cyrillic:]]', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:^IsCyrillic:]]', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:^Script=Cyrillic:]]', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:^InCyrillic:]]', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)[[:^Block=Cyrillic:]]', u'\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(ur'(?u)\d', u'0')), True) self.assertEqual(bool(regex.match(ur'(?u)\s', u' ')), True) self.assertEqual(bool(regex.match(ur'(?u)\w', u'A')), True) self.assertEqual(regex.match(ur"(?u)\d", u"?"), None) self.assertEqual(regex.match(ur"(?u)\s", u"?"), None) self.assertEqual(regex.match(ur"(?u)\w", u"?"), None) self.assertEqual(regex.match(ur"(?u)\D", u"0"), None) self.assertEqual(regex.match(ur"(?u)\S", u" "), None) self.assertEqual(regex.match(ur"(?u)\W", u"A"), None) self.assertEqual(bool(regex.match(ur'(?u)\D', u'?')), True) self.assertEqual(bool(regex.match(ur'(?u)\S', u'?')), True) self.assertEqual(bool(regex.match(ur'(?u)\W', u'?')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{L}', u'A')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{L}', u'a')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{Lu}', u'A')), True) self.assertEqual(bool(regex.match(ur'(?u)\p{Ll}', u'a')), True) self.assertEqual(bool(regex.match(ur'(?u)(?i)a', u'a')), True) self.assertEqual(bool(regex.match(ur'(?u)(?i)a', u'A')), True) self.assertEqual(bool(regex.match(ur'(?u)\w', u'0')), True) self.assertEqual(bool(regex.match(ur'(?u)\w', u'a')), True) self.assertEqual(bool(regex.match(ur'(?u)\w', u'_')), True) self.assertEqual(regex.match(ur"(?u)\X", u"\xE0").span(), (0, 1)) self.assertEqual(regex.match(ur"(?u)\X", u"a\u0300").span(), (0, 2)) self.assertEqual(regex.findall(ur"(?u)\X", u"a\xE0a\u0300e\xE9e\u0301"), [u'a', u'\xe0', u'a\u0300', u'e', u'\xe9', u'e\u0301']) self.assertEqual(regex.findall(ur"(?u)\X{3}", u"a\xE0a\u0300e\xE9e\u0301"), [u'a\xe0a\u0300', u'e\xe9e\u0301']) self.assertEqual(regex.findall(ur"(?u)\X", u"\r\r\n\u0301A\u0301"), [u'\r', u'\r\n', u'\u0301', u'A\u0301']) self.assertEqual(bool(regex.match(ur'(?u)\p{Ll}', u'a')), True) chars_u = u"-09AZaz_\u0393\u03b3" chars_b = "-09AZaz_" word_set = set("Ll Lm Lo Lt Lu Mc Me Mn Nd Nl No Pc".split()) tests = [ (ur"(?u)\w", chars_u, u"09AZaz_\u0393\u03b3"), (ur"(?u)[[:word:]]", chars_u, u"09AZaz_\u0393\u03b3"), (ur"(?u)\W", chars_u, u"-"), (ur"(?u)[[:^word:]]", chars_u, u"-"), (ur"(?u)\d", chars_u, u"09"), (ur"(?u)[[:digit:]]", chars_u, u"09"), (ur"(?u)\D", chars_u, u"-AZaz_\u0393\u03b3"), (ur"(?u)[[:^digit:]]", chars_u, u"-AZaz_\u0393\u03b3"), (ur"(?u)[[:alpha:]]", chars_u, u"AZaz\u0393\u03b3"), (ur"(?u)[[:^alpha:]]", chars_u, u"-09_"), (ur"(?u)[[:alnum:]]", chars_u, u"09AZaz\u0393\u03b3"), (ur"(?u)[[:^alnum:]]", chars_u, u"-_"), (ur"(?u)[[:xdigit:]]", chars_u, u"09Aa"), (ur"(?u)[[:^xdigit:]]", chars_u, u"-Zz_\u0393\u03b3"), (ur"(?u)\p{InBasicLatin}", u"a\xE1", u"a"), (ur"(?u)\P{InBasicLatin}", u"a\xE1", u"\xE1"), (ur"(?iu)\p{InBasicLatin}", u"a\xE1", u"a"), (ur"(?iu)\P{InBasicLatin}", u"a\xE1", u"\xE1"), (r"(?L)\w", chars_b, "09AZaz_"), (r"(?L)[[:word:]]", chars_b, "09AZaz_"), (r"(?L)\W", chars_b, "-"), (r"(?L)[[:^word:]]", chars_b, "-"), (r"(?L)\d", chars_b, "09"), (r"(?L)[[:digit:]]", chars_b, "09"), (r"(?L)\D", chars_b, "-AZaz_"), (r"(?L)[[:^digit:]]", chars_b, "-AZaz_"), (r"(?L)[[:alpha:]]", chars_b, "AZaz"), (r"(?L)[[:^alpha:]]", chars_b, "-09_"), (r"(?L)[[:alnum:]]", chars_b, "09AZaz"), (r"(?L)[[:^alnum:]]", chars_b, "-_"), (r"(?L)[[:xdigit:]]", chars_b, "09Aa"), (r"(?L)[[:^xdigit:]]", chars_b, "-Zz_"), (r"\w", chars_b, "09AZaz_"), (r"[[:word:]]", chars_b, "09AZaz_"), (r"\W", chars_b, "-"), (r"[[:^word:]]", chars_b, "-"), (r"\d", chars_b, "09"), (r"[[:digit:]]", chars_b, "09"), (r"\D", chars_b, "-AZaz_"), (r"[[:^digit:]]", chars_b, "-AZaz_"), (r"[[:alpha:]]", chars_b, "AZaz"), (r"[[:^alpha:]]", chars_b, "-09_"), (r"[[:alnum:]]", chars_b, "09AZaz"), (r"[[:^alnum:]]", chars_b, "-_"), (r"[[:xdigit:]]", chars_b, "09Aa"), (r"[[:^xdigit:]]", chars_b, "-Zz_"), ] for pattern, chars, expected in tests: try: if chars[ : 0].join(regex.findall(pattern, chars)) != expected: self.fail("Failed: %s" % pattern) except Exception, e: self.fail("Failed: %s raised %s" % (pattern, repr(e))) self.assertEqual(bool(regex.match(ur"(?u)\p{NumericValue=0}", u"0")), True) self.assertEqual(bool(regex.match(ur"(?u)\p{NumericValue=1/2}", u"\N{VULGAR FRACTION ONE HALF}")), True) self.assertEqual(bool(regex.match(ur"(?u)\p{NumericValue=0.5}", u"\N{VULGAR FRACTION ONE HALF}")), True) def test_word_class(self): self.assertEqual(regex.findall(ur"(?u)\w+", u" \u0939\u093f\u0928\u094d\u0926\u0940,"), [u'\u0939\u093f\u0928\u094d\u0926\u0940']) self.assertEqual(regex.findall(ur"(?u)\W+", u" \u0939\u093f\u0928\u094d\u0926\u0940,"), [u' ', u',']) self.assertEqual(regex.split(ur"(?uV1)\b", u" \u0939\u093f\u0928\u094d\u0926\u0940,"), [u' ', u'\u0939\u093f\u0928\u094d\u0926\u0940', u',']) self.assertEqual(regex.split(ur"(?uV1)\B", u" \u0939\u093f\u0928\u094d\u0926\u0940,"), [u'', u' \u0939', u'\u093f', u'\u0928', u'\u094d', u'\u0926', u'\u0940,', u'']) def test_search_anchor(self): self.assertEqual(regex.findall(r"\G\w{2}", "abcd ef"), ['ab', 'cd']) def test_search_reverse(self): self.assertEqual(regex.findall(r"(?r).", "abc"), ['c', 'b', 'a']) self.assertEqual(regex.findall(r"(?r).", "abc", overlapped=True), ['c', 'b', 'a']) self.assertEqual(regex.findall(r"(?r)..", "abcde"), ['de', 'bc']) self.assertEqual(regex.findall(r"(?r)..", "abcde", overlapped=True), ['de', 'cd', 'bc', 'ab']) self.assertEqual(regex.findall(r"(?r)(.)(-)(.)", "a-b-c", overlapped=True), [("b", "-", "c"), ("a", "-", "b")]) self.assertEqual([m[0] for m in regex.finditer(r"(?r).", "abc")], ['c', 'b', 'a']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)..", "abcde", overlapped=True)], ['de', 'cd', 'bc', 'ab']) self.assertEqual([m[0] for m in regex.finditer(r"(?r).", "abc")], ['c', 'b', 'a']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)..", "abcde", overlapped=True)], ['de', 'cd', 'bc', 'ab']) self.assertEqual(regex.findall(r"^|\w+", "foo bar"), ['', 'foo', 'bar']) self.assertEqual(regex.findall(r"(?V1)^|\w+", "foo bar"), ['', 'foo', 'bar']) self.assertEqual(regex.findall(r"(?r)^|\w+", "foo bar"), ['bar', 'foo', '']) self.assertEqual(regex.findall(r"(?rV1)^|\w+", "foo bar"), ['bar', 'foo', '']) self.assertEqual([m[0] for m in regex.finditer(r"^|\w+", "foo bar")], ['', 'foo', 'bar']) self.assertEqual([m[0] for m in regex.finditer(r"(?V1)^|\w+", "foo bar")], ['', 'foo', 'bar']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)^|\w+", "foo bar")], ['bar', 'foo', '']) self.assertEqual([m[0] for m in regex.finditer(r"(?rV1)^|\w+", "foo bar")], ['bar', 'foo', '']) self.assertEqual(regex.findall(r"\G\w{2}", "abcd ef"), ['ab', 'cd']) self.assertEqual(regex.findall(r".{2}(?<=\G.*)", "abcd"), ['ab', 'cd']) self.assertEqual(regex.findall(r"(?r)\G\w{2}", "abcd ef"), []) self.assertEqual(regex.findall(r"(?r)\w{2}\G", "abcd ef"), ['ef']) self.assertEqual(regex.findall(r"q*", "qqwe"), ['qq', '', '', '']) self.assertEqual(regex.findall(r"(?V1)q*", "qqwe"), ['qq', '', '', '']) self.assertEqual(regex.findall(r"(?r)q*", "qqwe"), ['', '', 'qq', '']) self.assertEqual(regex.findall(r"(?rV1)q*", "qqwe"), ['', '', 'qq', '']) self.assertEqual(regex.findall(".", "abcd", pos=1, endpos=3), ['b', 'c']) self.assertEqual(regex.findall(".", "abcd", pos=1, endpos=-1), ['b', 'c']) self.assertEqual([m[0] for m in regex.finditer(".", "abcd", pos=1, endpos=3)], ['b', 'c']) self.assertEqual([m[0] for m in regex.finditer(".", "abcd", pos=1, endpos=-1)], ['b', 'c']) self.assertEqual([m[0] for m in regex.finditer("(?r).", "abcd", pos=1, endpos=3)], ['c', 'b']) self.assertEqual([m[0] for m in regex.finditer("(?r).", "abcd", pos=1, endpos=-1)], ['c', 'b']) self.assertEqual(regex.findall("(?r).", "abcd", pos=1, endpos=3), ['c', 'b']) self.assertEqual(regex.findall("(?r).", "abcd", pos=1, endpos=-1), ['c', 'b']) self.assertEqual(regex.findall(r"[ab]", "aB", regex.I), ['a', 'B']) self.assertEqual(regex.findall(r"(?r)[ab]", "aB", regex.I), ['B', 'a']) self.assertEqual(regex.findall(r"(?r).{2}", "abc"), ['bc']) self.assertEqual(regex.findall(r"(?r).{2}", "abc", overlapped=True), ['bc', 'ab']) self.assertEqual(regex.findall(r"(\w+) (\w+)", "first second third fourth fifth"), [('first', 'second'), ('third', 'fourth')]) self.assertEqual(regex.findall(r"(?r)(\w+) (\w+)", "first second third fourth fifth"), [('fourth', 'fifth'), ('second', 'third')]) self.assertEqual([m[0] for m in regex.finditer(r"(?r).{2}", "abc")], ['bc']) self.assertEqual([m[0] for m in regex.finditer(r"(?r).{2}", "abc", overlapped=True)], ['bc', 'ab']) self.assertEqual([m[0] for m in regex.finditer(r"(\w+) (\w+)", "first second third fourth fifth")], ['first second', 'third fourth']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)(\w+) (\w+)", "first second third fourth fifth")], ['fourth fifth', 'second third']) self.assertEqual(regex.search("abcdef", "abcdef").span(), (0, 6)) self.assertEqual(regex.search("(?r)abcdef", "abcdef").span(), (0, 6)) self.assertEqual(regex.search("(?i)abcdef", "ABCDEF").span(), (0, 6)) self.assertEqual(regex.search("(?ir)abcdef", "ABCDEF").span(), (0, 6)) self.assertEqual(regex.sub(r"(.)", r"\1", "abc"), 'abc') self.assertEqual(regex.sub(r"(?r)(.)", r"\1", "abc"), 'abc') def test_atomic(self): # Issue 433030. self.assertEqual(regex.search(r"(?>a*)a", "aa"), None) def test_possessive(self): # Single-character non-possessive. self.assertEqual(regex.search(r"a?a", "a").span(), (0, 1)) self.assertEqual(regex.search(r"a*a", "aaa").span(), (0, 3)) self.assertEqual(regex.search(r"a+a", "aaa").span(), (0, 3)) self.assertEqual(regex.search(r"a{1,3}a", "aaa").span(), (0, 3)) # Multiple-character non-possessive. self.assertEqual(regex.search(r"(?:ab)?ab", "ab").span(), (0, 2)) self.assertEqual(regex.search(r"(?:ab)*ab", "ababab").span(), (0, 6)) self.assertEqual(regex.search(r"(?:ab)+ab", "ababab").span(), (0, 6)) self.assertEqual(regex.search(r"(?:ab){1,3}ab", "ababab").span(), (0, 6)) # Single-character possessive. self.assertEqual(regex.search(r"a?+a", "a"), None) self.assertEqual(regex.search(r"a*+a", "aaa"), None) self.assertEqual(regex.search(r"a++a", "aaa"), None) self.assertEqual(regex.search(r"a{1,3}+a", "aaa"), None) # Multiple-character possessive. self.assertEqual(regex.search(r"(?:ab)?+ab", "ab"), None) self.assertEqual(regex.search(r"(?:ab)*+ab", "ababab"), None) self.assertEqual(regex.search(r"(?:ab)++ab", "ababab"), None) self.assertEqual(regex.search(r"(?:ab){1,3}+ab", "ababab"), None) def test_zerowidth(self): # Issue 3262. self.assertEqual(regex.split(r"\b", "a b"), ['a b']) self.assertEqual(regex.split(r"(?V1)\b", "a b"), ['', 'a', ' ', 'b', '']) # Issue 1647489. self.assertEqual(regex.findall(r"^|\w+", "foo bar"), ['', 'foo', 'bar']) self.assertEqual([m[0] for m in regex.finditer(r"^|\w+", "foo bar")], ['', 'foo', 'bar']) self.assertEqual(regex.findall(r"(?r)^|\w+", "foo bar"), ['bar', 'foo', '']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)^|\w+", "foo bar")], ['bar', 'foo', '']) self.assertEqual(regex.findall(r"(?V1)^|\w+", "foo bar"), ['', 'foo', 'bar']) self.assertEqual([m[0] for m in regex.finditer(r"(?V1)^|\w+", "foo bar")], ['', 'foo', 'bar']) self.assertEqual(regex.findall(r"(?rV1)^|\w+", "foo bar"), ['bar', 'foo', '']) self.assertEqual([m[0] for m in regex.finditer(r"(?rV1)^|\w+", "foo bar")], ['bar', 'foo', '']) self.assertEqual(regex.split("", "xaxbxc"), ['xaxbxc']) self.assertEqual([m for m in regex.splititer("", "xaxbxc")], ['xaxbxc']) self.assertEqual(regex.split("(?r)", "xaxbxc"), ['xaxbxc']) self.assertEqual([m for m in regex.splititer("(?r)", "xaxbxc")], ['xaxbxc']) self.assertEqual(regex.split("(?V1)", "xaxbxc"), ['', 'x', 'a', 'x', 'b', 'x', 'c', '']) self.assertEqual([m for m in regex.splititer("(?V1)", "xaxbxc")], ['', 'x', 'a', 'x', 'b', 'x', 'c', '']) self.assertEqual(regex.split("(?rV1)", "xaxbxc"), ['', 'c', 'x', 'b', 'x', 'a', 'x', '']) self.assertEqual([m for m in regex.splititer("(?rV1)", "xaxbxc")], ['', 'c', 'x', 'b', 'x', 'a', 'x', '']) def test_scoped_and_inline_flags(self): # Issues 433028, 433024, 433027. self.assertEqual(regex.search(r"(?i)Ab", "ab").span(), (0, 2)) self.assertEqual(regex.search(r"(?i:A)b", "ab").span(), (0, 2)) self.assertEqual(regex.search(r"A(?i)b", "ab").span(), (0, 2)) self.assertEqual(regex.search(r"A(?iV1)b", "ab"), None) self.assertRaisesRegex(regex.error, self.CANT_TURN_OFF, lambda: regex.search(r"(?V0-i)Ab", "ab", flags=regex.I)) self.assertEqual(regex.search(r"(?V0)Ab", "ab"), None) self.assertEqual(regex.search(r"(?V1)Ab", "ab"), None) self.assertEqual(regex.search(r"(?V1-i)Ab", "ab", flags=regex.I), None) self.assertEqual(regex.search(r"(?-i:A)b", "ab", flags=regex.I), None) self.assertEqual(regex.search(r"A(?V1-i)b", "ab", flags=regex.I).span(), (0, 2)) def test_repeated_repeats(self): # Issue 2537. self.assertEqual(regex.search(r"(?:a+)+", "aaa").span(), (0, 3)) self.assertEqual(regex.search(r"(?:(?:ab)+c)+", "abcabc").span(), (0, 6)) def test_lookbehind(self): self.assertEqual(regex.search(r"123(?<=a\d+)", "a123").span(), (1, 4)) self.assertEqual(regex.search(r"123(?<=a\d+)", "b123"), None) self.assertEqual(regex.search(r"123(?[ \t]+\r*$)|(?P(?<=[^\n])\Z)') self.assertEqual(pat.subn(lambda m: '<' + m.lastgroup + '>', 'foobar '), ('foobar', 1)) self.assertEqual([m.group() for m in pat.finditer('foobar ')], [' ', '']) pat = regex.compile(r'(?mV1)(?P[ \t]+\r*$)|(?P(?<=[^\n])\Z)') self.assertEqual(pat.subn(lambda m: '<' + m.lastgroup + '>', 'foobar '), ('foobar', 2)) self.assertEqual([m.group() for m in pat.finditer('foobar ')], [' ', '']) def test_overlapped(self): self.assertEqual(regex.findall(r"..", "abcde"), ['ab', 'cd']) self.assertEqual(regex.findall(r"..", "abcde", overlapped=True), ['ab', 'bc', 'cd', 'de']) self.assertEqual(regex.findall(r"(?r)..", "abcde"), ['de', 'bc']) self.assertEqual(regex.findall(r"(?r)..", "abcde", overlapped=True), ['de', 'cd', 'bc', 'ab']) self.assertEqual(regex.findall(r"(.)(-)(.)", "a-b-c", overlapped=True), [("a", "-", "b"), ("b", "-", "c")]) self.assertEqual([m[0] for m in regex.finditer(r"..", "abcde")], ['ab', 'cd']) self.assertEqual([m[0] for m in regex.finditer(r"..", "abcde", overlapped=True)], ['ab', 'bc', 'cd', 'de']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)..", "abcde")], ['de', 'bc']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)..", "abcde", overlapped=True)], ['de', 'cd', 'bc', 'ab']) self.assertEqual([m.groups() for m in regex.finditer(r"(.)(-)(.)", "a-b-c", overlapped=True)], [("a", "-", "b"), ("b", "-", "c")]) self.assertEqual([m.groups() for m in regex.finditer(r"(?r)(.)(-)(.)", "a-b-c", overlapped=True)], [("b", "-", "c"), ("a", "-", "b")]) def test_splititer(self): self.assertEqual(regex.split(r",", "a,b,,c,"), ['a', 'b', '', 'c', '']) self.assertEqual([m for m in regex.splititer(r",", "a,b,,c,")], ['a', 'b', '', 'c', '']) def test_grapheme(self): self.assertEqual(regex.match(ur"(?u)\X", u"\xE0").span(), (0, 1)) self.assertEqual(regex.match(ur"(?u)\X", u"a\u0300").span(), (0, 2)) self.assertEqual(regex.findall(ur"(?u)\X", u"a\xE0a\u0300e\xE9e\u0301"), [u'a', u'\xe0', u'a\u0300', u'e', u'\xe9', u'e\u0301']) self.assertEqual(regex.findall(ur"(?u)\X{3}", u"a\xE0a\u0300e\xE9e\u0301"), [u'a\xe0a\u0300', u'e\xe9e\u0301']) self.assertEqual(regex.findall(ur"(?u)\X", u"\r\r\n\u0301A\u0301"), [u'\r', u'\r\n', u'\u0301', u'A\u0301']) def test_word_boundary(self): text = u'The quick ("brown") fox can\'t jump 32.3 feet, right?' self.assertEqual(regex.split(ur'(?V1)\b', text), [u'', u'The', u' ', u'quick', u' ("', u'brown', u'") ', u'fox', u' ', u'can', u"'", u't', u' ', u'jump', u' ', u'32', u'.', u'3', u' ', u'feet', u', ', u'right', u'?']) self.assertEqual(regex.split(ur'(?V1w)\b', text), [u'', u'The', u' ', u'quick', u' ', u'(', u'"', u'brown', u'"', u')', u' ', u'fox', u' ', u"can't", u' ', u'jump', u' ', u'32.3', u' ', u'feet', u',', u' ', u'right', u'?', u'']) text = u"The fox" self.assertEqual(regex.split(ur'(?V1)\b', text), [u'', u'The', u' ', u'fox', u'']) self.assertEqual(regex.split(ur'(?V1w)\b', text), [u'', u'The', u' ', u' ', u'fox', u'']) text = u"can't aujourd'hui l'objectif" self.assertEqual(regex.split(ur'(?V1)\b', text), [u'', u'can', u"'", u't', u' ', u'aujourd', u"'", u'hui', u' ', u'l', u"'", u'objectif', u'']) self.assertEqual(regex.split(ur'(?V1w)\b', text), [u'', u"can't", u' ', u"aujourd'hui", u' ', u"l'", u'objectif', u'']) def test_line_boundary(self): self.assertEqual(regex.findall(r".+", "Line 1\nLine 2\n"), ["Line 1", "Line 2"]) self.assertEqual(regex.findall(r".+", "Line 1\rLine 2\r"), ["Line 1\rLine 2\r"]) self.assertEqual(regex.findall(r".+", "Line 1\r\nLine 2\r\n"), ["Line 1\r", "Line 2\r"]) self.assertEqual(regex.findall(r"(?w).+", "Line 1\nLine 2\n"), ["Line 1", "Line 2"]) self.assertEqual(regex.findall(r"(?w).+", "Line 1\rLine 2\r"), ["Line 1", "Line 2"]) self.assertEqual(regex.findall(r"(?w).+", "Line 1\r\nLine 2\r\n"), ["Line 1", "Line 2"]) self.assertEqual(regex.search(r"^abc", "abc").start(), 0) self.assertEqual(regex.search(r"^abc", "\nabc"), None) self.assertEqual(regex.search(r"^abc", "\rabc"), None) self.assertEqual(regex.search(r"(?w)^abc", "abc").start(), 0) self.assertEqual(regex.search(r"(?w)^abc", "\nabc"), None) self.assertEqual(regex.search(r"(?w)^abc", "\rabc"), None) self.assertEqual(regex.search(r"abc$", "abc").start(), 0) self.assertEqual(regex.search(r"abc$", "abc\n").start(), 0) self.assertEqual(regex.search(r"abc$", "abc\r"), None) self.assertEqual(regex.search(r"(?w)abc$", "abc").start(), 0) self.assertEqual(regex.search(r"(?w)abc$", "abc\n").start(), 0) self.assertEqual(regex.search(r"(?w)abc$", "abc\r").start(), 0) self.assertEqual(regex.search(r"(?m)^abc", "abc").start(), 0) self.assertEqual(regex.search(r"(?m)^abc", "\nabc").start(), 1) self.assertEqual(regex.search(r"(?m)^abc", "\rabc"), None) self.assertEqual(regex.search(r"(?mw)^abc", "abc").start(), 0) self.assertEqual(regex.search(r"(?mw)^abc", "\nabc").start(), 1) self.assertEqual(regex.search(r"(?mw)^abc", "\rabc").start(), 1) self.assertEqual(regex.search(r"(?m)abc$", "abc").start(), 0) self.assertEqual(regex.search(r"(?m)abc$", "abc\n").start(), 0) self.assertEqual(regex.search(r"(?m)abc$", "abc\r"), None) self.assertEqual(regex.search(r"(?mw)abc$", "abc").start(), 0) self.assertEqual(regex.search(r"(?mw)abc$", "abc\n").start(), 0) self.assertEqual(regex.search(r"(?mw)abc$", "abc\r").start(), 0) def test_branch_reset(self): self.assertEqual(regex.match(r"(?:(a)|(b))(c)", "ac").groups(), ('a', None, 'c')) self.assertEqual(regex.match(r"(?:(a)|(b))(c)", "bc").groups(), (None, 'b', 'c')) self.assertEqual(regex.match(r"(?:(?a)|(?b))(?c)", "ac").groups(), ('a', None, 'c')) self.assertEqual(regex.match(r"(?:(?a)|(?b))(?c)", "bc").groups(), (None, 'b', 'c')) self.assertEqual(regex.match(r"(?a)(?:(?b)|(?c))(?d)", "abd").groups(), ('a', 'b', None, 'd')) self.assertEqual(regex.match(r"(?a)(?:(?b)|(?c))(?d)", "acd").groups(), ('a', None, 'c', 'd')) self.assertEqual(regex.match(r"(a)(?:(b)|(c))(d)", "abd").groups(), ('a', 'b', None, 'd')) self.assertEqual(regex.match(r"(a)(?:(b)|(c))(d)", "acd").groups(), ('a', None, 'c', 'd')) self.assertEqual(regex.match(r"(a)(?|(b)|(b))(d)", "abd").groups(), ('a', 'b', 'd')) self.assertEqual(regex.match(r"(?|(?a)|(?b))(c)", "ac").groups(), ('a', None, 'c')) self.assertEqual(regex.match(r"(?|(?a)|(?b))(c)", "bc").groups(), (None, 'b', 'c')) self.assertEqual(regex.match(r"(?|(?a)|(?b))(c)", "ac").groups(), ('a', 'c')) self.assertEqual(regex.match(r"(?|(?a)|(?b))(c)", "bc").groups(), ('b', 'c')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(?c)(?d))(e)", "abe").groups(), ('a', 'b', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(?c)(?d))(e)", "cde").groups(), ('d', 'c', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(?c)(d))(e)", "abe").groups(), ('a', 'b', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(?c)(d))(e)", "cde").groups(), ('d', 'c', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(d))(e)", "abe").groups(), ('a', 'b', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(d))(e)", "cde").groups(), ('c', 'd', 'e')) # Hg issue 87. self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(?d))(e)", "abe").groups(), ("a", "b", "e")) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(?d))(e)", "abe").capturesdict(), {"a": ["a"], "b": ["b"]}) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(?d))(e)", "cde").groups(), ("d", None, "e")) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(?d))(e)", "cde").capturesdict(), {"a": ["c", "d"], "b": []}) def test_set(self): self.assertEqual(regex.match(r"[a]", "a").span(), (0, 1)) self.assertEqual(regex.match(r"(?i)[a]", "A").span(), (0, 1)) self.assertEqual(regex.match(r"[a-b]", r"a").span(), (0, 1)) self.assertEqual(regex.match(r"(?i)[a-b]", r"A").span(), (0, 1)) self.assertEqual(regex.sub(r"(?V0)([][])", r"-", "a[b]c"), "a-b-c") self.assertEqual(regex.findall(ur"[\p{Alpha}]", u"a0"), [u"a"]) self.assertEqual(regex.findall(ur"(?i)[\p{Alpha}]", u"A0"), [u"A"]) self.assertEqual(regex.findall(ur"[a\p{Alpha}]", u"ab0"), [u"a", u"b"]) self.assertEqual(regex.findall(ur"[a\P{Alpha}]", u"ab0"), [u"a", u"0"]) self.assertEqual(regex.findall(ur"(?i)[a\p{Alpha}]", u"ab0"), [u"a", u"b"]) self.assertEqual(regex.findall(ur"(?i)[a\P{Alpha}]", u"ab0"), [u"a", u"0"]) self.assertEqual(regex.findall(ur"[a-b\p{Alpha}]", u"abC0"), [u"a", u"b", u"C"]) self.assertEqual(regex.findall(ur"(?i)[a-b\p{Alpha}]", u"AbC0"), [u"A", u"b", u"C"]) self.assertEqual(regex.findall(ur"[\p{Alpha}]", u"a0"), [u"a"]) self.assertEqual(regex.findall(ur"[\P{Alpha}]", u"a0"), [u"0"]) self.assertEqual(regex.findall(ur"[^\p{Alpha}]", u"a0"), [u"0"]) self.assertEqual(regex.findall(ur"[^\P{Alpha}]", u"a0"), [u"a"]) self.assertEqual("".join(regex.findall(r"[^\d-h]", "a^b12c-h")), 'a^bc') self.assertEqual("".join(regex.findall(r"[^\dh]", "a^b12c-h")), 'a^bc-') self.assertEqual("".join(regex.findall(r"[^h\s\db]", "a^b 12c-h")), 'a^c-') self.assertEqual("".join(regex.findall(r"[^b\w]", "a b")), ' ') self.assertEqual("".join(regex.findall(r"[^b\S]", "a b")), ' ') self.assertEqual("".join(regex.findall(r"[^8\d]", "a 1b2")), 'a b') all_chars = u"".join(unichr(c) for c in range(0x100)) self.assertEqual(len(regex.findall(ur"(?u)\p{ASCII}", all_chars)), 128) self.assertEqual(len(regex.findall(ur"(?u)\p{Letter}", all_chars)), 117) self.assertEqual(len(regex.findall(ur"(?u)\p{Digit}", all_chars)), 10) # Set operators self.assertEqual(len(regex.findall(ur"(?uV1)[\p{ASCII}&&\p{Letter}]", all_chars)), 52) self.assertEqual(len(regex.findall(ur"(?uV1)[\p{ASCII}&&\p{Alnum}&&\p{Letter}]", all_chars)), 52) self.assertEqual(len(regex.findall(ur"(?uV1)[\p{ASCII}&&\p{Alnum}&&\p{Digit}]", all_chars)), 10) self.assertEqual(len(regex.findall(ur"(?uV1)[\p{ASCII}&&\p{Cc}]", all_chars)), 33) self.assertEqual(len(regex.findall(ur"(?uV1)[\p{ASCII}&&\p{Graph}]", all_chars)), 94) self.assertEqual(len(regex.findall(ur"(?uV1)[\p{ASCII}--\p{Cc}]", all_chars)), 95) self.assertEqual(len(regex.findall(ur"(?u)[\p{Letter}\p{Digit}]", all_chars)), 127) self.assertEqual(len(regex.findall(ur"(?uV1)[\p{Letter}||\p{Digit}]", all_chars)), 127) self.assertEqual(len(regex.findall(ur"(?u)\p{HexDigit}", all_chars)), 22) self.assertEqual(len(regex.findall(ur"(?uV1)[\p{HexDigit}~~\p{Digit}]", all_chars)), 12) self.assertEqual(len(regex.findall(ur"(?uV1)[\p{Digit}~~\p{HexDigit}]", all_chars)), 12) self.assertEqual(repr(type(regex.compile(r"(?V0)([][-])"))), self.PATTERN_CLASS) self.assertEqual(regex.findall(r"(?V1)[[a-z]--[aei]]", "abc"), ["b", "c"]) self.assertEqual(regex.findall(r"(?iV1)[[a-z]--[aei]]", "abc"), ["b", "c"]) self.assertEqual(regex.findall("(?V1)[\w--a]","abc"), ["b", "c"]) self.assertEqual(regex.findall("(?iV1)[\w--a]","abc"), ["b", "c"]) def test_various(self): tests = [ # Test ?P< and ?P= extensions. ('(?Pa)', '', '', regex.error, self.BAD_GROUP_NAME), # Begins with a digit. ('(?Pa)', '', '', regex.error, self.BAD_GROUP_NAME), # Begins with an illegal char. ('(?Pa)', '', '', regex.error, self.BAD_GROUP_NAME), # Begins with an illegal char. # Same tests, for the ?P= form. ('(?Pa)(?P=foo_123', 'aa', '', regex.error, self.MISSING_RPAREN), ('(?Pa)(?P=1)', 'aa', '1', repr('a')), ('(?Pa)(?P=0)', 'aa', '', regex.error, self.BAD_GROUP_NAME), ('(?Pa)(?P=-1)', 'aa', '', regex.error, self.BAD_GROUP_NAME), ('(?Pa)(?P=!)', 'aa', '', regex.error, self.BAD_GROUP_NAME), ('(?Pa)(?P=foo_124)', 'aa', '', regex.error, self.UNKNOWN_GROUP), # Backref to undefined group. ('(?Pa)', 'a', '1', repr('a')), ('(?Pa)(?P=foo_123)', 'aa', '1', repr('a')), # Mal-formed \g in pattern treated as literal for compatibility. (r'(?a)\ga)\g<1>', 'aa', '1', repr('a')), (r'(?a)\g', 'aa', '', repr(None)), (r'(?a)\g', 'aa', '', regex.error, self.UNKNOWN_GROUP), # Backref to undefined group. ('(?a)', 'a', '1', repr('a')), (r'(?a)\g', 'aa', '1', repr('a')), # Test octal escapes. ('\\1', 'a', '', regex.error, self.INVALID_GROUP_REF), # Backreference. ('[\\1]', '\1', '0', "'\\x01'"), # Character. ('\\09', chr(0) + '9', '0', repr(chr(0) + '9')), ('\\141', 'a', '0', repr('a')), ('(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)\\119', 'abcdefghijklk9', '0,11', repr(('abcdefghijklk9', 'k'))), # Test \0 is handled everywhere. (r'\0', '\0', '0', repr('\0')), (r'[\0a]', '\0', '0', repr('\0')), (r'[a\0]', '\0', '0', repr('\0')), (r'[^a\0]', '\0', '', repr(None)), # Test various letter escapes. (r'\a[\b]\f\n\r\t\v', '\a\b\f\n\r\t\v', '0', repr('\a\b\f\n\r\t\v')), (r'[\a][\b][\f][\n][\r][\t][\v]', '\a\b\f\n\r\t\v', '0', repr('\a\b\f\n\r\t\v')), (r'\c\e\g\h\i\j\k\o\p\q\y\z', 'ceghijkopqyz', '0', repr('ceghijkopqyz')), (r'\xff', '\377', '0', repr(chr(255))), # New \x semantics. (r'\x00ffffffffffffff', '\377', '', repr(None)), (r'\x00f', '\017', '', repr(None)), (r'\x00fe', '\376', '', repr(None)), (r'\x00ff', '\377', '', repr(None)), (r'\t\n\v\r\f\a\g', '\t\n\v\r\f\ag', '0', repr('\t\n\v\r\f\ag')), ('\t\n\v\r\f\a\g', '\t\n\v\r\f\ag', '0', repr('\t\n\v\r\f\ag')), (r'\t\n\v\r\f\a', '\t\n\v\r\f\a', '0', repr(chr(9) + chr(10) + chr(11) + chr(13) + chr(12) + chr(7))), (r'[\t][\n][\v][\r][\f][\b]', '\t\n\v\r\f\b', '0', repr('\t\n\v\r\f\b')), (r"^\w+=(\\[\000-\277]|[^\n\\])*", "SRC=eval.c g.c blah blah blah \\\\\n\tapes.c", '0', repr("SRC=eval.c g.c blah blah blah \\\\")), # Test that . only matches \n in DOTALL mode. ('a.b', 'acb', '0', repr('acb')), ('a.b', 'a\nb', '', repr(None)), ('a.*b', 'acc\nccb', '', repr(None)), ('a.{4,5}b', 'acc\nccb', '', repr(None)), ('a.b', 'a\rb', '0', repr('a\rb')), # The new behaviour is that the inline flag affects only what follows. ('a.b(?s)', 'a\nb', '0', repr('a\nb')), ('a.b(?sV1)', 'a\nb', '', repr(None)), ('(?s)a.b', 'a\nb', '0', repr('a\nb')), ('a.*(?s)b', 'acc\nccb', '0', repr('acc\nccb')), ('a.*(?sV1)b', 'acc\nccb', '', repr(None)), ('(?s)a.*b', 'acc\nccb', '0', repr('acc\nccb')), ('(?s)a.{4,5}b', 'acc\nccb', '0', repr('acc\nccb')), (')', '', '', regex.error, self.TRAILING_CHARS), # Unmatched right bracket. ('', '', '0', "''"), # Empty pattern. ('abc', 'abc', '0', repr('abc')), ('abc', 'xbc', '', repr(None)), ('abc', 'axc', '', repr(None)), ('abc', 'abx', '', repr(None)), ('abc', 'xabcy', '0', repr('abc')), ('abc', 'ababc', '0', repr('abc')), ('ab*c', 'abc', '0', repr('abc')), ('ab*bc', 'abc', '0', repr('abc')), ('ab*bc', 'abbc', '0', repr('abbc')), ('ab*bc', 'abbbbc', '0', repr('abbbbc')), ('ab+bc', 'abbc', '0', repr('abbc')), ('ab+bc', 'abc', '', repr(None)), ('ab+bc', 'abq', '', repr(None)), ('ab+bc', 'abbbbc', '0', repr('abbbbc')), ('ab?bc', 'abbc', '0', repr('abbc')), ('ab?bc', 'abc', '0', repr('abc')), ('ab?bc', 'abbbbc', '', repr(None)), ('ab?c', 'abc', '0', repr('abc')), ('^abc$', 'abc', '0', repr('abc')), ('^abc$', 'abcc', '', repr(None)), ('^abc', 'abcc', '0', repr('abc')), ('^abc$', 'aabc', '', repr(None)), ('abc$', 'aabc', '0', repr('abc')), ('^', 'abc', '0', repr('')), ('$', 'abc', '0', repr('')), ('a.c', 'abc', '0', repr('abc')), ('a.c', 'axc', '0', repr('axc')), ('a.*c', 'axyzc', '0', repr('axyzc')), ('a.*c', 'axyzd', '', repr(None)), ('a[bc]d', 'abc', '', repr(None)), ('a[bc]d', 'abd', '0', repr('abd')), ('a[b-d]e', 'abd', '', repr(None)), ('a[b-d]e', 'ace', '0', repr('ace')), ('a[b-d]', 'aac', '0', repr('ac')), ('a[-b]', 'a-', '0', repr('a-')), ('a[\\-b]', 'a-', '0', repr('a-')), ('a[b-]', 'a-', '0', repr('a-')), ('a[]b', '-', '', regex.error, self.BAD_SET), ('a[', '-', '', regex.error, self.BAD_SET), ('a\\', '-', '', regex.error, self.BAD_ESCAPE), ('abc)', '-', '', regex.error, self.TRAILING_CHARS), ('(abc', '-', '', regex.error, self.MISSING_RPAREN), ('a]', 'a]', '0', repr('a]')), ('a[]]b', 'a]b', '0', repr('a]b')), ('a[]]b', 'a]b', '0', repr('a]b')), ('a[^bc]d', 'aed', '0', repr('aed')), ('a[^bc]d', 'abd', '', repr(None)), ('a[^-b]c', 'adc', '0', repr('adc')), ('a[^-b]c', 'a-c', '', repr(None)), ('a[^]b]c', 'a]c', '', repr(None)), ('a[^]b]c', 'adc', '0', repr('adc')), ('\\ba\\b', 'a-', '0', repr('a')), ('\\ba\\b', '-a', '0', repr('a')), ('\\ba\\b', '-a-', '0', repr('a')), ('\\by\\b', 'xy', '', repr(None)), ('\\by\\b', 'yz', '', repr(None)), ('\\by\\b', 'xyz', '', repr(None)), ('x\\b', 'xyz', '', repr(None)), ('x\\B', 'xyz', '0', repr('x')), ('\\Bz', 'xyz', '0', repr('z')), ('z\\B', 'xyz', '', repr(None)), ('\\Bx', 'xyz', '', repr(None)), ('\\Ba\\B', 'a-', '', repr(None)), ('\\Ba\\B', '-a', '', repr(None)), ('\\Ba\\B', '-a-', '', repr(None)), ('\\By\\B', 'xy', '', repr(None)), ('\\By\\B', 'yz', '', repr(None)), ('\\By\\b', 'xy', '0', repr('y')), ('\\by\\B', 'yz', '0', repr('y')), ('\\By\\B', 'xyz', '0', repr('y')), ('ab|cd', 'abc', '0', repr('ab')), ('ab|cd', 'abcd', '0', repr('ab')), ('()ef', 'def', '0,1', repr(('ef', ''))), ('$b', 'b', '', repr(None)), ('a\\(b', 'a(b', '', repr(('a(b',))), ('a\\(*b', 'ab', '0', repr('ab')), ('a\\(*b', 'a((b', '0', repr('a((b')), ('a\\\\b', 'a\\b', '0', repr('a\\b')), ('((a))', 'abc', '0,1,2', repr(('a', 'a', 'a'))), ('(a)b(c)', 'abc', '0,1,2', repr(('abc', 'a', 'c'))), ('a+b+c', 'aabbabc', '0', repr('abc')), ('(a+|b)*', 'ab', '0,1', repr(('ab', 'b'))), ('(a+|b)+', 'ab', '0,1', repr(('ab', 'b'))), ('(a+|b)?', 'ab', '0,1', repr(('a', 'a'))), (')(', '-', '', regex.error, self.TRAILING_CHARS), ('[^ab]*', 'cde', '0', repr('cde')), ('abc', '', '', repr(None)), ('a*', '', '0', repr('')), ('a|b|c|d|e', 'e', '0', repr('e')), ('(a|b|c|d|e)f', 'ef', '0,1', repr(('ef', 'e'))), ('abcd*efg', 'abcdefg', '0', repr('abcdefg')), ('ab*', 'xabyabbbz', '0', repr('ab')), ('ab*', 'xayabbbz', '0', repr('a')), ('(ab|cd)e', 'abcde', '0,1', repr(('cde', 'cd'))), ('[abhgefdc]ij', 'hij', '0', repr('hij')), ('^(ab|cd)e', 'abcde', '', repr(None)), ('(abc|)ef', 'abcdef', '0,1', repr(('ef', ''))), ('(a|b)c*d', 'abcd', '0,1', repr(('bcd', 'b'))), ('(ab|ab*)bc', 'abc', '0,1', repr(('abc', 'a'))), ('a([bc]*)c*', 'abc', '0,1', repr(('abc', 'bc'))), ('a([bc]*)(c*d)', 'abcd', '0,1,2', repr(('abcd', 'bc', 'd'))), ('a([bc]+)(c*d)', 'abcd', '0,1,2', repr(('abcd', 'bc', 'd'))), ('a([bc]*)(c+d)', 'abcd', '0,1,2', repr(('abcd', 'b', 'cd'))), ('a[bcd]*dcdcde', 'adcdcde', '0', repr('adcdcde')), ('a[bcd]+dcdcde', 'adcdcde', '', repr(None)), ('(ab|a)b*c', 'abc', '0,1', repr(('abc', 'ab'))), ('((a)(b)c)(d)', 'abcd', '1,2,3,4', repr(('abc', 'a', 'b', 'd'))), ('[a-zA-Z_][a-zA-Z0-9_]*', 'alpha', '0', repr('alpha')), ('^a(bc+|b[eh])g|.h$', 'abh', '0,1', repr(('bh', None))), ('(bc+d$|ef*g.|h?i(j|k))', 'effgz', '0,1,2', repr(('effgz', 'effgz', None))), ('(bc+d$|ef*g.|h?i(j|k))', 'ij', '0,1,2', repr(('ij', 'ij', 'j'))), ('(bc+d$|ef*g.|h?i(j|k))', 'effg', '', repr(None)), ('(bc+d$|ef*g.|h?i(j|k))', 'bcdd', '', repr(None)), ('(bc+d$|ef*g.|h?i(j|k))', 'reffgz', '0,1,2', repr(('effgz', 'effgz', None))), ('(((((((((a)))))))))', 'a', '0', repr('a')), ('multiple words of text', 'uh-uh', '', repr(None)), ('multiple words', 'multiple words, yeah', '0', repr('multiple words')), ('(.*)c(.*)', 'abcde', '0,1,2', repr(('abcde', 'ab', 'de'))), ('\\((.*), (.*)\\)', '(a, b)', '2,1', repr(('b', 'a'))), ('[k]', 'ab', '', repr(None)), ('a[-]?c', 'ac', '0', repr('ac')), ('(abc)\\1', 'abcabc', '1', repr('abc')), ('([a-c]*)\\1', 'abcabc', '1', repr('abc')), ('^(.+)?B', 'AB', '1', repr('A')), ('(a+).\\1$', 'aaaaa', '0,1', repr(('aaaaa', 'aa'))), ('^(a+).\\1$', 'aaaa', '', repr(None)), ('(abc)\\1', 'abcabc', '0,1', repr(('abcabc', 'abc'))), ('([a-c]+)\\1', 'abcabc', '0,1', repr(('abcabc', 'abc'))), ('(a)\\1', 'aa', '0,1', repr(('aa', 'a'))), ('(a+)\\1', 'aa', '0,1', repr(('aa', 'a'))), ('(a+)+\\1', 'aa', '0,1', repr(('aa', 'a'))), ('(a).+\\1', 'aba', '0,1', repr(('aba', 'a'))), ('(a)ba*\\1', 'aba', '0,1', repr(('aba', 'a'))), ('(aa|a)a\\1$', 'aaa', '0,1', repr(('aaa', 'a'))), ('(a|aa)a\\1$', 'aaa', '0,1', repr(('aaa', 'a'))), ('(a+)a\\1$', 'aaa', '0,1', repr(('aaa', 'a'))), ('([abc]*)\\1', 'abcabc', '0,1', repr(('abcabc', 'abc'))), ('(a)(b)c|ab', 'ab', '0,1,2', repr(('ab', None, None))), ('(a)+x', 'aaax', '0,1', repr(('aaax', 'a'))), ('([ac])+x', 'aacx', '0,1', repr(('aacx', 'c'))), ('([^/]*/)*sub1/', 'd:msgs/tdir/sub1/trial/away.cpp', '0,1', repr(('d:msgs/tdir/sub1/', 'tdir/'))), ('([^.]*)\\.([^:]*):[T ]+(.*)', 'track1.title:TBlah blah blah', '0,1,2,3', repr(('track1.title:TBlah blah blah', 'track1', 'title', 'Blah blah blah'))), ('([^N]*N)+', 'abNNxyzN', '0,1', repr(('abNNxyzN', 'xyzN'))), ('([^N]*N)+', 'abNNxyz', '0,1', repr(('abNN', 'N'))), ('([abc]*)x', 'abcx', '0,1', repr(('abcx', 'abc'))), ('([abc]*)x', 'abc', '', repr(None)), ('([xyz]*)x', 'abcx', '0,1', repr(('x', ''))), ('(a)+b|aac', 'aac', '0,1', repr(('aac', None))), # Test symbolic groups. ('(?Paaa)a', 'aaaa', '', regex.error, self.BAD_GROUP_NAME), ('(?Paaa)a', 'aaaa', '0,id', repr(('aaaa', 'aaa'))), ('(?Paa)(?P=id)', 'aaaa', '0,id', repr(('aaaa', 'aa'))), ('(?Paa)(?P=xd)', 'aaaa', '', regex.error, self.UNKNOWN_GROUP), # Character properties. (ur"\g", u"g", '0', repr(u'g')), (ur"\g<1>", u"g", '', regex.error, self.INVALID_GROUP_REF), (ur"(.)\g<1>", u"gg", '0', repr(u'gg')), (ur"(.)\g<1>", u"gg", '', repr((u'gg', u'g'))), (ur"\N", u"N", '0', repr(u'N')), (ur"\N{LATIN SMALL LETTER A}", u"a", '0', repr(u'a')), (ur"\p", u"p", '0', repr(u'p')), (ur"\p{Ll}", u"a", '0', repr(u'a')), (ur"\P", u"P", '0', repr(u'P')), (ur"\P{Lu}", u"p", '0', repr(u'p')), # All tests from Perl. ('abc', 'abc', '0', repr('abc')), ('abc', 'xbc', '', repr(None)), ('abc', 'axc', '', repr(None)), ('abc', 'abx', '', repr(None)), ('abc', 'xabcy', '0', repr('abc')), ('abc', 'ababc', '0', repr('abc')), ('ab*c', 'abc', '0', repr('abc')), ('ab*bc', 'abc', '0', repr('abc')), ('ab*bc', 'abbc', '0', repr('abbc')), ('ab*bc', 'abbbbc', '0', repr('abbbbc')), ('ab{0,}bc', 'abbbbc', '0', repr('abbbbc')), ('ab+bc', 'abbc', '0', repr('abbc')), ('ab+bc', 'abc', '', repr(None)), ('ab+bc', 'abq', '', repr(None)), ('ab{1,}bc', 'abq', '', repr(None)), ('ab+bc', 'abbbbc', '0', repr('abbbbc')), ('ab{1,}bc', 'abbbbc', '0', repr('abbbbc')), ('ab{1,3}bc', 'abbbbc', '0', repr('abbbbc')), ('ab{3,4}bc', 'abbbbc', '0', repr('abbbbc')), ('ab{4,5}bc', 'abbbbc', '', repr(None)), ('ab?bc', 'abbc', '0', repr('abbc')), ('ab?bc', 'abc', '0', repr('abc')), ('ab{0,1}bc', 'abc', '0', repr('abc')), ('ab?bc', 'abbbbc', '', repr(None)), ('ab?c', 'abc', '0', repr('abc')), ('ab{0,1}c', 'abc', '0', repr('abc')), ('^abc$', 'abc', '0', repr('abc')), ('^abc$', 'abcc', '', repr(None)), ('^abc', 'abcc', '0', repr('abc')), ('^abc$', 'aabc', '', repr(None)), ('abc$', 'aabc', '0', repr('abc')), ('^', 'abc', '0', repr('')), ('$', 'abc', '0', repr('')), ('a.c', 'abc', '0', repr('abc')), ('a.c', 'axc', '0', repr('axc')), ('a.*c', 'axyzc', '0', repr('axyzc')), ('a.*c', 'axyzd', '', repr(None)), ('a[bc]d', 'abc', '', repr(None)), ('a[bc]d', 'abd', '0', repr('abd')), ('a[b-d]e', 'abd', '', repr(None)), ('a[b-d]e', 'ace', '0', repr('ace')), ('a[b-d]', 'aac', '0', repr('ac')), ('a[-b]', 'a-', '0', repr('a-')), ('a[b-]', 'a-', '0', repr('a-')), ('a[b-a]', '-', '', regex.error, self.BAD_CHAR_RANGE), ('a[]b', '-', '', regex.error, self.BAD_SET), ('a[', '-', '', regex.error, self.BAD_SET), ('a]', 'a]', '0', repr('a]')), ('a[]]b', 'a]b', '0', repr('a]b')), ('a[^bc]d', 'aed', '0', repr('aed')), ('a[^bc]d', 'abd', '', repr(None)), ('a[^-b]c', 'adc', '0', repr('adc')), ('a[^-b]c', 'a-c', '', repr(None)), ('a[^]b]c', 'a]c', '', repr(None)), ('a[^]b]c', 'adc', '0', repr('adc')), ('ab|cd', 'abc', '0', repr('ab')), ('ab|cd', 'abcd', '0', repr('ab')), ('()ef', 'def', '0,1', repr(('ef', ''))), ('*a', '-', '', regex.error, self.NOTHING_TO_REPEAT), ('(*)b', '-', '', regex.error, self.NOTHING_TO_REPEAT), ('$b', 'b', '', repr(None)), ('a\\', '-', '', regex.error, self.BAD_ESCAPE), ('a\\(b', 'a(b', '', repr(('a(b',))), ('a\\(*b', 'ab', '0', repr('ab')), ('a\\(*b', 'a((b', '0', repr('a((b')), ('a\\\\b', 'a\\b', '0', repr('a\\b')), ('abc)', '-', '', regex.error, self.TRAILING_CHARS), ('(abc', '-', '', regex.error, self.MISSING_RPAREN), ('((a))', 'abc', '0,1,2', repr(('a', 'a', 'a'))), ('(a)b(c)', 'abc', '0,1,2', repr(('abc', 'a', 'c'))), ('a+b+c', 'aabbabc', '0', repr('abc')), ('a{1,}b{1,}c', 'aabbabc', '0', repr('abc')), ('a**', '-', '', regex.error, self.MULTIPLE_REPEAT), ('a.+?c', 'abcabc', '0', repr('abc')), ('(a+|b)*', 'ab', '0,1', repr(('ab', 'b'))), ('(a+|b){0,}', 'ab', '0,1', repr(('ab', 'b'))), ('(a+|b)+', 'ab', '0,1', repr(('ab', 'b'))), ('(a+|b){1,}', 'ab', '0,1', repr(('ab', 'b'))), ('(a+|b)?', 'ab', '0,1', repr(('a', 'a'))), ('(a+|b){0,1}', 'ab', '0,1', repr(('a', 'a'))), (')(', '-', '', regex.error, self.TRAILING_CHARS), ('[^ab]*', 'cde', '0', repr('cde')), ('abc', '', '', repr(None)), ('a*', '', '0', repr('')), ('([abc])*d', 'abbbcd', '0,1', repr(('abbbcd', 'c'))), ('([abc])*bcd', 'abcd', '0,1', repr(('abcd', 'a'))), ('a|b|c|d|e', 'e', '0', repr('e')), ('(a|b|c|d|e)f', 'ef', '0,1', repr(('ef', 'e'))), ('abcd*efg', 'abcdefg', '0', repr('abcdefg')), ('ab*', 'xabyabbbz', '0', repr('ab')), ('ab*', 'xayabbbz', '0', repr('a')), ('(ab|cd)e', 'abcde', '0,1', repr(('cde', 'cd'))), ('[abhgefdc]ij', 'hij', '0', repr('hij')), ('^(ab|cd)e', 'abcde', '', repr(None)), ('(abc|)ef', 'abcdef', '0,1', repr(('ef', ''))), ('(a|b)c*d', 'abcd', '0,1', repr(('bcd', 'b'))), ('(ab|ab*)bc', 'abc', '0,1', repr(('abc', 'a'))), ('a([bc]*)c*', 'abc', '0,1', repr(('abc', 'bc'))), ('a([bc]*)(c*d)', 'abcd', '0,1,2', repr(('abcd', 'bc', 'd'))), ('a([bc]+)(c*d)', 'abcd', '0,1,2', repr(('abcd', 'bc', 'd'))), ('a([bc]*)(c+d)', 'abcd', '0,1,2', repr(('abcd', 'b', 'cd'))), ('a[bcd]*dcdcde', 'adcdcde', '0', repr('adcdcde')), ('a[bcd]+dcdcde', 'adcdcde', '', repr(None)), ('(ab|a)b*c', 'abc', '0,1', repr(('abc', 'ab'))), ('((a)(b)c)(d)', 'abcd', '1,2,3,4', repr(('abc', 'a', 'b', 'd'))), ('[a-zA-Z_][a-zA-Z0-9_]*', 'alpha', '0', repr('alpha')), ('^a(bc+|b[eh])g|.h$', 'abh', '0,1', repr(('bh', None))), ('(bc+d$|ef*g.|h?i(j|k))', 'effgz', '0,1,2', repr(('effgz', 'effgz', None))), ('(bc+d$|ef*g.|h?i(j|k))', 'ij', '0,1,2', repr(('ij', 'ij', 'j'))), ('(bc+d$|ef*g.|h?i(j|k))', 'effg', '', repr(None)), ('(bc+d$|ef*g.|h?i(j|k))', 'bcdd', '', repr(None)), ('(bc+d$|ef*g.|h?i(j|k))', 'reffgz', '0,1,2', repr(('effgz', 'effgz', None))), ('((((((((((a))))))))))', 'a', '10', repr('a')), ('((((((((((a))))))))))\\10', 'aa', '0', repr('aa')), # Python does not have the same rules for \\41 so this is a syntax error # ('((((((((((a))))))))))\\41', 'aa', '', repr(None)), # ('((((((((((a))))))))))\\41', 'a!', '0', repr('a!')), ('((((((((((a))))))))))\\41', '', '', regex.error, self.INVALID_GROUP_REF), ('(?i)((((((((((a))))))))))\\41', '', '', regex.error, self.INVALID_GROUP_REF), ('(((((((((a)))))))))', 'a', '0', repr('a')), ('multiple words of text', 'uh-uh', '', repr(None)), ('multiple words', 'multiple words, yeah', '0', repr('multiple words')), ('(.*)c(.*)', 'abcde', '0,1,2', repr(('abcde', 'ab', 'de'))), ('\\((.*), (.*)\\)', '(a, b)', '2,1', repr(('b', 'a'))), ('[k]', 'ab', '', repr(None)), ('a[-]?c', 'ac', '0', repr('ac')), ('(abc)\\1', 'abcabc', '1', repr('abc')), ('([a-c]*)\\1', 'abcabc', '1', repr('abc')), ('(?i)abc', 'ABC', '0', repr('ABC')), ('(?i)abc', 'XBC', '', repr(None)), ('(?i)abc', 'AXC', '', repr(None)), ('(?i)abc', 'ABX', '', repr(None)), ('(?i)abc', 'XABCY', '0', repr('ABC')), ('(?i)abc', 'ABABC', '0', repr('ABC')), ('(?i)ab*c', 'ABC', '0', repr('ABC')), ('(?i)ab*bc', 'ABC', '0', repr('ABC')), ('(?i)ab*bc', 'ABBC', '0', repr('ABBC')), ('(?i)ab*?bc', 'ABBBBC', '0', repr('ABBBBC')), ('(?i)ab{0,}?bc', 'ABBBBC', '0', repr('ABBBBC')), ('(?i)ab+?bc', 'ABBC', '0', repr('ABBC')), ('(?i)ab+bc', 'ABC', '', repr(None)), ('(?i)ab+bc', 'ABQ', '', repr(None)), ('(?i)ab{1,}bc', 'ABQ', '', repr(None)), ('(?i)ab+bc', 'ABBBBC', '0', repr('ABBBBC')), ('(?i)ab{1,}?bc', 'ABBBBC', '0', repr('ABBBBC')), ('(?i)ab{1,3}?bc', 'ABBBBC', '0', repr('ABBBBC')), ('(?i)ab{3,4}?bc', 'ABBBBC', '0', repr('ABBBBC')), ('(?i)ab{4,5}?bc', 'ABBBBC', '', repr(None)), ('(?i)ab??bc', 'ABBC', '0', repr('ABBC')), ('(?i)ab??bc', 'ABC', '0', repr('ABC')), ('(?i)ab{0,1}?bc', 'ABC', '0', repr('ABC')), ('(?i)ab??bc', 'ABBBBC', '', repr(None)), ('(?i)ab??c', 'ABC', '0', repr('ABC')), ('(?i)ab{0,1}?c', 'ABC', '0', repr('ABC')), ('(?i)^abc$', 'ABC', '0', repr('ABC')), ('(?i)^abc$', 'ABCC', '', repr(None)), ('(?i)^abc', 'ABCC', '0', repr('ABC')), ('(?i)^abc$', 'AABC', '', repr(None)), ('(?i)abc$', 'AABC', '0', repr('ABC')), ('(?i)^', 'ABC', '0', repr('')), ('(?i)$', 'ABC', '0', repr('')), ('(?i)a.c', 'ABC', '0', repr('ABC')), ('(?i)a.c', 'AXC', '0', repr('AXC')), ('(?i)a.*?c', 'AXYZC', '0', repr('AXYZC')), ('(?i)a.*c', 'AXYZD', '', repr(None)), ('(?i)a[bc]d', 'ABC', '', repr(None)), ('(?i)a[bc]d', 'ABD', '0', repr('ABD')), ('(?i)a[b-d]e', 'ABD', '', repr(None)), ('(?i)a[b-d]e', 'ACE', '0', repr('ACE')), ('(?i)a[b-d]', 'AAC', '0', repr('AC')), ('(?i)a[-b]', 'A-', '0', repr('A-')), ('(?i)a[b-]', 'A-', '0', repr('A-')), ('(?i)a[b-a]', '-', '', regex.error, self.BAD_CHAR_RANGE), ('(?i)a[]b', '-', '', regex.error, self.BAD_SET), ('(?i)a[', '-', '', regex.error, self.BAD_SET), ('(?i)a]', 'A]', '0', repr('A]')), ('(?i)a[]]b', 'A]B', '0', repr('A]B')), ('(?i)a[^bc]d', 'AED', '0', repr('AED')), ('(?i)a[^bc]d', 'ABD', '', repr(None)), ('(?i)a[^-b]c', 'ADC', '0', repr('ADC')), ('(?i)a[^-b]c', 'A-C', '', repr(None)), ('(?i)a[^]b]c', 'A]C', '', repr(None)), ('(?i)a[^]b]c', 'ADC', '0', repr('ADC')), ('(?i)ab|cd', 'ABC', '0', repr('AB')), ('(?i)ab|cd', 'ABCD', '0', repr('AB')), ('(?i)()ef', 'DEF', '0,1', repr(('EF', ''))), ('(?i)*a', '-', '', regex.error, self.NOTHING_TO_REPEAT), ('(?i)(*)b', '-', '', regex.error, self.NOTHING_TO_REPEAT), ('(?i)$b', 'B', '', repr(None)), ('(?i)a\\', '-', '', regex.error, self.BAD_ESCAPE), ('(?i)a\\(b', 'A(B', '', repr(('A(B',))), ('(?i)a\\(*b', 'AB', '0', repr('AB')), ('(?i)a\\(*b', 'A((B', '0', repr('A((B')), ('(?i)a\\\\b', 'A\\B', '0', repr('A\\B')), ('(?i)abc)', '-', '', regex.error, self.TRAILING_CHARS), ('(?i)(abc', '-', '', regex.error, self.MISSING_RPAREN), ('(?i)((a))', 'ABC', '0,1,2', repr(('A', 'A', 'A'))), ('(?i)(a)b(c)', 'ABC', '0,1,2', repr(('ABC', 'A', 'C'))), ('(?i)a+b+c', 'AABBABC', '0', repr('ABC')), ('(?i)a{1,}b{1,}c', 'AABBABC', '0', repr('ABC')), ('(?i)a**', '-', '', regex.error, self.MULTIPLE_REPEAT), ('(?i)a.+?c', 'ABCABC', '0', repr('ABC')), ('(?i)a.*?c', 'ABCABC', '0', repr('ABC')), ('(?i)a.{0,5}?c', 'ABCABC', '0', repr('ABC')), ('(?i)(a+|b)*', 'AB', '0,1', repr(('AB', 'B'))), ('(?i)(a+|b){0,}', 'AB', '0,1', repr(('AB', 'B'))), ('(?i)(a+|b)+', 'AB', '0,1', repr(('AB', 'B'))), ('(?i)(a+|b){1,}', 'AB', '0,1', repr(('AB', 'B'))), ('(?i)(a+|b)?', 'AB', '0,1', repr(('A', 'A'))), ('(?i)(a+|b){0,1}', 'AB', '0,1', repr(('A', 'A'))), ('(?i)(a+|b){0,1}?', 'AB', '0,1', repr(('', None))), ('(?i))(', '-', '', regex.error, self.TRAILING_CHARS), ('(?i)[^ab]*', 'CDE', '0', repr('CDE')), ('(?i)abc', '', '', repr(None)), ('(?i)a*', '', '0', repr('')), ('(?i)([abc])*d', 'ABBBCD', '0,1', repr(('ABBBCD', 'C'))), ('(?i)([abc])*bcd', 'ABCD', '0,1', repr(('ABCD', 'A'))), ('(?i)a|b|c|d|e', 'E', '0', repr('E')), ('(?i)(a|b|c|d|e)f', 'EF', '0,1', repr(('EF', 'E'))), ('(?i)abcd*efg', 'ABCDEFG', '0', repr('ABCDEFG')), ('(?i)ab*', 'XABYABBBZ', '0', repr('AB')), ('(?i)ab*', 'XAYABBBZ', '0', repr('A')), ('(?i)(ab|cd)e', 'ABCDE', '0,1', repr(('CDE', 'CD'))), ('(?i)[abhgefdc]ij', 'HIJ', '0', repr('HIJ')), ('(?i)^(ab|cd)e', 'ABCDE', '', repr(None)), ('(?i)(abc|)ef', 'ABCDEF', '0,1', repr(('EF', ''))), ('(?i)(a|b)c*d', 'ABCD', '0,1', repr(('BCD', 'B'))), ('(?i)(ab|ab*)bc', 'ABC', '0,1', repr(('ABC', 'A'))), ('(?i)a([bc]*)c*', 'ABC', '0,1', repr(('ABC', 'BC'))), ('(?i)a([bc]*)(c*d)', 'ABCD', '0,1,2', repr(('ABCD', 'BC', 'D'))), ('(?i)a([bc]+)(c*d)', 'ABCD', '0,1,2', repr(('ABCD', 'BC', 'D'))), ('(?i)a([bc]*)(c+d)', 'ABCD', '0,1,2', repr(('ABCD', 'B', 'CD'))), ('(?i)a[bcd]*dcdcde', 'ADCDCDE', '0', repr('ADCDCDE')), ('(?i)a[bcd]+dcdcde', 'ADCDCDE', '', repr(None)), ('(?i)(ab|a)b*c', 'ABC', '0,1', repr(('ABC', 'AB'))), ('(?i)((a)(b)c)(d)', 'ABCD', '1,2,3,4', repr(('ABC', 'A', 'B', 'D'))), ('(?i)[a-zA-Z_][a-zA-Z0-9_]*', 'ALPHA', '0', repr('ALPHA')), ('(?i)^a(bc+|b[eh])g|.h$', 'ABH', '0,1', repr(('BH', None))), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'EFFGZ', '0,1,2', repr(('EFFGZ', 'EFFGZ', None))), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'IJ', '0,1,2', repr(('IJ', 'IJ', 'J'))), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'EFFG', '', repr(None)), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'BCDD', '', repr(None)), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'REFFGZ', '0,1,2', repr(('EFFGZ', 'EFFGZ', None))), ('(?i)((((((((((a))))))))))', 'A', '10', repr('A')), ('(?i)((((((((((a))))))))))\\10', 'AA', '0', repr('AA')), #('(?i)((((((((((a))))))))))\\41', 'AA', '', repr(None)), #('(?i)((((((((((a))))))))))\\41', 'A!', '0', repr('A!')), ('(?i)(((((((((a)))))))))', 'A', '0', repr('A')), ('(?i)(?:(?:(?:(?:(?:(?:(?:(?:(?:(a))))))))))', 'A', '1', repr('A')), ('(?i)(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))', 'C', '1', repr('C')), ('(?i)multiple words of text', 'UH-UH', '', repr(None)), ('(?i)multiple words', 'MULTIPLE WORDS, YEAH', '0', repr('MULTIPLE WORDS')), ('(?i)(.*)c(.*)', 'ABCDE', '0,1,2', repr(('ABCDE', 'AB', 'DE'))), ('(?i)\\((.*), (.*)\\)', '(A, B)', '2,1', repr(('B', 'A'))), ('(?i)[k]', 'AB', '', repr(None)), # ('(?i)abcd', 'ABCD', SUCCEED, 'found+"-"+\\found+"-"+\\\\found', repr(ABCD-$&-\\ABCD)), # ('(?i)a(bc)d', 'ABCD', SUCCEED, 'g1+"-"+\\g1+"-"+\\\\g1', repr(BC-$1-\\BC)), ('(?i)a[-]?c', 'AC', '0', repr('AC')), ('(?i)(abc)\\1', 'ABCABC', '1', repr('ABC')), ('(?i)([a-c]*)\\1', 'ABCABC', '1', repr('ABC')), ('a(?!b).', 'abad', '0', repr('ad')), ('a(?=d).', 'abad', '0', repr('ad')), ('a(?=c|d).', 'abad', '0', repr('ad')), ('a(?:b|c|d)(.)', 'ace', '1', repr('e')), ('a(?:b|c|d)*(.)', 'ace', '1', repr('e')), ('a(?:b|c|d)+?(.)', 'ace', '1', repr('e')), ('a(?:b|(c|e){1,2}?|d)+?(.)', 'ace', '1,2', repr(('c', 'e'))), # Lookbehind: split by : but not if it is escaped by -. ('(?]*?b', 'a>b', '', repr(None)), # Bug 490573: minimizing repeat problem. (r'^a*?$', 'foo', '', repr(None)), # Bug 470582: nested groups problem. (r'^((a)c)?(ab)$', 'ab', '1,2,3', repr((None, None, 'ab'))), # Another minimizing repeat problem (capturing groups in assertions). ('^([ab]*?)(?=(b)?)c', 'abc', '1,2', repr(('ab', None))), ('^([ab]*?)(?!(b))c', 'abc', '1,2', repr(('ab', None))), ('^([ab]*?)(?(.){0,2})d", "abcd").captures(1), ['b', 'c']) self.assertEqual(regex.search(r"(.)+", "a").captures(1), ['a']) def test_guards(self): m = regex.search(r"(X.*?Y\s*){3}(X\s*)+AB:", "XY\nX Y\nX Y\nXY\nXX AB:") self.assertEqual(m.span(0, 1, 2), ((3, 21), (12, 15), (16, 18))) m = regex.search(r"(X.*?Y\s*){3,}(X\s*)+AB:", "XY\nX Y\nX Y\nXY\nXX AB:") self.assertEqual(m.span(0, 1, 2), ((0, 21), (12, 15), (16, 18))) m = regex.search(r'\d{4}(\s*\w)?\W*((?!\d)\w){2}', "9999XX") self.assertEqual(m.span(0, 1, 2), ((0, 6), (-1, -1), (5, 6))) m = regex.search(r'A\s*?.*?(\n+.*?\s*?){0,2}\(X', 'A\n1\nS\n1 (X') self.assertEqual(m.span(0, 1), ((0, 10), (5, 8))) m = regex.search('Derde\s*:', 'aaaaaa:\nDerde:') self.assertEqual(m.span(), (8, 14)) m = regex.search('Derde\s*:', 'aaaaa:\nDerde:') self.assertEqual(m.span(), (7, 13)) def test_turkic(self): # Turkish has dotted and dotless I/i. pairs = u"I=i;I=\u0131;i=\u0130" all_chars = set() matching = set() for pair in pairs.split(";"): ch1, ch2 = pair.split("=") all_chars.update((ch1, ch2)) matching.add((ch1, ch1)) matching.add((ch1, ch2)) matching.add((ch2, ch1)) matching.add((ch2, ch2)) for ch1 in all_chars: for ch2 in all_chars: m = regex.match(ur"(?iu)\A" + ch1 + ur"\Z", ch2) if m: if (ch1, ch2) not in matching: self.fail("%s matching %s" % (repr(ch1), repr(ch2))) else: if (ch1, ch2) in matching: self.fail("%s not matching %s" % (repr(ch1), repr(ch2))) def test_named_lists(self): options = [u"one", u"two", u"three"] self.assertEqual(regex.match(ur"333\L444", u"333one444", bar=options).group(), u"333one444") self.assertEqual(regex.match(ur"(?i)333\L444", u"333TWO444", bar=options).group(), u"333TWO444") self.assertEqual(regex.match(ur"333\L444", u"333four444", bar=options), None) options = ["one", "two", "three"] self.assertEqual(regex.match(r"333\L444", "333one444", bar=options).group(), "333one444") self.assertEqual(regex.match(r"(?i)333\L444", "333TWO444", bar=options).group(), "333TWO444") self.assertEqual(regex.match(r"333\L444", "333four444", bar=options), None) self.assertEqual(repr(type(regex.compile(r"3\L4\L+5", bar=["one", "two", "three"]))), self.PATTERN_CLASS) self.assertEqual(regex.findall(r"^\L", "solid QWERT", options=set(['good', 'brilliant', '+s\\ol[i}d'])), []) self.assertEqual(regex.findall(r"^\L", "+solid QWERT", options=set(['good', 'brilliant', '+solid'])), ['+solid']) options = [u"STRASSE"] self.assertEqual(regex.match(ur"(?fiu)\L", u"stra\N{LATIN SMALL LETTER SHARP S}e", words=options).span(), (0, 6)) options = [u"STRASSE", u"stress"] self.assertEqual(regex.match(ur"(?fiu)\L", u"stra\N{LATIN SMALL LETTER SHARP S}e", words=options).span(), (0, 6)) options = [u"stra\N{LATIN SMALL LETTER SHARP S}e"] self.assertEqual(regex.match(ur"(?fiu)\L", u"STRASSE", words=options).span(), (0, 7)) options = ["kit"] self.assertEqual(regex.search(ur"(?iu)\L", u"SKITS", words=options).span(), (1, 4)) self.assertEqual(regex.search(ur"(?iu)\L", u"SK\N{LATIN CAPITAL LETTER I WITH DOT ABOVE}TS", words=options).span(), (1, 4)) self.assertEqual(regex.search(ur"(?fiu)\b(\w+) +\1\b", u" stra\N{LATIN SMALL LETTER SHARP S}e STRASSE ").span(), (1, 15)) self.assertEqual(regex.search(ur"(?fiu)\b(\w+) +\1\b", u" STRASSE stra\N{LATIN SMALL LETTER SHARP S}e ").span(), (1, 15)) self.assertEqual(regex.search(r"^\L$", "", options=[]).span(), (0, 0)) def test_fuzzy(self): # Some tests borrowed from TRE library tests. self.assertEqual(repr(type(regex.compile('(fou){s,e<=1}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(fuu){s}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(fuu){s,e}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(anaconda){1i+1d<1,s<=1}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(anaconda){1i+1d<1,s<=1,e<=10}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(anaconda){s<=1,e<=1,1i+1d<1}'))), self.PATTERN_CLASS) text = 'molasses anaconda foo bar baz smith anderson ' self.assertEqual(regex.search('(znacnda){s<=1,e<=3,1i+1d<1}', text), None) self.assertEqual(regex.search('(znacnda){s<=1,e<=3,1i+1d<2}', text).span(0, 1), ((9, 17), (9, 17))) self.assertEqual(regex.search('(ananda){1i+1d<2}', text), None) self.assertEqual(regex.search(r"(?:\bznacnda){e<=2}", text)[0], "anaconda") self.assertEqual(regex.search(r"(?:\bnacnda){e<=2}", text)[0], "anaconda") text = 'anaconda foo bar baz smith anderson' self.assertEqual(regex.search('(fuu){i<=3,d<=3,e<=5}', text).span(0, 1), ((0, 0), (0, 0))) self.assertEqual(regex.search('(?b)(fuu){i<=3,d<=3,e<=5}', text).span(0, 1), ((9, 10), (9, 10))) self.assertEqual(regex.search('(fuu){i<=2,d<=2,e<=5}', text).span(0, 1), ((7, 10), (7, 10))) self.assertEqual(regex.search('(?e)(fuu){i<=2,d<=2,e<=5}', text).span(0, 1), ((9, 10), (9, 10))) self.assertEqual(regex.search('(fuu){i<=3,d<=3,e}', text).span(0, 1), ((0, 0), (0, 0))) self.assertEqual(regex.search('(?b)(fuu){i<=3,d<=3,e}', text).span(0, 1), ((9, 10), (9, 10))) self.assertEqual(repr(type(regex.compile('(approximate){s<=3,1i+1d<3}'))), self.PATTERN_CLASS) # No cost limit. self.assertEqual(regex.search('(foobar){e}', 'xirefoabralfobarxie').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('(?e)(foobar){e}', 'xirefoabralfobarxie').span(0, 1), ((0, 3), (0, 3))) self.assertEqual(regex.search('(?b)(foobar){e}', 'xirefoabralfobarxie').span(0, 1), ((11, 16), (11, 16))) # At most two errors. self.assertEqual(regex.search('(foobar){e<=2}', 'xirefoabrzlfd').span(0, 1), ((4, 9), (4, 9))) self.assertEqual(regex.search('(foobar){e<=2}', 'xirefoabzlfd'), None) # At most two inserts or substitutions and max two errors total. self.assertEqual(regex.search('(foobar){i<=2,s<=2,e<=2}', 'oobargoobaploowap').span(0, 1), ((5, 11), (5, 11))) # Find best whole word match for "foobar". self.assertEqual(regex.search('\\b(foobar){e}\\b', 'zfoobarz').span(0, 1), ((0, 8), (0, 8))) self.assertEqual(regex.search('\\b(foobar){e}\\b', 'boing zfoobarz goobar woop').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('(?b)\\b(foobar){e}\\b', 'boing zfoobarz goobar woop').span(0, 1), ((15, 21), (15, 21))) # Match whole string, allow only 1 error. self.assertEqual(regex.search('^(foobar){e<=1}$', 'foobar').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xfoobar').span(0, 1), ((0, 7), (0, 7))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foobarx').span(0, 1), ((0, 7), (0, 7))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'fooxbar').span(0, 1), ((0, 7), (0, 7))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foxbar').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xoobar').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foobax').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'oobar').span(0, 1), ((0, 5), (0, 5))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'fobar').span(0, 1), ((0, 5), (0, 5))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'fooba').span(0, 1), ((0, 5), (0, 5))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xfoobarx'), None) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foobarxx'), None) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xxfoobar'), None) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xfoxbar'), None) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foxbarx'), None) # At most one insert, two deletes, and three substitutions. # Additionally, deletes cost two and substitutes one, and total # cost must be less than 4. self.assertEqual(regex.search('(foobar){i<=1,d<=2,s<=3,2d+1s<4}', '3oifaowefbaoraofuiebofasebfaobfaorfeoaro').span(0, 1), ((6, 13), (6, 13))) self.assertEqual(regex.search('(?b)(foobar){i<=1,d<=2,s<=3,2d+1s<4}', '3oifaowefbaoraofuiebofasebfaobfaorfeoaro').span(0, 1), ((34, 39), (34, 39))) # Partially fuzzy matches. self.assertEqual(regex.search('foo(bar){e<=1}zap', 'foobarzap').span(0, 1), ((0, 9), (3, 6))) self.assertEqual(regex.search('foo(bar){e<=1}zap', 'fobarzap'), None) self.assertEqual(regex.search('foo(bar){e<=1}zap', 'foobrzap').span(0, 1), ((0, 8), (3, 5))) text = ('www.cnn.com 64.236.16.20\nwww.slashdot.org 66.35.250.150\n' 'For useful information, use www.slashdot.org\nthis is demo data!\n') self.assertEqual(regex.search(r'(?s)^.*(dot.org){e}.*$', text).span(0, 1), ((0, 120), (120, 120))) self.assertEqual(regex.search(r'(?es)^.*(dot.org){e}.*$', text).span(0, 1), ((0, 120), (93, 100))) self.assertEqual(regex.search(r'^.*(dot.org){e}.*$', text).span(0, 1), ((0, 119), (24, 101))) # Behaviour is unexpected, but arguably not wrong. It first finds the # best match, then the best in what follows, etc. self.assertEqual(regex.findall(r"\b\L{e<=1}\b", " book cot dog desk ", words="cat dog".split()), ["cot", "dog"]) self.assertEqual(regex.findall(r"\b\L{e<=1}\b", " book dog cot desk ", words="cat dog".split()), [" dog", "cot"]) self.assertEqual(regex.findall(r"(?e)\b\L{e<=1}\b", " book dog cot desk ", words="cat dog".split()), ["dog", "cot"]) self.assertEqual(regex.findall(r"(?r)\b\L{e<=1}\b", " book cot dog desk ", words="cat dog".split()), ["dog ", "cot"]) self.assertEqual(regex.findall(r"(?er)\b\L{e<=1}\b", " book cot dog desk ", words="cat dog".split()), ["dog", "cot"]) self.assertEqual(regex.findall(r"(?r)\b\L{e<=1}\b", " book dog cot desk ", words="cat dog".split()), ["cot", "dog"]) self.assertEqual(regex.findall(ur"\b\L{e<=1}\b", u" book cot dog desk ", words=u"cat dog".split()), [u"cot", u"dog"]) self.assertEqual(regex.findall(ur"\b\L{e<=1}\b", u" book dog cot desk ", words=u"cat dog".split()), [u" dog", u"cot"]) self.assertEqual(regex.findall(ur"(?e)\b\L{e<=1}\b", u" book dog cot desk ", words=u"cat dog".split()), [u"dog", u"cot"]) self.assertEqual(regex.findall(ur"(?r)\b\L{e<=1}\b", u" book cot dog desk ", words=u"cat dog".split()), [u"dog ", u"cot"]) self.assertEqual(regex.findall(ur"(?er)\b\L{e<=1}\b", u" book cot dog desk ", words=u"cat dog".split()), [u"dog", u"cot"]) self.assertEqual(regex.findall(ur"(?r)\b\L{e<=1}\b", u" book dog cot desk ", words=u"cat dog".split()), [u"cot", u"dog"]) self.assertEqual(regex.search(r"(\w+) (\1{e<=1})", "foo fou").groups(), ("foo", "fou")) self.assertEqual(regex.search(r"(?r)(\2{e<=1}) (\w+)", "foo fou").groups(), ("foo", "fou")) self.assertEqual(regex.search(ur"(\w+) (\1{e<=1})", u"foo fou").groups(), (u"foo", u"fou")) self.assertEqual(regex.findall(r"(?:(?:QR)+){e}","abcde"), ["abcde", ""]) self.assertEqual(regex.findall(r"(?:Q+){e}","abc"), ["abc", ""]) # Hg issue 41. self.assertEqual(regex.match(r"(?:service detection){0[^()]+)|(?R))*\)", "(ab(cd)ef)")[ : ], ("(ab(cd)ef)", "ef")) self.assertEqual(regex.search(r"\(((?>[^()]+)|(?R))*\)", "(ab(cd)ef)").captures(1), ["ab", "cd", "(cd)", "ef"]) self.assertEqual(regex.search(r"(?r)\(((?R)|(?>[^()]+))*\)", "(ab(cd)ef)")[ : ], ("(ab(cd)ef)", "ab")) self.assertEqual(regex.search(r"(?r)\(((?R)|(?>[^()]+))*\)", "(ab(cd)ef)").captures(1), ["ef", "cd", "(cd)", "ab"]) self.assertEqual(regex.search(r"\(([^()]+|(?R))*\)", "some text (a(b(c)d)e) more text")[ : ], ("(a(b(c)d)e)", "e")) self.assertEqual(regex.search(r"(?r)\(((?R)|[^()]+)*\)", "some text (a(b(c)d)e) more text")[ : ], ("(a(b(c)d)e)", "a")) self.assertEqual(regex.search(r"(foo(\(((?:(?>[^()]+)|(?2))*)\)))", "foo(bar(baz)+baz(bop))")[ : ], ("foo(bar(baz)+baz(bop))", "foo(bar(baz)+baz(bop))", "(bar(baz)+baz(bop))", "bar(baz)+baz(bop)")) self.assertEqual(regex.search(r"(?r)(foo(\(((?:(?2)|(?>[^()]+))*)\)))", "foo(bar(baz)+baz(bop))")[ : ], ("foo(bar(baz)+baz(bop))", "foo(bar(baz)+baz(bop))", "(bar(baz)+baz(bop))", "bar(baz)+baz(bop)")) rgx = regex.compile(r"""^\s*(<\s*([a-zA-Z:]+)(?:\s*[a-zA-Z:]*\s*=\s*(?:'[^']*'|"[^"]*"))*\s*(/\s*)?>(?:[^<>]*|(?1))*(?(3)|<\s*/\s*\2\s*>))\s*$""") self.assertEqual(bool(rgx.search('')), True) self.assertEqual(bool(rgx.search('')), False) self.assertEqual(bool(rgx.search('')), True) self.assertEqual(bool(rgx.search('')), False) self.assertEqual(bool(rgx.search('')), False) self.assertEqual(bool(rgx.search('')), False) self.assertEqual(bool(rgx.search('')), True) self.assertEqual(bool(rgx.search('< fooo / >')), True) # The next regex should and does match. Perl 5.14 agrees. #self.assertEqual(bool(rgx.search('foo')), False) self.assertEqual(bool(rgx.search('foo')), False) self.assertEqual(bool(rgx.search('foo')), True) self.assertEqual(bool(rgx.search('foo')), True) self.assertEqual(bool(rgx.search('')), True) def test_copy(self): # PatternObjects are immutable, therefore there's no need to clone them. r = regex.compile("a") self.assert_(copy.copy(r) is r) self.assert_(copy.deepcopy(r) is r) # MatchObjects are normally mutable because the target string can be # detached. However, after the target string has been detached, a # MatchObject becomes immutable, so there's no need to clone it. m = r.match("a") self.assert_(copy.copy(m) is not m) self.assert_(copy.deepcopy(m) is not m) self.assert_(m.string is not None) m2 = copy.copy(m) m2.detach_string() self.assert_(m.string is not None) self.assert_(m2.string is None) # The following behaviour matches that of the re module. it = regex.finditer(".", "ab") it2 = copy.copy(it) self.assertEqual(it.next().group(), "a") self.assertEqual(it2.next().group(), "b") # The following behaviour matches that of the re module. it = regex.finditer(".", "ab") it2 = copy.deepcopy(it) self.assertEqual(it.next().group(), "a") self.assertEqual(it2.next().group(), "b") # The following behaviour is designed to match that of copying 'finditer'. it = regex.splititer(" ", "a b") it2 = copy.copy(it) self.assertEqual(it.next(), "a") self.assertEqual(it2.next(), "b") # The following behaviour is designed to match that of copying 'finditer'. it = regex.splititer(" ", "a b") it2 = copy.deepcopy(it) self.assertEqual(it.next(), "a") self.assertEqual(it2.next(), "b") def test_format(self): self.assertEqual(regex.subf(r"(\w+) (\w+)", "{0} => {2} {1}", "foo bar"), "foo bar => bar foo") self.assertEqual(regex.subf(r"(?\w+) (?\w+)", "{word2} {word1}", "foo bar"), "bar foo") self.assertEqual(regex.subfn(r"(\w+) (\w+)", "{0} => {2} {1}", "foo bar"), ("foo bar => bar foo", 1)) self.assertEqual(regex.subfn(r"(?\w+) (?\w+)", "{word2} {word1}", "foo bar"), ("bar foo", 1)) self.assertEqual(regex.match(r"(\w+) (\w+)", "foo bar").expandf("{0} => {2} {1}"), "foo bar => bar foo") def test_fullmatch(self): self.assertEqual(bool(regex.fullmatch(r"abc", "abc")), True) self.assertEqual(bool(regex.fullmatch(r"abc", "abcx")), False) self.assertEqual(bool(regex.fullmatch(r"abc", "abcx", endpos=3)), True) self.assertEqual(bool(regex.fullmatch(r"abc", "xabc", pos=1)), True) self.assertEqual(bool(regex.fullmatch(r"abc", "xabcy", pos=1)), False) self.assertEqual(bool(regex.fullmatch(r"abc", "xabcy", pos=1, endpos=4)), True) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "abc")), True) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "abcx")), False) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "abcx", endpos=3)), True) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "xabc", pos=1)), True) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "xabcy", pos=1)), False) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "xabcy", pos=1, endpos=4)), True) def test_hg_bugs(self): # Hg issue 28. self.assertEqual(bool(regex.compile("(?>b)", flags=regex.V1)), True) # Hg issue 29. self.assertEqual(bool(regex.compile(r"^((?>\w+)|(?>\s+))*$", flags=regex.V1)), True) # Hg issue 31. self.assertEqual(regex.findall(r"\((?:(?>[^()]+)|(?R))*\)", "a(bcd(e)f)g(h)"), ['(bcd(e)f)', '(h)']) self.assertEqual(regex.findall(r"\((?:(?:[^()]+)|(?R))*\)", "a(bcd(e)f)g(h)"), ['(bcd(e)f)', '(h)']) self.assertEqual(regex.findall(r"\((?:(?>[^()]+)|(?R))*\)", "a(b(cd)e)f)g)h"), ['(b(cd)e)']) self.assertEqual(regex.findall(r"\((?:(?>[^()]+)|(?R))*\)", "a(bc(d(e)f)gh"), ['(d(e)f)']) self.assertEqual(regex.findall(r"(?r)\((?:(?>[^()]+)|(?R))*\)", "a(bc(d(e)f)gh"), ['(d(e)f)']) self.assertEqual([m.group() for m in regex.finditer(r"\((?:[^()]*+|(?0))*\)", "a(b(c(de)fg)h")], ['(c(de)fg)']) # Hg issue 32. self.assertEqual(regex.search("a(bc)d", "abcd", regex.I | regex.V1).group(0), "abcd") # Hg issue 33. self.assertEqual(regex.search("([\da-f:]+)$", "E", regex.I | regex.V1).group(0), "E") self.assertEqual(regex.search("([\da-f:]+)$", "e", regex.I | regex.V1).group(0), "e") # Hg issue 34. self.assertEqual(regex.search("^(?=ab(de))(abd)(e)", "abde").groups(), ('de', 'abd', 'e')) # Hg issue 35. self.assertEqual(bool(regex.match(r"\ ", " ", flags=regex.X)), True) # Hg issue 36. self.assertEqual(regex.search(r"^(a|)\1{2}b", "b").group(0, 1), ('b', '')) # Hg issue 37. self.assertEqual(regex.search("^(a){0,0}", "abc").group(0, 1), ('', None)) # Hg issue 38. self.assertEqual(regex.search("(?>.*/)b", "a/b").group(0), "a/b") # Hg issue 39. self.assertEqual(regex.search(r"(?V0)((?i)blah)\s+\1", "blah BLAH").group(0, 1), ("blah BLAH", "blah")) self.assertEqual(regex.search(r"(?V1)((?i)blah)\s+\1", "blah BLAH"), None) # Hg issue 40. self.assertEqual(regex.search(r"(\()?[^()]+(?(1)\)|)", "(abcd").group(0), "abcd") # Hg issue 42. self.assertEqual(regex.search("(a*)*", "a").span(1), (1, 1)) self.assertEqual(regex.search("(a*)*", "aa").span(1), (2, 2)) self.assertEqual(regex.search("(a*)*", "aaa").span(1), (3, 3)) # Hg issue 43. self.assertEqual(regex.search("a(?#xxx)*", "aaa").group(), "aaa") # Hg issue 44. self.assertEqual(regex.search("(?=abc){3}abc", "abcabcabc").span(), (0, 3)) # Hg issue 45. self.assertEqual(regex.search("^(?:a(?:(?:))+)+", "a").span(), (0, 1)) self.assertEqual(regex.search("^(?:a(?:(?:))+)+", "aa").span(), (0, 2)) # Hg issue 46. self.assertEqual(regex.search("a(?x: b c )d", "abcd").group(0), "abcd") # Hg issue 47. self.assertEqual(regex.search("a#comment\n*", "aaa", flags=regex.X).group(0), "aaa") # Hg issue 48. self.assertEqual(regex.search(r"(?V1)(a(?(1)\1)){1}", "aaaaaaaaaa").span(0, 1), ((0, 1), (0, 1))) self.assertEqual(regex.search(r"(?V1)(a(?(1)\1)){2}", "aaaaaaaaaa").span(0, 1), ((0, 3), (1, 3))) self.assertEqual(regex.search(r"(?V1)(a(?(1)\1)){3}", "aaaaaaaaaa").span(0, 1), ((0, 6), (3, 6))) self.assertEqual(regex.search(r"(?V1)(a(?(1)\1)){4}", "aaaaaaaaaa").span(0, 1), ((0, 10), (6, 10))) # Hg issue 49. self.assertEqual(regex.search("(?V1)(a)(?<=b(?1))", "baz").group(0), "a") # Hg issue 50. self.assertEqual(regex.findall(ur'(?fi)\L', u'POST, Post, post, po\u017Ft, po\uFB06, and po\uFB05', keywords=['post','pos']), [u'POST', u'Post', u'post', u'po\u017Ft', u'po\uFB06', u'po\uFB05']) self.assertEqual(regex.findall(ur'(?fi)pos|post', u'POST, Post, post, po\u017Ft, po\uFB06, and po\uFB05'), [u'POS', u'Pos', u'pos', u'po\u017F', u'po\uFB06', u'po\uFB05']) self.assertEqual(regex.findall(ur'(?fi)post|pos', u'POST, Post, post, po\u017Ft, po\uFB06, and po\uFB05'), [u'POST', u'Post', u'post', u'po\u017Ft', u'po\uFB06', u'po\uFB05']) self.assertEqual(regex.findall(ur'(?fi)post|another', u'POST, Post, post, po\u017Ft, po\uFB06, and po\uFB05'), [u'POST', u'Post', u'post', u'po\u017Ft', u'po\uFB06', u'po\uFB05']) # Hg issue 51. self.assertEqual(regex.search("(?V1)((a)(?1)|(?2))", "a").group(0, 1, 2), ('a', 'a', None)) # Hg issue 52. self.assertEqual(regex.search(r"(?V1)(\1xx|){6}", "xx").span(0, 1), ((0, 2), (2, 2))) # Hg issue 53. self.assertEqual(regex.search("(a|)+", "a").group(0, 1), ("a", "")) # Hg issue 54. self.assertEqual(regex.search(r"(a|)*\d", "a" * 80), None) # Hg issue 55. self.assertEqual(regex.search("^(?:a?b?)*$", "ac"), None) # Hg issue 58. self.assertRaisesRegex(regex.error, self.UNDEF_CHAR_NAME, lambda: regex.compile("\\N{1}")) # Hg issue 59. self.assertEqual(regex.search("\\Z", "a\na\n").span(0), (4, 4)) # Hg issue 60. self.assertEqual(regex.search("(q1|.)*(q2|.)*(x(a|bc)*y){2,}", "xayxay").group(0), "xayxay") # Hg issue 61. self.assertEqual(regex.search("(?i)[^a]", "A"), None) # Hg issue 63. self.assertEqual(regex.search(u"(?iu)[[:ascii:]]", u"\N{KELVIN SIGN}"), None) # Hg issue 66. self.assertEqual(regex.search("((a|b(?1)c){3,5})", "baaaaca").group(0, 1, 2), ('aaaa', 'aaaa', 'a')) # Hg issue 71. self.assertEqual(regex.findall(r"(?<=:\S+ )\w+", ":9 abc :10 def"), ['abc', 'def']) self.assertEqual(regex.findall(r"(?<=:\S* )\w+", ":9 abc :10 def"), ['abc', 'def']) self.assertEqual(regex.findall(r"(?<=:\S+? )\w+", ":9 abc :10 def"), ['abc', 'def']) self.assertEqual(regex.findall(r"(?<=:\S*? )\w+", ":9 abc :10 def"), ['abc', 'def']) # Hg issue 73. self.assertEqual(regex.search(r"(?:fe)?male", "female").group(), "female") self.assertEqual([m.group() for m in regex.finditer(r"(fe)?male: h(?(1)(er)|(is)) (\w+)", "female: her dog; male: his cat. asdsasda")], ['female: her dog', 'male: his cat']) # Hg issue 78. self.assertEqual(regex.search(r'(?\((?:[^()]++|(?&rec))*\))', 'aaa(((1+0)+1)+1)bbb').captures('rec'), ['(1+0)', '((1+0)+1)', '(((1+0)+1)+1)']) # Hg issue 80. self.assertRaisesRegex(regex.error, self.BAD_ESCAPE, lambda: regex.sub('x', '\\', 'x'), ) # Hg issue 82. fz = "(CAGCCTCCCATTTCAGAATATACATCC){1a(?b))', "ab").spans("x"), [(1, 2), (0, 2)]) # Hg issue 91. # Check that the replacement cache works. self.assertEqual(regex.sub(r'(-)', lambda m: m.expand(r'x'), 'a-b-c'), 'axbxc') # Hg issue 94. rx = regex.compile(r'\bt(est){i<2}', flags=regex.V1) self.assertEqual(rx.search("Some text"), None) self.assertEqual(rx.findall("Some text"), []) # Hg issue 95. self.assertRaisesRegex(regex.error, self.MULTIPLE_REPEAT, lambda: regex.compile(r'.???')) # Hg issue 97. self.assertEqual(regex.escape(u'foo!?'), u'foo\\!\\?') self.assertEqual(regex.escape(u'foo!?', special_only=True), u'foo!\\?') self.assertEqual(regex.escape('foo!?'), 'foo\\!\\?') self.assertEqual(regex.escape('foo!?', special_only=True), 'foo!\\?') # Hg issue 100. self.assertEqual(regex.search('^([^z]*(?:WWWi|W))?$', 'WWWi').groups(), ('WWWi', )) self.assertEqual(regex.search('^([^z]*(?:WWWi|w))?$', 'WWWi').groups(), ('WWWi', )) self.assertEqual(regex.search('^([^z]*?(?:WWWi|W))?$', 'WWWi').groups(), ('WWWi', )) # Hg issue 101. pat = regex.compile(r'xxx', flags=regex.FULLCASE | regex.UNICODE) self.assertEqual([x.group() for x in pat.finditer('yxxx')], ['xxx']) self.assertEqual(pat.findall('yxxx'), ['xxx']) raw = 'yxxx' self.assertEqual([x.group() for x in pat.finditer(raw)], ['xxx']) self.assertEqual(pat.findall(raw), ['xxx']) pat = regex.compile(r'xxx', flags=regex.FULLCASE | regex.IGNORECASE | regex.UNICODE) self.assertEqual([x.group() for x in pat.finditer('yxxx')], ['xxx']) self.assertEqual(pat.findall('yxxx'), ['xxx']) raw = 'yxxx' self.assertEqual([x.group() for x in pat.finditer(raw)], ['xxx']) self.assertEqual(pat.findall(raw), ['xxx']) # Hg issue 106. self.assertEqual(regex.sub('(?V0).*', 'x', 'test'), 'x') self.assertEqual(regex.sub('(?V1).*', 'x', 'test'), 'xx') self.assertEqual(regex.sub('(?V0).*?', '|', 'test'), '|t|e|s|t|') self.assertEqual(regex.sub('(?V1).*?', '|', 'test'), '|||||||||') # Hg issue 112. self.assertEqual(regex.sub(r'^(@)\n(?!.*?@)(.*)', r'\1\n==========\n\2', '@\n', flags=regex.DOTALL), '@\n==========\n') # Hg issue 109. self.assertEqual(regex.match(r'(?:cats|cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?e)(?:cats|cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?b)(?:cats|cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?:cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?e)(?:cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?b)(?:cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?:cats){e<=2}', 'c ats').fuzzy_counts, (1, 1, 0)) self.assertEqual(regex.match(r'(?e)(?:cats){e<=2}', 'c ats').fuzzy_counts, (0, 1, 0)) self.assertEqual(regex.match(r'(?b)(?:cats){e<=2}', 'c ats').fuzzy_counts, (0, 1, 0)) self.assertEqual(regex.match(r'(?:cats){e<=2}', 'c a ts').fuzzy_counts, (0, 2, 0)) self.assertEqual(regex.match(r'(?e)(?:cats){e<=2}', 'c a ts').fuzzy_counts, (0, 2, 0)) self.assertEqual(regex.match(r'(?b)(?:cats){e<=2}', 'c a ts').fuzzy_counts, (0, 2, 0)) self.assertEqual(regex.match(r'(?:cats){e<=1}', 'c ats').fuzzy_counts, (0, 1, 0)) self.assertEqual(regex.match(r'(?e)(?:cats){e<=1}', 'c ats').fuzzy_counts, (0, 1, 0)) self.assertEqual(regex.match(r'(?b)(?:cats){e<=1}', 'c ats').fuzzy_counts, (0, 1, 0)) # Hg issue 115. self.assertEqual(regex.findall(r'\bof ([a-z]+) of \1\b', 'To make use of one of these modules'), []) # Hg issue 125. self.assertEqual(regex.sub(r'x', r'\g<0>', 'x'), 'x') # Unreported issue: no such builtin as 'ascii' in Python 2. self.assertEqual(bool(regex.match(r'a', 'a', regex.DEBUG)), True) # Hg issue 131. self.assertEqual(regex.findall(r'(?V1)[[b-e]--cd]', 'abcdef'), ['b', 'e']) self.assertEqual(regex.findall(r'(?V1)[b-e--cd]', 'abcdef'), ['b', 'e']) self.assertEqual(regex.findall(r'(?V1)[[bcde]--cd]', 'abcdef'), ['b', 'e']) self.assertEqual(regex.findall(r'(?V1)[bcde--cd]', 'abcdef'), ['b', 'e']) # Hg issue 132. self.assertRaisesRegex(regex.error, '^unknown property at position 4$', lambda: regex.compile(ur'\p{}')) # Issue 23692. self.assertEqual(regex.match('(?:()|(?(1)()|z)){2}(?(2)a|z)', 'a').group(0, 1, 2), ('a', '', '')) self.assertEqual(regex.match('(?:()|(?(1)()|z)){0,2}(?(2)a|z)', 'a').group(0, 1, 2), ('a', '', '')) # Hg issue 137: Posix character class :punct: does not seem to be # supported. # Posix compatibility as recommended here: # http://www.unicode.org/reports/tr18/#Compatibility_Properties # Posix in Unicode. chars = u''.join(unichr(c) for c in range(0x10000)) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:alnum:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)[\p{Alpha}\p{PosixDigit}]+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:alpha:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)\p{Alpha}+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:ascii:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)[\p{InBasicLatin}]+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:blank:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)[\p{gc=Space_Separator}\t]+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:cntrl:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)\p{gc=Control}+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:digit:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)[0-9]+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:graph:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)[^\p{Space}\p{gc=Control}\p{gc=Surrogate}\p{gc=Unassigned}]+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:lower:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)\p{Lower}+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:print:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?uV1)[\p{Graph}\p{Blank}--\p{Cntrl}]+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:punct:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?uV1)[\p{gc=Punctuation}\p{gc=Symbol}--\p{Alpha}]+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:space:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)\p{Whitespace}+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:upper:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)\p{Upper}+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:word:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)[\p{Alpha}\p{gc=Mark}\p{Digit}\p{gc=Connector_Punctuation}\p{Join_Control}]+''', chars)))) self.assertEqual(repr(u''.join(regex.findall(ur'''(?u)[[:xdigit:]]+''', chars))), repr(u''.join(regex.findall(ur'''(?u)[0-9A-Fa-f]+''', chars)))) # Posix in ASCII. chars = ''.join(chr(c) for c in range(0x100)) self.assertEqual(repr(''.join(regex.findall(r'''[[:alnum:]]+''', chars))), repr(''.join(regex.findall(r'''[\p{Alpha}\p{PosixDigit}]+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:alpha:]]+''', chars))), repr(''.join(regex.findall(r'''\p{Alpha}+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:ascii:]]+''', chars))), repr(''.join(regex.findall(r'''[\x00-\x7F]+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:blank:]]+''', chars))), repr(''.join(regex.findall(r'''[\p{gc=Space_Separator}\t]+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:cntrl:]]+''', chars))), repr(''.join(regex.findall(r'''\p{gc=Control}+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:digit:]]+''', chars))), repr(''.join(regex.findall(r'''[0-9]+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:graph:]]+''', chars))), repr(''.join(regex.findall(r'''[^\p{Space}\p{gc=Control}\p{gc=Surrogate}\p{gc=Unassigned}]+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:lower:]]+''', chars))), repr(''.join(regex.findall(r'''\p{Lower}+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:print:]]+''', chars))), repr(''.join(regex.findall(r'''(?V1)[\p{Graph}\p{Blank}--\p{Cntrl}]+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:punct:]]+''', chars))), repr(''.join(regex.findall(r'''(?V1)[\p{gc=Punctuation}\p{gc=Symbol}--\p{Alpha}]+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:space:]]+''', chars))), repr(''.join(regex.findall(r'''\p{Whitespace}+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:upper:]]+''', chars))), repr(''.join(regex.findall(r'''\p{Upper}+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:word:]]+''', chars))), repr(''.join(regex.findall(r'''[\p{Alpha}\p{gc=Mark}\p{Digit}\p{gc=Connector_Punctuation}\p{Join_Control}]+''', chars)))) self.assertEqual(repr(''.join(regex.findall(r'''[[:xdigit:]]+''', chars))), repr(''.join(regex.findall(r'''[0-9A-Fa-f]+''', chars)))) # Hg issue 138: grapheme anchored search not working properly. self.assertEqual(repr(regex.search(ur'(?u)\X$', u'ab\u2103').group()), repr(u'\u2103')) # Hg issue 139: Regular expression with multiple wildcards where first # should match empty string does not always work. self.assertEqual(regex.search("([^L]*)([^R]*R)", "LtR").groups(), ('', 'LtR')) # Hg issue 140: Replace with REVERSE and groups has unexpected # behavior. self.assertEqual(regex.sub(r'(.)', r'x\1y', 'ab'), 'xayxby') self.assertEqual(regex.sub(r'(?r)(.)', r'x\1y', 'ab'), 'xayxby') # Hg issue 141: Crash on a certain partial match. self.assertEqual(regex.fullmatch('(a)*abc', 'ab', partial=True).span(), (0, 2)) self.assertEqual(regex.fullmatch('(a)*abc', 'ab', partial=True).partial, True) # Hg Issue #143: Partial matches have incorrect span if prefix is '.' # wildcard. self.assertEqual(regex.search('OXRG', 'OOGOX', partial=True).span(), (3, 5)) self.assertEqual(regex.search('.XRG', 'OOGOX', partial=True).span(), (3, 5)) self.assertEqual(regex.search('.{1,3}XRG', 'OOGOX', partial=True).span(), (1, 5)) # Hg issue 144: Latest version problem with matching 'R|R'. self.assertEqual(regex.match('R|R', 'R').span(), (0, 1)) # Hg issue 146: Forced-fail (?!) works improperly in conditional. self.assertEqual(regex.match(r'(.)(?(1)(?!))', 'xy'), None) # Groups cleared after failure. self.assertEqual(regex.findall(r'(y)?(\d)(?(1)\b\B)', 'ax1y2z3b'), [('', '1'), ('', '2'), ('', '3')]) self.assertEqual(regex.findall(r'(y)?+(\d)(?(1)\b\B)', 'ax1y2z3b'), [('', '1'), ('', '2'), ('', '3')]) # Hg issue 147: Fuzzy match can return match points beyond buffer end. self.assertEqual([m.span() for m in regex.finditer(r'(?i)(?:error){e}', 'regex failure')], [(0, 5), (5, 10), (10, 13), (13, 13)]) self.assertEqual([m.span() for m in regex.finditer(r'(?fi)(?:error){e}', 'regex failure')], [(0, 5), (5, 10), (10, 13), (13, 13)]) # Hg issue 151: Request: \K. self.assertEqual(regex.search(r'(ab\Kcd)', 'abcd').group(0, 1), ('cd', 'abcd')) self.assertEqual(regex.findall(r'\w\w\K\w\w', 'abcdefgh'), ['cd', 'gh']) self.assertEqual(regex.findall(r'(\w\w\K\w\w)', 'abcdefgh'), ['abcd', 'efgh']) self.assertEqual(regex.search(r'(?r)(ab\Kcd)', 'abcd').group(0, 1), ('ab', 'abcd')) self.assertEqual(regex.findall(r'(?r)\w\w\K\w\w', 'abcdefgh'), ['ef', 'ab']) self.assertEqual(regex.findall(r'(?r)(\w\w\K\w\w)', 'abcdefgh'), ['efgh', 'abcd']) # Hg issue 153: Request: (*SKIP). self.assertEqual(regex.search(r'12(*FAIL)|3', '123')[0], '3') self.assertEqual(regex.search(r'(?r)12(*FAIL)|3', '123')[0], '3') self.assertEqual(regex.search(r'\d+(*PRUNE)\d', '123'), None) self.assertEqual(regex.search(r'\d+(?=(*PRUNE))\d', '123')[0], '123') self.assertEqual(regex.search(r'\d+(*PRUNE)bcd|[3d]', '123bcd')[0], '123bcd') self.assertEqual(regex.search(r'\d+(*PRUNE)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'\d+?(*PRUNE)bcd|[3d]', '123bcd')[0], '3bcd') self.assertEqual(regex.search(r'\d+?(*PRUNE)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=3(*PRUNE))zzd|[4d]$', '123zzd')[0], '123zzd') self.assertEqual(regex.search(r'\d++(?<=3(*PRUNE))zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=(*PRUNE)3)zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=2(*PRUNE)3)zzd|[3d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d(*PRUNE)\d+', '123'), None) self.assertEqual(regex.search(r'(?r)\d(?<=(*PRUNE))\d+', '123')[0], '123') self.assertEqual(regex.search(r'(?r)\d+(*PRUNE)bcd|[3d]', '123bcd')[0], '123bcd') self.assertEqual(regex.search(r'(?r)\d+(*PRUNE)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=3(*PRUNE))zzd|[4d]$', '123zzd')[0], '123zzd') self.assertEqual(regex.search(r'(?r)\d++(?<=3(*PRUNE))zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=(*PRUNE)3)zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=2(*PRUNE)3)zzd|[3d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d+(*SKIP)bcd|[3d]', '123bcd')[0], '123bcd') self.assertEqual(regex.search(r'\d+(*SKIP)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'\d+?(*SKIP)bcd|[3d]', '123bcd')[0], '3bcd') self.assertEqual(regex.search(r'\d+?(*SKIP)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=3(*SKIP))zzd|[4d]$', '123zzd')[0], '123zzd') self.assertEqual(regex.search(r'\d++(?<=3(*SKIP))zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=(*SKIP)3)zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=2(*SKIP)3)zzd|[3d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d+(*SKIP)bcd|[3d]', '123bcd')[0], '123bcd') self.assertEqual(regex.search(r'(?r)\d+(*SKIP)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=3(*SKIP))zzd|[4d]$', '123zzd')[0], '123zzd') self.assertEqual(regex.search(r'(?r)\d++(?<=3(*SKIP))zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=(*SKIP)3)zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=2(*SKIP)3)zzd|[3d]$', '124zzd')[0], 'd') # Hg issue 152: Request: Request: (?(DEFINE)...). self.assertEqual(regex.search(r'(?(DEFINE)(?\d+)(?\w+))(?&quant) (?&item)', '5 elephants')[0], '5 elephants') # Hg issue 150: Have an option for POSIX-compatible longest match of # alternates. self.assertEqual(regex.search(r'(?p)\d+(\w(\d*)?|[eE]([+-]\d+))', '10b12')[0], '10b12') self.assertEqual(regex.search(r'(?p)\d+(\w(\d*)?|[eE]([+-]\d+))', '10E+12')[0], '10E+12') self.assertEqual(regex.search(r'(?p)(\w|ae|oe|ue|ss)', 'ae')[0], 'ae') self.assertEqual(regex.search(r'(?p)one(self)?(selfsufficient)?', 'oneselfsufficient')[0], 'oneselfsufficient') # Hg issue 156: regression on atomic grouping self.assertEqual(regex.match('1(?>2)', '12').span(), (0, 2)) # Hg issue 157: regression: segfault on complex lookaround self.assertEqual(regex.match(r'(?V1w)(?=(?=[^A-Z]*+[A-Z])(?=[^a-z]*+[a-z]))(?=\D*+\d)(?=\p{Alphanumeric}*+\P{Alphanumeric})\A(?s:.){8,255}+\Z', 'AAaa11!!')[0], 'AAaa11!!') # Hg issue 158: Group issue with (?(DEFINE)...) TEST_REGEX = regex.compile(r'''(?smx) (?(DEFINE) (? ^,[^,]+, ) ) # Group 2 is defined on this line ^,([^,]+), (?:(?!(?&subcat)[\r\n]+(?&subcat)).)+ ''') TEST_DATA = ''' ,Cat 1, ,Brand 1, some thing ,Brand 2, other things ,Cat 2, ,Brand, Some thing ''' self.assertEqual([m.span(1, 2) for m in TEST_REGEX.finditer(TEST_DATA)], [((-1, -1), (2, 7)), ((-1, -1), (54, 59))]) # Hg issue 161: Unexpected fuzzy match results self.assertEqual(regex.search('(abcdefgh){e}', '******abcdefghijklmnopqrtuvwxyz', regex.BESTMATCH).span(), (6, 14)) self.assertEqual(regex.search('(abcdefghi){e}', '******abcdefghijklmnopqrtuvwxyz', regex.BESTMATCH).span(), (6, 15)) # Hg issue 163: allow lookarounds in conditionals. self.assertEqual(regex.match(r'(?:(?=\d)\d+\b|\w+)', '123abc').span(), (0, 6)) self.assertEqual(regex.match(r'(?(?=\d)\d+\b|\w+)', '123abc'), None) self.assertEqual(regex.search(r'(?(?<=love\s)you|(?<=hate\s)her)', "I love you").span(), (7, 10)) self.assertEqual(regex.findall(r'(?(?<=love\s)you|(?<=hate\s)her)', "I love you but I don't hate her either"), ['you', 'her']) # Hg issue #180: bug of POSIX matching. self.assertEqual(regex.search(r'(?p)a*(.*?)', 'aaabbb').group(0, 1), ('aaabbb', 'bbb')) self.assertEqual(regex.search(r'(?p)a*(.*)', 'aaabbb').group(0, 1), ('aaabbb', 'bbb')) self.assertEqual(regex.sub(r'(?p)a*(.*?)', r'\1', 'aaabbb'), 'bbb') self.assertEqual(regex.sub(r'(?p)a*(.*)', r'\1', 'aaabbb'), 'bbb') def test_subscripted_captures(self): self.assertEqual(regex.match(r'(?P.)+', 'abc').expandf('{0} {0[0]} {0[-1]}'), 'abc abc abc') self.assertEqual(regex.match(r'(?P.)+', 'abc').expandf('{1} {1[0]} {1[1]} {1[2]} {1[-1]} {1[-2]} {1[-3]}'), 'c a b c c b a') self.assertEqual(regex.match(r'(?P.)+', 'abc').expandf('{x} {x[0]} {x[1]} {x[2]} {x[-1]} {x[-2]} {x[-3]}'), 'c a b c c b a') self.assertEqual(regex.subf(r'(?P.)+', r'{0} {0[0]} {0[-1]}', 'abc'), 'abc abc abc') self.assertEqual(regex.subf(r'(?P.)+', '{1} {1[0]} {1[1]} {1[2]} {1[-1]} {1[-2]} {1[-3]}', 'abc'), 'c a b c c b a') self.assertEqual(regex.subf(r'(?P.)+', '{x} {x[0]} {x[1]} {x[2]} {x[-1]} {x[-2]} {x[-3]}', 'abc'), 'c a b c c b a') if not hasattr(str, "format"): # Strings don't have the .format method (below Python 2.6). del RegexTests.test_format del RegexTests.test_subscripted_captures def test_main(): run_unittest(RegexTests) if __name__ == "__main__": test_main() regex-2016.01.10/Python2/_regex.c0000666000000000000000000271155712644551563014363 0ustar 00000000000000/* Secret Labs' Regular Expression Engine * * regular expression matching engine * * partial history: * 1999-10-24 fl created (based on existing template matcher code) * 2000-03-06 fl first alpha, sort of * 2000-08-01 fl fixes for 1.6b1 * 2000-08-07 fl use PyOS_CheckStack() if available * 2000-09-20 fl added expand method * 2001-03-20 fl lots of fixes for 2.1b2 * 2001-04-15 fl export copyright as Python attribute, not global * 2001-04-28 fl added __copy__ methods (work in progress) * 2001-05-14 fl fixes for 1.5.2 compatibility * 2001-07-01 fl added BIGCHARSET support (from Martin von Loewis) * 2001-10-18 fl fixed group reset issue (from Matthew Mueller) * 2001-10-20 fl added split primitive; reenable unicode for 1.6/2.0/2.1 * 2001-10-21 fl added sub/subn primitive * 2001-10-24 fl added finditer primitive (for 2.2 only) * 2001-12-07 fl fixed memory leak in sub/subn (Guido van Rossum) * 2002-11-09 fl fixed empty sub/subn return type * 2003-04-18 mvl fully support 4-byte codes * 2003-10-17 gn implemented non recursive scheme * 2009-07-26 mrab completely re-designed matcher code * 2011-11-18 mrab added support for PEP 393 strings * * Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved. * * This version of the SRE library can be redistributed under CNRI's * Python 1.6 license. For any other use, please contact Secret Labs * AB (info@pythonware.com). * * Portions of this engine have been developed in cooperation with * CNRI. Hewlett-Packard provided funding for 1.6 integration and * other compatibility work. */ /* #define VERBOSE */ #if defined(VERBOSE) #define TRACE(X) printf X; #else #define TRACE(X) #endif #include "Python.h" #include "structmember.h" /* offsetof */ #include #include "_regex.h" #include "pyport.h" #include "pythread.h" #if PY_VERSION_HEX < 0x02060000 #if SIZEOF_SIZE_T == SIZEOF_LONG_LONG #define T_PYSSIZET T_LONGLONG #elif SIZEOF_SIZE_T == SIZEOF_LONG #define T_PYSSIZET T_LONG #else #error size_t is the same size as neither LONG nor LONGLONG #endif #endif typedef unsigned char Py_UCS1; typedef unsigned short Py_UCS2; typedef RE_UINT32 RE_CODE; /* Properties in the General Category. */ #define RE_PROP_GC_CN ((RE_PROP_GC << 16) | RE_PROP_CN) #define RE_PROP_GC_LU ((RE_PROP_GC << 16) | RE_PROP_LU) #define RE_PROP_GC_LL ((RE_PROP_GC << 16) | RE_PROP_LL) #define RE_PROP_GC_LT ((RE_PROP_GC << 16) | RE_PROP_LT) #define RE_PROP_GC_P ((RE_PROP_GC << 16) | RE_PROP_P) /* Unlimited repeat count. */ #define RE_UNLIMITED (~(RE_CODE)0) /* The status of a . */ typedef RE_UINT32 RE_STATUS_T; /* Whether to match concurrently, i.e. release the GIL while matching. */ #define RE_CONC_NO 0 #define RE_CONC_YES 1 #define RE_CONC_DEFAULT 2 /* The side that could truncate in a partial match. * * The values RE_PARTIAL_LEFT and RE_PARTIAL_RIGHT are also used as array * indexes, so they need to be 0 and 1. */ #define RE_PARTIAL_NONE -1 #define RE_PARTIAL_LEFT 0 #define RE_PARTIAL_RIGHT 1 /* Flags for the kind of 'sub' call: 'sub', 'subn', 'subf', 'subfn'. */ #define RE_SUB 0x0 #define RE_SUBN 0x1 #if PY_VERSION_HEX >= 0x02060000 #define RE_SUBF 0x2 #endif /* The name of this module, minus the leading underscore. */ #define RE_MODULE "regex" /* Error codes. */ #define RE_ERROR_SUCCESS 1 /* Successful match. */ #define RE_ERROR_FAILURE 0 /* Unsuccessful match. */ #define RE_ERROR_ILLEGAL -1 /* Illegal code. */ #define RE_ERROR_INTERNAL -2 /* Internal error. */ #define RE_ERROR_CONCURRENT -3 /* "concurrent" invalid. */ #define RE_ERROR_MEMORY -4 /* Out of memory. */ #define RE_ERROR_INTERRUPTED -5 /* Signal handler raised exception. */ #define RE_ERROR_REPLACEMENT -6 /* Invalid replacement string. */ #define RE_ERROR_INVALID_GROUP_REF -7 /* Invalid group reference. */ #define RE_ERROR_GROUP_INDEX_TYPE -8 /* Group index type error. */ #define RE_ERROR_NO_SUCH_GROUP -9 /* No such group. */ #define RE_ERROR_INDEX -10 /* String index. */ #define RE_ERROR_BACKTRACKING -11 /* Too much backtracking. */ #define RE_ERROR_NOT_STRING -12 /* Not a string. */ #define RE_ERROR_NOT_UNICODE -13 /* Not a Unicode string. */ #define RE_ERROR_PARTIAL -15 /* Partial match. */ /* The number of backtrack entries per allocated block. */ #define RE_BACKTRACK_BLOCK_SIZE 64 /* The maximum number of backtrack entries to allocate. */ #define RE_MAX_BACKTRACK_ALLOC (1024 * 1024) /* The number of atomic entries per allocated block. */ #define RE_ATOMIC_BLOCK_SIZE 64 /* The initial maximum capacity of the guard block. */ #define RE_INIT_GUARDS_BLOCK_SIZE 16 /* The initial maximum capacity of the node list. */ #define RE_INIT_NODE_LIST_SIZE 16 /* The size increment for various allocation lists. */ #define RE_LIST_SIZE_INC 16 /* The initial maximum capacity of the capture groups. */ #define RE_INIT_CAPTURE_SIZE 16 /* Node bitflags. */ #define RE_POSITIVE_OP 0x1 #define RE_ZEROWIDTH_OP 0x2 #define RE_FUZZY_OP 0x4 #define RE_REVERSE_OP 0x8 #define RE_REQUIRED_OP 0x10 /* Guards against further matching can occur at the start of the body and the * tail of a repeat containing a repeat. */ #define RE_STATUS_BODY 0x1 #define RE_STATUS_TAIL 0x2 /* Whether a guard is added depends on whether there's a repeat in the body of * the repeat or a group reference in the body or tail of the repeat. */ #define RE_STATUS_NEITHER 0x0 #define RE_STATUS_REPEAT 0x4 #define RE_STATUS_LIMITED 0x8 #define RE_STATUS_REF 0x10 #define RE_STATUS_VISITED_AG 0x20 #define RE_STATUS_VISITED_REP 0x40 /* Whether a string node has been initialised for fast searching. */ #define RE_STATUS_FAST_INIT 0x80 /* Whether a node us being used. (Additional nodes may be created while the * pattern is being built. */ #define RE_STATUS_USED 0x100 /* Whether a node is a string node. */ #define RE_STATUS_STRING 0x200 /* Whether a repeat node is within another repeat. */ #define RE_STATUS_INNER 0x400 /* Various flags stored in a node status member. */ #define RE_STATUS_SHIFT 11 #define RE_STATUS_FUZZY (RE_FUZZY_OP << RE_STATUS_SHIFT) #define RE_STATUS_REVERSE (RE_REVERSE_OP << RE_STATUS_SHIFT) #define RE_STATUS_REQUIRED (RE_REQUIRED_OP << RE_STATUS_SHIFT) #define RE_STATUS_HAS_GROUPS 0x10000 #define RE_STATUS_HAS_REPEATS 0x20000 /* The different error types for fuzzy matching. */ #define RE_FUZZY_SUB 0 #define RE_FUZZY_INS 1 #define RE_FUZZY_DEL 2 #define RE_FUZZY_ERR 3 #define RE_FUZZY_COUNT 3 /* The various values in a FUZZY node. */ #define RE_FUZZY_VAL_MAX_BASE 1 #define RE_FUZZY_VAL_MAX_SUB (RE_FUZZY_VAL_MAX_BASE + RE_FUZZY_SUB) #define RE_FUZZY_VAL_MAX_INS (RE_FUZZY_VAL_MAX_BASE + RE_FUZZY_INS) #define RE_FUZZY_VAL_MAX_DEL (RE_FUZZY_VAL_MAX_BASE + RE_FUZZY_DEL) #define RE_FUZZY_VAL_MAX_ERR (RE_FUZZY_VAL_MAX_BASE + RE_FUZZY_ERR) #define RE_FUZZY_VAL_COST_BASE 5 #define RE_FUZZY_VAL_SUB_COST (RE_FUZZY_VAL_COST_BASE + RE_FUZZY_SUB) #define RE_FUZZY_VAL_INS_COST (RE_FUZZY_VAL_COST_BASE + RE_FUZZY_INS) #define RE_FUZZY_VAL_DEL_COST (RE_FUZZY_VAL_COST_BASE + RE_FUZZY_DEL) #define RE_FUZZY_VAL_MAX_COST (RE_FUZZY_VAL_COST_BASE + RE_FUZZY_ERR) /* The various values in an END_FUZZY node. */ #define RE_FUZZY_VAL_MIN_BASE 1 #define RE_FUZZY_VAL_MIN_SUB (RE_FUZZY_VAL_MIN_BASE + RE_FUZZY_SUB) #define RE_FUZZY_VAL_MIN_INS (RE_FUZZY_VAL_MIN_BASE + RE_FUZZY_INS) #define RE_FUZZY_VAL_MIN_DEL (RE_FUZZY_VAL_MIN_BASE + RE_FUZZY_DEL) #define RE_FUZZY_VAL_MIN_ERR (RE_FUZZY_VAL_MIN_BASE + RE_FUZZY_ERR) /* The maximum number of errors when trying to improve a fuzzy match. */ #define RE_MAX_ERRORS 10 /* The flags which will be set for full Unicode case folding. */ #define RE_FULL_CASE_FOLDING (RE_FLAG_UNICODE | RE_FLAG_FULLCASE | RE_FLAG_IGNORECASE) /* The shortest string prefix for which we'll use a fast string search. */ #define RE_MIN_FAST_LENGTH 5 static char copyright[] = " RE 2.3.0 Copyright (c) 1997-2002 by Secret Labs AB "; /* The exception to raise on error. */ static PyObject* error_exception; /* The dictionary of Unicode properties. */ static PyObject* property_dict; typedef struct RE_State* RE_StatePtr; /* Bit-flags for the common character properties supported by locale-sensitive * matching. */ #define RE_LOCALE_ALNUM 0x001 #define RE_LOCALE_ALPHA 0x002 #define RE_LOCALE_CNTRL 0x004 #define RE_LOCALE_DIGIT 0x008 #define RE_LOCALE_GRAPH 0x010 #define RE_LOCALE_LOWER 0x020 #define RE_LOCALE_PRINT 0x040 #define RE_LOCALE_PUNCT 0x080 #define RE_LOCALE_SPACE 0x100 #define RE_LOCALE_UPPER 0x200 /* Info about the current locale. * * Used by patterns that are locale-sensitive. */ typedef struct RE_LocaleInfo { unsigned short properties[0x100]; unsigned char uppercase[0x100]; unsigned char lowercase[0x100]; } RE_LocaleInfo; /* Handlers for ASCII, locale and Unicode. */ typedef struct RE_EncodingTable { BOOL (*has_property)(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch); BOOL (*at_boundary)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_word_start)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_word_end)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_default_boundary)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_default_word_start)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_default_word_end)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_grapheme_boundary)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*is_line_sep)(Py_UCS4 ch); BOOL (*at_line_start)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_line_end)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*possible_turkic)(RE_LocaleInfo* locale_info, Py_UCS4 ch); int (*all_cases)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* codepoints); Py_UCS4 (*simple_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch); int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); int (*all_turkic_i)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* cases); } RE_EncodingTable; /* Position within the regex and text. */ typedef struct RE_Position { struct RE_Node* node; Py_ssize_t text_pos; } RE_Position; /* Info about fuzzy matching. */ typedef struct RE_FuzzyInfo { struct RE_Node* node; size_t counts[RE_FUZZY_COUNT + 1]; /* Add 1 for total errors. */ size_t total_cost; } RE_FuzzyInfo; /* Storage for backtrack data. */ typedef struct RE_BacktrackData { union { struct { size_t capture_change; BOOL too_few_errors; } atomic; struct { RE_Position position; } branch; struct { RE_FuzzyInfo fuzzy_info; Py_ssize_t text_pos; RE_CODE index; } fuzzy; struct { RE_Position position; size_t count; struct RE_Node* fuzzy_node; BOOL too_few_errors; } fuzzy_insert; struct { RE_Position position; RE_INT8 fuzzy_type; RE_INT8 step; } fuzzy_item; struct { RE_Position position; Py_ssize_t string_pos; RE_INT8 fuzzy_type; RE_INT8 folded_pos; RE_INT8 folded_len; RE_INT8 gfolded_pos; RE_INT8 gfolded_len; RE_INT8 step; } fuzzy_string; struct { Py_ssize_t text_pos; Py_ssize_t current_capture; RE_CODE private_index; RE_CODE public_index; BOOL capture; } group; struct { struct RE_Node* node; size_t capture_change; } group_call; struct { Py_ssize_t match_pos; } keep; struct { struct RE_Node* node; size_t capture_change; BOOL too_few_errors; BOOL inside; } lookaround; struct { RE_Position position; Py_ssize_t text_pos; size_t count; Py_ssize_t start; size_t capture_change; RE_CODE index; } repeat; }; RE_UINT8 op; } RE_BacktrackData; /* Storage for backtrack data is allocated in blocks for speed. */ typedef struct RE_BacktrackBlock { RE_BacktrackData items[RE_BACKTRACK_BLOCK_SIZE]; struct RE_BacktrackBlock* previous; struct RE_BacktrackBlock* next; size_t capacity; size_t count; } RE_BacktrackBlock; /* Storage for atomic data. */ typedef struct RE_AtomicData { RE_BacktrackBlock* current_backtrack_block; size_t backtrack_count; struct RE_Node* node; RE_BacktrackData* backtrack; struct RE_SavedGroups* saved_groups; struct RE_SavedRepeats* saved_repeats; Py_ssize_t slice_start; Py_ssize_t slice_end; Py_ssize_t text_pos; BOOL is_lookaround; BOOL has_groups; BOOL has_repeats; } RE_AtomicData; /* Storage for atomic data is allocated in blocks for speed. */ typedef struct RE_AtomicBlock { RE_AtomicData items[RE_ATOMIC_BLOCK_SIZE]; struct RE_AtomicBlock* previous; struct RE_AtomicBlock* next; size_t capacity; size_t count; } RE_AtomicBlock; /* Storage for saved groups. */ typedef struct RE_SavedGroups { struct RE_SavedGroups* previous; struct RE_SavedGroups* next; struct RE_GroupSpan* spans; size_t* counts; } RE_SavedGroups; /* Storage for info around a recursive by 'basic'match'. */ typedef struct RE_Info { RE_BacktrackBlock* current_backtrack_block; size_t backtrack_count; RE_SavedGroups* current_saved_groups; struct RE_GroupCallFrame* current_group_call_frame; BOOL must_advance; } RE_Info; /* Storage for the next node. */ typedef struct RE_NextNode { struct RE_Node* node; struct RE_Node* test; struct RE_Node* match_next; Py_ssize_t match_step; } RE_NextNode; /* A pattern node. */ typedef struct RE_Node { RE_NextNode next_1; union { struct { RE_NextNode next_2; } nonstring; struct { /* Used only if (node->status & RE_STATUS_STRING) is true. */ Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; } string; }; Py_ssize_t step; size_t value_count; RE_CODE* values; RE_STATUS_T status; RE_UINT8 op; BOOL match; } RE_Node; /* Info about a group's span. */ typedef struct RE_GroupSpan { Py_ssize_t start; Py_ssize_t end; } RE_GroupSpan; /* Span of a guard (inclusive range). */ typedef struct RE_GuardSpan { Py_ssize_t low; Py_ssize_t high; BOOL protect; } RE_GuardSpan; /* Spans guarded against further matching. */ typedef struct RE_GuardList { size_t capacity; size_t count; RE_GuardSpan* spans; Py_ssize_t last_text_pos; size_t last_low; } RE_GuardList; /* Info about a group. */ typedef struct RE_GroupData { RE_GroupSpan span; size_t capture_count; size_t capture_capacity; Py_ssize_t current_capture; RE_GroupSpan* captures; } RE_GroupData; /* Info about a repeat. */ typedef struct RE_RepeatData { RE_GuardList body_guard_list; RE_GuardList tail_guard_list; size_t count; Py_ssize_t start; size_t capture_change; } RE_RepeatData; /* Storage for saved repeats. */ typedef struct RE_SavedRepeats { struct RE_SavedRepeats* previous; struct RE_SavedRepeats* next; RE_RepeatData* repeats; } RE_SavedRepeats; /* Guards for fuzzy sections. */ typedef struct RE_FuzzyGuards { RE_GuardList body_guard_list; RE_GuardList tail_guard_list; } RE_FuzzyGuards; /* Info about a capture group. */ typedef struct RE_GroupInfo { Py_ssize_t end_index; RE_Node* node; BOOL referenced; BOOL has_name; } RE_GroupInfo; /* Info about a call_ref. */ typedef struct RE_CallRefInfo { RE_Node* node; BOOL defined; BOOL used; } RE_CallRefInfo; /* Info about a repeat. */ typedef struct RE_RepeatInfo { RE_STATUS_T status; } RE_RepeatInfo; /* Stack frame for a group call. */ typedef struct RE_GroupCallFrame { struct RE_GroupCallFrame* previous; struct RE_GroupCallFrame* next; RE_Node* node; RE_GroupData* groups; RE_RepeatData* repeats; } RE_GroupCallFrame; /* Info about a string argument. */ typedef struct RE_StringInfo { #if PY_VERSION_HEX >= 0x02060000 Py_buffer view; /* View of the string if it's a buffer object. */ #endif void* characters; /* Pointer to the characters of the string. */ Py_ssize_t length; /* Length of the string. */ Py_ssize_t charsize; /* Size of the characters in the string. */ BOOL is_unicode; /* Whether the string is Unicode. */ BOOL should_release; /* Whether the buffer should be released. */ } RE_StringInfo; /* Info about where the next match was found, starting from a certain search * position. This is used when a pattern starts with a BRANCH. */ #define MAX_SEARCH_POSITIONS 7 /* Info about a search position. */ typedef struct { Py_ssize_t start_pos; Py_ssize_t match_pos; } RE_SearchPosition; /* The state object used during matching. */ typedef struct RE_State { struct PatternObject* pattern; /* Parent PatternObject. */ /* Info about the string being matched. */ PyObject* string; #if PY_VERSION_HEX >= 0x02060000 Py_buffer view; /* View of the string if it's a buffer object. */ #endif Py_ssize_t charsize; void* text; Py_ssize_t text_length; /* The slice of the string being searched. */ Py_ssize_t slice_start; Py_ssize_t slice_end; /* Info about the capture groups. */ RE_GroupData* groups; Py_ssize_t lastindex; Py_ssize_t lastgroup; /* Info about the repeats. */ RE_RepeatData* repeats; Py_ssize_t search_anchor; /* Where the last match finished. */ Py_ssize_t match_pos; /* The start position of the match. */ Py_ssize_t text_pos; /* The current position of the match. */ Py_ssize_t final_newline; /* The index of newline at end of string, or -1. */ Py_ssize_t final_line_sep; /* The index of line separator at end of string, or -1. */ /* Storage for backtrack info. */ RE_BacktrackBlock backtrack_block; RE_BacktrackBlock* current_backtrack_block; Py_ssize_t backtrack_allocated; RE_BacktrackData* backtrack; RE_AtomicBlock* current_atomic_block; /* Storage for saved capture groups. */ RE_SavedGroups* first_saved_groups; RE_SavedGroups* current_saved_groups; RE_SavedRepeats* first_saved_repeats; RE_SavedRepeats* current_saved_repeats; /* Info about the best POSIX match (leftmost longest). */ Py_ssize_t best_match_pos; Py_ssize_t best_text_pos; RE_GroupData* best_match_groups; /* Miscellaneous. */ Py_ssize_t min_width; /* The minimum width of the string to match (assuming it's not a fuzzy pattern). */ RE_EncodingTable* encoding; /* The 'encoding' of the string being searched. */ RE_LocaleInfo* locale_info; /* Info about the locale, if needed. */ Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); void* (*point_to)(void* text, Py_ssize_t pos); PyThread_type_lock lock; /* A lock for accessing the state across threads. */ RE_FuzzyInfo fuzzy_info; /* Info about fuzzy matching. */ size_t total_fuzzy_counts[RE_FUZZY_COUNT]; /* Totals for fuzzy matching. */ size_t best_fuzzy_counts[RE_FUZZY_COUNT]; /* Best totals for fuzzy matching. */ RE_FuzzyGuards* fuzzy_guards; /* The guards for a fuzzy match. */ size_t total_errors; /* The total number of errors of a fuzzy match. */ size_t max_errors; /* The maximum permitted number of errors. */ size_t fewest_errors; /* The fewest errors so far of an enhanced fuzzy match. */ /* The group call stack. */ RE_GroupCallFrame* first_group_call_frame; RE_GroupCallFrame* current_group_call_frame; RE_GuardList* group_call_guard_list; RE_SearchPosition search_positions[MAX_SEARCH_POSITIONS]; /* Where the search matches next. */ size_t capture_change; /* Incremented every time a captive group changes. */ Py_ssize_t req_pos; /* The position where the required string matched. */ Py_ssize_t req_end; /* The end position where the required string matched. */ int partial_side; /* The side that could truncate in a partial match. */ RE_UINT16 iterations; /* The number of iterations the matching engine has performed since checking for KeyboardInterrupt. */ BOOL is_unicode; /* Whether the string to be matched is Unicode. */ BOOL should_release; /* Whether the buffer should be released. */ BOOL overlapped; /* Whether the matches can be overlapped. */ BOOL reverse; /* Whether it's a reverse pattern. */ BOOL visible_captures; /* Whether the 'captures' method will be visible. */ BOOL version_0; /* Whether to perform version_0 behaviour (same as re module). */ BOOL must_advance; /* Whether the end of the match must advance past its start. */ BOOL is_multithreaded; /* Whether to release the GIL while matching. */ BOOL too_few_errors; /* Whether there were too few fuzzy errors. */ BOOL match_all; /* Whether to match all of the string ('fullmatch'). */ BOOL found_match; /* Whether a POSIX match has been found. */ } RE_State; /* Storage for the regex state and thread state. * * Scanner objects can sometimes be shared across threads, which means that * their RE_State structs are also shared. This isn't safe when the GIL is * released, so in such instances we have a lock (mutex) in the RE_State struct * to protect it during matching. We also need a thread-safe place to store the * thread state when releasing the GIL. */ typedef struct RE_SafeState { RE_State* re_state; PyThreadState* thread_state; } RE_SafeState; /* The PatternObject created from a regular expression. */ typedef struct PatternObject { PyObject_HEAD PyObject* pattern; /* Pattern source (or None). */ Py_ssize_t flags; /* Flags used when compiling pattern source. */ PyObject* weakreflist; /* List of weak references */ /* Nodes into which the regular expression is compiled. */ RE_Node* start_node; RE_Node* start_test; size_t true_group_count; /* The true number of capture groups. */ size_t public_group_count; /* The number of public capture groups. */ size_t repeat_count; /* The number of repeats. */ Py_ssize_t group_end_index; /* The number of group closures. */ PyObject* groupindex; PyObject* indexgroup; PyObject* named_lists; size_t named_lists_count; PyObject** partial_named_lists[2]; PyObject* named_list_indexes; /* Storage for the pattern nodes. */ size_t node_capacity; size_t node_count; RE_Node** node_list; /* Info about the capture groups. */ size_t group_info_capacity; RE_GroupInfo* group_info; /* Info about the call_refs. */ size_t call_ref_info_capacity; size_t call_ref_info_count; RE_CallRefInfo* call_ref_info; Py_ssize_t pattern_call_ref; /* Info about the repeats. */ size_t repeat_info_capacity; RE_RepeatInfo* repeat_info; Py_ssize_t min_width; /* The minimum width of the string to match (assuming it isn't a fuzzy pattern). */ RE_EncodingTable* encoding; /* Encoding handlers. */ RE_LocaleInfo* locale_info; /* Info about the locale, if needed. */ RE_GroupData* groups_storage; RE_RepeatData* repeats_storage; size_t fuzzy_count; /* The number of fuzzy sections. */ Py_ssize_t req_offset; /* The offset to the required string. */ RE_Node* req_string; /* The required string. */ BOOL is_fuzzy; /* Whether it's a fuzzy pattern. */ BOOL do_search_start; /* Whether to do an initial search. */ BOOL recursive; /* Whether the entire pattern is recursive. */ } PatternObject; /* The MatchObject created when a match is found. */ typedef struct MatchObject { PyObject_HEAD PyObject* string; /* Link to the target string or NULL if detached. */ PyObject* substring; /* Link to (a substring of) the target string. */ Py_ssize_t substring_offset; /* Offset into the target string. */ PatternObject* pattern; /* Link to the regex (pattern) object. */ Py_ssize_t pos; /* Start of current slice. */ Py_ssize_t endpos; /* End of current slice. */ Py_ssize_t match_start; /* Start of matched slice. */ Py_ssize_t match_end; /* End of matched slice. */ Py_ssize_t lastindex; /* Last group seen by the engine (-1 if none). */ Py_ssize_t lastgroup; /* Last named group seen by the engine (-1 if none). */ size_t group_count; /* The number of groups. */ RE_GroupData* groups; /* The capture groups. */ PyObject* regs; size_t fuzzy_counts[RE_FUZZY_COUNT]; BOOL partial; /* Whether it's a partial match. */ } MatchObject; /* The ScannerObject. */ typedef struct ScannerObject { PyObject_HEAD PatternObject* pattern; RE_State state; int status; } ScannerObject; /* The SplitterObject. */ typedef struct SplitterObject { PyObject_HEAD PatternObject* pattern; RE_State state; Py_ssize_t maxsplit; Py_ssize_t last_pos; Py_ssize_t split_count; Py_ssize_t index; int status; } SplitterObject; #if PY_VERSION_HEX >= 0x02060000 /* The CaptureObject. */ typedef struct CaptureObject { PyObject_HEAD Py_ssize_t group_index; MatchObject** match_indirect; } CaptureObject; #endif /* Info used when compiling a pattern to nodes. */ typedef struct RE_CompileArgs { RE_CODE* code; /* The start of the compiled pattern. */ RE_CODE* end_code; /* The end of the compiled pattern. */ PatternObject* pattern; /* The pattern object. */ Py_ssize_t min_width; /* The minimum width of the string to match (assuming it isn't a fuzzy pattern). */ RE_Node* start; /* The start node. */ RE_Node* end; /* The end node. */ size_t repeat_depth; /* The nesting depth of the repeat. */ BOOL forward; /* Whether it's a forward (not reverse) pattern. */ BOOL visible_captures; /* Whether all of the captures will be visible. */ BOOL has_captures; /* Whether the pattern has capture groups. */ BOOL is_fuzzy; /* Whether the pattern (or some part of it) is fuzzy. */ BOOL within_fuzzy; /* Whether the subpattern is within a fuzzy section. */ BOOL has_groups; /* Whether the subpattern contains captures. */ BOOL has_repeats; /* Whether the subpattern contains repeats. */ } RE_CompileArgs; /* The string slices which will be concatenated to make the result string of * the 'sub' method. * * This allows us to avoid creating a list of slices if there of fewer than 2 * of them. Empty strings aren't recorded, so if 'list' and 'item' are both * NULL then the result is an empty string. */ typedef struct JoinInfo { PyObject* list; /* The list of slices if there are more than 2 of them. */ PyObject* item; /* The slice if there is only 1 of them. */ BOOL reversed; /* Whether the slices have been found in reverse order. */ BOOL is_unicode; /* Whether the string is Unicode. */ } JoinInfo; /* Info about fuzzy matching. */ typedef struct { RE_Node* new_node; Py_ssize_t new_text_pos; Py_ssize_t limit; Py_ssize_t new_string_pos; int step; int new_folded_pos; int folded_len; int new_gfolded_pos; int new_group_pos; int fuzzy_type; BOOL permit_insertion; } RE_FuzzyData; typedef struct RE_BestEntry { Py_ssize_t match_pos; Py_ssize_t text_pos; } RE_BestEntry; typedef struct RE_BestList { size_t capacity; size_t count; RE_BestEntry* entries; } RE_BestList; /* Function types for getting info from a MatchObject. */ typedef PyObject* (*RE_GetByIndexFunc)(MatchObject* self, Py_ssize_t index); /* Returns the magnitude of a 'Py_ssize_t' value. */ Py_LOCAL_INLINE(Py_ssize_t) abs_ssize_t(Py_ssize_t x) { return x >= 0 ? x : -x; } /* Returns the minimum of 2 'Py_ssize_t' values. */ Py_LOCAL_INLINE(Py_ssize_t) min_ssize_t(Py_ssize_t x, Py_ssize_t y) { return x <= y ? x : y; } /* Returns the maximum of 2 'Py_ssize_t' values. */ Py_LOCAL_INLINE(Py_ssize_t) max_ssize_t(Py_ssize_t x, Py_ssize_t y) { return x >= y ? x : y; } /* Returns the minimum of 2 'size_t' values. */ Py_LOCAL_INLINE(size_t) min_size_t(size_t x, size_t y) { return x <= y ? x : y; } /* Returns the maximum of 2 'size_t' values. */ Py_LOCAL_INLINE(size_t) max_size_t(size_t x, size_t y) { return x >= y ? x : y; } /* Returns the 'maximum' of 2 RE_STATUS_T values. */ Py_LOCAL_INLINE(RE_STATUS_T) max_status_2(RE_STATUS_T x, RE_STATUS_T y) { return x >= y ? x : y; } /* Returns the 'maximum' of 3 RE_STATUS_T values. */ Py_LOCAL_INLINE(RE_STATUS_T) max_status_3(RE_STATUS_T x, RE_STATUS_T y, RE_STATUS_T z) { return max_status_2(x, max_status_2(y, z)); } /* Returns the 'maximum' of 4 RE_STATUS_T values. */ Py_LOCAL_INLINE(RE_STATUS_T) max_status_4(RE_STATUS_T w, RE_STATUS_T x, RE_STATUS_T y, RE_STATUS_T z) { return max_status_2(max_status_2(w, x), max_status_2(y, z)); } /* Gets a character at a position assuming 1 byte per character. */ static Py_UCS4 bytes1_char_at(void* text, Py_ssize_t pos) { return *((Py_UCS1*)text + pos); } /* Sets a character at a position assuming 1 byte per character. */ static void bytes1_set_char_at(void* text, Py_ssize_t pos, Py_UCS4 ch) { *((Py_UCS1*)text + pos) = (Py_UCS1)ch; } /* Gets a pointer to a position assuming 1 byte per character. */ static void* bytes1_point_to(void* text, Py_ssize_t pos) { return (Py_UCS1*)text + pos; } /* Gets a character at a position assuming 2 bytes per character. */ static Py_UCS4 bytes2_char_at(void* text, Py_ssize_t pos) { return *((Py_UCS2*)text + pos); } /* Sets a character at a position assuming 2 bytes per character. */ static void bytes2_set_char_at(void* text, Py_ssize_t pos, Py_UCS4 ch) { *((Py_UCS2*)text + pos) = (Py_UCS2)ch; } /* Gets a pointer to a position assuming 2 bytes per character. */ static void* bytes2_point_to(void* text, Py_ssize_t pos) { return (Py_UCS2*)text + pos; } /* Gets a character at a position assuming 4 bytes per character. */ static Py_UCS4 bytes4_char_at(void* text, Py_ssize_t pos) { return *((Py_UCS4*)text + pos); } /* Sets a character at a position assuming 4 bytes per character. */ static void bytes4_set_char_at(void* text, Py_ssize_t pos, Py_UCS4 ch) { *((Py_UCS4*)text + pos) = (Py_UCS4)ch; } /* Gets a pointer to a position assuming 4 bytes per character. */ static void* bytes4_point_to(void* text, Py_ssize_t pos) { return (Py_UCS4*)text + pos; } /* Default for whether a position is on a word boundary. */ static BOOL at_boundary_always(RE_State* state, Py_ssize_t text_pos) { return TRUE; } /* Converts a BOOL to success/failure. */ Py_LOCAL_INLINE(int) bool_as_status(BOOL value) { return value ? RE_ERROR_SUCCESS : RE_ERROR_FAILURE; } /* ASCII-specific. */ Py_LOCAL_INLINE(BOOL) unicode_has_property(RE_CODE property, Py_UCS4 ch); /* Checks whether a character has a property. */ Py_LOCAL_INLINE(BOOL) ascii_has_property(RE_CODE property, Py_UCS4 ch) { if (ch > RE_ASCII_MAX) { /* Outside the ASCII range. */ RE_UINT32 value; value = property & 0xFFFF; return value == 0; } return unicode_has_property(property, ch); } /* Wrapper for calling 'ascii_has_property' via a pointer. */ static BOOL ascii_has_property_wrapper(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch) { return ascii_has_property(property, ch); } /* Checks whether there's a word character to the left. */ Py_LOCAL_INLINE(BOOL) ascii_word_left(RE_State* state, Py_ssize_t text_pos) { return text_pos > 0 && ascii_has_property(RE_PROP_WORD, state->char_at(state->text, text_pos - 1)); } /* Checks whether there's a word character to the right. */ Py_LOCAL_INLINE(BOOL) ascii_word_right(RE_State* state, Py_ssize_t text_pos) { return text_pos < state->text_length && ascii_has_property(RE_PROP_WORD, state->char_at(state->text, text_pos)); } /* Checks whether a position is on a word boundary. */ static BOOL ascii_at_boundary(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = ascii_word_left(state, text_pos); right = ascii_word_right(state, text_pos); return left != right; } /* Checks whether a position is at the start of a word. */ static BOOL ascii_at_word_start(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = ascii_word_left(state, text_pos); right = ascii_word_right(state, text_pos); return !left && right; } /* Checks whether a position is at the end of a word. */ static BOOL ascii_at_word_end(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = ascii_word_left(state, text_pos); right = ascii_word_right(state, text_pos); return left && !right; } /* Checks whether a character is a line separator. */ static BOOL ascii_is_line_sep(Py_UCS4 ch) { return 0x0A <= ch && ch <= 0x0D; } /* Checks whether a position is at the start of a line. */ static BOOL ascii_at_line_start(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 ch; if (text_pos <= 0) return TRUE; ch = state->char_at(state->text, text_pos - 1); if (ch == 0x0D) { if (text_pos >= state->text_length) return TRUE; /* No line break inside CRLF. */ return state->char_at(state->text, text_pos) != 0x0A; } return 0x0A <= ch && ch <= 0x0D; } /* Checks whether a position is at the end of a line. */ static BOOL ascii_at_line_end(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 ch; if (text_pos >= state->text_length) return TRUE; ch = state->char_at(state->text, text_pos); if (ch == 0x0A) { if (text_pos <= 0) return TRUE; /* No line break inside CRLF. */ return state->char_at(state->text, text_pos - 1) != 0x0D; } return 0x0A <= ch && ch <= 0x0D; } /* Checks whether a character could be Turkic (variants of I/i). For ASCII, it * won't be. */ static BOOL ascii_possible_turkic(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return FALSE; } /* Gets all the cases of a character. */ static int ascii_all_cases(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* codepoints) { int count; count = 0; codepoints[count++] = ch; if (('A' <= ch && ch <= 'Z') || ('a' <= ch && ch <= 'z')) /* It's a letter, so add the other case. */ codepoints[count++] = ch ^ 0x20; return count; } /* Returns a character with its case folded. */ static Py_UCS4 ascii_simple_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch) { if ('A' <= ch && ch <= 'Z') /* Uppercase folds to lowercase. */ return ch ^ 0x20; return ch; } /* Returns a character with its case folded. */ static int ascii_full_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded) { if ('A' <= ch && ch <= 'Z') /* Uppercase folds to lowercase. */ folded[0] = ch ^ 0x20; else folded[0] = ch; return 1; } /* Gets all the case variants of Turkic 'I'. The given character will be listed * first. */ static int ascii_all_turkic_i(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* cases) { int count; count = 0; cases[count++] = ch; if (ch != 'I') cases[count++] = 'I'; if (ch != 'i') cases[count++] = 'i'; return count; } /* The handlers for ASCII characters. */ static RE_EncodingTable ascii_encoding = { ascii_has_property_wrapper, ascii_at_boundary, ascii_at_word_start, ascii_at_word_end, ascii_at_boundary, /* No special "default word boundary" for ASCII. */ ascii_at_word_start, /* No special "default start of word" for ASCII. */ ascii_at_word_end, /* No special "default end of a word" for ASCII. */ at_boundary_always, /* No special "grapheme boundary" for ASCII. */ ascii_is_line_sep, ascii_at_line_start, ascii_at_line_end, ascii_possible_turkic, ascii_all_cases, ascii_simple_case_fold, ascii_full_case_fold, ascii_all_turkic_i, }; /* Locale-specific. */ /* Checks whether a character has the 'alnum' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isalnum(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_ALNUM) != 0; } /* Checks whether a character has the 'alpha' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isalpha(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_ALPHA) != 0; } /* Checks whether a character has the 'cntrl' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_iscntrl(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_CNTRL) != 0; } /* Checks whether a character has the 'digit' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isdigit(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_DIGIT) != 0; } /* Checks whether a character has the 'graph' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isgraph(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_GRAPH) != 0; } /* Checks whether a character has the 'lower' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_islower(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_LOWER) != 0; } /* Checks whether a character has the 'print' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isprint(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_PRINT) != 0; } /* Checks whether a character has the 'punct' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_ispunct(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_PUNCT) != 0; } /* Checks whether a character has the 'space' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isspace(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_SPACE) != 0; } /* Checks whether a character has the 'upper' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isupper(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_UPPER) != 0; } /* Converts a character to lowercase in the given locale. */ Py_LOCAL_INLINE(Py_UCS4) locale_tolower(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX ? locale_info->lowercase[ch] : ch; } /* Converts a character to uppercase in the given locale. */ Py_LOCAL_INLINE(Py_UCS4) locale_toupper(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX ? locale_info->uppercase[ch] : ch; } /* Checks whether a character has a property. */ Py_LOCAL_INLINE(BOOL) locale_has_property(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch) { RE_UINT32 value; RE_UINT32 v; value = property & 0xFFFF; if (ch > RE_LOCALE_MAX) /* Outside the locale range. */ return value == 0; switch (property >> 16) { case RE_PROP_ALNUM >> 16: v = locale_isalnum(locale_info, ch) != 0; break; case RE_PROP_ALPHA >> 16: v = locale_isalpha(locale_info, ch) != 0; break; case RE_PROP_ANY >> 16: v = 1; break; case RE_PROP_ASCII >> 16: v = ch <= RE_ASCII_MAX; break; case RE_PROP_BLANK >> 16: v = ch == '\t' || ch == ' '; break; case RE_PROP_GC: switch (property) { case RE_PROP_ASSIGNED: v = ch <= RE_LOCALE_MAX; break; case RE_PROP_CASEDLETTER: v = locale_isalpha(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_CNTRL: v = locale_iscntrl(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_DIGIT: v = locale_isdigit(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_GC_CN: v = ch > RE_LOCALE_MAX; break; case RE_PROP_GC_LL: v = locale_islower(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_GC_LU: v = locale_isupper(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_GC_P: v = locale_ispunct(locale_info, ch) ? value : 0xFFFF; break; default: v = 0xFFFF; break; } break; case RE_PROP_GRAPH >> 16: v = locale_isgraph(locale_info, ch) != 0; break; case RE_PROP_LOWER >> 16: v = locale_islower(locale_info, ch) != 0; break; case RE_PROP_POSIX_ALNUM >> 16: v = re_get_posix_alnum(ch) != 0; break; case RE_PROP_POSIX_DIGIT >> 16: v = re_get_posix_digit(ch) != 0; break; case RE_PROP_POSIX_PUNCT >> 16: v = re_get_posix_punct(ch) != 0; break; case RE_PROP_POSIX_XDIGIT >> 16: v = re_get_posix_xdigit(ch) != 0; break; case RE_PROP_PRINT >> 16: v = locale_isprint(locale_info, ch) != 0; break; case RE_PROP_SPACE >> 16: v = locale_isspace(locale_info, ch) != 0; break; case RE_PROP_UPPER >> 16: v = locale_isupper(locale_info, ch) != 0; break; case RE_PROP_WORD >> 16: v = ch == '_' || locale_isalnum(locale_info, ch) != 0; break; case RE_PROP_XDIGIT >> 16: v = re_get_hex_digit(ch) != 0; break; default: v = 0; break; } return v == value; } /* Wrapper for calling 'locale_has_property' via a pointer. */ static BOOL locale_has_property_wrapper(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch) { return locale_has_property(locale_info, property, ch); } /* Checks whether there's a word character to the left. */ Py_LOCAL_INLINE(BOOL) locale_word_left(RE_State* state, Py_ssize_t text_pos) { return text_pos > 0 && locale_has_property(state->locale_info, RE_PROP_WORD, state->char_at(state->text, text_pos - 1)); } /* Checks whether there's a word character to the right. */ Py_LOCAL_INLINE(BOOL) locale_word_right(RE_State* state, Py_ssize_t text_pos) { return text_pos < state->text_length && locale_has_property(state->locale_info, RE_PROP_WORD, state->char_at(state->text, text_pos)); } /* Checks whether a position is on a word boundary. */ static BOOL locale_at_boundary(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = locale_word_left(state, text_pos); right = locale_word_right(state, text_pos); return left != right; } /* Checks whether a position is at the start of a word. */ static BOOL locale_at_word_start(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = locale_word_left(state, text_pos); right = locale_word_right(state, text_pos); return !left && right; } /* Checks whether a position is at the end of a word. */ static BOOL locale_at_word_end(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = locale_word_left(state, text_pos); right = locale_word_right(state, text_pos); return left && !right; } /* Checks whether a character could be Turkic (variants of I/i). */ static BOOL locale_possible_turkic(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return locale_toupper(locale_info, ch) == 'I' || locale_tolower(locale_info, ch) == 'i'; } /* Gets all the cases of a character. */ static int locale_all_cases(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* codepoints) { int count; Py_UCS4 other; count = 0; codepoints[count++] = ch; other = locale_toupper(locale_info, ch); if (other != ch) codepoints[count++] = other; other = locale_tolower(locale_info, ch); if (other != ch) codepoints[count++] = other; return count; } /* Returns a character with its case folded. */ static Py_UCS4 locale_simple_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return locale_tolower(locale_info, ch); } /* Returns a character with its case folded. */ static int locale_full_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded) { folded[0] = locale_tolower(locale_info, ch); return 1; } /* Gets all the case variants of Turkic 'I'. The given character will be listed * first. */ static int locale_all_turkic_i(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* cases) { int count; Py_UCS4 other; count = 0; cases[count++] = ch; if (ch != 'I') cases[count++] = 'I'; if (ch != 'i') cases[count++] = 'i'; /* Uppercase 'i' will be either dotted (Turkic) or dotless (non-Turkic). */ other = locale_toupper(locale_info, 'i'); if (other != ch && other != 'I') cases[count++] = other; /* Lowercase 'I' will be either dotless (Turkic) or dotted (non-Turkic). */ other = locale_tolower(locale_info, 'I'); if (other != ch && other != 'i') cases[count++] = other; return count; } /* The handlers for locale characters. */ static RE_EncodingTable locale_encoding = { locale_has_property_wrapper, locale_at_boundary, locale_at_word_start, locale_at_word_end, locale_at_boundary, /* No special "default word boundary" for locale. */ locale_at_word_start, /* No special "default start of a word" for locale. */ locale_at_word_end, /* No special "default end of a word" for locale. */ at_boundary_always, /* No special "grapheme boundary" for locale. */ ascii_is_line_sep, /* Assume locale line separators are same as ASCII. */ ascii_at_line_start, /* Assume locale line separators are same as ASCII. */ ascii_at_line_end, /* Assume locale line separators are same as ASCII. */ locale_possible_turkic, locale_all_cases, locale_simple_case_fold, locale_full_case_fold, locale_all_turkic_i, }; /* Unicode-specific. */ /* Checks whether a Unicode character has a property. */ Py_LOCAL_INLINE(BOOL) unicode_has_property(RE_CODE property, Py_UCS4 ch) { RE_UINT32 prop; RE_UINT32 value; RE_UINT32 v; prop = property >> 16; if (prop >= sizeof(re_get_property) / sizeof(re_get_property[0])) return FALSE; value = property & 0xFFFF; v = re_get_property[prop](ch); if (v == value) return TRUE; if (prop == RE_PROP_GC) { switch (value) { case RE_PROP_ASSIGNED: return v != RE_PROP_CN; case RE_PROP_C: return (RE_PROP_C_MASK & (1 << v)) != 0; case RE_PROP_CASEDLETTER: return v == RE_PROP_LU || v == RE_PROP_LL || v == RE_PROP_LT; case RE_PROP_L: return (RE_PROP_L_MASK & (1 << v)) != 0; case RE_PROP_M: return (RE_PROP_M_MASK & (1 << v)) != 0; case RE_PROP_N: return (RE_PROP_N_MASK & (1 << v)) != 0; case RE_PROP_P: return (RE_PROP_P_MASK & (1 << v)) != 0; case RE_PROP_S: return (RE_PROP_S_MASK & (1 << v)) != 0; case RE_PROP_Z: return (RE_PROP_Z_MASK & (1 << v)) != 0; } } return FALSE; } /* Wrapper for calling 'unicode_has_property' via a pointer. */ static BOOL unicode_has_property_wrapper(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch) { return unicode_has_property(property, ch); } /* Checks whether there's a word character to the left. */ Py_LOCAL_INLINE(BOOL) unicode_word_left(RE_State* state, Py_ssize_t text_pos) { return text_pos > 0 && unicode_has_property(RE_PROP_WORD, state->char_at(state->text, text_pos - 1)); } /* Checks whether there's a word character to the right. */ Py_LOCAL_INLINE(BOOL) unicode_word_right(RE_State* state, Py_ssize_t text_pos) { return text_pos < state->text_length && unicode_has_property(RE_PROP_WORD, state->char_at(state->text, text_pos)); } /* Checks whether a position is on a word boundary. */ static BOOL unicode_at_boundary(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = unicode_word_left(state, text_pos); right = unicode_word_right(state, text_pos); return left != right; } /* Checks whether a position is at the start of a word. */ static BOOL unicode_at_word_start(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = unicode_word_left(state, text_pos); right = unicode_word_right(state, text_pos); return !left && right; } /* Checks whether a position is at the end of a word. */ static BOOL unicode_at_word_end(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = unicode_word_left(state, text_pos); right = unicode_word_right(state, text_pos); return left && !right; } /* Checks whether a character is a Unicode vowel. * * Only a limited number are treated as vowels. */ Py_LOCAL_INLINE(BOOL) is_unicode_vowel(Py_UCS4 ch) { switch (Py_UNICODE_TOLOWER((Py_UNICODE)ch)) { case 'a': case 0xE0: case 0xE1: case 0xE2: case 'e': case 0xE8: case 0xE9: case 0xEA: case 'i': case 0xEC: case 0xED: case 0xEE: case 'o': case 0xF2: case 0xF3: case 0xF4: case 'u': case 0xF9: case 0xFA: case 0xFB: return TRUE; default: return FALSE; } } /* Checks whether a position is on a default word boundary. * * The rules are defined here: * http://www.unicode.org/reports/tr29/#Default_Word_Boundaries */ static BOOL unicode_at_default_boundary(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); int prop; int prop_m1; Py_ssize_t pos_m1; Py_ssize_t pos_m2; int prop_m2; Py_ssize_t pos_p0; int prop_p0; Py_ssize_t pos_p1; int prop_p1; /* Break at the start and end of the text. */ /* WB1 */ if (text_pos <= 0) return TRUE; /* WB2 */ if (text_pos >= state->text_length) return TRUE; char_at = state->char_at; prop = (int)re_get_word_break(char_at(state->text, text_pos)); prop_m1 = (int)re_get_word_break(char_at(state->text, text_pos - 1)); /* Don't break within CRLF. */ /* WB3 */ if (prop_m1 == RE_BREAK_CR && prop == RE_BREAK_LF) return FALSE; /* Otherwise break before and after Newlines (including CR and LF). */ /* WB3a and WB3b */ if (prop_m1 == RE_BREAK_NEWLINE || prop_m1 == RE_BREAK_CR || prop_m1 == RE_BREAK_LF || prop == RE_BREAK_NEWLINE || prop == RE_BREAK_CR || prop == RE_BREAK_LF) return TRUE; /* WB4 */ /* Get the property of the previous character, ignoring Format and Extend * characters. */ pos_m1 = text_pos - 1; prop_m1 = RE_BREAK_OTHER; while (pos_m1 >= 0) { prop_m1 = (int)re_get_word_break(char_at(state->text, pos_m1)); if (prop_m1 != RE_BREAK_EXTEND && prop_m1 != RE_BREAK_FORMAT) break; --pos_m1; } /* Get the property of the preceding character, ignoring Format and Extend * characters. */ pos_m2 = pos_m1 - 1; prop_m2 = RE_BREAK_OTHER; while (pos_m2 >= 0) { prop_m2 = (int)re_get_word_break(char_at(state->text, pos_m2)); if (prop_m2 != RE_BREAK_EXTEND && prop_m2 != RE_BREAK_FORMAT) break; --pos_m2; } /* Get the property of the next character, ignoring Format and Extend * characters. */ pos_p0 = text_pos; prop_p0 = prop; while (pos_p0 < state->text_length) { prop_p0 = (int)re_get_word_break(char_at(state->text, pos_p0)); if (prop_p0 != RE_BREAK_EXTEND && prop_p0 != RE_BREAK_FORMAT) break; ++pos_p0; } /* Get the property of the following character, ignoring Format and Extend * characters. */ pos_p1 = pos_p0 + 1; prop_p1 = RE_BREAK_OTHER; while (pos_p1 < state->text_length) { prop_p1 = (int)re_get_word_break(char_at(state->text, pos_p1)); if (prop_p1 != RE_BREAK_EXTEND && prop_p1 != RE_BREAK_FORMAT) break; ++pos_p1; } /* Don't break between most letters. */ /* WB5 */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_HEBREWLETTER) && (prop_p0 == RE_BREAK_ALETTER || prop_p0 == RE_BREAK_HEBREWLETTER)) return FALSE; /* Break between apostrophe and vowels (French, Italian). */ /* WB5a */ if (pos_m1 >= 0 && char_at(state->text, pos_m1) == '\'' && is_unicode_vowel(char_at(state->text, text_pos))) return TRUE; /* Don't break letters across certain punctuation. */ /* WB6 */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_HEBREWLETTER) && (prop_p0 == RE_BREAK_MIDLETTER || prop_p0 == RE_BREAK_MIDNUMLET || prop_p0 == RE_BREAK_SINGLEQUOTE) && (prop_p1 == RE_BREAK_ALETTER || prop_p1 == RE_BREAK_HEBREWLETTER)) return FALSE; /* WB7 */ if ((prop_m2 == RE_BREAK_ALETTER || prop_m2 == RE_BREAK_HEBREWLETTER) && (prop_m1 == RE_BREAK_MIDLETTER || prop_m1 == RE_BREAK_MIDNUMLET || prop_m1 == RE_BREAK_SINGLEQUOTE) && (prop_p0 == RE_BREAK_ALETTER || prop_p0 == RE_BREAK_HEBREWLETTER)) return FALSE; /* WB7a */ if (prop_m1 == RE_BREAK_HEBREWLETTER && prop_p0 == RE_BREAK_SINGLEQUOTE) return FALSE; /* WB7b */ if (prop_m1 == RE_BREAK_HEBREWLETTER && prop_p0 == RE_BREAK_DOUBLEQUOTE && prop_p1 == RE_BREAK_HEBREWLETTER) return FALSE; /* WB7c */ if (prop_m2 == RE_BREAK_HEBREWLETTER && prop_m1 == RE_BREAK_DOUBLEQUOTE && prop_p0 == RE_BREAK_HEBREWLETTER) return FALSE; /* Don't break within sequences of digits, or digits adjacent to letters * ("3a", or "A3"). */ /* WB8 */ if (prop_m1 == RE_BREAK_NUMERIC && prop_p0 == RE_BREAK_NUMERIC) return FALSE; /* WB9 */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_HEBREWLETTER) && prop_p0 == RE_BREAK_NUMERIC) return FALSE; /* WB10 */ if (prop_m1 == RE_BREAK_NUMERIC && (prop_p0 == RE_BREAK_ALETTER || prop_p0 == RE_BREAK_HEBREWLETTER)) return FALSE; /* Don't break within sequences, such as "3.2" or "3,456.789". */ /* WB11 */ if (prop_m2 == RE_BREAK_NUMERIC && (prop_m1 == RE_BREAK_MIDNUM || prop_m1 == RE_BREAK_MIDNUMLET || prop_m1 == RE_BREAK_SINGLEQUOTE) && prop_p0 == RE_BREAK_NUMERIC) return FALSE; /* WB12 */ if (prop_m1 == RE_BREAK_NUMERIC && (prop_p0 == RE_BREAK_MIDNUM || prop_p0 == RE_BREAK_MIDNUMLET || prop_p0 == RE_BREAK_SINGLEQUOTE) && prop_p1 == RE_BREAK_NUMERIC) return FALSE; /* Don't break between Katakana. */ /* WB13 */ if (prop_m1 == RE_BREAK_KATAKANA && prop_p0 == RE_BREAK_KATAKANA) return FALSE; /* Don't break from extenders. */ /* WB13a */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_HEBREWLETTER || prop_m1 == RE_BREAK_NUMERIC || prop_m1 == RE_BREAK_KATAKANA || prop_m1 == RE_BREAK_EXTENDNUMLET) && prop_p0 == RE_BREAK_EXTENDNUMLET) return FALSE; /* WB13b */ if (prop_m1 == RE_BREAK_EXTENDNUMLET && (prop_p0 == RE_BREAK_ALETTER || prop_p0 == RE_BREAK_HEBREWLETTER || prop_p0 == RE_BREAK_NUMERIC || prop_p0 == RE_BREAK_KATAKANA)) return FALSE; /* Don't break between regional indicator symbols. */ /* WB13c */ if (prop_m1 == RE_BREAK_REGIONALINDICATOR && prop_p0 == RE_BREAK_REGIONALINDICATOR) return FALSE; /* Otherwise, break everywhere (including around ideographs). */ /* WB14 */ return TRUE; } /* Checks whether a position is at the start/end of a word. */ Py_LOCAL_INLINE(BOOL) unicode_at_default_word_start_or_end(RE_State* state, Py_ssize_t text_pos, BOOL at_start) { Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); BOOL before; BOOL after; Py_UCS4 char_0; Py_UCS4 char_m1; int prop; int prop_m1; Py_ssize_t pos_m1; Py_ssize_t pos_p1; int prop_p1; Py_UCS4 char_p1; Py_ssize_t pos_m2; int prop_m2; Py_UCS4 char_m2; char_at = state->char_at; /* At the start or end of the text. */ if (text_pos <= 0 || text_pos >= state->text_length) { before = unicode_word_left(state, text_pos); after = unicode_word_right(state, text_pos); return before != at_start && after == at_start; } char_0 = char_at(state->text, text_pos); char_m1 = char_at(state->text, text_pos - 1); prop = (int)re_get_word_break(char_0); prop_m1 = (int)re_get_word_break(char_m1); /* No break within CRLF. */ if (prop_m1 == RE_BREAK_CR && prop == RE_BREAK_LF) return FALSE; /* Break before and after Newlines (including CR and LF). */ if (prop_m1 == RE_BREAK_NEWLINE || prop_m1 == RE_BREAK_CR || prop_m1 == RE_BREAK_LF || prop == RE_BREAK_NEWLINE || prop == RE_BREAK_CR || prop == RE_BREAK_LF) { before = unicode_has_property(RE_PROP_WORD, char_m1); after = unicode_has_property(RE_PROP_WORD, char_0); return before != at_start && after == at_start; } /* No break just before Format or Extend characters. */ if (prop == RE_BREAK_EXTEND || prop == RE_BREAK_FORMAT) return FALSE; /* Get the property of the previous character. */ pos_m1 = text_pos - 1; prop_m1 = RE_BREAK_OTHER; while (pos_m1 >= 0) { char_m1 = char_at(state->text, pos_m1); prop_m1 = (int)re_get_word_break(char_m1); if (prop_m1 != RE_BREAK_EXTEND && prop_m1 != RE_BREAK_FORMAT) break; --pos_m1; } /* No break between most letters. */ if (prop_m1 == RE_BREAK_ALETTER && prop == RE_BREAK_ALETTER) return FALSE; if (pos_m1 >= 0 && char_m1 == '\'' && is_unicode_vowel(char_0)) return TRUE; pos_p1 = text_pos + 1; prop_p1 = RE_BREAK_OTHER; while (pos_p1 < state->text_length) { char_p1 = char_at(state->text, pos_p1); prop_p1 = (int)re_get_word_break(char_p1); if (prop_p1 != RE_BREAK_EXTEND && prop_p1 != RE_BREAK_FORMAT) break; ++pos_p1; } /* No break letters across certain punctuation. */ if (prop_m1 == RE_BREAK_ALETTER && (prop == RE_BREAK_MIDLETTER || prop == RE_BREAK_MIDNUMLET) && prop_p1 == RE_BREAK_ALETTER) return FALSE; pos_m2 = pos_m1 - 1; prop_m2 = RE_BREAK_OTHER; while (pos_m2 >= 0) { char_m2 = char_at(state->text, pos_m2); prop_m2 = (int)re_get_word_break(char_m2); if (prop_m2 != RE_BREAK_EXTEND && prop_m1 != RE_BREAK_FORMAT) break; --pos_m2; } if (prop_m2 == RE_BREAK_ALETTER && (prop_m1 == RE_BREAK_MIDLETTER || prop_m1 == RE_BREAK_MIDNUMLET) && prop == RE_BREAK_ALETTER) return FALSE; /* No break within sequences of digits, or digits adjacent to letters * ("3a", or "A3"). */ if ((prop_m1 == RE_BREAK_NUMERIC || prop_m1 == RE_BREAK_ALETTER) && prop == RE_BREAK_NUMERIC) return FALSE; if (prop_m1 == RE_BREAK_NUMERIC && prop == RE_BREAK_ALETTER) return FALSE; /* No break within sequences, such as "3.2" or "3,456.789". */ if (prop_m2 == RE_BREAK_NUMERIC && (prop_m1 == RE_BREAK_MIDNUM || prop_m1 == RE_BREAK_MIDNUMLET) && prop == RE_BREAK_NUMERIC) return FALSE; if (prop_m1 == RE_BREAK_NUMERIC && (prop == RE_BREAK_MIDNUM || prop == RE_BREAK_MIDNUMLET) && prop_p1 == RE_BREAK_NUMERIC) return FALSE; /* No break between Katakana. */ if (prop_m1 == RE_BREAK_KATAKANA && prop == RE_BREAK_KATAKANA) return FALSE; /* No break from extenders. */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_NUMERIC || prop_m1 == RE_BREAK_KATAKANA || prop_m1 == RE_BREAK_EXTENDNUMLET) && prop == RE_BREAK_EXTENDNUMLET) return FALSE; if (prop_m1 == RE_BREAK_EXTENDNUMLET && (prop == RE_BREAK_ALETTER || prop == RE_BREAK_NUMERIC || prop == RE_BREAK_KATAKANA)) return FALSE; /* Otherwise, break everywhere (including around ideographs). */ before = unicode_has_property(RE_PROP_WORD, char_m1); after = unicode_has_property(RE_PROP_WORD, char_0); return before != at_start && after == at_start; } /* Checks whether a position is at the start of a word. */ static BOOL unicode_at_default_word_start(RE_State* state, Py_ssize_t text_pos) { return unicode_at_default_word_start_or_end(state, text_pos, TRUE); } /* Checks whether a position is at the end of a word. */ static BOOL unicode_at_default_word_end(RE_State* state, Py_ssize_t text_pos) { return unicode_at_default_word_start_or_end(state, text_pos, FALSE); } /* Checks whether a position is on a grapheme boundary. * * The rules are defined here: * http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries */ static BOOL unicode_at_grapheme_boundary(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); int prop; int prop_m1; /* Break at the start and end of the text. */ /* GB1 */ if (text_pos <= 0) return TRUE; /* GB2 */ if (text_pos >= state->text_length) return TRUE; char_at = state->char_at; prop = (int)re_get_grapheme_cluster_break(char_at(state->text, text_pos)); prop_m1 = (int)re_get_grapheme_cluster_break(char_at(state->text, text_pos - 1)); /* Don't break within CRLF. */ /* GB3 */ if (prop_m1 == RE_GBREAK_CR && prop == RE_GBREAK_LF) return FALSE; /* Otherwise break before and after controls (including CR and LF). */ /* GB4 and GB5 */ if (prop_m1 == RE_GBREAK_CONTROL || prop_m1 == RE_GBREAK_CR || prop_m1 == RE_GBREAK_LF || prop == RE_GBREAK_CONTROL || prop == RE_GBREAK_CR || prop == RE_GBREAK_LF) return TRUE; /* Don't break Hangul syllable sequences. */ /* GB6 */ if (prop_m1 == RE_GBREAK_L && (prop == RE_GBREAK_L || prop == RE_GBREAK_V || prop == RE_GBREAK_LV || prop == RE_GBREAK_LVT)) return FALSE; /* GB7 */ if ((prop_m1 == RE_GBREAK_LV || prop_m1 == RE_GBREAK_V) && (prop == RE_GBREAK_V || prop == RE_GBREAK_T)) return FALSE; /* GB8 */ if ((prop_m1 == RE_GBREAK_LVT || prop_m1 == RE_GBREAK_T) && (prop == RE_GBREAK_T)) return FALSE; /* Don't break between regional indicator symbols. */ /* GB8a */ if (prop_m1 == RE_GBREAK_REGIONALINDICATOR && prop == RE_GBREAK_REGIONALINDICATOR) return FALSE; /* Don't break just before Extend characters. */ /* GB9 */ if (prop == RE_GBREAK_EXTEND) return FALSE; /* Don't break before SpacingMarks, or after Prepend characters. */ /* GB9a */ if (prop == RE_GBREAK_SPACINGMARK) return FALSE; /* GB9b */ if (prop_m1 == RE_GBREAK_PREPEND) return FALSE; /* Otherwise, break everywhere. */ /* GB10 */ return TRUE; } /* Checks whether a character is a line separator. */ static BOOL unicode_is_line_sep(Py_UCS4 ch) { return (0x0A <= ch && ch <= 0x0D) || ch == 0x85 || ch == 0x2028 || ch == 0x2029; } /* Checks whether a position is at the start of a line. */ static BOOL unicode_at_line_start(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 ch; if (text_pos <= 0) return TRUE; ch = state->char_at(state->text, text_pos - 1); if (ch == 0x0D) { if (text_pos >= state->text_length) return TRUE; /* No line break inside CRLF. */ return state->char_at(state->text, text_pos) != 0x0A; } return (0x0A <= ch && ch <= 0x0D) || ch == 0x85 || ch == 0x2028 || ch == 0x2029; } /* Checks whether a position is at the end of a line. */ static BOOL unicode_at_line_end(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 ch; if (text_pos >= state->text_length) return TRUE; ch = state->char_at(state->text, text_pos); if (ch == 0x0A) { if (text_pos <= 0) return TRUE; /* No line break inside CRLF. */ return state->char_at(state->text, text_pos - 1) != 0x0D; } return (0x0A <= ch && ch <= 0x0D) || ch == 0x85 || ch == 0x2028 || ch == 0x2029; } /* Checks whether a character could be Turkic (variants of I/i). */ static BOOL unicode_possible_turkic(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch == 'I' || ch == 'i' || ch == 0x0130 || ch == 0x0131; } /* Gets all the cases of a character. */ static int unicode_all_cases(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* codepoints) { return re_get_all_cases(ch, codepoints); } /* Returns a character with its case folded, unless it could be Turkic * (variants of I/i). */ static Py_UCS4 unicode_simple_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch) { /* Is it a possible Turkic character? If so, pass it through unchanged. */ if (ch == 'I' || ch == 'i' || ch == 0x0130 || ch == 0x0131) return ch; return (Py_UCS4)re_get_simple_case_folding(ch); } /* Returns a character with its case folded, unless it could be Turkic * (variants of I/i). */ static int unicode_full_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded) { /* Is it a possible Turkic character? If so, pass it through unchanged. */ if (ch == 'I' || ch == 'i' || ch == 0x0130 || ch == 0x0131) { folded[0] = ch; return 1; } return re_get_full_case_folding(ch, folded); } /* Gets all the case variants of Turkic 'I'. */ static int unicode_all_turkic_i(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* cases) { int count; count = 0; cases[count++] = ch; if (ch != 'I') cases[count++] = 'I'; if (ch != 'i') cases[count++] = 'i'; if (ch != 0x130) cases[count++] = 0x130; if (ch != 0x131) cases[count++] = 0x131; return count; } /* The handlers for Unicode characters. */ static RE_EncodingTable unicode_encoding = { unicode_has_property_wrapper, unicode_at_boundary, unicode_at_word_start, unicode_at_word_end, unicode_at_default_boundary, unicode_at_default_word_start, unicode_at_default_word_end, unicode_at_grapheme_boundary, unicode_is_line_sep, unicode_at_line_start, unicode_at_line_end, unicode_possible_turkic, unicode_all_cases, unicode_simple_case_fold, unicode_full_case_fold, unicode_all_turkic_i, }; Py_LOCAL_INLINE(PyObject*) get_object(char* module_name, char* object_name); /* Sets the error message. */ Py_LOCAL_INLINE(void) set_error(int status, PyObject* object) { TRACE(("<>\n")) if (!error_exception) error_exception = get_object("_" RE_MODULE "_core", "error"); switch (status) { case RE_ERROR_BACKTRACKING: PyErr_SetString(error_exception, "too much backtracking"); break; case RE_ERROR_CONCURRENT: PyErr_SetString(PyExc_ValueError, "concurrent not int or None"); break; case RE_ERROR_GROUP_INDEX_TYPE: if (object) PyErr_Format(PyExc_TypeError, "group indices must be integers or strings, not %.200s", object->ob_type->tp_name); else PyErr_Format(PyExc_TypeError, "group indices must be integers or strings"); break; case RE_ERROR_ILLEGAL: PyErr_SetString(PyExc_RuntimeError, "invalid RE code"); break; case RE_ERROR_INDEX: PyErr_SetString(PyExc_TypeError, "string indices must be integers"); break; case RE_ERROR_INTERRUPTED: /* An exception has already been raised, so let it fly. */ break; case RE_ERROR_INVALID_GROUP_REF: PyErr_SetString(error_exception, "invalid group reference"); break; case RE_ERROR_MEMORY: PyErr_NoMemory(); break; case RE_ERROR_NOT_STRING: PyErr_Format(PyExc_TypeError, "expected string instance, %.200s found", object->ob_type->tp_name); break; case RE_ERROR_NOT_UNICODE: PyErr_Format(PyExc_TypeError, "expected unicode instance, not %.200s", object->ob_type->tp_name); break; case RE_ERROR_NO_SUCH_GROUP: PyErr_SetString(PyExc_IndexError, "no such group"); break; case RE_ERROR_REPLACEMENT: PyErr_SetString(error_exception, "invalid replacement"); break; default: /* Other error codes indicate compiler/engine bugs. */ PyErr_SetString(PyExc_RuntimeError, "internal error in regular expression engine"); break; } } /* Allocates memory. * * Sets the Python error handler and returns NULL if the allocation fails. */ Py_LOCAL_INLINE(void*) re_alloc(size_t size) { void* new_ptr; new_ptr = PyMem_Malloc(size); if (!new_ptr) set_error(RE_ERROR_MEMORY, NULL); return new_ptr; } /* Reallocates memory. * * Sets the Python error handler and returns NULL if the reallocation fails. */ Py_LOCAL_INLINE(void*) re_realloc(void* ptr, size_t size) { void* new_ptr; new_ptr = PyMem_Realloc(ptr, size); if (!new_ptr) set_error(RE_ERROR_MEMORY, NULL); return new_ptr; } /* Deallocates memory. */ Py_LOCAL_INLINE(void) re_dealloc(void* ptr) { PyMem_Free(ptr); } /* Releases the GIL if multithreading is enabled. */ Py_LOCAL_INLINE(void) release_GIL(RE_SafeState* safe_state) { if (safe_state->re_state->is_multithreaded) safe_state->thread_state = PyEval_SaveThread(); } /* Acquires the GIL if multithreading is enabled. */ Py_LOCAL_INLINE(void) acquire_GIL(RE_SafeState* safe_state) { if (safe_state->re_state->is_multithreaded) PyEval_RestoreThread(safe_state->thread_state); } /* Allocates memory, holding the GIL during the allocation. * * Sets the Python error handler and returns NULL if the allocation fails. */ Py_LOCAL_INLINE(void*) safe_alloc(RE_SafeState* safe_state, size_t size) { void* new_ptr; acquire_GIL(safe_state); new_ptr = re_alloc(size); release_GIL(safe_state); return new_ptr; } /* Reallocates memory, holding the GIL during the reallocation. * * Sets the Python error handler and returns NULL if the reallocation fails. */ Py_LOCAL_INLINE(void*) safe_realloc(RE_SafeState* safe_state, void* ptr, size_t size) { void* new_ptr; acquire_GIL(safe_state); new_ptr = re_realloc(ptr, size); release_GIL(safe_state); return new_ptr; } /* Deallocates memory, holding the GIL during the deallocation. */ Py_LOCAL_INLINE(void) safe_dealloc(RE_SafeState* safe_state, void* ptr) { acquire_GIL(safe_state); re_dealloc(ptr); release_GIL(safe_state); } /* Checks for KeyboardInterrupt, holding the GIL during the check. */ Py_LOCAL_INLINE(BOOL) safe_check_signals(RE_SafeState* safe_state) { BOOL result; acquire_GIL(safe_state); result = (BOOL)PyErr_CheckSignals(); release_GIL(safe_state); return result; } /* Checks whether a character is in a range. */ Py_LOCAL_INLINE(BOOL) in_range(Py_UCS4 lower, Py_UCS4 upper, Py_UCS4 ch) { return lower <= ch && ch <= upper; } /* Checks whether a character is in a range, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_range_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 lower, Py_UCS4 upper, Py_UCS4 ch) { int count; Py_UCS4 cases[RE_MAX_CASES]; int i; count = encoding->all_cases(locale_info, ch, cases); for (i = 0; i < count; i++) { if (in_range(lower, upper, cases[i])) return TRUE; } return FALSE; } /* Checks whether 2 characters are the same. */ Py_LOCAL_INLINE(BOOL) same_char(Py_UCS4 ch1, Py_UCS4 ch2) { return ch1 == ch2; } /* Wrapper for calling 'same_char' via a pointer. */ static BOOL same_char_wrapper(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2) { return same_char(ch1, ch2); } /* Checks whether 2 characters are the same, ignoring case. */ Py_LOCAL_INLINE(BOOL) same_char_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2) { int count; Py_UCS4 cases[RE_MAX_CASES]; int i; if (ch1 == ch2) return TRUE; count = encoding->all_cases(locale_info, ch1, cases); for (i = 1; i < count; i++) { if (cases[i] == ch2) return TRUE; } return FALSE; } /* Wrapper for calling 'same_char' via a pointer. */ static BOOL same_char_ign_wrapper(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2) { return same_char_ign(encoding, locale_info, ch1, ch2); } /* Checks whether a character is anything except a newline. */ Py_LOCAL_INLINE(BOOL) matches_ANY(RE_EncodingTable* encoding, RE_Node* node, Py_UCS4 ch) { return ch != '\n'; } /* Checks whether a character is anything except a line separator. */ Py_LOCAL_INLINE(BOOL) matches_ANY_U(RE_EncodingTable* encoding, RE_Node* node, Py_UCS4 ch) { return !encoding->is_line_sep(ch); } /* Checks whether 2 characters are the same. */ Py_LOCAL_INLINE(BOOL) matches_CHARACTER(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return same_char(node->values[0], ch); } /* Checks whether 2 characters are the same, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_CHARACTER_IGN(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return same_char_ign(encoding, locale_info, node->values[0], ch); } /* Checks whether a character has a property. */ Py_LOCAL_INLINE(BOOL) matches_PROPERTY(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return encoding->has_property(locale_info, node->values[0], ch); } /* Checks whether a character has a property, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_PROPERTY_IGN(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_UINT32 property; RE_UINT32 prop; property = node->values[0]; prop = property >> 16; /* We need to do special handling of case-sensitive properties according to * the 'encoding'. */ if (encoding == &unicode_encoding) { /* We are working with Unicode. */ if (property == RE_PROP_GC_LU || property == RE_PROP_GC_LL || property == RE_PROP_GC_LT) { RE_UINT32 value; value = re_get_general_category(ch); return value == RE_PROP_LU || value == RE_PROP_LL || value == RE_PROP_LT; } else if (prop == RE_PROP_UPPERCASE || prop == RE_PROP_LOWERCASE) return (BOOL)re_get_cased(ch); /* The property is case-insensitive. */ return unicode_has_property(property, ch); } else if (encoding == &ascii_encoding) { /* We are working with ASCII. */ if (property == RE_PROP_GC_LU || property == RE_PROP_GC_LL || property == RE_PROP_GC_LT) { RE_UINT32 value; value = re_get_general_category(ch); return value == RE_PROP_LU || value == RE_PROP_LL || value == RE_PROP_LT; } else if (prop == RE_PROP_UPPERCASE || prop == RE_PROP_LOWERCASE) return (BOOL)re_get_cased(ch); /* The property is case-insensitive. */ return ascii_has_property(property, ch); } else { /* We are working with Locale. */ if (property == RE_PROP_GC_LU || property == RE_PROP_GC_LL || property == RE_PROP_GC_LT) return locale_isupper(locale_info, ch) || locale_islower(locale_info, ch); else if (prop == RE_PROP_UPPERCASE || prop == RE_PROP_LOWERCASE) return locale_isupper(locale_info, ch) || locale_islower(locale_info, ch); /* The property is case-insensitive. */ return locale_has_property(locale_info, property, ch); } } /* Checks whether a character is in a range. */ Py_LOCAL_INLINE(BOOL) matches_RANGE(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return in_range(node->values[0], node->values[1], ch); } /* Checks whether a character is in a range, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_RANGE_IGN(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return in_range_ign(encoding, locale_info, node->values[0], node->values[1], ch); } Py_LOCAL_INLINE(BOOL) in_set_diff(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch); Py_LOCAL_INLINE(BOOL) in_set_inter(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch); Py_LOCAL_INLINE(BOOL) in_set_sym_diff(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch); Py_LOCAL_INLINE(BOOL) in_set_union(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch); /* Checks whether a character matches a set member. */ Py_LOCAL_INLINE(BOOL) matches_member(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* member, Py_UCS4 ch) { switch (member->op) { case RE_OP_CHARACTER: /* values are: char_code */ TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->values[0])) return ch == member->values[0]; case RE_OP_PROPERTY: /* values are: property */ TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->values[0])) return encoding->has_property(locale_info, member->values[0], ch); case RE_OP_RANGE: /* values are: lower, upper */ TRACE(("%s %d %d %d\n", re_op_text[member->op], member->match, member->values[0], member->values[1])) return in_range(member->values[0], member->values[1], ch); case RE_OP_SET_DIFF: TRACE(("%s\n", re_op_text[member->op])) return in_set_diff(encoding, locale_info, member, ch); case RE_OP_SET_INTER: TRACE(("%s\n", re_op_text[member->op])) return in_set_inter(encoding, locale_info, member, ch); case RE_OP_SET_SYM_DIFF: TRACE(("%s\n", re_op_text[member->op])) return in_set_sym_diff(encoding, locale_info, member, ch); case RE_OP_SET_UNION: TRACE(("%s\n", re_op_text[member->op])) return in_set_union(encoding, locale_info, member, ch); case RE_OP_STRING: { /* values are: char_code, char_code, ... */ size_t i; TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->value_count)) for (i = 0; i < member->value_count; i++) { if (ch == member->values[i]) return TRUE; } return FALSE; } default: return FALSE; } } /* Checks whether a character matches a set member, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_member_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* member, int case_count, Py_UCS4* cases) { int i; for (i = 0; i < case_count; i++) { switch (member->op) { case RE_OP_CHARACTER: /* values are: char_code */ TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->values[0])) if (cases[i] == member->values[0]) return TRUE; break; case RE_OP_PROPERTY: /* values are: property */ TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->values[0])) if (encoding->has_property(locale_info, member->values[0], cases[i])) return TRUE; break; case RE_OP_RANGE: /* values are: lower, upper */ TRACE(("%s %d %d %d\n", re_op_text[member->op], member->match, member->values[0], member->values[1])) if (in_range(member->values[0], member->values[1], cases[i])) return TRUE; break; case RE_OP_SET_DIFF: TRACE(("%s\n", re_op_text[member->op])) if (in_set_diff(encoding, locale_info, member, cases[i])) return TRUE; break; case RE_OP_SET_INTER: TRACE(("%s\n", re_op_text[member->op])) if (in_set_inter(encoding, locale_info, member, cases[i])) return TRUE; break; case RE_OP_SET_SYM_DIFF: TRACE(("%s\n", re_op_text[member->op])) if (in_set_sym_diff(encoding, locale_info, member, cases[i])) return TRUE; break; case RE_OP_SET_UNION: TRACE(("%s\n", re_op_text[member->op])) if (in_set_union(encoding, locale_info, member, cases[i])) return TRUE; break; case RE_OP_STRING: { size_t j; TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->value_count)) for (j = 0; j < member->value_count; j++) { if (cases[i] == member->values[j]) return TRUE; } break; } default: return TRUE; } } return FALSE; } /* Checks whether a character is in a set difference. */ Py_LOCAL_INLINE(BOOL) in_set_diff(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_Node* member; member = node->nonstring.next_2.node; if (matches_member(encoding, locale_info, member, ch) != member->match) return FALSE; member = member->next_1.node; while (member) { if (matches_member(encoding, locale_info, member, ch) == member->match) return FALSE; member = member->next_1.node; } return TRUE; } /* Checks whether a character is in a set difference, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_set_diff_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, int case_count, Py_UCS4* cases) { RE_Node* member; member = node->nonstring.next_2.node; if (matches_member_ign(encoding, locale_info, member, case_count, cases) != member->match) return FALSE; member = member->next_1.node; while (member) { if (matches_member_ign(encoding, locale_info, member, case_count, cases) == member->match) return FALSE; member = member->next_1.node; } return TRUE; } /* Checks whether a character is in a set intersection. */ Py_LOCAL_INLINE(BOOL) in_set_inter(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_Node* member; member = node->nonstring.next_2.node; while (member) { if (matches_member(encoding, locale_info, member, ch) != member->match) return FALSE; member = member->next_1.node; } return TRUE; } /* Checks whether a character is in a set intersection, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_set_inter_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, int case_count, Py_UCS4* cases) { RE_Node* member; member = node->nonstring.next_2.node; while (member) { if (matches_member_ign(encoding, locale_info, member, case_count, cases) != member->match) return FALSE; member = member->next_1.node; } return TRUE; } /* Checks whether a character is in a set symmetric difference. */ Py_LOCAL_INLINE(BOOL) in_set_sym_diff(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_Node* member; BOOL result; member = node->nonstring.next_2.node; result = FALSE; while (member) { if (matches_member(encoding, locale_info, member, ch) == member->match) result = !result; member = member->next_1.node; } return result; } /* Checks whether a character is in a set symmetric difference, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_set_sym_diff_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, int case_count, Py_UCS4* cases) { RE_Node* member; BOOL result; member = node->nonstring.next_2.node; result = FALSE; while (member) { if (matches_member_ign(encoding, locale_info, member, case_count, cases) == member->match) result = !result; member = member->next_1.node; } return result; } /* Checks whether a character is in a set union. */ Py_LOCAL_INLINE(BOOL) in_set_union(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_Node* member; member = node->nonstring.next_2.node; while (member) { if (matches_member(encoding, locale_info, member, ch) == member->match) return TRUE; member = member->next_1.node; } return FALSE; } /* Checks whether a character is in a set union, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_set_union_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, int case_count, Py_UCS4* cases) { RE_Node* member; member = node->nonstring.next_2.node; while (member) { if (matches_member_ign(encoding, locale_info, member, case_count, cases) == member->match) return TRUE; member = member->next_1.node; } return FALSE; } /* Checks whether a character is in a set. */ Py_LOCAL_INLINE(BOOL) matches_SET(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { switch (node->op) { case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_REV: return in_set_diff(encoding, locale_info, node, ch); case RE_OP_SET_INTER: case RE_OP_SET_INTER_REV: return in_set_inter(encoding, locale_info, node, ch); case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_REV: return in_set_sym_diff(encoding, locale_info, node, ch); case RE_OP_SET_UNION: case RE_OP_SET_UNION_REV: return in_set_union(encoding, locale_info, node, ch); } return FALSE; } /* Checks whether a character is in a set, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_SET_IGN(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { Py_UCS4 cases[RE_MAX_CASES]; int case_count; case_count = encoding->all_cases(locale_info, ch, cases); switch (node->op) { case RE_OP_SET_DIFF_IGN: case RE_OP_SET_DIFF_IGN_REV: return in_set_diff_ign(encoding, locale_info, node, case_count, cases); case RE_OP_SET_INTER_IGN: case RE_OP_SET_INTER_IGN_REV: return in_set_inter_ign(encoding, locale_info, node, case_count, cases); case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_SYM_DIFF_IGN_REV: return in_set_sym_diff_ign(encoding, locale_info, node, case_count, cases); case RE_OP_SET_UNION_IGN: case RE_OP_SET_UNION_IGN_REV: return in_set_union_ign(encoding, locale_info, node, case_count, cases); } return FALSE; } /* Resets a guard list. */ Py_LOCAL_INLINE(void) reset_guard_list(RE_GuardList* guard_list) { guard_list->count = 0; guard_list->last_text_pos = -1; } /* Clears the groups. */ Py_LOCAL_INLINE(void) clear_groups(RE_State* state) { size_t i; for (i = 0; i < state->pattern->true_group_count; i++) { RE_GroupData* group; group = &state->groups[i]; group->span.start = -1; group->span.end = -1; group->capture_count = 0; group->current_capture = -1; } } /* Initialises the state for a match. */ Py_LOCAL_INLINE(void) init_match(RE_State* state) { RE_AtomicBlock* current; size_t i; /* Reset the backtrack. */ state->current_backtrack_block = &state->backtrack_block; state->current_backtrack_block->count = 0; state->current_saved_groups = state->first_saved_groups; state->backtrack = NULL; state->search_anchor = state->text_pos; state->match_pos = state->text_pos; /* Reset the atomic stack. */ current = state->current_atomic_block; if (current) { while (current->previous) current = current->previous; state->current_atomic_block = current; state->current_atomic_block->count = 0; } /* Reset the guards for the repeats. */ for (i = 0; i < state->pattern->repeat_count; i++) { reset_guard_list(&state->repeats[i].body_guard_list); reset_guard_list(&state->repeats[i].tail_guard_list); } /* Reset the guards for the fuzzy sections. */ for (i = 0; i < state->pattern->fuzzy_count; i++) { reset_guard_list(&state->fuzzy_guards[i].body_guard_list); reset_guard_list(&state->fuzzy_guards[i].tail_guard_list); } /* Clear the groups. */ clear_groups(state); /* Reset the guards for the group calls. */ for (i = 0; i < state->pattern->call_ref_info_count; i++) reset_guard_list(&state->group_call_guard_list[i]); /* Clear the counts and cost for matching. */ if (state->pattern->is_fuzzy) { memset(state->fuzzy_info.counts, 0, sizeof(state->fuzzy_info.counts)); memset(state->total_fuzzy_counts, 0, sizeof(state->total_fuzzy_counts)); } state->fuzzy_info.total_cost = 0; state->total_errors = 0; state->too_few_errors = FALSE; state->found_match = FALSE; state->capture_change = 0; state->iterations = 0; } /* Adds a new backtrack entry. */ Py_LOCAL_INLINE(BOOL) add_backtrack(RE_SafeState* safe_state, RE_UINT8 op) { RE_State* state; RE_BacktrackBlock* current; state = safe_state->re_state; current = state->current_backtrack_block; if (current->count >= current->capacity) { if (!current->next) { RE_BacktrackBlock* next; /* Is there too much backtracking? */ if (state->backtrack_allocated >= RE_MAX_BACKTRACK_ALLOC) return FALSE; next = (RE_BacktrackBlock*)safe_alloc(safe_state, sizeof(RE_BacktrackBlock)); if (!next) return FALSE; next->previous = current; next->next = NULL; next->capacity = RE_BACKTRACK_BLOCK_SIZE; current->next = next; state->backtrack_allocated += RE_BACKTRACK_BLOCK_SIZE; } current = current->next; current->count = 0; state->current_backtrack_block = current; } state->backtrack = ¤t->items[current->count++]; state->backtrack->op = op; return TRUE; } /* Gets the last backtrack entry. * * It'll never be called when there are _no_ entries. */ Py_LOCAL_INLINE(RE_BacktrackData*) last_backtrack(RE_State* state) { RE_BacktrackBlock* current; current = state->current_backtrack_block; state->backtrack = ¤t->items[current->count - 1]; return state->backtrack; } /* Discards the last backtrack entry. * * It'll never be called to discard the _only_ entry. */ Py_LOCAL_INLINE(void) discard_backtrack(RE_State* state) { RE_BacktrackBlock* current; current = state->current_backtrack_block; --current->count; if (current->count == 0 && current->previous) state->current_backtrack_block = current->previous; } /* Pushes a new empty entry onto the atomic stack. */ Py_LOCAL_INLINE(RE_AtomicData*) push_atomic(RE_SafeState* safe_state) { RE_State* state; RE_AtomicBlock* current; state = safe_state->re_state; current = state->current_atomic_block; if (!current || current->count >= current->capacity) { /* The current block is full. */ if (current && current->next) /* Advance to the next block. */ current = current->next; else { /* Add a new block. */ RE_AtomicBlock* next; next = (RE_AtomicBlock*)safe_alloc(safe_state, sizeof(RE_AtomicBlock)); if (!next) return NULL; next->previous = current; next->next = NULL; next->capacity = RE_ATOMIC_BLOCK_SIZE; if (current) /* The current block is the last one. */ current->next = next; else /* The new block is the first one. */ state->current_atomic_block = next; current = next; } current->count = 0; } return ¤t->items[current->count++]; } /* Pops the top entry from the atomic stack. */ Py_LOCAL_INLINE(RE_AtomicData*) pop_atomic(RE_SafeState* safe_state) { RE_State* state; RE_AtomicBlock* current; RE_AtomicData* atomic; state = safe_state->re_state; current = state->current_atomic_block; atomic = ¤t->items[--current->count]; if (current->count == 0 && current->previous) state->current_atomic_block = current->previous; return atomic; } /* Gets the top entry from the atomic stack. */ Py_LOCAL_INLINE(RE_AtomicData*) top_atomic(RE_SafeState* safe_state) { RE_State* state; RE_AtomicBlock* current; state = safe_state->re_state; current = state->current_atomic_block; return ¤t->items[current->count - 1]; } /* Copies a repeat guard list. */ Py_LOCAL_INLINE(BOOL) copy_guard_data(RE_SafeState* safe_state, RE_GuardList* dst, RE_GuardList* src) { if (dst->capacity < src->count) { RE_GuardSpan* new_spans; if (!safe_state) return FALSE; dst->capacity = src->count; new_spans = (RE_GuardSpan*)safe_realloc(safe_state, dst->spans, dst->capacity * sizeof(RE_GuardSpan)); if (!new_spans) return FALSE; dst->spans = new_spans; } dst->count = src->count; memmove(dst->spans, src->spans, dst->count * sizeof(RE_GuardSpan)); dst->last_text_pos = -1; return TRUE; } /* Copies a repeat. */ Py_LOCAL_INLINE(BOOL) copy_repeat_data(RE_SafeState* safe_state, RE_RepeatData* dst, RE_RepeatData* src) { if (!copy_guard_data(safe_state, &dst->body_guard_list, &src->body_guard_list) || !copy_guard_data(safe_state, &dst->tail_guard_list, &src->tail_guard_list)) { safe_dealloc(safe_state, dst->body_guard_list.spans); safe_dealloc(safe_state, dst->tail_guard_list.spans); return FALSE; } dst->count = src->count; dst->start = src->start; dst->capture_change = src->capture_change; return TRUE; } /* Pushes a return node onto the group call stack. */ Py_LOCAL_INLINE(BOOL) push_group_return(RE_SafeState* safe_state, RE_Node* return_node) { RE_State* state; PatternObject* pattern; RE_GroupCallFrame* frame; state = safe_state->re_state; pattern = state->pattern; if (state->current_group_call_frame && state->current_group_call_frame->next) /* Advance to the next allocated frame. */ frame = state->current_group_call_frame->next; else if (!state->current_group_call_frame && state->first_group_call_frame) /* Advance to the first allocated frame. */ frame = state->first_group_call_frame; else { /* Create a new frame. */ frame = (RE_GroupCallFrame*)safe_alloc(safe_state, sizeof(RE_GroupCallFrame)); if (!frame) return FALSE; frame->groups = (RE_GroupData*)safe_alloc(safe_state, pattern->true_group_count * sizeof(RE_GroupData)); frame->repeats = (RE_RepeatData*)safe_alloc(safe_state, pattern->repeat_count * sizeof(RE_RepeatData)); if (!frame->groups || !frame->repeats) { safe_dealloc(safe_state, frame->groups); safe_dealloc(safe_state, frame->repeats); safe_dealloc(safe_state, frame); return FALSE; } memset(frame->groups, 0, pattern->true_group_count * sizeof(RE_GroupData)); memset(frame->repeats, 0, pattern->repeat_count * sizeof(RE_RepeatData)); frame->previous = state->current_group_call_frame; frame->next = NULL; if (frame->previous) frame->previous->next = frame; else state->first_group_call_frame = frame; } frame->node = return_node; /* Push the groups and guards. */ if (return_node) { size_t g; size_t r; for (g = 0; g < pattern->true_group_count; g++) { frame->groups[g].span = state->groups[g].span; frame->groups[g].current_capture = state->groups[g].current_capture; } for (r = 0; r < pattern->repeat_count; r++) { if (!copy_repeat_data(safe_state, &frame->repeats[r], &state->repeats[r])) return FALSE; } } state->current_group_call_frame = frame; return TRUE; } /* Pops a return node from the group call stack. */ Py_LOCAL_INLINE(RE_Node*) pop_group_return(RE_State* state) { RE_GroupCallFrame* frame; frame = state->current_group_call_frame; /* Pop the groups and repeats. */ if (frame->node) { PatternObject* pattern; size_t g; size_t r; pattern = state->pattern; for (g = 0; g < pattern->true_group_count; g++) { state->groups[g].span = frame->groups[g].span; state->groups[g].current_capture = frame->groups[g].current_capture; } for (r = 0; r < pattern->repeat_count; r++) copy_repeat_data(NULL, &state->repeats[r], &frame->repeats[r]); } /* Withdraw to previous frame. */ state->current_group_call_frame = frame->previous; return frame->node; } /* Returns the return node from the top of the group call stack. */ Py_LOCAL_INLINE(RE_Node*) top_group_return(RE_State* state) { RE_GroupCallFrame* frame; frame = state->current_group_call_frame; return frame->node; } /* Checks whether a node matches only 1 character. */ Py_LOCAL_INLINE(BOOL) node_matches_one_character(RE_Node* node) { switch (node->op) { case RE_OP_ANY: case RE_OP_ANY_ALL: case RE_OP_ANY_ALL_REV: case RE_OP_ANY_REV: case RE_OP_ANY_U: case RE_OP_ANY_U_REV: case RE_OP_CHARACTER: case RE_OP_CHARACTER_IGN: case RE_OP_CHARACTER_IGN_REV: case RE_OP_CHARACTER_REV: case RE_OP_PROPERTY: case RE_OP_PROPERTY_IGN: case RE_OP_PROPERTY_IGN_REV: case RE_OP_PROPERTY_REV: case RE_OP_RANGE: case RE_OP_RANGE_IGN: case RE_OP_RANGE_IGN_REV: case RE_OP_RANGE_REV: case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_IGN: case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER: case RE_OP_SET_INTER_IGN: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION: case RE_OP_SET_UNION_IGN: case RE_OP_SET_UNION_IGN_REV: case RE_OP_SET_UNION_REV: return TRUE; default: return FALSE; } } /* Checks whether the node is a firstset. */ Py_LOCAL_INLINE(BOOL) is_firstset(RE_Node* node) { if (node->step != 0) return FALSE; return node_matches_one_character(node); } /* Locates the start node for testing ahead. */ Py_LOCAL_INLINE(RE_Node*) locate_test_start(RE_Node* node) { for (;;) { switch (node->op) { case RE_OP_BOUNDARY: switch (node->next_1.node->op) { case RE_OP_STRING: case RE_OP_STRING_FLD: case RE_OP_STRING_FLD_REV: case RE_OP_STRING_IGN: case RE_OP_STRING_IGN_REV: case RE_OP_STRING_REV: return node->next_1.node; default: return node; } case RE_OP_CALL_REF: case RE_OP_END_GROUP: case RE_OP_START_GROUP: node = node->next_1.node; break; case RE_OP_GREEDY_REPEAT: case RE_OP_LAZY_REPEAT: if (node->values[1] == 0) return node; node = node->next_1.node; break; case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: if (node->values[1] == 0) return node; return node->nonstring.next_2.node; case RE_OP_LOOKAROUND: node = node->nonstring.next_2.node; break; default: if (is_firstset(node)) { switch (node->next_1.node->op) { case RE_OP_END_OF_STRING: case RE_OP_START_OF_STRING: return node->next_1.node; } } return node; } } } /* Checks whether a character matches any of a set of case characters. */ Py_LOCAL_INLINE(BOOL) any_case(Py_UCS4 ch, int case_count, Py_UCS4* cases) { int i; for (i = 0; i < case_count; i++) { if (ch == cases[i]) return TRUE; } return FALSE; } /* Matches many ANYs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_ANY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; text = state->text; encoding = state->encoding; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_ANY(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_ANY(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_ANY(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many ANYs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_ANY_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; text = state->text; encoding = state->encoding; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_ANY(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_ANY(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_ANY(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many ANY_Us, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_ANY_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; text = state->text; encoding = state->encoding; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_ANY_U(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_ANY_U(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_ANY_U(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many ANY_Us, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_ANY_U_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; text = state->text; encoding = state->encoding; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_ANY_U(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_ANY_U(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_ANY_U(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many CHARACTERs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_CHARACTER(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; Py_UCS4 ch; text = state->text; match = node->match == match; ch = node->values[0]; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && (text_ptr[0] == ch) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && (text_ptr[0] == ch) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && (text_ptr[0] == ch) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many CHARACTERs, up to a limit, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_CHARACTER_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; Py_UCS4 cases[RE_MAX_CASES]; int case_count; text = state->text; match = node->match == match; case_count = state->encoding->all_cases(state->locale_info, node->values[0], cases); switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && any_case(text_ptr[0], case_count, cases) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && any_case(text_ptr[0], case_count, cases) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && any_case(text_ptr[0], case_count, cases) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many CHARACTERs, up to a limit, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_CHARACTER_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; Py_UCS4 cases[RE_MAX_CASES]; int case_count; text = state->text; match = node->match == match; case_count = state->encoding->all_cases(state->locale_info, node->values[0], cases); switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && any_case(text_ptr[-1], case_count, cases) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && any_case(text_ptr[-1], case_count, cases) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && any_case(text_ptr[-1], case_count, cases) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many CHARACTERs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_CHARACTER_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; Py_UCS4 ch; text = state->text; match = node->match == match; ch = node->values[0]; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && (text_ptr[-1] == ch) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && (text_ptr[-1] == ch) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && (text_ptr[-1] == ch) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many PROPERTYs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_PROPERTY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many PROPERTYs, up to a limit, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_PROPERTY_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many PROPERTYs, up to a limit, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_PROPERTY_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many PROPERTYs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_PROPERTY_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many RANGEs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_RANGE(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many RANGEs, up to a limit, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_RANGE_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many RANGEs, up to a limit, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_RANGE_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many RANGEs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_RANGE_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many SETs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_SET(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many SETs, up to a limit, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_SET_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many SETs, up to a limit, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_SET_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many SETs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_SET_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Counts a repeated character pattern. */ Py_LOCAL_INLINE(size_t) count_one(RE_State* state, RE_Node* node, Py_ssize_t text_pos, size_t max_count, BOOL* is_partial) { size_t count; *is_partial = FALSE; if (max_count < 1) return 0; switch (node->op) { case RE_OP_ANY: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_ANY(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_ANY_ALL: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_ANY_ALL_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_ANY_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_ANY_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_ANY_U: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_ANY_U(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_ANY_U_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_ANY_U_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_CHARACTER: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_CHARACTER(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_CHARACTER_IGN: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_CHARACTER_IGN(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_CHARACTER_IGN_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_CHARACTER_IGN_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_CHARACTER_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_CHARACTER_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_PROPERTY: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_PROPERTY(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_PROPERTY_IGN: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_PROPERTY_IGN(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_PROPERTY_IGN_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_PROPERTY_IGN_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_PROPERTY_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_PROPERTY_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_RANGE: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_RANGE(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_RANGE_IGN: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_RANGE_IGN(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_RANGE_IGN_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_RANGE_IGN_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_RANGE_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_RANGE_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_SET(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_SET_IGN(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_SET_IGN_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_SET_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; } return 0; } /* Performs a simple string search. */ Py_LOCAL_INLINE(Py_ssize_t) simple_string_search(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { Py_ssize_t length; RE_CODE* values; Py_UCS4 check_char; length = (Py_ssize_t)node->value_count; values = node->values; check_char = values[0]; *is_partial = FALSE; switch (state->charsize) { case 1: { Py_UCS1* text = (Py_UCS1*)state->text; Py_UCS1* text_ptr = text + text_pos; Py_UCS1* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (text_ptr[0] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } case 2: { Py_UCS2* text = (Py_UCS2*)state->text; Py_UCS2* text_ptr = text + text_pos; Py_UCS2* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (text_ptr[0] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } case 4: { Py_UCS4* text = (Py_UCS4*)state->text; Py_UCS4* text_ptr = text + text_pos; Py_UCS4* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (text_ptr[0] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } } /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_pos; } return -1; } /* Performs a simple string search, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) simple_string_search_ign(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { Py_ssize_t length; RE_CODE* values; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; Py_UCS4 cases[RE_MAX_CASES]; int case_count; length = (Py_ssize_t)node->value_count; values = node->values; encoding = state->encoding; locale_info = state->locale_info; case_count = encoding->all_cases(locale_info, values[0], cases); *is_partial = FALSE; switch (state->charsize) { case 1: { Py_UCS1* text = (Py_UCS1*)state->text; Py_UCS1* text_ptr = text + text_pos; Py_UCS1* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (any_case(text_ptr[0], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } case 2: { Py_UCS2* text = (Py_UCS2*)state->text; Py_UCS2* text_ptr = text + text_pos; Py_UCS2* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (any_case(text_ptr[0], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } case 4: { Py_UCS4* text = (Py_UCS4*)state->text; Py_UCS4* text_ptr = text + text_pos; Py_UCS4* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (any_case(text_ptr[0], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } } /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_pos; } return -1; } /* Performs a simple string search, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) simple_string_search_ign_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { Py_ssize_t length; RE_CODE* values; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; Py_UCS4 cases[RE_MAX_CASES]; int case_count; length = (Py_ssize_t)node->value_count; values = node->values; encoding = state->encoding; locale_info = state->locale_info; case_count = encoding->all_cases(locale_info, values[length - 1], cases); *is_partial = FALSE; switch (state->charsize) { case 1: { Py_UCS1* text = (Py_UCS1*)state->text; Py_UCS1* text_ptr = text + text_pos; Py_UCS1* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (any_case(text_ptr[-1], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } case 2: { Py_UCS2* text = (Py_UCS2*)state->text; Py_UCS2* text_ptr = text + text_pos; Py_UCS2* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (any_case(text_ptr[-1], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } case 4: { Py_UCS4* text = (Py_UCS4*)state->text; Py_UCS4* text_ptr = text + text_pos; Py_UCS4* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (any_case(text_ptr[-1], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } } /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_pos; } return -1; } /* Performs a simple string search, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) simple_string_search_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { Py_ssize_t length; RE_CODE* values; Py_UCS4 check_char; length = (Py_ssize_t)node->value_count; values = node->values; check_char = values[length - 1]; *is_partial = FALSE; switch (state->charsize) { case 1: { Py_UCS1* text = (Py_UCS1*)state->text; Py_UCS1* text_ptr = text + text_pos; Py_UCS1* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (text_ptr[-1] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } case 2: { Py_UCS2* text = (Py_UCS2*)state->text; Py_UCS2* text_ptr = text + text_pos; Py_UCS2* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (text_ptr[-1] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } case 4: { Py_UCS4* text = (Py_UCS4*)state->text; Py_UCS4* text_ptr = text + text_pos; Py_UCS4* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (text_ptr[-1] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } } /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_pos; } return -1; } /* Performs a Boyer-Moore fast string search. */ Py_LOCAL_INLINE(Py_ssize_t) fast_string_search(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit) { void* text; Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; Py_ssize_t last_pos; Py_UCS4 check_char; text = state->text; length = (Py_ssize_t)node->value_count; values = node->values; good_suffix_offset = node->string.good_suffix_offset; bad_character_offset = node->string.bad_character_offset; last_pos = length - 1; check_char = values[last_pos]; limit -= length; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (ch == check_char) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char(text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS1*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (ch == check_char) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char(text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS2*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (ch == check_char) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char(text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS4*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } } return -1; } /* Performs a Boyer-Moore fast string search, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) fast_string_search_ign(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit) { RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; void* text; Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; Py_ssize_t last_pos; Py_UCS4 cases[RE_MAX_CASES]; int case_count; encoding = state->encoding; locale_info = state->locale_info; text = state->text; length = (Py_ssize_t)node->value_count; values = node->values; good_suffix_offset = node->string.good_suffix_offset; bad_character_offset = node->string.bad_character_offset; last_pos = length - 1; case_count = encoding->all_cases(locale_info, values[last_pos], cases); limit -= length; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS1*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS2*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS4*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } } return -1; } /* Performs a Boyer-Moore fast string search, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) fast_string_search_ign_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit) { RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; void* text; Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; Py_UCS4 cases[RE_MAX_CASES]; int case_count; encoding = state->encoding; locale_info = state->locale_info; text = state->text; length = (Py_ssize_t)node->value_count; values = node->values; good_suffix_offset = node->string.good_suffix_offset; bad_character_offset = node->string.bad_character_offset; case_count = encoding->all_cases(locale_info, values[0], cases); text_pos -= length; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = 1; while (pos < length && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS1*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = 1; while (pos < length && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS2*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = 1; while (pos < length && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS4*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } } return -1; } /* Performs a Boyer-Moore fast string search, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) fast_string_search_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit) { void* text; Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; Py_UCS4 check_char; text = state->text; length = (Py_ssize_t)node->value_count; values = node->values; good_suffix_offset = node->string.good_suffix_offset; bad_character_offset = node->string.bad_character_offset; check_char = values[0]; text_pos -= length; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (ch == check_char) { Py_ssize_t pos; pos = 1; while (pos < length && same_char(text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS1*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (ch == check_char) { Py_ssize_t pos; pos = 1; while (pos < length && same_char(text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS2*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (ch == check_char) { Py_ssize_t pos; pos = 1; while (pos < length && same_char(text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS4*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } } return -1; } /* Builds the tables for a Boyer-Moore fast string search. */ Py_LOCAL_INLINE(BOOL) build_fast_tables(RE_State* state, RE_Node* node, BOOL ignore) { Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad; Py_ssize_t* good; Py_UCS4 ch; Py_ssize_t last_pos; Py_ssize_t pos; BOOL (*is_same_char)(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2); Py_ssize_t suffix_len; BOOL saved_start; Py_ssize_t s; Py_ssize_t i; Py_ssize_t s_start; Py_UCS4 codepoints[RE_MAX_CASES]; length = (Py_ssize_t)node->value_count; if (length < RE_MIN_FAST_LENGTH) return TRUE; values = node->values; bad = (Py_ssize_t*)re_alloc(256 * sizeof(bad[0])); good = (Py_ssize_t*)re_alloc((size_t)length * sizeof(good[0])); if (!bad || !good) { re_dealloc(bad); re_dealloc(good); return FALSE; } for (ch = 0; ch < 0x100; ch++) bad[ch] = length; last_pos = length - 1; for (pos = 0; pos < last_pos; pos++) { Py_ssize_t offset; offset = last_pos - pos; ch = values[pos]; if (ignore) { int count; int i; count = state->encoding->all_cases(state->locale_info, ch, codepoints); for (i = 0; i < count; i++) bad[codepoints[i] & 0xFF] = offset; } else bad[ch & 0xFF] = offset; } is_same_char = ignore ? same_char_ign_wrapper : same_char_wrapper; suffix_len = 2; pos = length - suffix_len; saved_start = FALSE; s = pos - 1; i = suffix_len - 1; s_start = s; while (pos >= 0) { /* Look for another occurrence of the suffix. */ while (i > 0) { /* Have we dropped off the end of the string? */ if (s + i < 0) break; if (is_same_char(state->encoding, state->locale_info, values[s + i], values[pos + i])) /* It still matches. */ --i; else { /* Start again further along. */ --s; i = suffix_len - 1; } } if (s >= 0 && is_same_char(state->encoding, state->locale_info, values[s], values[pos])) { /* We haven't dropped off the end of the string, and the suffix has * matched this far, so this is a good starting point for the next * iteration. */ --s; if (!saved_start) { s_start = s; saved_start = TRUE; } } else { /* Calculate the suffix offset. */ good[pos] = pos - s; /* Extend the suffix and start searching for _this_ one. */ --pos; ++suffix_len; /* Where's a good place to start searching? */ if (saved_start) { s = s_start; saved_start = FALSE; } else --s; /* Can we short-circuit the searching? */ if (s < 0) break; } i = suffix_len - 1; } /* Fill-in any remaining entries. */ while (pos >= 0) { good[pos] = pos - s; --pos; --s; } node->string.bad_character_offset = bad; node->string.good_suffix_offset = good; return TRUE; } /* Builds the tables for a Boyer-Moore fast string search, backwards. */ Py_LOCAL_INLINE(BOOL) build_fast_tables_rev(RE_State* state, RE_Node* node, BOOL ignore) { Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad; Py_ssize_t* good; Py_UCS4 ch; Py_ssize_t last_pos; Py_ssize_t pos; BOOL (*is_same_char)(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2); Py_ssize_t suffix_len; BOOL saved_start; Py_ssize_t s; Py_ssize_t i; Py_ssize_t s_start; Py_UCS4 codepoints[RE_MAX_CASES]; length = (Py_ssize_t)node->value_count; if (length < RE_MIN_FAST_LENGTH) return TRUE; values = node->values; bad = (Py_ssize_t*)re_alloc(256 * sizeof(bad[0])); good = (Py_ssize_t*)re_alloc((size_t)length * sizeof(good[0])); if (!bad || !good) { re_dealloc(bad); re_dealloc(good); return FALSE; } for (ch = 0; ch < 0x100; ch++) bad[ch] = -length; last_pos = length - 1; for (pos = last_pos; pos > 0; pos--) { Py_ssize_t offset; offset = -pos; ch = values[pos]; if (ignore) { int count; int i; count = state->encoding->all_cases(state->locale_info, ch, codepoints); for (i = 0; i < count; i++) bad[codepoints[i] & 0xFF] = offset; } else bad[ch & 0xFF] = offset; } is_same_char = ignore ? same_char_ign_wrapper : same_char_wrapper; suffix_len = 2; pos = suffix_len - 1; saved_start = FALSE; s = pos + 1; i = suffix_len - 1; s_start = s; while (pos < length) { /* Look for another occurrence of the suffix. */ while (i > 0) { /* Have we dropped off the end of the string? */ if (s - i >= length) break; if (is_same_char(state->encoding, state->locale_info, values[s - i], values[pos - i])) /* It still matches. */ --i; else { /* Start again further along. */ ++s; i = suffix_len - 1; } } if (s < length && is_same_char(state->encoding, state->locale_info, values[s], values[pos])) { /* We haven't dropped off the end of the string, and the suffix has * matched this far, so this is a good starting point for the next * iteration. */ ++s; if (!saved_start) { s_start = s; saved_start = TRUE; } } else { /* Calculate the suffix offset. */ good[pos] = pos - s; /* Extend the suffix and start searching for _this_ one. */ ++pos; ++suffix_len; /* Where's a good place to start searching? */ if (saved_start) { s = s_start; saved_start = FALSE; } else ++s; /* Can we short-circuit the searching? */ if (s >= length) break; } i = suffix_len - 1; } /* Fill-in any remaining entries. */ while (pos < length) { good[pos] = pos - s; ++pos; ++s; } node->string.bad_character_offset = bad; node->string.good_suffix_offset = good; return TRUE; } /* Performs a string search. */ Py_LOCAL_INLINE(Py_ssize_t) string_search(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { RE_State* state; Py_ssize_t found_pos; state = safe_state->re_state; *is_partial = FALSE; /* Has the node been initialised for fast searching, if necessary? */ if (!(node->status & RE_STATUS_FAST_INIT)) { /* Ideally the pattern should immutable and shareable across threads. * Internally, however, it isn't. For safety we need to hold the GIL. */ acquire_GIL(safe_state); /* Double-check because of multithreading. */ if (!(node->status & RE_STATUS_FAST_INIT)) { build_fast_tables(state, node, FALSE); node->status |= RE_STATUS_FAST_INIT; } release_GIL(safe_state); } if (node->string.bad_character_offset) { /* Start with a fast search. This will find the string if it's complete * (i.e. not truncated). */ found_pos = fast_string_search(state, node, text_pos, limit); if (found_pos < 0 && state->partial_side == RE_PARTIAL_RIGHT) /* We didn't find the string, but it could've been truncated, so * try again, starting close to the end. */ found_pos = simple_string_search(state, node, limit - (Py_ssize_t)(node->value_count - 1), limit, is_partial); } else found_pos = simple_string_search(state, node, text_pos, limit, is_partial); return found_pos; } /* Performs a string search, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_fld(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, Py_ssize_t* new_pos, BOOL* is_partial) { RE_State* state; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); void* text; RE_CODE* values; Py_ssize_t start_pos; int f_pos; int folded_len; Py_ssize_t length; Py_ssize_t s_pos; Py_UCS4 folded[RE_MAX_FOLDED]; state = safe_state->re_state; encoding = state->encoding; locale_info = state->locale_info; full_case_fold = encoding->full_case_fold; char_at = state->char_at; text = state->text; values = node->values; start_pos = text_pos; f_pos = 0; folded_len = 0; length = (Py_ssize_t)node->value_count; s_pos = 0; *is_partial = FALSE; while (s_pos < length || f_pos < folded_len) { if (f_pos >= folded_len) { /* Fetch and casefold another character. */ if (text_pos >= limit) { if (text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) { *is_partial = TRUE; return start_pos; } return -1; } folded_len = full_case_fold(locale_info, char_at(text, text_pos), folded); f_pos = 0; } if (s_pos < length && same_char_ign(encoding, locale_info, values[s_pos], folded[f_pos])) { ++s_pos; ++f_pos; if (f_pos >= folded_len) ++text_pos; } else { ++start_pos; text_pos = start_pos; f_pos = 0; folded_len = 0; s_pos = 0; } } /* We found the string. */ if (new_pos) *new_pos = text_pos; return start_pos; } /* Performs a string search, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_fld_rev(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, Py_ssize_t* new_pos, BOOL* is_partial) { RE_State* state; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); void* text; RE_CODE* values; Py_ssize_t start_pos; int f_pos; int folded_len; Py_ssize_t length; Py_ssize_t s_pos; Py_UCS4 folded[RE_MAX_FOLDED]; state = safe_state->re_state; encoding = state->encoding; locale_info = state->locale_info; full_case_fold = encoding->full_case_fold; char_at = state->char_at; text = state->text; values = node->values; start_pos = text_pos; f_pos = 0; folded_len = 0; length = (Py_ssize_t)node->value_count; s_pos = 0; *is_partial = FALSE; while (s_pos < length || f_pos < folded_len) { if (f_pos >= folded_len) { /* Fetch and casefold another character. */ if (text_pos <= limit) { if (text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) { *is_partial = TRUE; return start_pos; } return -1; } folded_len = full_case_fold(locale_info, char_at(text, text_pos - 1), folded); f_pos = 0; } if (s_pos < length && same_char_ign(encoding, locale_info, values[length - s_pos - 1], folded[folded_len - f_pos - 1])) { ++s_pos; ++f_pos; if (f_pos >= folded_len) --text_pos; } else { --start_pos; text_pos = start_pos; f_pos = 0; folded_len = 0; s_pos = 0; } } /* We found the string. */ if (new_pos) *new_pos = text_pos; return start_pos; } /* Performs a string search, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_ign(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { RE_State* state; Py_ssize_t found_pos; state = safe_state->re_state; *is_partial = FALSE; /* Has the node been initialised for fast searching, if necessary? */ if (!(node->status & RE_STATUS_FAST_INIT)) { /* Ideally the pattern should immutable and shareable across threads. * Internally, however, it isn't. For safety we need to hold the GIL. */ acquire_GIL(safe_state); /* Double-check because of multithreading. */ if (!(node->status & RE_STATUS_FAST_INIT)) { build_fast_tables(state, node, TRUE); node->status |= RE_STATUS_FAST_INIT; } release_GIL(safe_state); } if (node->string.bad_character_offset) { /* Start with a fast search. This will find the string if it's complete * (i.e. not truncated). */ found_pos = fast_string_search_ign(state, node, text_pos, limit); if (found_pos < 0 && state->partial_side == RE_PARTIAL_RIGHT) /* We didn't find the string, but it could've been truncated, so * try again, starting close to the end. */ found_pos = simple_string_search_ign(state, node, limit - (Py_ssize_t)(node->value_count - 1), limit, is_partial); } else found_pos = simple_string_search_ign(state, node, text_pos, limit, is_partial); return found_pos; } /* Performs a string search, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_ign_rev(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { RE_State* state; Py_ssize_t found_pos; state = safe_state->re_state; *is_partial = FALSE; /* Has the node been initialised for fast searching, if necessary? */ if (!(node->status & RE_STATUS_FAST_INIT)) { /* Ideally the pattern should immutable and shareable across threads. * Internally, however, it isn't. For safety we need to hold the GIL. */ acquire_GIL(safe_state); /* Double-check because of multithreading. */ if (!(node->status & RE_STATUS_FAST_INIT)) { build_fast_tables_rev(state, node, TRUE); node->status |= RE_STATUS_FAST_INIT; } release_GIL(safe_state); } if (node->string.bad_character_offset) { /* Start with a fast search. This will find the string if it's complete * (i.e. not truncated). */ found_pos = fast_string_search_ign_rev(state, node, text_pos, limit); if (found_pos < 0 && state->partial_side == RE_PARTIAL_LEFT) /* We didn't find the string, but it could've been truncated, so * try again, starting close to the end. */ found_pos = simple_string_search_ign_rev(state, node, limit + (Py_ssize_t)(node->value_count - 1), limit, is_partial); } else found_pos = simple_string_search_ign_rev(state, node, text_pos, limit, is_partial); return found_pos; } /* Performs a string search, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_rev(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { RE_State* state; Py_ssize_t found_pos; state = safe_state->re_state; *is_partial = FALSE; /* Has the node been initialised for fast searching, if necessary? */ if (!(node->status & RE_STATUS_FAST_INIT)) { /* Ideally the pattern should immutable and shareable across threads. * Internally, however, it isn't. For safety we need to hold the GIL. */ acquire_GIL(safe_state); /* Double-check because of multithreading. */ if (!(node->status & RE_STATUS_FAST_INIT)) { build_fast_tables_rev(state, node, FALSE); node->status |= RE_STATUS_FAST_INIT; } release_GIL(safe_state); } if (node->string.bad_character_offset) { /* Start with a fast search. This will find the string if it's complete * (i.e. not truncated). */ found_pos = fast_string_search_rev(state, node, text_pos, limit); if (found_pos < 0 && state->partial_side == RE_PARTIAL_LEFT) /* We didn't find the string, but it could've been truncated, so * try again, starting close to the end. */ found_pos = simple_string_search_rev(state, node, limit + (Py_ssize_t)(node->value_count - 1), limit, is_partial); } else found_pos = simple_string_search_rev(state, node, text_pos, limit, is_partial); return found_pos; } /* Returns how many characters there could be before full case-folding. */ Py_LOCAL_INLINE(Py_ssize_t) possible_unfolded_length(Py_ssize_t length) { if (length == 0) return 0; if (length < RE_MAX_FOLDED) return 1; return length / RE_MAX_FOLDED; } /* Checks whether there's any character except a newline at a position. */ Py_LOCAL_INLINE(int) try_match_ANY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_ANY(state->encoding, node, state->char_at(state->text, text_pos))); } /* Checks whether there's any character at all at a position. */ Py_LOCAL_INLINE(int) try_match_ANY_ALL(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end); } /* Checks whether there's any character at all at a position, backwards. */ Py_LOCAL_INLINE(int) try_match_ANY_ALL_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start); } /* Checks whether there's any character except a newline at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_ANY_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_ANY(state->encoding, node, state->char_at(state->text, text_pos - 1))); } /* Checks whether there's any character except a line separator at a position. */ Py_LOCAL_INLINE(int) try_match_ANY_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_ANY_U(state->encoding, node, state->char_at(state->text, text_pos))); } /* Checks whether there's any character except a line separator at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_ANY_U_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_ANY_U(state->encoding, node, state->char_at(state->text, text_pos - 1))); } /* Checks whether a position is on a word boundary. */ Py_LOCAL_INLINE(int) try_match_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_boundary(state, text_pos) == node->match); } /* Checks whether there's a character at a position. */ Py_LOCAL_INLINE(int) try_match_CHARACTER(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_CHARACTER(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character at a position, ignoring case. */ Py_LOCAL_INLINE(int) try_match_CHARACTER_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_CHARACTER_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character at a position, ignoring case, backwards. */ Py_LOCAL_INLINE(int) try_match_CHARACTER_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_CHARACTER_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character at a position, backwards. */ Py_LOCAL_INLINE(int) try_match_CHARACTER_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_CHARACTER(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether a position is on a default word boundary. */ Py_LOCAL_INLINE(int) try_match_DEFAULT_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_default_boundary(state, text_pos) == node->match); } /* Checks whether a position is at the default end of a word. */ Py_LOCAL_INLINE(int) try_match_DEFAULT_END_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_default_word_end(state, text_pos)); } /* Checks whether a position is at the default start of a word. */ Py_LOCAL_INLINE(int) try_match_DEFAULT_START_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_default_word_start(state, text_pos)); } /* Checks whether a position is at the end of a line. */ Py_LOCAL_INLINE(int) try_match_END_OF_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos >= state->slice_end || state->char_at(state->text, text_pos) == '\n'); } /* Checks whether a position is at the end of a line. */ Py_LOCAL_INLINE(int) try_match_END_OF_LINE_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_line_end(state, text_pos)); } /* Checks whether a position is at the end of the string. */ Py_LOCAL_INLINE(int) try_match_END_OF_STRING(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos >= state->text_length); } /* Checks whether a position is at the end of a line or the string. */ Py_LOCAL_INLINE(int) try_match_END_OF_STRING_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos >= state->text_length || text_pos == state->final_newline); } /* Checks whether a position is at the end of the string. */ Py_LOCAL_INLINE(int) try_match_END_OF_STRING_LINE_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos >= state->text_length || text_pos == state->final_line_sep); } /* Checks whether a position is at the end of a word. */ Py_LOCAL_INLINE(int) try_match_END_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_word_end(state, text_pos)); } /* Checks whether a position is on a grapheme boundary. */ Py_LOCAL_INLINE(int) try_match_GRAPHEME_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_grapheme_boundary(state, text_pos)); } /* Checks whether there's a character with a certain property at a position. */ Py_LOCAL_INLINE(int) try_match_PROPERTY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_PROPERTY(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character with a certain property at a position, * ignoring case. */ Py_LOCAL_INLINE(int) try_match_PROPERTY_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_PROPERTY_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character with a certain property at a position, * ignoring case, backwards. */ Py_LOCAL_INLINE(int) try_match_PROPERTY_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_PROPERTY_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character with a certain property at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_PROPERTY_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_PROPERTY(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character in a certain range at a position. */ Py_LOCAL_INLINE(int) try_match_RANGE(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_RANGE(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character in a certain range at a position, * ignoring case. */ Py_LOCAL_INLINE(int) try_match_RANGE_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_RANGE_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character in a certain range at a position, * ignoring case, backwards. */ Py_LOCAL_INLINE(int) try_match_RANGE_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_RANGE_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character in a certain range at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_RANGE_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_RANGE(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether a position is at the search anchor. */ Py_LOCAL_INLINE(int) try_match_SEARCH_ANCHOR(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos == state->search_anchor); } /* Checks whether there's a character in a certain set at a position. */ Py_LOCAL_INLINE(int) try_match_SET(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_SET(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character in a certain set at a position, ignoring * case. */ Py_LOCAL_INLINE(int) try_match_SET_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_SET_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character in a certain set at a position, ignoring * case, backwards. */ Py_LOCAL_INLINE(int) try_match_SET_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_SET_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character in a certain set at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_SET_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_SET(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether a position is at the start of a line. */ Py_LOCAL_INLINE(int) try_match_START_OF_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos <= 0 || state->char_at(state->text, text_pos - 1) == '\n'); } /* Checks whether a position is at the start of a line. */ Py_LOCAL_INLINE(int) try_match_START_OF_LINE_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_line_start(state, text_pos)); } /* Checks whether a position is at the start of the string. */ Py_LOCAL_INLINE(int) try_match_START_OF_STRING(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos <= 0); } /* Checks whether a position is at the start of a word. */ Py_LOCAL_INLINE(int) try_match_START_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_word_start(state, text_pos)); } /* Checks whether there's a certain string at a position. */ Py_LOCAL_INLINE(int) try_match_STRING(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_CODE* values; Py_ssize_t s_pos; length = (Py_ssize_t)node->value_count; char_at = state->char_at; values = node->values; for (s_pos = 0; s_pos < length; s_pos++) { if (text_pos + s_pos >= state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } if (!same_char(char_at(state->text, text_pos + s_pos), values[s_pos])) return RE_ERROR_FAILURE; } next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, ignoring case. */ Py_LOCAL_INLINE(int) try_match_STRING_FLD(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t s_pos; RE_CODE* values; int folded_len; int f_pos; Py_ssize_t start_pos; Py_UCS4 folded[RE_MAX_FOLDED]; length = (Py_ssize_t)node->value_count; char_at = state->char_at; encoding = state->encoding; locale_info = state->locale_info; full_case_fold = encoding->full_case_fold; s_pos = 0; values = node->values; folded_len = 0; f_pos = 0; start_pos = text_pos; while (s_pos < length) { if (f_pos >= folded_len) { /* Fetch and casefold another character. */ if (text_pos >= state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } folded_len = full_case_fold(locale_info, char_at(state->text, text_pos), folded); f_pos = 0; } if (!same_char_ign(encoding, locale_info, folded[f_pos], values[s_pos])) return RE_ERROR_FAILURE; ++s_pos; ++f_pos; if (f_pos >= folded_len) ++text_pos; } if (f_pos < folded_len) return RE_ERROR_FAILURE; next_position->node = next->match_next; if (next->match_step == 0) next_position->text_pos = start_pos; else next_position->text_pos = text_pos; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, ignoring case, * backwards. */ Py_LOCAL_INLINE(int) try_match_STRING_FLD_REV(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t s_pos; RE_CODE* values; int folded_len; int f_pos; Py_ssize_t start_pos; Py_UCS4 folded[RE_MAX_FOLDED]; length = (Py_ssize_t)node->value_count; char_at = state->char_at; encoding = state->encoding; locale_info = state->locale_info; full_case_fold = encoding->full_case_fold; s_pos = 0; values = node->values; folded_len = 0; f_pos = 0; start_pos = text_pos; while (s_pos < length) { if (f_pos >= folded_len) { /* Fetch and casefold another character. */ if (text_pos <= state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } folded_len = full_case_fold(locale_info, char_at(state->text, text_pos - 1), folded); f_pos = 0; } if (!same_char_ign(encoding, locale_info, folded[folded_len - f_pos - 1], values[length - s_pos - 1])) return RE_ERROR_FAILURE; ++s_pos; ++f_pos; if (f_pos >= folded_len) --text_pos; } if (f_pos < folded_len) return RE_ERROR_FAILURE; next_position->node = next->match_next; if (next->match_step == 0) next_position->text_pos = start_pos; else next_position->text_pos = text_pos; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, ignoring case. */ Py_LOCAL_INLINE(int) try_match_STRING_IGN(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; RE_CODE* values; Py_ssize_t s_pos; length = (Py_ssize_t)node->value_count; char_at = state->char_at; encoding = state->encoding; locale_info = state->locale_info; values = node->values; for (s_pos = 0; s_pos < length; s_pos++) { if (text_pos + s_pos >= state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } if (!same_char_ign(encoding, locale_info, char_at(state->text, text_pos + s_pos), values[s_pos])) return RE_ERROR_FAILURE; } next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, ignoring case, * backwards. */ Py_LOCAL_INLINE(int) try_match_STRING_IGN_REV(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; RE_CODE* values; Py_ssize_t s_pos; length = (Py_ssize_t)node->value_count; char_at = state->char_at; encoding = state->encoding; locale_info = state->locale_info; values = node->values; for (s_pos = 0; s_pos < length; s_pos++) { if (text_pos - s_pos <= state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } if (!same_char_ign(encoding, locale_info, char_at(state->text, text_pos - s_pos - 1), values[length - s_pos - 1])) return RE_ERROR_FAILURE; } next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, backwards. */ Py_LOCAL_INLINE(int) try_match_STRING_REV(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_CODE* values; Py_ssize_t s_pos; length = (Py_ssize_t)node->value_count; char_at = state->char_at; values = node->values; for (s_pos = 0; s_pos < length; s_pos++) { if (text_pos - s_pos <= state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } if (!same_char(char_at(state->text, text_pos - s_pos - 1), values[length - s_pos - 1])) return RE_ERROR_FAILURE; } next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Tries a match at the current text position. * * Returns the next node and text position if the match succeeds. */ Py_LOCAL_INLINE(int) try_match(RE_State* state, RE_NextNode* next, Py_ssize_t text_pos, RE_Position* next_position) { RE_Node* test; int status; test = next->test; if (test->status & RE_STATUS_FUZZY) { next_position->node = next->node; next_position->text_pos = text_pos; return RE_ERROR_SUCCESS; } switch (test->op) { case RE_OP_ANY: status = try_match_ANY(state, test, text_pos); break; case RE_OP_ANY_ALL: status = try_match_ANY_ALL(state, test, text_pos); break; case RE_OP_ANY_ALL_REV: status = try_match_ANY_ALL_REV(state, test, text_pos); break; case RE_OP_ANY_REV: status = try_match_ANY_REV(state, test, text_pos); break; case RE_OP_ANY_U: status = try_match_ANY_U(state, test, text_pos); break; case RE_OP_ANY_U_REV: status = try_match_ANY_U_REV(state, test, text_pos); break; case RE_OP_BOUNDARY: status = try_match_BOUNDARY(state, test, text_pos); break; case RE_OP_BRANCH: status = try_match(state, &test->next_1, text_pos, next_position); if (status == RE_ERROR_FAILURE) status = try_match(state, &test->nonstring.next_2, text_pos, next_position); break; case RE_OP_CHARACTER: status = try_match_CHARACTER(state, test, text_pos); break; case RE_OP_CHARACTER_IGN: status = try_match_CHARACTER_IGN(state, test, text_pos); break; case RE_OP_CHARACTER_IGN_REV: status = try_match_CHARACTER_IGN_REV(state, test, text_pos); break; case RE_OP_CHARACTER_REV: status = try_match_CHARACTER_REV(state, test, text_pos); break; case RE_OP_DEFAULT_BOUNDARY: status = try_match_DEFAULT_BOUNDARY(state, test, text_pos); break; case RE_OP_DEFAULT_END_OF_WORD: status = try_match_DEFAULT_END_OF_WORD(state, test, text_pos); break; case RE_OP_DEFAULT_START_OF_WORD: status = try_match_DEFAULT_START_OF_WORD(state, test, text_pos); break; case RE_OP_END_OF_LINE: status = try_match_END_OF_LINE(state, test, text_pos); break; case RE_OP_END_OF_LINE_U: status = try_match_END_OF_LINE_U(state, test, text_pos); break; case RE_OP_END_OF_STRING: status = try_match_END_OF_STRING(state, test, text_pos); break; case RE_OP_END_OF_STRING_LINE: status = try_match_END_OF_STRING_LINE(state, test, text_pos); break; case RE_OP_END_OF_STRING_LINE_U: status = try_match_END_OF_STRING_LINE_U(state, test, text_pos); break; case RE_OP_END_OF_WORD: status = try_match_END_OF_WORD(state, test, text_pos); break; case RE_OP_GRAPHEME_BOUNDARY: status = try_match_GRAPHEME_BOUNDARY(state, test, text_pos); break; case RE_OP_PROPERTY: status = try_match_PROPERTY(state, test, text_pos); break; case RE_OP_PROPERTY_IGN: status = try_match_PROPERTY_IGN(state, test, text_pos); break; case RE_OP_PROPERTY_IGN_REV: status = try_match_PROPERTY_IGN_REV(state, test, text_pos); break; case RE_OP_PROPERTY_REV: status = try_match_PROPERTY_REV(state, test, text_pos); break; case RE_OP_RANGE: status = try_match_RANGE(state, test, text_pos); break; case RE_OP_RANGE_IGN: status = try_match_RANGE_IGN(state, test, text_pos); break; case RE_OP_RANGE_IGN_REV: status = try_match_RANGE_IGN_REV(state, test, text_pos); break; case RE_OP_RANGE_REV: status = try_match_RANGE_REV(state, test, text_pos); break; case RE_OP_SEARCH_ANCHOR: status = try_match_SEARCH_ANCHOR(state, test, text_pos); break; case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: status = try_match_SET(state, test, text_pos); break; case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: status = try_match_SET_IGN(state, test, text_pos); break; case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: status = try_match_SET_IGN_REV(state, test, text_pos); break; case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: status = try_match_SET_REV(state, test, text_pos); break; case RE_OP_START_OF_LINE: status = try_match_START_OF_LINE(state, test, text_pos); break; case RE_OP_START_OF_LINE_U: status = try_match_START_OF_LINE_U(state, test, text_pos); break; case RE_OP_START_OF_STRING: status = try_match_START_OF_STRING(state, test, text_pos); break; case RE_OP_START_OF_WORD: status = try_match_START_OF_WORD(state, test, text_pos); break; case RE_OP_STRING: return try_match_STRING(state, next, test, text_pos, next_position); case RE_OP_STRING_FLD: return try_match_STRING_FLD(state, next, test, text_pos, next_position); case RE_OP_STRING_FLD_REV: return try_match_STRING_FLD_REV(state, next, test, text_pos, next_position); case RE_OP_STRING_IGN: return try_match_STRING_IGN(state, next, test, text_pos, next_position); case RE_OP_STRING_IGN_REV: return try_match_STRING_IGN_REV(state, next, test, text_pos, next_position); case RE_OP_STRING_REV: return try_match_STRING_REV(state, next, test, text_pos, next_position); default: next_position->node = next->node; next_position->text_pos = text_pos; return RE_ERROR_SUCCESS; } if (status != RE_ERROR_SUCCESS) return status; next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Searches for a word boundary. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_boundary)(RE_State* state, Py_ssize_t text_pos); at_boundary = state->encoding->at_boundary; *is_partial = FALSE; for (;;) { if (at_boundary(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for a word boundary, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_BOUNDARY_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_boundary)(RE_State* state, Py_ssize_t text_pos); at_boundary = state->encoding->at_boundary; *is_partial = FALSE; for (;;) { if (at_boundary(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for a default word boundary. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_boundary)(RE_State* state, Py_ssize_t text_pos); at_default_boundary = state->encoding->at_default_boundary; *is_partial = FALSE; for (;;) { if (at_default_boundary(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for a default word boundary, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_BOUNDARY_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_boundary)(RE_State* state, Py_ssize_t text_pos); at_default_boundary = state->encoding->at_default_boundary; *is_partial = FALSE; for (;;) { if (at_default_boundary(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the default end of a word. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_END_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_word_end)(RE_State* state, Py_ssize_t text_pos); at_default_word_end = state->encoding->at_default_word_end; *is_partial = FALSE; for (;;) { if (at_default_word_end(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the default end of a word, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_END_OF_WORD_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_word_end)(RE_State* state, Py_ssize_t text_pos); at_default_word_end = state->encoding->at_default_word_end; *is_partial = FALSE; for (;;) { if (at_default_word_end(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the default start of a word. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_START_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_word_start)(RE_State* state, Py_ssize_t text_pos); at_default_word_start = state->encoding->at_default_word_start; *is_partial = FALSE; for (;;) { if (at_default_word_start(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the default start of a word, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_START_OF_WORD_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_word_start)(RE_State* state, Py_ssize_t text_pos); at_default_word_start = state->encoding->at_default_word_start; *is_partial = FALSE; for (;;) { if (at_default_word_start(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the end of line. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; for (;;) { if (text_pos >= state->text_length || state->char_at(state->text, text_pos) == '\n') return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the end of line, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_LINE_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; for (;;) { if (text_pos >= state->text_length || state->char_at(state->text, text_pos) == '\n') return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the end of the string. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_STRING(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (state->slice_end >= state->text_length) return state->text_length; return -1; } /* Searches for the end of the string, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_STRING_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (text_pos >= state->text_length) return text_pos; return -1; } /* Searches for the end of the string or line. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_STRING_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (text_pos <= state->final_newline) text_pos = state->final_newline; else if (text_pos <= state->text_length) text_pos = state->text_length; if (text_pos > state->slice_end) return -1; if (text_pos >= state->text_length) return text_pos; return text_pos; } /* Searches for the end of the string or line, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_STRING_LINE_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (text_pos >= state->text_length) text_pos = state->text_length; else if (text_pos >= state->final_newline) text_pos = state->final_newline; else return -1; if (text_pos < state->slice_start) return -1; if (text_pos <= 0) return text_pos; return text_pos; } /* Searches for the end of a word. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_word_end)(RE_State* state, Py_ssize_t text_pos); at_word_end = state->encoding->at_word_end; *is_partial = FALSE; for (;;) { if (at_word_end(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the end of a word, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_WORD_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_word_end)(RE_State* state, Py_ssize_t text_pos); at_word_end = state->encoding->at_word_end; *is_partial = FALSE; for (;;) { if (at_word_end(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for a grapheme boundary. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_GRAPHEME_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_grapheme_boundary)(RE_State* state, Py_ssize_t text_pos); at_grapheme_boundary = state->encoding->at_grapheme_boundary; *is_partial = FALSE; for (;;) { if (at_grapheme_boundary(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for a grapheme boundary, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_GRAPHEME_BOUNDARY_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_grapheme_boundary)(RE_State* state, Py_ssize_t text_pos); at_grapheme_boundary = state->encoding->at_grapheme_boundary; *is_partial = FALSE; for (;;) { if (at_grapheme_boundary(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the start of line. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; for (;;) { if (text_pos <= 0 || state->char_at(state->text, text_pos - 1) == '\n') return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the start of line, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_LINE_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; for (;;) { if (text_pos <= 0 || state->char_at(state->text, text_pos - 1) == '\n') return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the start of the string. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_STRING(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (text_pos <= 0) return text_pos; return -1; } /* Searches for the start of the string, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_STRING_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (state->slice_start <= 0) return 0; return -1; } /* Searches for the start of a word. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_word_start)(RE_State* state, Py_ssize_t text_pos); at_word_start = state->encoding->at_word_start; *is_partial = FALSE; for (;;) { if (at_word_start(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the start of a word, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_WORD_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_word_start)(RE_State* state, Py_ssize_t text_pos); at_word_start = state->encoding->at_word_start; *is_partial = FALSE; for (;;) { if (at_word_start(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for a string. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) return text_pos; return string_search(safe_state, node, text_pos, state->slice_end, is_partial); } /* Searches for a string, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_FLD(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t* new_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) { *new_pos = state->req_end; return text_pos; } return string_search_fld(safe_state, node, text_pos, state->slice_end, new_pos, is_partial); } /* Searches for a string, ignoring case, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_FLD_REV(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t* new_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) { *new_pos = state->req_end; return text_pos; } return string_search_fld_rev(safe_state, node, text_pos, state->slice_start, new_pos, is_partial); } /* Searches for a string, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_IGN(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) return text_pos; return string_search_ign(safe_state, node, text_pos, state->slice_end, is_partial); } /* Searches for a string, ignoring case, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_IGN_REV(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) return text_pos; return string_search_ign_rev(safe_state, node, text_pos, state->slice_start, is_partial); } /* Searches for a string, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_REV(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) return text_pos; return string_search_rev(safe_state, node, text_pos, state->slice_start, is_partial); } /* Searches for the start of a match. */ Py_LOCAL_INLINE(int) search_start(RE_SafeState* safe_state, RE_NextNode* next, RE_Position* new_position, int search_index) { RE_State* state; Py_ssize_t start_pos; RE_Node* test; RE_Node* node; RE_SearchPosition* info; Py_ssize_t text_pos; state = safe_state->re_state; start_pos = state->text_pos; TRACE(("<> at %d\n", start_pos)) test = next->test; node = next->node; if (state->reverse) { if (start_pos < state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = state->slice_start; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } } else { if (start_pos > state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = state->slice_end; return RE_ERROR_PARTIAL; } } } if (test->status & RE_STATUS_FUZZY) { /* Don't call 'search_start' again. */ state->pattern->do_search_start = FALSE; state->match_pos = start_pos; new_position->node = node; new_position->text_pos = start_pos; return RE_ERROR_SUCCESS; } again: if (!state->pattern->is_fuzzy && state->partial_side == RE_PARTIAL_NONE) { if (state->reverse) { if (start_pos - state->min_width < state->slice_start) return RE_ERROR_FAILURE; } else { if (start_pos + state->min_width > state->slice_end) return RE_ERROR_FAILURE; } } if (search_index < MAX_SEARCH_POSITIONS) { info = &state->search_positions[search_index]; if (state->reverse) { if (info->start_pos >= 0 && info->start_pos >= start_pos && start_pos >= info->match_pos) { state->match_pos = info->match_pos; new_position->text_pos = state->match_pos; new_position->node = node; return RE_ERROR_SUCCESS; } } else { if (info->start_pos >= 0 && info->start_pos <= start_pos && start_pos <= info->match_pos) { state->match_pos = info->match_pos; new_position->text_pos = state->match_pos; new_position->node = node; return RE_ERROR_SUCCESS; } } } else info = NULL; switch (test->op) { case RE_OP_ANY: start_pos = match_many_ANY(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_ANY_ALL: case RE_OP_ANY_ALL_REV: break; case RE_OP_ANY_REV: start_pos = match_many_ANY_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_ANY_U: start_pos = match_many_ANY_U(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_ANY_U_REV: start_pos = match_many_ANY_U_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_BOUNDARY: { BOOL is_partial; if (state->reverse) start_pos = search_start_BOUNDARY_rev(state, test, start_pos, &is_partial); else start_pos = search_start_BOUNDARY(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_CHARACTER: start_pos = match_many_CHARACTER(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_CHARACTER_IGN: start_pos = match_many_CHARACTER_IGN(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_CHARACTER_IGN_REV: start_pos = match_many_CHARACTER_IGN_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_CHARACTER_REV: start_pos = match_many_CHARACTER_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_DEFAULT_BOUNDARY: { BOOL is_partial; if (state->reverse) start_pos = search_start_DEFAULT_BOUNDARY_rev(state, test, start_pos, &is_partial); else start_pos = search_start_DEFAULT_BOUNDARY(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_DEFAULT_END_OF_WORD: { BOOL is_partial; if (state->reverse) start_pos = search_start_DEFAULT_END_OF_WORD_rev(state, test, start_pos, &is_partial); else start_pos = search_start_DEFAULT_END_OF_WORD(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_DEFAULT_START_OF_WORD: { BOOL is_partial; if (state->reverse) start_pos = search_start_DEFAULT_START_OF_WORD_rev(state, test, start_pos, &is_partial); else start_pos = search_start_DEFAULT_START_OF_WORD(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_END_OF_LINE: { BOOL is_partial; if (state->reverse) start_pos = search_start_END_OF_LINE_rev(state, test, start_pos, &is_partial); else start_pos = search_start_END_OF_LINE(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_END_OF_STRING: { BOOL is_partial; if (state->reverse) start_pos = search_start_END_OF_STRING_rev(state, test, start_pos, &is_partial); else start_pos = search_start_END_OF_STRING(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_END_OF_STRING_LINE: { BOOL is_partial; if (state->reverse) start_pos = search_start_END_OF_STRING_LINE_rev(state, test, start_pos, &is_partial); else start_pos = search_start_END_OF_STRING_LINE(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_END_OF_WORD: { BOOL is_partial; if (state->reverse) start_pos = search_start_END_OF_WORD_rev(state, test, start_pos, &is_partial); else start_pos = search_start_END_OF_WORD(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_GRAPHEME_BOUNDARY: { BOOL is_partial; if (state->reverse) start_pos = search_start_GRAPHEME_BOUNDARY_rev(state, test, start_pos, &is_partial); else start_pos = search_start_GRAPHEME_BOUNDARY(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_PROPERTY: start_pos = match_many_PROPERTY(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_PROPERTY_IGN: start_pos = match_many_PROPERTY_IGN(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_PROPERTY_IGN_REV: start_pos = match_many_PROPERTY_IGN_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_PROPERTY_REV: start_pos = match_many_PROPERTY_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_RANGE: start_pos = match_many_RANGE(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_RANGE_IGN: start_pos = match_many_RANGE_IGN(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_RANGE_IGN_REV: start_pos = match_many_RANGE_IGN_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_RANGE_REV: start_pos = match_many_RANGE_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_SEARCH_ANCHOR: if (state->reverse) { if (start_pos < state->search_anchor) return RE_ERROR_FAILURE; } else { if (start_pos > state->search_anchor) return RE_ERROR_FAILURE; } start_pos = state->search_anchor; break; case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: start_pos = match_many_SET(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return FALSE; break; case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: start_pos = match_many_SET_IGN(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return FALSE; break; case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: start_pos = match_many_SET_IGN_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return FALSE; break; case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: start_pos = match_many_SET_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return FALSE; break; case RE_OP_START_OF_LINE: { BOOL is_partial; if (state->reverse) start_pos = search_start_START_OF_LINE_rev(state, test, start_pos, &is_partial); else start_pos = search_start_START_OF_LINE(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_START_OF_STRING: { BOOL is_partial; if (state->reverse) start_pos = search_start_START_OF_STRING_rev(state, test, start_pos, &is_partial); else start_pos = search_start_START_OF_STRING(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_START_OF_WORD: { BOOL is_partial; if (state->reverse) start_pos = search_start_START_OF_WORD_rev(state, test, start_pos, &is_partial); else start_pos = search_start_START_OF_WORD(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_STRING: { BOOL is_partial; start_pos = search_start_STRING(safe_state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_STRING_FLD: { Py_ssize_t new_pos; BOOL is_partial; start_pos = search_start_STRING_FLD(safe_state, test, start_pos, &new_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } /* Can we look further ahead? */ if (test == node) { if (test->next_1.node) { int status; status = try_match(state, &test->next_1, new_pos, new_position); if (status < 0) return status; if (status == RE_ERROR_FAILURE) { ++start_pos; if (start_pos >= state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = state->slice_start; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } goto again; } } /* It's a possible match. */ state->match_pos = start_pos; if (info) { info->start_pos = state->text_pos; info->match_pos = state->match_pos; } return RE_ERROR_SUCCESS; } break; } case RE_OP_STRING_FLD_REV: { Py_ssize_t new_pos; BOOL is_partial; start_pos = search_start_STRING_FLD_REV(safe_state, test, start_pos, &new_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } /* Can we look further ahead? */ if (test == node) { if (test->next_1.node) { int status; status = try_match(state, &test->next_1, new_pos, new_position); if (status < 0) return status; if (status == RE_ERROR_FAILURE) { --start_pos; if (start_pos <= state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = state->slice_start; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } goto again; } } /* It's a possible match. */ state->match_pos = start_pos; if (info) { info->start_pos = state->text_pos; info->match_pos = state->match_pos; } return RE_ERROR_SUCCESS; } break; } case RE_OP_STRING_IGN: { BOOL is_partial; start_pos = search_start_STRING_IGN(safe_state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_STRING_IGN_REV: { BOOL is_partial; start_pos = search_start_STRING_IGN_REV(safe_state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_STRING_REV: { BOOL is_partial; start_pos = search_start_STRING_REV(safe_state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } default: /* Don't call 'search_start' again. */ state->pattern->do_search_start = FALSE; state->match_pos = start_pos; new_position->node = node; new_position->text_pos = start_pos; return RE_ERROR_SUCCESS; } /* Can we look further ahead? */ if (test == node) { text_pos = start_pos + test->step; if (test->next_1.node) { int status; status = try_match(state, &test->next_1, text_pos, new_position); if (status < 0) return status; if (status == RE_ERROR_FAILURE) { if (state->reverse) { --start_pos; if (start_pos < state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = state->slice_start; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } } else { ++start_pos; if (start_pos > state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = state->slice_end; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } } goto again; } } } else { new_position->node = node; new_position->text_pos = start_pos; } /* It's a possible match. */ state->match_pos = start_pos; if (info) { info->start_pos = state->text_pos; info->match_pos = state->match_pos; } return RE_ERROR_SUCCESS; } /* Saves a capture group. */ Py_LOCAL_INLINE(BOOL) save_capture(RE_SafeState* safe_state, size_t private_index, size_t public_index) { RE_State* state; RE_GroupData* private_group; RE_GroupData* public_group; state = safe_state->re_state; /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ private_group = &state->groups[private_index - 1]; public_group = &state->groups[public_index - 1]; /* Will the repeated captures ever be visible? */ if (!state->visible_captures) { public_group->captures[0] = private_group->span; public_group->capture_count = 1; return TRUE; } if (public_group->capture_count >= public_group->capture_capacity) { size_t new_capacity; RE_GroupSpan* new_captures; new_capacity = public_group->capture_capacity * 2; new_capacity = max_size_t(new_capacity, RE_INIT_CAPTURE_SIZE); new_captures = (RE_GroupSpan*)safe_realloc(safe_state, public_group->captures, new_capacity * sizeof(RE_GroupSpan)); if (!new_captures) return FALSE; public_group->captures = new_captures; public_group->capture_capacity = new_capacity; } public_group->captures[public_group->capture_count++] = private_group->span; return TRUE; } /* Unsaves a capture group. */ Py_LOCAL_INLINE(void) unsave_capture(RE_State* state, size_t private_index, size_t public_index) { /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ if (state->groups[public_index - 1].capture_count > 0) --state->groups[public_index - 1].capture_count; } /* Pushes the groups for backtracking. */ Py_LOCAL_INLINE(BOOL) push_groups(RE_SafeState* safe_state) { RE_State* state; size_t group_count; RE_SavedGroups* current; size_t g; state = safe_state->re_state; group_count = state->pattern->true_group_count; if (group_count == 0) return TRUE; current = state->current_saved_groups; if (current && current->next) current = current->next; else if (!current && state->first_saved_groups) current = state->first_saved_groups; else { RE_SavedGroups* new_block; new_block = (RE_SavedGroups*)safe_alloc(safe_state, sizeof(RE_SavedGroups)); if (!new_block) return FALSE; new_block->spans = (RE_GroupSpan*)safe_alloc(safe_state, group_count * sizeof(RE_GroupSpan)); new_block->counts = (size_t*)safe_alloc(safe_state, group_count * sizeof(Py_ssize_t)); if (!new_block->spans || !new_block->counts) { safe_dealloc(safe_state, new_block->spans); safe_dealloc(safe_state, new_block->counts); safe_dealloc(safe_state, new_block); return FALSE; } new_block->previous = current; new_block->next = NULL; if (new_block->previous) new_block->previous->next = new_block; else state->first_saved_groups = new_block; current = new_block; } for (g = 0; g < group_count; g++) { current->spans[g] = state->groups[g].span; current->counts[g] = state->groups[g].capture_count; } state->current_saved_groups = current; return TRUE; } /* Pops the groups for backtracking. */ Py_LOCAL_INLINE(void) pop_groups(RE_State* state) { size_t group_count; RE_SavedGroups* current; size_t g; group_count = state->pattern->true_group_count; if (group_count == 0) return; current = state->current_saved_groups; for (g = 0; g < group_count; g++) { state->groups[g].span = current->spans[g]; state->groups[g].capture_count = current->counts[g]; } state->current_saved_groups = current->previous; } /* Drops the groups for backtracking. */ Py_LOCAL_INLINE(void) drop_groups(RE_State* state) { if (state->pattern->true_group_count != 0) state->current_saved_groups = state->current_saved_groups->previous; } /* Pushes the repeats for backtracking. */ Py_LOCAL_INLINE(BOOL) push_repeats(RE_SafeState* safe_state) { RE_State* state; PatternObject* pattern; size_t repeat_count; RE_SavedRepeats* current; size_t r; state = safe_state->re_state; pattern = state->pattern; repeat_count = pattern->repeat_count; if (repeat_count == 0) return TRUE; current = state->current_saved_repeats; if (current && current->next) current = current->next; else if (!current && state->first_saved_repeats) current = state->first_saved_repeats; else { RE_SavedRepeats* new_block; new_block = (RE_SavedRepeats*)safe_alloc(safe_state, sizeof(RE_SavedRepeats)); if (!new_block) return FALSE; new_block->repeats = (RE_RepeatData*)safe_alloc(safe_state, repeat_count * sizeof(RE_RepeatData)); if (!new_block->repeats) { safe_dealloc(safe_state, new_block); return FALSE; } memset(new_block->repeats, 0, repeat_count * sizeof(RE_RepeatData)); new_block->previous = current; new_block->next = NULL; if (new_block->previous) new_block->previous->next = new_block; else state->first_saved_repeats = new_block; current = new_block; } for (r = 0; r < repeat_count; r++) { if (!copy_repeat_data(safe_state, ¤t->repeats[r], &state->repeats[r])) return FALSE; } state->current_saved_repeats = current; return TRUE; } /* Pops the repeats for backtracking. */ Py_LOCAL_INLINE(void) pop_repeats(RE_State* state) { PatternObject* pattern; size_t repeat_count; RE_SavedRepeats* current; size_t r; pattern = state->pattern; repeat_count = pattern->repeat_count; if (repeat_count == 0) return; current = state->current_saved_repeats; for (r = 0; r < repeat_count; r++) copy_repeat_data(NULL, &state->repeats[r], ¤t->repeats[r]); state->current_saved_repeats = current->previous; } /* Drops the repeats for backtracking. */ Py_LOCAL_INLINE(void) drop_repeats(RE_State* state) { PatternObject* pattern; size_t repeat_count; RE_SavedRepeats* current; pattern = state->pattern; repeat_count = pattern->repeat_count; if (repeat_count == 0) return; current = state->current_saved_repeats; state->current_saved_repeats = current->previous; } /* Inserts a new span in a guard list. */ Py_LOCAL_INLINE(BOOL) insert_guard_span(RE_SafeState* safe_state, RE_GuardList* guard_list, size_t index) { size_t n; if (guard_list->count >= guard_list->capacity) { size_t new_capacity; RE_GuardSpan* new_spans; new_capacity = guard_list->capacity * 2; if (new_capacity == 0) new_capacity = RE_INIT_GUARDS_BLOCK_SIZE; new_spans = (RE_GuardSpan*)safe_realloc(safe_state, guard_list->spans, new_capacity * sizeof(RE_GuardSpan)); if (!new_spans) return FALSE; guard_list->capacity = new_capacity; guard_list->spans = new_spans; } n = guard_list->count - index; if (n > 0) memmove(guard_list->spans + index + 1, guard_list->spans + index, n * sizeof(RE_GuardSpan)); ++guard_list->count; return TRUE; } /* Deletes a span in a guard list. */ Py_LOCAL_INLINE(void) delete_guard_span(RE_GuardList* guard_list, size_t index) { size_t n; n = guard_list->count - index - 1; if (n > 0) memmove(guard_list->spans + index, guard_list->spans + index + 1, n * sizeof(RE_GuardSpan)); --guard_list->count; } /* Checks whether a position is guarded against further matching. */ Py_LOCAL_INLINE(BOOL) is_guarded(RE_GuardList* guard_list, Py_ssize_t text_pos) { size_t low; size_t high; /* Is this position in the guard list? */ if (guard_list->count == 0 || text_pos < guard_list->spans[0].low) guard_list->last_low = 0; else if (text_pos > guard_list->spans[guard_list->count - 1].high) guard_list->last_low = guard_list->count; else { low = 0; high = guard_list->count; while (low < high) { size_t mid; RE_GuardSpan* span; mid = (low + high) / 2; span = &guard_list->spans[mid]; if (text_pos < span->low) high = mid; else if (text_pos > span->high) low = mid + 1; else return span->protect; } guard_list->last_low = low; } guard_list->last_text_pos = text_pos; return FALSE; } /* Guards a position against further matching. */ Py_LOCAL_INLINE(BOOL) guard(RE_SafeState* safe_state, RE_GuardList* guard_list, Py_ssize_t text_pos, BOOL protect) { size_t low; size_t high; /* Where should be new position be added? */ if (text_pos == guard_list->last_text_pos) low = guard_list->last_low; else { low = 0; high = guard_list->count; while (low < high) { size_t mid; RE_GuardSpan* span; mid = (low + high) / 2; span = &guard_list->spans[mid]; if (text_pos < span->low) high = mid; else if (text_pos > span->high) low = mid + 1; else return TRUE; } } /* Add the position to the guard list. */ if (low > 0 && guard_list->spans[low - 1].high + 1 == text_pos && guard_list->spans[low - 1].protect == protect) { /* The new position is just above this span. */ if (low < guard_list->count && guard_list->spans[low].low - 1 == text_pos && guard_list->spans[low].protect == protect) { /* The new position joins 2 spans */ guard_list->spans[low - 1].high = guard_list->spans[low].high; delete_guard_span(guard_list, low); } else /* Extend the span. */ guard_list->spans[low - 1].high = text_pos; } else if (low < guard_list->count && guard_list->spans[low].low - 1 == text_pos && guard_list->spans[low].protect == protect) /* The new position is just below this span. */ /* Extend the span. */ guard_list->spans[low].low = text_pos; else { /* Insert a new span. */ if (!insert_guard_span(safe_state, guard_list, low)) return FALSE; guard_list->spans[low].low = text_pos; guard_list->spans[low].high = text_pos; guard_list->spans[low].protect = protect; } guard_list->last_text_pos = -1; return TRUE; } /* Guards a position against further matching for a repeat. */ Py_LOCAL_INLINE(BOOL) guard_repeat(RE_SafeState* safe_state, size_t index, Py_ssize_t text_pos, RE_STATUS_T guard_type, BOOL protect) { RE_State* state; RE_GuardList* guard_list; state = safe_state->re_state; /* Is a guard active here? */ if (!(state->pattern->repeat_info[index].status & guard_type)) return TRUE; /* Which guard list? */ if (guard_type & RE_STATUS_BODY) guard_list = &state->repeats[index].body_guard_list; else guard_list = &state->repeats[index].tail_guard_list; return guard(safe_state, guard_list, text_pos, protect); } /* Guards a range of positions against further matching for a repeat. */ Py_LOCAL_INLINE(BOOL) guard_repeat_range(RE_SafeState* safe_state, size_t index, Py_ssize_t lo_pos, Py_ssize_t hi_pos, RE_STATUS_T guard_type, BOOL protect) { RE_State* state; RE_GuardList* guard_list; Py_ssize_t pos; state = safe_state->re_state; /* Is a guard active here? */ if (!(state->pattern->repeat_info[index].status & guard_type)) return TRUE; /* Which guard list? */ if (guard_type & RE_STATUS_BODY) guard_list = &state->repeats[index].body_guard_list; else guard_list = &state->repeats[index].tail_guard_list; for (pos = lo_pos; pos <= hi_pos; pos++) { if (!guard(safe_state, guard_list, pos, protect)) return FALSE; } return TRUE; } /* Checks whether a position is guarded against further matching for a repeat. */ Py_LOCAL_INLINE(BOOL) is_repeat_guarded(RE_SafeState* safe_state, size_t index, Py_ssize_t text_pos, RE_STATUS_T guard_type) { RE_State* state; RE_GuardList* guard_list; state = safe_state->re_state; /* Is a guard active here? */ if (!(state->pattern->repeat_info[index].status & guard_type)) return FALSE; /* Which guard list? */ if (guard_type == RE_STATUS_BODY) guard_list = &state->repeats[index].body_guard_list; else guard_list = &state->repeats[index].tail_guard_list; return is_guarded(guard_list, text_pos); } /* Builds a Unicode string. */ Py_LOCAL_INLINE(PyObject*) build_unicode_value(void* buffer, Py_ssize_t len, Py_ssize_t buffer_charsize) { return PyUnicode_FromUnicode(buffer, len); } /* Builds a bytestring. Returns NULL if any member is too wide. */ Py_LOCAL_INLINE(PyObject*) build_bytes_value(void* buffer, Py_ssize_t len, Py_ssize_t buffer_charsize) { Py_UCS1* byte_buffer; Py_ssize_t i; PyObject* result; if (buffer_charsize == 1) return Py_BuildValue("s#", buffer, len); byte_buffer = re_alloc((size_t)len); if (!byte_buffer) return NULL; for (i = 0; i < len; i++) { Py_UCS2 c = ((Py_UCS2*)buffer)[i]; if (c > 0xFF) goto too_wide; byte_buffer[i] = (Py_UCS1)c; } result = Py_BuildValue("s#", byte_buffer, len); re_dealloc(byte_buffer); return result; too_wide: re_dealloc(byte_buffer); return NULL; } /* Looks for a string in a string set. */ Py_LOCAL_INLINE(int) string_set_contains(RE_State* state, PyObject* string_set, Py_ssize_t first, Py_ssize_t last) { PyObject* string; int status; if (state->is_unicode) string = build_unicode_value(state->point_to(state->text, first), last - first, state->charsize); else string = build_bytes_value(state->point_to(state->text, first), last - first, state->charsize); if (!string) return RE_ERROR_INTERNAL; status = PySet_Contains(string_set, string); Py_DECREF(string); return status; } /* Looks for a string in a string set, ignoring case. */ Py_LOCAL_INLINE(int) string_set_contains_ign(RE_State* state, PyObject* string_set, void* buffer, Py_ssize_t index, Py_ssize_t len, Py_ssize_t buffer_charsize) { Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; BOOL (*possible_turkic)(RE_LocaleInfo* locale_info, Py_UCS4 ch); Py_UCS4 codepoints[4]; switch (buffer_charsize) { case 1: char_at = bytes1_char_at; set_char_at = bytes1_set_char_at; break; case 2: char_at = bytes2_char_at; set_char_at = bytes2_set_char_at; break; case 4: char_at = bytes4_char_at; set_char_at = bytes4_set_char_at; break; default: char_at = bytes1_char_at; set_char_at = bytes1_set_char_at; break; } encoding = state->encoding; locale_info = state->locale_info; possible_turkic = encoding->possible_turkic; /* Look for a possible Turkic 'I'. */ while (index < len && !possible_turkic(locale_info, char_at(buffer, index))) ++index; if (index < len) { /* Possible Turkic 'I'. */ int count; int i; /* Try all the alternatives to the 'I'. */ count = encoding->all_turkic_i(locale_info, char_at(buffer, index), codepoints); for (i = 0; i < count; i++) { int status; set_char_at(buffer, index, codepoints[i]); /* Recurse for the remainder of the string. */ status = string_set_contains_ign(state, string_set, buffer, index + 1, len, buffer_charsize); if (status != 0) return status; } return 0; } else { /* No Turkic 'I'. */ PyObject* string; int status; if (state->is_unicode) string = build_unicode_value(buffer, len, buffer_charsize); else string = build_bytes_value(buffer, len, buffer_charsize); if (!string) return RE_ERROR_MEMORY; status = PySet_Contains(string_set, string); Py_DECREF(string); return status; } } /* Creates a partial string set for truncation at the left or right side. */ Py_LOCAL_INLINE(int) make_partial_string_set(RE_State* state, RE_Node* node) { PatternObject* pattern; int partial_side; PyObject* string_set; PyObject* partial_set; PyObject* iter = NULL; PyObject* item = NULL; PyObject* slice = NULL; pattern = state->pattern; partial_side = state->partial_side; if (partial_side != RE_PARTIAL_LEFT && partial_side != RE_PARTIAL_RIGHT) return RE_ERROR_INTERNAL; /* Fetch the full string set. PyList_GET_ITEM borrows a reference. */ string_set = PyList_GET_ITEM(pattern->named_list_indexes, node->values[0]); if (!string_set) return RE_ERROR_INTERNAL; /* Gets the list of partial string sets. */ if (!pattern->partial_named_lists[partial_side]) { size_t size; size = pattern->named_lists_count * sizeof(PyObject*); pattern->partial_named_lists[partial_side] = re_alloc(size); if (!pattern->partial_named_lists[partial_side]) return RE_ERROR_INTERNAL; memset(pattern->partial_named_lists[partial_side], 0, size); } /* Get the partial string set. */ partial_set = pattern->partial_named_lists[partial_side][node->values[0]]; if (partial_set) return 1; /* Build the partial string set. */ partial_set = PySet_New(NULL); if (!partial_set) return RE_ERROR_INTERNAL; iter = PyObject_GetIter(string_set); if (!iter) goto error; item = PyIter_Next(iter); while (item) { Py_ssize_t len; Py_ssize_t first; Py_ssize_t last; len = PySequence_Length(item); if (len == -1) goto error; first = 0; last = len; while (last - first > 1) { int status; /* Shorten the entry. */ if (partial_side == RE_PARTIAL_LEFT) ++first; else --last; slice = PySequence_GetSlice(item, first, last); if (!slice) goto error; status = PySet_Add(partial_set, slice); Py_DECREF(slice); if (status < 0) goto error; } Py_DECREF(item); item = PyIter_Next(iter); } if (PyErr_Occurred()) goto error; Py_DECREF(iter); pattern->partial_named_lists[partial_side][node->values[0]] = partial_set; return 1; error: Py_XDECREF(item); Py_XDECREF(iter); Py_DECREF(partial_set); return RE_ERROR_INTERNAL; } /* Tries to match a string at the current position with a member of a string * set, forwards or backwards. */ Py_LOCAL_INLINE(int) string_set_match_fwdrev(RE_SafeState* safe_state, RE_Node* node, BOOL reverse) { RE_State* state; Py_ssize_t min_len; Py_ssize_t max_len; Py_ssize_t text_available; Py_ssize_t slice_available; int partial_side; Py_ssize_t len; Py_ssize_t first; Py_ssize_t last; int status; PyObject* string_set; state = safe_state->re_state; min_len = (Py_ssize_t)node->values[1]; max_len = (Py_ssize_t)node->values[2]; acquire_GIL(safe_state); if (reverse) { text_available = state->text_pos; slice_available = state->text_pos - state->slice_start; partial_side = RE_PARTIAL_LEFT; } else { text_available = state->text_length - state->text_pos; slice_available = state->slice_end - state->text_pos; partial_side = RE_PARTIAL_RIGHT; } /* Get as many characters as we need for the longest possible match. */ len = min_ssize_t(max_len, slice_available); if (reverse) { first = state->text_pos - len; last = state->text_pos; } else { first = state->text_pos; last = state->text_pos + len; } /* If we didn't get all of the characters we need, is a partial match * allowed? */ if (len < max_len && len == text_available && state->partial_side == partial_side) { if (len == 0) { /* An empty string is always a possible partial match. */ status = RE_ERROR_PARTIAL; goto finished; } /* Make a set of the possible partial matches. */ status = make_partial_string_set(state, node); if (status < 0) goto finished; /* Fetch the partial string set. */ string_set = state->pattern->partial_named_lists[partial_side][node->values[0]]; /* Is the text we have a partial match? */ status = string_set_contains(state, string_set, first, last); if (status < 0) goto finished; if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= len; else state->text_pos += len; status = RE_ERROR_PARTIAL; goto finished; } } /* Fetch the string set. PyList_GET_ITEM borrows a reference. */ string_set = PyList_GET_ITEM(state->pattern->named_list_indexes, node->values[0]); if (!string_set) { status = RE_ERROR_INTERNAL; goto finished; } /* We've already looked for a partial match (if allowed), but what about a * complete match? */ while (len >= min_len) { status = string_set_contains(state, string_set, first, last); if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= len; else state->text_pos += len; status = 1; goto finished; } /* Look for a shorter match. */ --len; if (reverse) ++first; else --last; } /* No match. */ status = 0; finished: release_GIL(safe_state); return status; } /* Tries to match a string at the current position with a member of a string * set, ignoring case, forwards or backwards. */ Py_LOCAL_INLINE(int) string_set_match_fld_fwdrev(RE_SafeState* safe_state, RE_Node* node, BOOL reverse) { RE_State* state; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); Py_ssize_t folded_charsize; void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); Py_ssize_t min_len; Py_ssize_t max_len; Py_ssize_t buf_len; void* folded; int status; BOOL* end_of_fold = NULL; Py_ssize_t text_available; Py_ssize_t slice_available; Py_ssize_t t_pos; Py_ssize_t f_pos; int step; int partial_side; Py_ssize_t len; Py_ssize_t consumed; Py_UCS4 codepoints[RE_MAX_FOLDED]; Py_ssize_t first; Py_ssize_t last; PyObject* string_set; state = safe_state->re_state; full_case_fold = state->encoding->full_case_fold; char_at = state->char_at; /* The folded string will have the same width as the original string. */ folded_charsize = state->charsize; switch (folded_charsize) { case 1: set_char_at = bytes1_set_char_at; break; case 2: set_char_at = bytes2_set_char_at; break; case 4: set_char_at = bytes4_set_char_at; break; default: return RE_ERROR_INTERNAL; } min_len = (Py_ssize_t)node->values[1]; max_len = (Py_ssize_t)node->values[2]; acquire_GIL(safe_state); /* Allocate a buffer for the folded string. */ buf_len = max_len + RE_MAX_FOLDED; folded = re_alloc((size_t)(buf_len * folded_charsize)); if (!folded) { status = RE_ERROR_MEMORY; goto finished; } end_of_fold = re_alloc((size_t)buf_len * sizeof(BOOL)); if (!end_of_fold) { status = RE_ERROR_MEMORY; goto finished; } memset(end_of_fold, 0, (size_t)buf_len * sizeof(BOOL)); if (reverse) { text_available = state->text_pos; slice_available = state->text_pos - state->slice_start; t_pos = state->text_pos - 1; f_pos = buf_len; step = -1; partial_side = RE_PARTIAL_LEFT; } else { text_available = state->text_length - state->text_pos; slice_available = state->slice_end - state->text_pos; t_pos = state->text_pos; f_pos = 0; step = 1; partial_side = RE_PARTIAL_RIGHT; } /* We can stop getting characters as soon as the case-folded string is long * enough (each codepoint from the text can expand to more than one folded * codepoint). */ len = 0; end_of_fold[len] = TRUE; consumed = 0; while (len < max_len && consumed < slice_available) { int count; int j; count = full_case_fold(state->locale_info, char_at(state->text, t_pos), codepoints); if (reverse) f_pos -= count; for (j = 0; j < count; j++) set_char_at(folded, f_pos + j, codepoints[j]); if (!reverse) f_pos += count; len += count; end_of_fold[len] = TRUE; ++consumed; t_pos += step; } if (reverse) { first = f_pos; last = buf_len; } else { first = 0; last = f_pos; } /* If we didn't get all of the characters we need, is a partial match * allowed? */ if (len < max_len && len == text_available && state->partial_side == partial_side) { if (len == 0) { /* An empty string is always a possible partial match. */ status = RE_ERROR_PARTIAL; goto finished; } /* Make a set of the possible partial matches. */ status = make_partial_string_set(state, node); if (status < 0) goto finished; /* Fetch the partial string set. */ string_set = state->pattern->partial_named_lists[partial_side][node->values[0]]; /* Is the text we have a partial match? */ status = string_set_contains_ign(state, string_set, folded, first, last, folded_charsize); if (status < 0) goto finished; if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= consumed; else state->text_pos += consumed; status = RE_ERROR_PARTIAL; goto finished; } } /* Fetch the string set. PyList_GET_ITEM borrows a reference. */ string_set = PyList_GET_ITEM(state->pattern->named_list_indexes, node->values[0]); if (!string_set) { status = RE_ERROR_INTERNAL; goto finished; } /* We've already looked for a partial match (if allowed), but what about a * complete match? */ while (len >= min_len) { if (end_of_fold[len]) { status = string_set_contains_ign(state, string_set, folded, first, last, folded_charsize); if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= consumed; else state->text_pos += consumed; status = 1; goto finished; } --consumed; } /* Look for a shorter match. */ --len; if (reverse) ++first; else --last; } /* No match. */ status = 0; finished: re_dealloc(end_of_fold); re_dealloc(folded); release_GIL(safe_state); return status; } /* Tries to match a string at the current position with a member of a string * set, ignoring case, forwards or backwards. */ Py_LOCAL_INLINE(int) string_set_match_ign_fwdrev(RE_SafeState* safe_state, RE_Node* node, BOOL reverse) { RE_State* state; Py_UCS4 (*simple_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch); Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); Py_ssize_t folded_charsize; void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); Py_ssize_t min_len; Py_ssize_t max_len; void* folded; int status; Py_ssize_t text_available; Py_ssize_t slice_available; Py_ssize_t t_pos; Py_ssize_t f_pos; int step; int partial_side; Py_ssize_t len; Py_ssize_t i; Py_ssize_t first; Py_ssize_t last; PyObject* string_set; state = safe_state->re_state; simple_case_fold = state->encoding->simple_case_fold; char_at = state->char_at; /* The folded string will have the same width as the original string. */ folded_charsize = state->charsize; switch (folded_charsize) { case 1: set_char_at = bytes1_set_char_at; break; case 2: set_char_at = bytes2_set_char_at; break; case 4: set_char_at = bytes4_set_char_at; break; default: return RE_ERROR_INTERNAL; } min_len = (Py_ssize_t)node->values[1]; max_len = (Py_ssize_t)node->values[2]; acquire_GIL(safe_state); /* Allocate a buffer for the folded string. */ folded = re_alloc((size_t)(max_len * folded_charsize)); if (!folded) { status = RE_ERROR_MEMORY; goto finished; } if (reverse) { text_available = state->text_pos; slice_available = state->text_pos - state->slice_start; t_pos = state->text_pos - 1; f_pos = max_len - 1; step = -1; partial_side = RE_PARTIAL_LEFT; } else { text_available = state->text_length - state->text_pos; slice_available = state->slice_end - state->text_pos; t_pos = state->text_pos; f_pos = 0; step = 1; partial_side = RE_PARTIAL_RIGHT; } /* Get as many characters as we need for the longest possible match. */ len = min_ssize_t(max_len, slice_available); for (i = 0; i < len; i ++) { Py_UCS4 ch; ch = simple_case_fold(state->locale_info, char_at(state->text, t_pos)); set_char_at(folded, f_pos, ch); t_pos += step; f_pos += step; } if (reverse) { first = f_pos; last = max_len; } else { first = 0; last = f_pos; } /* If we didn't get all of the characters we need, is a partial match * allowed? */ if (len < max_len && len == text_available && state->partial_side == partial_side) { if (len == 0) { /* An empty string is always a possible partial match. */ status = RE_ERROR_PARTIAL; goto finished; } /* Make a set of the possible partial matches. */ status = make_partial_string_set(state, node); if (status < 0) goto finished; /* Fetch the partial string set. */ string_set = state->pattern->partial_named_lists[partial_side][node->values[0]]; /* Is the text we have a partial match? */ status = string_set_contains_ign(state, string_set, folded, first, last, folded_charsize); if (status < 0) goto finished; if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= len; else state->text_pos += len; status = RE_ERROR_PARTIAL; goto finished; } } /* Fetch the string set. PyList_GET_ITEM borrows a reference. */ string_set = PyList_GET_ITEM(state->pattern->named_list_indexes, node->values[0]); if (!string_set) { status = RE_ERROR_INTERNAL; goto finished; } /* We've already looked for a partial match (if allowed), but what about a * complete match? */ while (len >= min_len) { status = string_set_contains_ign(state, string_set, folded, first, last, folded_charsize); if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= len; else state->text_pos += len; status = 1; goto finished; } /* Look for a shorter match. */ --len; if (reverse) ++first; else --last; } /* No match. */ status = 0; finished: re_dealloc(folded); release_GIL(safe_state); return status; } /* Checks whether any additional fuzzy error is permitted. */ Py_LOCAL_INLINE(BOOL) any_error_permitted(RE_State* state) { RE_FuzzyInfo* fuzzy_info; RE_CODE* values; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; return fuzzy_info->total_cost <= values[RE_FUZZY_VAL_MAX_COST] && fuzzy_info->counts[RE_FUZZY_ERR] < values[RE_FUZZY_VAL_MAX_ERR] && state->total_errors <= state->max_errors; } /* Checks whether this additional fuzzy error is permitted. */ Py_LOCAL_INLINE(BOOL) this_error_permitted(RE_State* state, int fuzzy_type) { RE_FuzzyInfo* fuzzy_info; RE_CODE* values; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; return fuzzy_info->total_cost + values[RE_FUZZY_VAL_COST_BASE + fuzzy_type] <= values[RE_FUZZY_VAL_MAX_COST] && fuzzy_info->counts[fuzzy_type] < values[RE_FUZZY_VAL_MAX_BASE + fuzzy_type] && state->total_errors + 1 <= state->max_errors; } /* Checks whether we've reachsd the end of the text during a fuzzy partial * match. */ Py_LOCAL_INLINE(int) check_fuzzy_partial(RE_State* state, Py_ssize_t text_pos) { switch (state->partial_side) { case RE_PARTIAL_LEFT: if (text_pos < 0) return RE_ERROR_PARTIAL; break; case RE_PARTIAL_RIGHT: if (text_pos > state->text_length) return RE_ERROR_PARTIAL; break; } return RE_ERROR_FAILURE; } /* Checks a fuzzy match of an item. */ Py_LOCAL_INLINE(int) next_fuzzy_match_item(RE_State* state, RE_FuzzyData* data, BOOL is_string, int step) { Py_ssize_t new_pos; if (this_error_permitted(state, data->fuzzy_type)) { switch (data->fuzzy_type) { case RE_FUZZY_DEL: /* Could a character at text_pos have been deleted? */ if (is_string) data->new_string_pos += step; else data->new_node = data->new_node->next_1.node; return RE_ERROR_SUCCESS; case RE_FUZZY_INS: /* Could the character at text_pos have been inserted? */ if (!data->permit_insertion) return RE_ERROR_FAILURE; new_pos = data->new_text_pos + step; if (state->slice_start <= new_pos && new_pos <= state->slice_end) { data->new_text_pos = new_pos; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); case RE_FUZZY_SUB: /* Could the character at text_pos have been substituted? */ new_pos = data->new_text_pos + step; if (state->slice_start <= new_pos && new_pos <= state->slice_end) { data->new_text_pos = new_pos; if (is_string) data->new_string_pos += step; else data->new_node = data->new_node->next_1.node; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); } } return RE_ERROR_FAILURE; } /* Tries a fuzzy match of an item of width 0 or 1. */ Py_LOCAL_INLINE(int) fuzzy_match_item(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, int step) { RE_State* state; RE_FuzzyData data; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; state = safe_state->re_state; if (!any_error_permitted(state)) { *node = NULL; return RE_ERROR_SUCCESS; } data.new_text_pos = *text_pos; data.new_node = *node; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; if (step == 0) { if (data.new_node->status & RE_STATUS_REVERSE) { data.step = -1; data.limit = state->slice_start; } else { data.step = 1; data.limit = state->slice_end; } } else data.step = step; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || data.new_text_pos != state->search_anchor; for (data.fuzzy_type = 0; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_item(state, &data, FALSE, step); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } *node = NULL; return RE_ERROR_SUCCESS; found: if (!add_backtrack(safe_state, (*node)->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_item.position.text_pos = *text_pos; bt_data->fuzzy_item.position.node = *node; bt_data->fuzzy_item.fuzzy_type = (RE_INT8)data.fuzzy_type; bt_data->fuzzy_item.step = (RE_INT8)step; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = data.new_text_pos; *node = data.new_node; return RE_ERROR_SUCCESS; } /* Retries a fuzzy match of a item of width 0 or 1. */ Py_LOCAL_INLINE(int) retry_fuzzy_match_item(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, BOOL advance) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; RE_FuzzyData data; int step; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; data.new_text_pos = bt_data->fuzzy_item.position.text_pos; data.new_node = bt_data->fuzzy_item.position.node; data.fuzzy_type = bt_data->fuzzy_item.fuzzy_type; data.step = bt_data->fuzzy_item.step; if (data.fuzzy_type >= 0) { --fuzzy_info->counts[data.fuzzy_type]; --fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; --state->total_errors; } /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || data.new_text_pos != state->search_anchor; step = advance ? data.step : 0; for (++data.fuzzy_type; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_item(state, &data, FALSE, step); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } discard_backtrack(state); *node = NULL; return RE_ERROR_SUCCESS; found: bt_data->fuzzy_item.fuzzy_type = (RE_INT8)data.fuzzy_type; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = data.new_text_pos; *node = data.new_node; return RE_ERROR_SUCCESS; } /* Tries a fuzzy insertion. */ Py_LOCAL_INLINE(int) fuzzy_insert(RE_SafeState* safe_state, Py_ssize_t text_pos, RE_Node* node) { RE_State* state; RE_BacktrackData* bt_data; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; state = safe_state->re_state; /* No insertion or deletion. */ if (!add_backtrack(safe_state, node->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_insert.position.text_pos = text_pos; bt_data->fuzzy_insert.position.node = node; bt_data->fuzzy_insert.count = 0; bt_data->fuzzy_insert.too_few_errors = state->too_few_errors; bt_data->fuzzy_insert.fuzzy_node = node; /* END_FUZZY node. */ /* Check whether there are too few errors. */ fuzzy_info = &state->fuzzy_info; /* The node in this case is the END_FUZZY node. */ values = node->values; if (fuzzy_info->counts[RE_FUZZY_DEL] < values[RE_FUZZY_VAL_MIN_DEL] || fuzzy_info->counts[RE_FUZZY_INS] < values[RE_FUZZY_VAL_MIN_INS] || fuzzy_info->counts[RE_FUZZY_SUB] < values[RE_FUZZY_VAL_MIN_SUB] || fuzzy_info->counts[RE_FUZZY_ERR] < values[RE_FUZZY_VAL_MIN_ERR]) state->too_few_errors = RE_ERROR_SUCCESS; return RE_ERROR_SUCCESS; } /* Retries a fuzzy insertion. */ Py_LOCAL_INLINE(int) retry_fuzzy_insert(RE_SafeState* safe_state, Py_ssize_t* text_pos, RE_Node** node) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; Py_ssize_t new_text_pos; RE_Node* new_node; int step; Py_ssize_t limit; RE_Node* fuzzy_node; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; new_text_pos = bt_data->fuzzy_insert.position.text_pos; new_node = bt_data->fuzzy_insert.position.node; if (new_node->status & RE_STATUS_REVERSE) { step = -1; limit = state->slice_start; } else { step = 1; limit = state->slice_end; } /* Could the character at text_pos have been inserted? */ if (!this_error_permitted(state, RE_FUZZY_INS) || new_text_pos == limit) { size_t count; count = bt_data->fuzzy_insert.count; fuzzy_info->counts[RE_FUZZY_INS] -= count; fuzzy_info->counts[RE_FUZZY_ERR] -= count; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_INS_COST] * count; state->total_errors -= count; state->too_few_errors = bt_data->fuzzy_insert.too_few_errors; discard_backtrack(state); *node = NULL; return RE_ERROR_SUCCESS; } ++bt_data->fuzzy_insert.count; ++fuzzy_info->counts[RE_FUZZY_INS]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_INS_COST]; ++state->total_errors; /* Check whether there are too few errors. */ state->too_few_errors = bt_data->fuzzy_insert.too_few_errors; fuzzy_node = bt_data->fuzzy_insert.fuzzy_node; /* END_FUZZY node. */ values = fuzzy_node->values; if (fuzzy_info->counts[RE_FUZZY_DEL] < values[RE_FUZZY_VAL_MIN_DEL] || fuzzy_info->counts[RE_FUZZY_INS] < values[RE_FUZZY_VAL_MIN_INS] || fuzzy_info->counts[RE_FUZZY_SUB] < values[RE_FUZZY_VAL_MIN_SUB] || fuzzy_info->counts[RE_FUZZY_ERR] < values[RE_FUZZY_VAL_MIN_ERR]) state->too_few_errors = RE_ERROR_SUCCESS; *text_pos = new_text_pos + step * (Py_ssize_t)bt_data->fuzzy_insert.count; *node = new_node; return RE_ERROR_SUCCESS; } /* Tries a fuzzy match of a string. */ Py_LOCAL_INLINE(int) fuzzy_match_string(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node* node, Py_ssize_t* string_pos, BOOL* matched, int step) { RE_State* state; RE_FuzzyData data; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; state = safe_state->re_state; if (!any_error_permitted(state)) { *matched = FALSE; return RE_ERROR_SUCCESS; } data.new_text_pos = *text_pos; data.new_string_pos = *string_pos; data.step = step; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || data.new_text_pos != state->search_anchor; for (data.fuzzy_type = 0; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_item(state, &data, TRUE, data.step); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } *matched = FALSE; return RE_ERROR_SUCCESS; found: if (!add_backtrack(safe_state, node->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_string.position.text_pos = *text_pos; bt_data->fuzzy_string.position.node = node; bt_data->fuzzy_string.string_pos = *string_pos; bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; bt_data->fuzzy_string.step = (RE_INT8)step; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = data.new_text_pos; *string_pos = data.new_string_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Retries a fuzzy match of a string. */ Py_LOCAL_INLINE(int) retry_fuzzy_match_string(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, Py_ssize_t* string_pos, BOOL* matched) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; RE_FuzzyData data; RE_Node* new_node; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; data.new_text_pos = bt_data->fuzzy_string.position.text_pos; new_node = bt_data->fuzzy_string.position.node; data.new_string_pos = bt_data->fuzzy_string.string_pos; data.fuzzy_type = bt_data->fuzzy_string.fuzzy_type; data.step = bt_data->fuzzy_string.step; --fuzzy_info->counts[data.fuzzy_type]; --fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; --state->total_errors; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || data.new_text_pos != state->search_anchor; for (++data.fuzzy_type; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_item(state, &data, TRUE, data.step); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } discard_backtrack(state); *matched = FALSE; return RE_ERROR_SUCCESS; found: bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = data.new_text_pos; *node = new_node; *string_pos = data.new_string_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Checks a fuzzy match of a atring. */ Py_LOCAL_INLINE(int) next_fuzzy_match_string_fld(RE_State* state, RE_FuzzyData* data) { int new_pos; if (this_error_permitted(state, data->fuzzy_type)) { switch (data->fuzzy_type) { case RE_FUZZY_DEL: /* Could a character at text_pos have been deleted? */ data->new_string_pos += data->step; return RE_ERROR_SUCCESS; case RE_FUZZY_INS: /* Could the character at text_pos have been inserted? */ if (!data->permit_insertion) return RE_ERROR_FAILURE; new_pos = data->new_folded_pos + data->step; if (0 <= new_pos && new_pos <= data->folded_len) { data->new_folded_pos = new_pos; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); case RE_FUZZY_SUB: /* Could the character at text_pos have been substituted? */ new_pos = data->new_folded_pos + data->step; if (0 <= new_pos && new_pos <= data->folded_len) { data->new_folded_pos = new_pos; data->new_string_pos += data->step; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); } } return RE_ERROR_FAILURE; } /* Tries a fuzzy match of a string, ignoring case. */ Py_LOCAL_INLINE(int) fuzzy_match_string_fld(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node* node, Py_ssize_t* string_pos, int* folded_pos, int folded_len, BOOL* matched, int step) { RE_State* state; Py_ssize_t new_text_pos; RE_FuzzyData data; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; state = safe_state->re_state; if (!any_error_permitted(state)) { *matched = FALSE; return RE_ERROR_SUCCESS; } new_text_pos = *text_pos; data.new_string_pos = *string_pos; data.new_folded_pos = *folded_pos; data.folded_len = folded_len; data.step = step; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || new_text_pos != state->search_anchor; if (step > 0) { if (data.new_folded_pos != 0) data.permit_insertion = RE_ERROR_SUCCESS; } else { if (data.new_folded_pos != folded_len) data.permit_insertion = RE_ERROR_SUCCESS; } for (data.fuzzy_type = 0; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_string_fld(state, &data); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } *matched = FALSE; return RE_ERROR_SUCCESS; found: if (!add_backtrack(safe_state, node->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_string.position.text_pos = *text_pos; bt_data->fuzzy_string.position.node = node; bt_data->fuzzy_string.string_pos = *string_pos; bt_data->fuzzy_string.folded_pos = (RE_INT8)(*folded_pos); bt_data->fuzzy_string.folded_len = (RE_INT8)folded_len; bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; bt_data->fuzzy_string.step = (RE_INT8)step; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = new_text_pos; *string_pos = data.new_string_pos; *folded_pos = data.new_folded_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Retries a fuzzy match of a string, ignoring case. */ Py_LOCAL_INLINE(int) retry_fuzzy_match_string_fld(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, Py_ssize_t* string_pos, int* folded_pos, BOOL* matched) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; Py_ssize_t new_text_pos; RE_Node* new_node; RE_FuzzyData data; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; new_text_pos = bt_data->fuzzy_string.position.text_pos; new_node = bt_data->fuzzy_string.position.node; data.new_string_pos = bt_data->fuzzy_string.string_pos; data.new_folded_pos = bt_data->fuzzy_string.folded_pos; data.folded_len = bt_data->fuzzy_string.folded_len; data.fuzzy_type = bt_data->fuzzy_string.fuzzy_type; data.step = bt_data->fuzzy_string.step; --fuzzy_info->counts[data.fuzzy_type]; --fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; --state->total_errors; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || new_text_pos != state->search_anchor; if (data.step > 0) { if (data.new_folded_pos != 0) data.permit_insertion = RE_ERROR_SUCCESS; } else { if (data.new_folded_pos != bt_data->fuzzy_string.folded_len) data.permit_insertion = RE_ERROR_SUCCESS; } for (++data.fuzzy_type; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_string_fld(state, &data); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } discard_backtrack(state); *matched = FALSE; return RE_ERROR_SUCCESS; found: bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = new_text_pos; *node = new_node; *string_pos = data.new_string_pos; *folded_pos = data.new_folded_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Checks a fuzzy match of a atring. */ Py_LOCAL_INLINE(int) next_fuzzy_match_group_fld(RE_State* state, RE_FuzzyData* data) { int new_pos; if (this_error_permitted(state, data->fuzzy_type)) { switch (data->fuzzy_type) { case RE_FUZZY_DEL: /* Could a character at text_pos have been deleted? */ data->new_gfolded_pos += data->step; return RE_ERROR_SUCCESS; case RE_FUZZY_INS: /* Could the character at text_pos have been inserted? */ if (!data->permit_insertion) return RE_ERROR_FAILURE; new_pos = data->new_folded_pos + data->step; if (0 <= new_pos && new_pos <= data->folded_len) { data->new_folded_pos = new_pos; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); case RE_FUZZY_SUB: /* Could the character at text_pos have been substituted? */ new_pos = data->new_folded_pos + data->step; if (0 <= new_pos && new_pos <= data->folded_len) { data->new_folded_pos = new_pos; data->new_gfolded_pos += data->step; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); } } return RE_ERROR_FAILURE; } /* Tries a fuzzy match of a group reference, ignoring case. */ Py_LOCAL_INLINE(int) fuzzy_match_group_fld(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node* node, int* folded_pos, int folded_len, Py_ssize_t* group_pos, int* gfolded_pos, int gfolded_len, BOOL* matched, int step) { RE_State* state; Py_ssize_t new_text_pos; RE_FuzzyData data; Py_ssize_t new_group_pos; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; state = safe_state->re_state; if (!any_error_permitted(state)) { *matched = FALSE; return RE_ERROR_SUCCESS; } new_text_pos = *text_pos; data.new_folded_pos = *folded_pos; data.folded_len = folded_len; new_group_pos = *group_pos; data.new_gfolded_pos = *gfolded_pos; data.step = step; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || new_text_pos != state->search_anchor; if (data.step > 0) { if (data.new_folded_pos != 0) data.permit_insertion = RE_ERROR_SUCCESS; } else { if (data.new_folded_pos != folded_len) data.permit_insertion = RE_ERROR_SUCCESS; } for (data.fuzzy_type = 0; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_group_fld(state, &data); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } *matched = FALSE; return RE_ERROR_SUCCESS; found: if (!add_backtrack(safe_state, node->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_string.position.text_pos = *text_pos; bt_data->fuzzy_string.position.node = node; bt_data->fuzzy_string.string_pos = *group_pos; bt_data->fuzzy_string.folded_pos = (RE_INT8)(*folded_pos); bt_data->fuzzy_string.folded_len = (RE_INT8)folded_len; bt_data->fuzzy_string.gfolded_pos = (RE_INT8)(*gfolded_pos); bt_data->fuzzy_string.gfolded_len = (RE_INT8)gfolded_len; bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; bt_data->fuzzy_string.step = (RE_INT8)step; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = new_text_pos; *group_pos = new_group_pos; *folded_pos = data.new_folded_pos; *gfolded_pos = data.new_gfolded_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Retries a fuzzy match of a group reference, ignoring case. */ Py_LOCAL_INLINE(int) retry_fuzzy_match_group_fld(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, int* folded_pos, Py_ssize_t* group_pos, int* gfolded_pos, BOOL* matched) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; Py_ssize_t new_text_pos; RE_Node* new_node; Py_ssize_t new_group_pos; RE_FuzzyData data; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; new_text_pos = bt_data->fuzzy_string.position.text_pos; new_node = bt_data->fuzzy_string.position.node; new_group_pos = bt_data->fuzzy_string.string_pos; data.new_folded_pos = bt_data->fuzzy_string.folded_pos; data.folded_len = bt_data->fuzzy_string.folded_len; data.new_gfolded_pos = bt_data->fuzzy_string.gfolded_pos; data.fuzzy_type = bt_data->fuzzy_string.fuzzy_type; data.step = bt_data->fuzzy_string.step; --fuzzy_info->counts[data.fuzzy_type]; --fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; --state->total_errors; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || new_text_pos != state->search_anchor || data.new_folded_pos != bt_data->fuzzy_string.folded_len; for (++data.fuzzy_type; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_group_fld(state, &data); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } discard_backtrack(state); *matched = FALSE; return RE_ERROR_SUCCESS; found: bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = new_text_pos; *node = new_node; *group_pos = new_group_pos; *folded_pos = data.new_folded_pos; *gfolded_pos = data.new_gfolded_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Locates the required string, if there's one. */ Py_LOCAL_INLINE(Py_ssize_t) locate_required_string(RE_SafeState* safe_state, BOOL search) { RE_State* state; PatternObject* pattern; Py_ssize_t found_pos; Py_ssize_t end_pos; state = safe_state->re_state; pattern = state->pattern; if (!pattern->req_string) /* There isn't a required string, so start matching from the current * position. */ return state->text_pos; /* Search for the required string and calculate where to start matching. */ switch (pattern->req_string->op) { case RE_OP_STRING: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_end; else { limit = state->slice_start + pattern->req_offset + (Py_ssize_t)pattern->req_string->value_count; if (limit > state->slice_end || limit < 0) limit = state->slice_end; } if (state->req_pos < 0 || state->text_pos > state->req_pos) /* First time or already passed it. */ found_pos = string_search(safe_state, pattern->req_string, state->text_pos, limit, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = found_pos + (Py_ssize_t)pattern->req_string->value_count; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos -= pattern->req_offset; if (found_pos >= state->text_pos) return found_pos; } break; } case RE_OP_STRING_FLD: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_end; else { limit = state->slice_start + pattern->req_offset + (Py_ssize_t)pattern->req_string->value_count; if (limit > state->slice_end || limit < 0) limit = state->slice_end; } if (state->req_pos < 0 || state->text_pos > state->req_pos) /* First time or already passed it. */ found_pos = string_search_fld(safe_state, pattern->req_string, state->text_pos, limit, &end_pos, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = end_pos; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos -= pattern->req_offset; if (found_pos >= state->text_pos) return found_pos; } break; } case RE_OP_STRING_FLD_REV: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_start; else { limit = state->slice_end - pattern->req_offset - (Py_ssize_t)pattern->req_string->value_count; if (limit < state->slice_start) limit = state->slice_start; } if (state->req_pos < 0 || state->text_pos < state->req_pos) /* First time or already passed it. */ found_pos = string_search_fld_rev(safe_state, pattern->req_string, state->text_pos, limit, &end_pos, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = end_pos; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos += pattern->req_offset; if (found_pos <= state->text_pos) return found_pos; } break; } case RE_OP_STRING_IGN: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_end; else { limit = state->slice_start + pattern->req_offset + (Py_ssize_t)pattern->req_string->value_count; if (limit > state->slice_end || limit < 0) limit = state->slice_end; } if (state->req_pos < 0 || state->text_pos > state->req_pos) /* First time or already passed it. */ found_pos = string_search_ign(safe_state, pattern->req_string, state->text_pos, limit, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = found_pos + (Py_ssize_t)pattern->req_string->value_count; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos -= pattern->req_offset; if (found_pos >= state->text_pos) return found_pos; } break; } case RE_OP_STRING_IGN_REV: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_start; else { limit = state->slice_end - pattern->req_offset - (Py_ssize_t)pattern->req_string->value_count; if (limit < state->slice_start) limit = state->slice_start; } if (state->req_pos < 0 || state->text_pos < state->req_pos) /* First time or already passed it. */ found_pos = string_search_ign_rev(safe_state, pattern->req_string, state->text_pos, limit, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = found_pos - (Py_ssize_t)pattern->req_string->value_count; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos += pattern->req_offset; if (found_pos <= state->text_pos) return found_pos; } break; } case RE_OP_STRING_REV: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_start; else { limit = state->slice_end - pattern->req_offset - (Py_ssize_t)pattern->req_string->value_count; if (limit < state->slice_start) limit = state->slice_start; } if (state->req_pos < 0 || state->text_pos < state->req_pos) /* First time or already passed it. */ found_pos = string_search_rev(safe_state, pattern->req_string, state->text_pos, limit, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = found_pos - (Py_ssize_t)pattern->req_string->value_count; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos += pattern->req_offset; if (found_pos <= state->text_pos) return found_pos; } break; } } /* Start matching from the current position. */ return state->text_pos; } /* Tries to match a character pattern. */ Py_LOCAL_INLINE(int) match_one(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { switch (node->op) { case RE_OP_ANY: return try_match_ANY(state, node, text_pos); case RE_OP_ANY_ALL: return try_match_ANY_ALL(state, node, text_pos); case RE_OP_ANY_ALL_REV: return try_match_ANY_ALL_REV(state, node, text_pos); case RE_OP_ANY_REV: return try_match_ANY_REV(state, node, text_pos); case RE_OP_ANY_U: return try_match_ANY_U(state, node, text_pos); case RE_OP_ANY_U_REV: return try_match_ANY_U_REV(state, node, text_pos); case RE_OP_CHARACTER: return try_match_CHARACTER(state, node, text_pos); case RE_OP_CHARACTER_IGN: return try_match_CHARACTER_IGN(state, node, text_pos); case RE_OP_CHARACTER_IGN_REV: return try_match_CHARACTER_IGN_REV(state, node, text_pos); case RE_OP_CHARACTER_REV: return try_match_CHARACTER_REV(state, node, text_pos); case RE_OP_PROPERTY: return try_match_PROPERTY(state, node, text_pos); case RE_OP_PROPERTY_IGN: return try_match_PROPERTY_IGN(state, node, text_pos); case RE_OP_PROPERTY_IGN_REV: return try_match_PROPERTY_IGN_REV(state, node, text_pos); case RE_OP_PROPERTY_REV: return try_match_PROPERTY_REV(state, node, text_pos); case RE_OP_RANGE: return try_match_RANGE(state, node, text_pos); case RE_OP_RANGE_IGN: return try_match_RANGE_IGN(state, node, text_pos); case RE_OP_RANGE_IGN_REV: return try_match_RANGE_IGN_REV(state, node, text_pos); case RE_OP_RANGE_REV: return try_match_RANGE_REV(state, node, text_pos); case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: return try_match_SET(state, node, text_pos); case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: return try_match_SET_IGN(state, node, text_pos); case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: return try_match_SET_IGN_REV(state, node, text_pos); case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: return try_match_SET_REV(state, node, text_pos); } return FALSE; } /* Tests whether 2 nodes contains the same values. */ Py_LOCAL_INLINE(BOOL) same_values(RE_Node* node_1, RE_Node* node_2) { size_t i; if (node_1->value_count != node_2->value_count) return FALSE; for (i = 0; i < node_1->value_count; i++) { if (node_1->values[i] != node_2->values[i]) return FALSE; } return TRUE; } /* Tests whether 2 nodes are equivalent (both string-like in the same way). */ Py_LOCAL_INLINE(BOOL) equivalent_nodes(RE_Node* node_1, RE_Node* node_2) { switch (node_1->op) { case RE_OP_CHARACTER: case RE_OP_STRING: switch (node_2->op) { case RE_OP_CHARACTER: case RE_OP_STRING: return same_values(node_1, node_2); } break; case RE_OP_CHARACTER_IGN: case RE_OP_STRING_IGN: switch (node_2->op) { case RE_OP_CHARACTER_IGN: case RE_OP_STRING_IGN: return same_values(node_1, node_2); } break; case RE_OP_CHARACTER_IGN_REV: case RE_OP_STRING_IGN_REV: switch (node_2->op) { case RE_OP_CHARACTER_IGN_REV: case RE_OP_STRING_IGN_REV: return same_values(node_1, node_2); } break; case RE_OP_CHARACTER_REV: case RE_OP_STRING_REV: switch (node_2->op) { case RE_OP_CHARACTER_REV: case RE_OP_STRING_REV: return same_values(node_1, node_2); } break; } return FALSE; } /* Prunes the backtracking. */ Py_LOCAL_INLINE(void) prune_backtracking(RE_State* state) { RE_AtomicBlock* current; current = state->current_atomic_block; if (current && current->count > 0) { /* In an atomic group or a lookaround. */ RE_AtomicData* atomic; /* Discard any backtracking info from inside the atomic group or * lookaround. */ atomic = ¤t->items[current->count - 1]; state->current_backtrack_block = atomic->current_backtrack_block; state->current_backtrack_block->count = atomic->backtrack_count; } else { /* In the outermost pattern. */ while (state->current_backtrack_block->previous) state->current_backtrack_block = state->current_backtrack_block->previous; /* Keep the bottom FAILURE on the backtracking stack. */ state->current_backtrack_block->count = 1; } } /* Saves the match as the best POSIX match (leftmost longest) found so far. */ Py_LOCAL_INLINE(BOOL) save_best_match(RE_SafeState* safe_state) { RE_State* state; size_t group_count; size_t g; state = safe_state->re_state; state->best_match_pos = state->match_pos; state->best_text_pos = state->text_pos; state->found_match = TRUE; memmove(state->best_fuzzy_counts, state->total_fuzzy_counts, sizeof(state->total_fuzzy_counts)); group_count = state->pattern->true_group_count; if (group_count == 0) return TRUE; acquire_GIL(safe_state); if (!state->best_match_groups) { /* Allocate storage for the groups of the best match. */ state->best_match_groups = (RE_GroupData*)re_alloc(group_count * sizeof(RE_GroupData)); if (!state->best_match_groups) goto error; memset(state->best_match_groups, 0, group_count * sizeof(RE_GroupData)); for (g = 0; g < group_count; g++) { RE_GroupData* best; RE_GroupData* group; best = &state->best_match_groups[g]; group = &state->groups[g]; best->capture_capacity = group->capture_capacity; best->captures = (RE_GroupSpan*)re_alloc(best->capture_capacity * sizeof(RE_GroupSpan)); if (!best->captures) goto error; } } /* Copy the group spans and captures. */ for (g = 0; g < group_count; g++) { RE_GroupData* best; RE_GroupData* group; best = &state->best_match_groups[g]; group = &state->groups[g]; best->span = group->span; best->capture_count = group->capture_count; if (best->capture_count < best->capture_capacity) { /* We need more space for the captures. */ re_dealloc(best->captures); best->captures = (RE_GroupSpan*)re_alloc(best->capture_capacity * sizeof(RE_GroupSpan)); if (!best->captures) goto error; } /* Copy the captures for this group. */ memmove(best->captures, group->captures, group->capture_count * sizeof(RE_GroupSpan)); } release_GIL(safe_state); return TRUE; error: release_GIL(safe_state); return FALSE; } /* Restores the best match for a POSIX match (leftmost longest). */ Py_LOCAL_INLINE(void) restore_best_match(RE_SafeState* safe_state) { RE_State* state; size_t group_count; size_t g; state = safe_state->re_state; if (!state->found_match) return; state->match_pos = state->best_match_pos; state->text_pos = state->best_text_pos; memmove(state->total_fuzzy_counts, state->best_fuzzy_counts, sizeof(state->total_fuzzy_counts)); group_count = state->pattern->true_group_count; if (group_count == 0) return; /* Copy the group spans and captures. */ for (g = 0; g < group_count; g++) { RE_GroupData* group; RE_GroupData* best; group = &state->groups[g]; best = &state->best_match_groups[g]; group->span = best->span; group->capture_count = best->capture_count; /* Copy the captures for this group. */ memmove(group->captures, best->captures, best->capture_count * sizeof(RE_GroupSpan)); } } /* Checks whether the new match is better than the current match for a POSIX * match (leftmost longest) and saves it if it is. */ Py_LOCAL_INLINE(BOOL) check_posix_match(RE_SafeState* safe_state) { RE_State* state; Py_ssize_t best_length; Py_ssize_t new_length; state = safe_state->re_state; if (!state->found_match) return save_best_match(safe_state); /* Check the overall match. */ if (state->reverse) { /* We're searching backwards. */ best_length = state->match_pos - state->best_text_pos; new_length = state->match_pos - state->text_pos; } else { /* We're searching forwards. */ best_length = state->best_text_pos - state->match_pos; new_length = state->text_pos - state->match_pos; } if (new_length > best_length) /* It's a longer match. */ return save_best_match(safe_state); return TRUE; } /* Performs a depth-first match or search from the context. */ Py_LOCAL_INLINE(int) basic_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; PatternObject* pattern; RE_Node* start_node; RE_NextNode start_pair; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); Py_ssize_t pattern_step; /* The overall step of the pattern (forwards or backwards). */ Py_ssize_t string_pos; BOOL do_search_start; Py_ssize_t found_pos; int status; RE_Node* node; int folded_pos; int gfolded_pos; TRACE(("<>\n")) state = safe_state->re_state; encoding = state->encoding; locale_info = state->locale_info; pattern = state->pattern; start_node = pattern->start_node; /* Look beyond any initial group node. */ start_pair.node = start_node; start_pair.test = pattern->start_test; /* Is the pattern anchored to the start or end of the string? */ switch (start_pair.test->op) { case RE_OP_END_OF_STRING: if (state->reverse) { /* Searching backwards. */ if (state->text_pos != state->text_length) return RE_ERROR_FAILURE; /* Don't bother to search further because it's anchored. */ search = FALSE; } break; case RE_OP_START_OF_STRING: if (!state->reverse) { /* Searching forwards. */ if (state->text_pos != 0) return RE_ERROR_FAILURE; /* Don't bother to search further because it's anchored. */ search = FALSE; } break; } char_at = state->char_at; pattern_step = state->reverse ? -1 : 1; string_pos = -1; do_search_start = pattern->do_search_start; state->fewest_errors = state->max_errors; if (do_search_start && pattern->req_string && equivalent_nodes(start_pair.test, pattern->req_string)) do_search_start = FALSE; /* Add a backtrack entry for failure. */ if (!add_backtrack(safe_state, RE_OP_FAILURE)) return RE_ERROR_BACKTRACKING; start_match: /* If we're searching, advance along the string until there could be a * match. */ if (pattern->pattern_call_ref >= 0) { RE_GuardList* guard_list; guard_list = &state->group_call_guard_list[pattern->pattern_call_ref]; guard_list->count = 0; guard_list->last_text_pos = -1; } /* Locate the required string, if there's one, unless this is a recursive * call of 'basic_match'. */ if (!pattern->req_string) found_pos = state->text_pos; else { found_pos = locate_required_string(safe_state, search); if (found_pos < 0) return RE_ERROR_FAILURE; } if (search) { state->text_pos = found_pos; if (do_search_start) { RE_Position new_position; next_match_1: /* 'search_start' will clear 'do_search_start' if it can't perform * a fast search for the next possible match. This enables us to * avoid the overhead of the call subsequently. */ status = search_start(safe_state, &start_pair, &new_position, 0); if (status == RE_ERROR_PARTIAL) { state->match_pos = state->text_pos; return status; } else if (status != RE_ERROR_SUCCESS) return status; node = new_position.node; state->text_pos = new_position.text_pos; if (node->op == RE_OP_SUCCESS) { /* Must the match advance past its start? */ if (state->text_pos != state->search_anchor || !state->must_advance) return RE_ERROR_SUCCESS; state->text_pos = state->match_pos + pattern_step; goto next_match_1; } /* 'do_search_start' may have been cleared. */ do_search_start = pattern->do_search_start; } else { /* Avoiding 'search_start', which we've found can't perform a fast * search for the next possible match. */ node = start_node; next_match_2: if (state->reverse) { if (state->text_pos < state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } } else { if (state->text_pos > state->slice_end) { if (state-> partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } } state->match_pos = state->text_pos; if (node->op == RE_OP_SUCCESS) { /* Must the match advance past its start? */ if (state->text_pos != state->search_anchor || !state->must_advance) { BOOL success; if (state->match_all) { /* We want to match all of the slice. */ if (state->reverse) success = state->text_pos == state->slice_start; else success = state->text_pos == state->slice_end; } else success = TRUE; if (success) return RE_ERROR_SUCCESS; } state->text_pos = state->match_pos + pattern_step; goto next_match_2; } } } else { /* The start position is anchored to the current position. */ if (found_pos != state->text_pos) return RE_ERROR_FAILURE; node = start_node; } advance: /* The main matching loop. */ for (;;) { TRACE(("%d|", state->text_pos)) /* Should we abort the matching? */ ++state->iterations; if (state->iterations == 0 && safe_check_signals(safe_state)) return RE_ERROR_INTERRUPTED; switch (node->op) { case RE_OP_ANY: /* Any character except a newline. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { ++state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_ALL: /* Any character at all. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_ALL(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { ++state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_ALL_REV: /* Any character at all, backwards. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_ALL_REV(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { --state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_REV: /* Any character except a newline, backwards. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_REV(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { --state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_U: /* Any character except a line separator. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_U(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { ++state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_U_REV: /* Any character except a line separator, backwards. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_U_REV(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { --state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ATOMIC: /* Start of an atomic group. */ { RE_AtomicData* atomic; TRACE(("%s\n", re_op_text[node->op])) if (!add_backtrack(safe_state, RE_OP_ATOMIC)) return RE_ERROR_BACKTRACKING; state->backtrack->atomic.too_few_errors = state->too_few_errors; state->backtrack->atomic.capture_change = state->capture_change; atomic = push_atomic(safe_state); if (!atomic) return RE_ERROR_MEMORY; atomic->backtrack_count = state->current_backtrack_block->count; atomic->current_backtrack_block = state->current_backtrack_block; atomic->is_lookaround = FALSE; atomic->has_groups = (node->status & RE_STATUS_HAS_GROUPS) != 0; atomic->has_repeats = (node->status & RE_STATUS_HAS_REPEATS) != 0; /* Save the groups and repeats. */ if (atomic->has_groups && !push_groups(safe_state)) return RE_ERROR_MEMORY; if (atomic->has_repeats && !push_repeats(safe_state)) return RE_ERROR_MEMORY; node = node->next_1.node; break; } case RE_OP_BOUNDARY: /* On a word boundary. */ TRACE(("%s %d\n", re_op_text[node->op], node->match)) status = try_match_BOUNDARY(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_BRANCH: /* 2-way branch. */ { RE_Position next_position; TRACE(("%s\n", re_op_text[node->op])) status = try_match(state, &node->next_1, state->text_pos, &next_position); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { if (!add_backtrack(safe_state, RE_OP_BRANCH)) return RE_ERROR_BACKTRACKING; state->backtrack->branch.position.node = node->nonstring.next_2.node; state->backtrack->branch.position.text_pos = state->text_pos; node = next_position.node; state->text_pos = next_position.text_pos; } else node = node->nonstring.next_2.node; break; } case RE_OP_CALL_REF: /* A group call reference. */ { TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) if (!push_group_return(safe_state, NULL)) return RE_ERROR_MEMORY; if (!add_backtrack(safe_state, RE_OP_CALL_REF)) return RE_ERROR_BACKTRACKING; node = node->next_1.node; break; } case RE_OP_CHARACTER: /* A character. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_CHARACTER(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_CHARACTER_IGN: /* A character, ignoring case. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_CHARACTER_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_CHARACTER_IGN_REV: /* A character, backwards, ignoring case. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_CHARACTER_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_CHARACTER_REV: /* A character, backwards. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_CHARACTER(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_CONDITIONAL: /* Start of a conditional subpattern. */ { RE_AtomicData* conditional; TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (!add_backtrack(safe_state, RE_OP_CONDITIONAL)) return RE_ERROR_BACKTRACKING; state->backtrack->lookaround.too_few_errors = state->too_few_errors; state->backtrack->lookaround.capture_change = state->capture_change; state->backtrack->lookaround.inside = TRUE; state->backtrack->lookaround.node = node; conditional = push_atomic(safe_state); if (!conditional) return RE_ERROR_MEMORY; conditional->backtrack_count = state->current_backtrack_block->count; conditional->current_backtrack_block = state->current_backtrack_block; conditional->slice_start = state->slice_start; conditional->slice_end = state->slice_end; conditional->text_pos = state->text_pos; conditional->node = node; conditional->backtrack = state->backtrack; conditional->is_lookaround = TRUE; conditional->has_groups = (node->status & RE_STATUS_HAS_GROUPS) != 0; conditional->has_repeats = (node->status & RE_STATUS_HAS_REPEATS) != 0; /* Save the groups and repeats. */ if (conditional->has_groups && !push_groups(safe_state)) return RE_ERROR_MEMORY; if (conditional->has_repeats && !push_repeats(safe_state)) return RE_ERROR_MEMORY; conditional->saved_groups = state->current_saved_groups; conditional->saved_repeats = state->current_saved_repeats; state->slice_start = 0; state->slice_end = state->text_length; node = node->next_1.node; break; } case RE_OP_DEFAULT_BOUNDARY: /* On a default word boundary. */ TRACE(("%s %d\n", re_op_text[node->op], node->match)) status = try_match_DEFAULT_BOUNDARY(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_DEFAULT_END_OF_WORD: /* At the default end of a word. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_DEFAULT_END_OF_WORD(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_DEFAULT_START_OF_WORD: /* At the default start of a word. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_DEFAULT_START_OF_WORD(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_ATOMIC: /* End of an atomic group. */ { RE_AtomicData* atomic; /* Discard any backtracking info from inside the atomic group. */ atomic = top_atomic(safe_state); state->current_backtrack_block = atomic->current_backtrack_block; state->current_backtrack_block->count = atomic->backtrack_count; node = node->next_1.node; break; } case RE_OP_END_CONDITIONAL: /* End of a conditional subpattern. */ { RE_AtomicData* conditional; conditional = pop_atomic(safe_state); while (!conditional->is_lookaround) { if (conditional->has_repeats) drop_repeats(state); if (conditional->has_groups) drop_groups(state); conditional = pop_atomic(safe_state); } state->text_pos = conditional->text_pos; state->slice_end = conditional->slice_end; state->slice_start = conditional->slice_start; /* Discard any backtracking info from inside the lookaround. */ state->current_backtrack_block = conditional->current_backtrack_block; state->current_backtrack_block->count = conditional->backtrack_count; state->current_saved_groups = conditional->saved_groups; state->current_saved_repeats = conditional->saved_repeats; /* It's a positive lookaround that's succeeded. We're now going to * leave the lookaround. */ conditional->backtrack->lookaround.inside = FALSE; if (conditional->node->match) { /* It's a positive lookaround that's succeeded. * * Go to the 'true' branch. */ node = node->next_1.node; } else { /* It's a negative lookaround that's succeeded. * * Go to the 'false' branch. */ node = node->nonstring.next_2.node; } break; } case RE_OP_END_FUZZY: /* End of fuzzy matching. */ TRACE(("%s\n", re_op_text[node->op])) if (!fuzzy_insert(safe_state, state->text_pos, node)) return RE_ERROR_BACKTRACKING; /* If there were too few errors, in the fuzzy section, try again. */ if (state->too_few_errors) { state->too_few_errors = FALSE; goto backtrack; } state->total_fuzzy_counts[RE_FUZZY_SUB] += state->fuzzy_info.counts[RE_FUZZY_SUB]; state->total_fuzzy_counts[RE_FUZZY_INS] += state->fuzzy_info.counts[RE_FUZZY_INS]; state->total_fuzzy_counts[RE_FUZZY_DEL] += state->fuzzy_info.counts[RE_FUZZY_DEL]; node = node->next_1.node; break; case RE_OP_END_GREEDY_REPEAT: /* End of a greedy repeat. */ { RE_CODE index; RE_RepeatData* rp_data; BOOL changed; BOOL try_body; int body_status; RE_Position next_body_position; BOOL try_tail; int tail_status; RE_Position next_tail_position; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; /* The body has matched successfully at this position. */ if (!guard_repeat(safe_state, index, rp_data->start, RE_STATUS_BODY, FALSE)) return RE_ERROR_MEMORY; ++rp_data->count; /* Have we advanced through the text or has a capture group change? */ changed = rp_data->capture_change != state->capture_change || state->text_pos != rp_data->start; /* The counts are of type size_t, so the format needs to specify * that. */ TRACE(("min is %" PY_FORMAT_SIZE_T "u, max is %" PY_FORMAT_SIZE_T "u, count is %" PY_FORMAT_SIZE_T "u\n", node->values[1], node->values[2], rp_data->count)) /* Could the body or tail match? */ try_body = changed && (rp_data->count < node->values[2] || ~node->values[2] == 0) && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY); if (try_body) { body_status = try_match(state, &node->next_1, state->text_pos, &next_body_position); if (body_status < 0) return body_status; if (body_status == RE_ERROR_FAILURE) try_body = FALSE; } else body_status = RE_ERROR_FAILURE; try_tail = (!changed || rp_data->count >= node->values[1]) && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_TAIL); if (try_tail) { tail_status = try_match(state, &node->nonstring.next_2, state->text_pos, &next_tail_position); if (tail_status < 0) return tail_status; if (tail_status == RE_ERROR_FAILURE) try_tail = FALSE; } else tail_status = RE_ERROR_FAILURE; if (!try_body && !try_tail) { /* Neither the body nor the tail could match. */ --rp_data->count; goto backtrack; } if (body_status < 0 || (body_status == 0 && tail_status < 0)) return RE_ERROR_PARTIAL; /* Record info in case we backtrack into the body. */ if (!add_backtrack(safe_state, RE_OP_BODY_END)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count - 1; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; if (try_body) { /* Both the body and the tail could match. */ if (try_tail) { /* The body takes precedence. If the body fails to match * then we want to try the tail before backtracking * further. */ /* Record backtracking info for matching the tail. */ if (!add_backtrack(safe_state, RE_OP_MATCH_TAIL)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position = next_tail_position; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; } /* Record backtracking info in case the body fails to match. */ if (!add_backtrack(safe_state, RE_OP_BODY_START)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.text_pos = state->text_pos; rp_data->capture_change = state->capture_change; rp_data->start = state->text_pos; /* Advance into the body. */ node = next_body_position.node; state->text_pos = next_body_position.text_pos; } else { /* Only the tail could match. */ /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } break; } case RE_OP_END_GROUP: /* End of a capture group. */ { RE_CODE private_index; RE_CODE public_index; RE_GroupData* group; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[1])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). */ private_index = node->values[0]; public_index = node->values[1]; group = &state->groups[private_index - 1]; if (!add_backtrack(safe_state, RE_OP_END_GROUP)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->group.private_index = private_index; bt_data->group.public_index = public_index; bt_data->group.text_pos = group->span.end; bt_data->group.capture = (BOOL)node->values[2]; bt_data->group.current_capture = group->current_capture; if (pattern->group_info[private_index - 1].referenced && group->span.end != state->text_pos) ++state->capture_change; group->span.end = state->text_pos; /* Save the capture? */ if (node->values[2]) { group->current_capture = (Py_ssize_t)group->capture_count; if (!save_capture(safe_state, private_index, public_index)) return RE_ERROR_MEMORY; } node = node->next_1.node; break; } case RE_OP_END_LAZY_REPEAT: /* End of a lazy repeat. */ { RE_CODE index; RE_RepeatData* rp_data; BOOL changed; BOOL try_body; int body_status; RE_Position next_body_position; BOOL try_tail; int tail_status; RE_Position next_tail_position; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; /* The body has matched successfully at this position. */ if (!guard_repeat(safe_state, index, rp_data->start, RE_STATUS_BODY, FALSE)) return RE_ERROR_MEMORY; ++rp_data->count; /* Have we advanced through the text or has a capture group change? */ changed = rp_data->capture_change != state->capture_change || state->text_pos != rp_data->start; /* The counts are of type size_t, so the format needs to specify * that. */ TRACE(("min is %" PY_FORMAT_SIZE_T "u, max is %" PY_FORMAT_SIZE_T "u, count is %" PY_FORMAT_SIZE_T "u\n", node->values[1], node->values[2], rp_data->count)) /* Could the body or tail match? */ try_body = changed && (rp_data->count < node->values[2] || ~node->values[2] == 0) && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY); if (try_body) { body_status = try_match(state, &node->next_1, state->text_pos, &next_body_position); if (body_status < 0) return body_status; if (body_status == RE_ERROR_FAILURE) try_body = FALSE; } else body_status = RE_ERROR_FAILURE; try_tail = (!changed || rp_data->count >= node->values[1]); if (try_tail) { tail_status = try_match(state, &node->nonstring.next_2, state->text_pos, &next_tail_position); if (tail_status < 0) return tail_status; if (tail_status == RE_ERROR_FAILURE) try_tail = FALSE; } else tail_status = RE_ERROR_FAILURE; if (!try_body && !try_tail) { /* Neither the body nor the tail could match. */ --rp_data->count; goto backtrack; } if (body_status < 0 || (body_status == 0 && tail_status < 0)) return RE_ERROR_PARTIAL; /* Record info in case we backtrack into the body. */ if (!add_backtrack(safe_state, RE_OP_BODY_END)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count - 1; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; if (try_body) { /* Both the body and the tail could match. */ if (try_tail) { /* The tail takes precedence. If the tail fails to match * then we want to try the body before backtracking * further. */ /* Record backtracking info for matching the body. */ if (!add_backtrack(safe_state, RE_OP_MATCH_BODY)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position = next_body_position; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } else { /* Only the body could match. */ /* Record backtracking info in case the body fails to * match. */ if (!add_backtrack(safe_state, RE_OP_BODY_START)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.text_pos = state->text_pos; rp_data->capture_change = state->capture_change; rp_data->start = state->text_pos; /* Advance into the body. */ node = next_body_position.node; state->text_pos = next_body_position.text_pos; } } else { /* Only the tail could match. */ /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } break; } case RE_OP_END_LOOKAROUND: /* End of a lookaround subpattern. */ { RE_AtomicData* lookaround; lookaround = pop_atomic(safe_state); while (!lookaround->is_lookaround) { if (lookaround->has_repeats) drop_repeats(state); if (lookaround->has_groups) drop_groups(state); lookaround = pop_atomic(safe_state); } state->text_pos = lookaround->text_pos; state->slice_end = lookaround->slice_end; state->slice_start = lookaround->slice_start; /* Discard any backtracking info from inside the lookaround. */ state->current_backtrack_block = lookaround->current_backtrack_block; state->current_backtrack_block->count = lookaround->backtrack_count; state->current_saved_groups = lookaround->saved_groups; state->current_saved_repeats = lookaround->saved_repeats; if (lookaround->node->match) { /* It's a positive lookaround that's succeeded. We're now going * to leave the lookaround. */ lookaround->backtrack->lookaround.inside = FALSE; node = node->next_1.node; } else { /* It's a negative lookaround that's succeeded. The groups and * certain flags may have changed. We need to restore them and * then backtrack. */ if (lookaround->has_repeats) pop_repeats(state); if (lookaround->has_groups) pop_groups(state); state->too_few_errors = lookaround->backtrack->lookaround.too_few_errors; state->capture_change = lookaround->backtrack->lookaround.capture_change; discard_backtrack(state); goto backtrack; } break; } case RE_OP_END_OF_LINE: /* At the end of a line. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_LINE(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_LINE_U: /* At the end of a line. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_LINE_U(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_STRING: /* At the end of the string. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_STRING(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_STRING_LINE: /* At end of string or final newline. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_STRING_LINE(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_STRING_LINE_U: /* At end of string or final newline. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_STRING_LINE_U(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_WORD: /* At the end of a word. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_WORD(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_FAILURE: /* Failure. */ goto backtrack; case RE_OP_FUZZY: /* Fuzzy matching. */ { RE_FuzzyInfo* fuzzy_info; RE_BacktrackData* bt_data; TRACE(("%s\n", re_op_text[node->op])) fuzzy_info = &state->fuzzy_info; /* Save the current fuzzy info. */ if (!add_backtrack(safe_state, RE_OP_FUZZY)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; memmove(&bt_data->fuzzy.fuzzy_info, fuzzy_info, sizeof(RE_FuzzyInfo)); bt_data->fuzzy.index = node->values[0]; bt_data->fuzzy.text_pos = state->text_pos; /* Initialise the new fuzzy info. */ memset(fuzzy_info->counts, 0, 4 * sizeof(fuzzy_info->counts[0])); fuzzy_info->total_cost = 0; fuzzy_info->node = node; node = node->next_1.node; break; } case RE_OP_GRAPHEME_BOUNDARY: /* On a grapheme boundary. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_GRAPHEME_BOUNDARY(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_GREEDY_REPEAT: /* Greedy repeat. */ { RE_CODE index; RE_RepeatData* rp_data; RE_BacktrackData* bt_data; BOOL try_body; int body_status; RE_Position next_body_position; BOOL try_tail; int tail_status; RE_Position next_tail_position; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; /* We might need to backtrack into the head, so save the current * repeat. */ if (!add_backtrack(safe_state, RE_OP_GREEDY_REPEAT)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; /* Initialise the new repeat. */ rp_data->count = 0; rp_data->start = state->text_pos; rp_data->capture_change = state->capture_change; /* Could the body or tail match? */ try_body = node->values[2] > 0 && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY); if (try_body) { body_status = try_match(state, &node->next_1, state->text_pos, &next_body_position); if (body_status < 0) return body_status; if (body_status == RE_ERROR_FAILURE) try_body = FALSE; } else body_status = RE_ERROR_FAILURE; try_tail = node->values[1] == 0; if (try_tail) { tail_status = try_match(state, &node->nonstring.next_2, state->text_pos, &next_tail_position); if (tail_status < 0) return tail_status; if (tail_status == RE_ERROR_FAILURE) try_tail = FALSE; } else tail_status = RE_ERROR_FAILURE; if (!try_body && !try_tail) /* Neither the body nor the tail could match. */ goto backtrack; if (body_status < 0 || (body_status == 0 && tail_status < 0)) return RE_ERROR_PARTIAL; if (try_body) { if (try_tail) { /* Both the body and the tail could match, but the body * takes precedence. If the body fails to match then we * want to try the tail before backtracking further. */ /* Record backtracking info for matching the tail. */ if (!add_backtrack(safe_state, RE_OP_MATCH_TAIL)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position = next_tail_position; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; } /* Advance into the body. */ node = next_body_position.node; state->text_pos = next_body_position.text_pos; } else { /* Only the tail could match. */ /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } break; } case RE_OP_GREEDY_REPEAT_ONE: /* Greedy repeat for one character. */ { RE_CODE index; RE_RepeatData* rp_data; size_t count; BOOL is_partial; BOOL match; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; if (is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY)) goto backtrack; /* Count how many times the character repeats, up to the maximum. */ count = count_one(state, node->nonstring.next_2.node, state->text_pos, node->values[2], &is_partial); if (is_partial) { state->text_pos += (Py_ssize_t)count * node->step; return RE_ERROR_PARTIAL; } /* Unmatch until it's not guarded. */ match = FALSE; for (;;) { if (count < node->values[1]) /* The number of repeats is below the minimum. */ break; if (!is_repeat_guarded(safe_state, index, state->text_pos + (Py_ssize_t)count * node->step, RE_STATUS_TAIL)) { /* It's not guarded at this position. */ match = TRUE; break; } if (count == 0) break; --count; } if (!match) { /* The repeat has failed to match at this position. */ if (!guard_repeat(safe_state, index, state->text_pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; goto backtrack; } if (count > node->values[1]) { /* Record the backtracking info. */ if (!add_backtrack(safe_state, RE_OP_GREEDY_REPEAT_ONE)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position.node = node; bt_data->repeat.index = index; bt_data->repeat.text_pos = rp_data->start; bt_data->repeat.count = rp_data->count; rp_data->start = state->text_pos; rp_data->count = count; } /* Advance into the tail. */ state->text_pos += (Py_ssize_t)count * node->step; node = node->next_1.node; break; } case RE_OP_GROUP_CALL: /* Group call. */ { size_t index; size_t g; size_t r; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) index = node->values[0]; /* Save the capture groups and repeat guards. */ if (!push_group_return(safe_state, node->next_1.node)) return RE_ERROR_MEMORY; /* Clear the capture groups for the group call. They'll be restored * on return. */ for (g = 0; g < state->pattern->true_group_count; g++) { RE_GroupData* group; group = &state->groups[g]; group->span.start = -1; group->span.end = -1; group->current_capture = -1; } /* Clear the repeat guards for the group call. They'll be restored * on return. */ for (r = 0; r < state->pattern->repeat_count; r++) { RE_RepeatData* repeat; repeat = &state->repeats[r]; repeat->body_guard_list.count = 0; repeat->body_guard_list.last_text_pos = -1; repeat->tail_guard_list.count = 0; repeat->tail_guard_list.last_text_pos = -1; } /* Call a group, skipping its CALL_REF node. */ node = pattern->call_ref_info[index].node->next_1.node; if (!add_backtrack(safe_state, RE_OP_GROUP_CALL)) return RE_ERROR_BACKTRACKING; break; } case RE_OP_GROUP_EXISTS: /* Capture group exists. */ { TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. * * A group index of 0, however, means that it's a DEFINE, which we * should skip. */ if (node->values[0] == 0) /* Skip past the body. */ node = node->nonstring.next_2.node; else { RE_GroupData* group; group = &state->groups[node->values[0] - 1]; if (group->current_capture >= 0) /* The 'true' branch. */ node = node->next_1.node; else /* The 'false' branch. */ node = node->nonstring.next_2.node; } break; } case RE_OP_GROUP_RETURN: /* Group return. */ { RE_Node* return_node; RE_BacktrackData* bt_data; TRACE(("%s\n", re_op_text[node->op])) return_node = top_group_return(state); if (!add_backtrack(safe_state, RE_OP_GROUP_RETURN)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->group_call.node = return_node; bt_data->group_call.capture_change = state->capture_change; if (return_node) { /* The group was called. */ node = return_node; /* Save the groups. */ if (!push_groups(safe_state)) return RE_ERROR_MEMORY; /* Save the repeats. */ if (!push_repeats(safe_state)) return RE_ERROR_MEMORY; } else /* The group was not called. */ node = node->next_1.node; pop_group_return(state); break; } case RE_OP_KEEP: /* Keep. */ { RE_BacktrackData* bt_data; TRACE(("%s\n", re_op_text[node->op])) if (!add_backtrack(safe_state, RE_OP_KEEP)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->keep.match_pos = state->match_pos; state->match_pos = state->text_pos; node = node->next_1.node; break; } case RE_OP_LAZY_REPEAT: /* Lazy repeat. */ { RE_CODE index; RE_RepeatData* rp_data; RE_BacktrackData* bt_data; BOOL try_body; int body_status; RE_Position next_body_position; BOOL try_tail; int tail_status; RE_Position next_tail_position; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; /* We might need to backtrack into the head, so save the current * repeat. */ if (!add_backtrack(safe_state, RE_OP_LAZY_REPEAT)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; /* Initialise the new repeat. */ rp_data->count = 0; rp_data->start = state->text_pos; rp_data->capture_change = state->capture_change; /* Could the body or tail match? */ try_body = node->values[2] > 0 && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY); if (try_body) { body_status = try_match(state, &node->next_1, state->text_pos, &next_body_position); if (body_status < 0) return body_status; if (body_status == RE_ERROR_FAILURE) try_body = FALSE; } else body_status = RE_ERROR_FAILURE; try_tail = node->values[1] == 0; if (try_tail) { tail_status = try_match(state, &node->nonstring.next_2, state->text_pos, &next_tail_position); if (tail_status < 0) return tail_status; if (tail_status == RE_ERROR_FAILURE) try_tail = FALSE; } else tail_status = RE_ERROR_FAILURE; if (!try_body && !try_tail) /* Neither the body nor the tail could match. */ goto backtrack; if (body_status < 0 || (body_status == 0 && tail_status < 0)) return RE_ERROR_PARTIAL; if (try_body) { if (try_tail) { /* Both the body and the tail could match, but the tail * takes precedence. If the tail fails to match then we * want to try the body before backtracking further. */ /* Record backtracking info for matching the tail. */ if (!add_backtrack(safe_state, RE_OP_MATCH_BODY)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position = next_body_position; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } else { /* Advance into the body. */ node = next_body_position.node; state->text_pos = next_body_position.text_pos; } } else { /* Only the tail could match. */ /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } break; } case RE_OP_LAZY_REPEAT_ONE: /* Lazy repeat for one character. */ { RE_CODE index; RE_RepeatData* rp_data; size_t count; BOOL is_partial; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; if (is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY)) goto backtrack; /* Count how many times the character repeats, up to the minimum. */ count = count_one(state, node->nonstring.next_2.node, state->text_pos, node->values[1], &is_partial); if (is_partial) { state->text_pos += (Py_ssize_t)count * node->step; return RE_ERROR_PARTIAL; } /* Have we matched at least the minimum? */ if (count < node->values[1]) { /* The repeat has failed to match at this position. */ if (!guard_repeat(safe_state, index, state->text_pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; goto backtrack; } if (count < node->values[2]) { /* The match is shorter than the maximum, so we might need to * backtrack the repeat to consume more. */ RE_BacktrackData* bt_data; /* Get the offset to the repeat values in the context. */ rp_data = &state->repeats[index]; if (!add_backtrack(safe_state, RE_OP_LAZY_REPEAT_ONE)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position.node = node; bt_data->repeat.index = index; bt_data->repeat.text_pos = rp_data->start; bt_data->repeat.count = rp_data->count; rp_data->start = state->text_pos; rp_data->count = count; } /* Advance into the tail. */ state->text_pos += (Py_ssize_t)count * node->step; node = node->next_1.node; break; } case RE_OP_LOOKAROUND: /* Start of a lookaround subpattern. */ { RE_AtomicData* lookaround; TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (!add_backtrack(safe_state, RE_OP_LOOKAROUND)) return RE_ERROR_BACKTRACKING; state->backtrack->lookaround.too_few_errors = state->too_few_errors; state->backtrack->lookaround.capture_change = state->capture_change; state->backtrack->lookaround.inside = TRUE; state->backtrack->lookaround.node = node; lookaround = push_atomic(safe_state); if (!lookaround) return RE_ERROR_MEMORY; lookaround->backtrack_count = state->current_backtrack_block->count; lookaround->current_backtrack_block = state->current_backtrack_block; lookaround->slice_start = state->slice_start; lookaround->slice_end = state->slice_end; lookaround->text_pos = state->text_pos; lookaround->node = node; lookaround->backtrack = state->backtrack; lookaround->is_lookaround = TRUE; lookaround->has_groups = (node->status & RE_STATUS_HAS_GROUPS) != 0; lookaround->has_repeats = (node->status & RE_STATUS_HAS_REPEATS) != 0; /* Save the groups and repeats. */ if (lookaround->has_groups && !push_groups(safe_state)) return RE_ERROR_MEMORY; if (lookaround->has_repeats && !push_repeats(safe_state)) return RE_ERROR_MEMORY; lookaround->saved_groups = state->current_saved_groups; lookaround->saved_repeats = state->current_saved_repeats; state->slice_start = 0; state->slice_end = state->text_length; node = node->next_1.node; break; } case RE_OP_PROPERTY: /* A property. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_PROPERTY(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_PROPERTY_IGN: /* A property, ignoring case. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_PROPERTY_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_PROPERTY_IGN_REV: /* A property, backwards, ignoring case. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_PROPERTY_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_PROPERTY_REV: /* A property, backwards. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_PROPERTY(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_PRUNE: /* Prune the backtracking. */ TRACE(("%s\n", re_op_text[node->op])) prune_backtracking(state); node = node->next_1.node; break; case RE_OP_RANGE: /* A range. */ TRACE(("%s %d %d %d\n", re_op_text[node->op], node->match, node->values[0], node->values[1])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_RANGE(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_RANGE_IGN: /* A range, ignoring case. */ TRACE(("%s %d %d %d\n", re_op_text[node->op], node->match, node->values[0], node->values[1])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_RANGE_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_RANGE_IGN_REV: /* A range, backwards, ignoring case. */ TRACE(("%s %d %d %d\n", re_op_text[node->op], node->match, node->values[0], node->values[1])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_RANGE_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_RANGE_REV: /* A range, backwards. */ TRACE(("%s %d %d %d\n", re_op_text[node->op], node->match, node->values[0], node->values[1])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_RANGE(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_REF_GROUP: /* Reference to a capture group. */ { RE_GroupData* group; RE_GroupSpan* span; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; if (string_pos < 0) string_pos = span->start; /* Try comparing. */ while (string_pos < span->end) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && same_char(char_at(state->text, state->text_pos), char_at(state->text, string_pos))) { ++string_pos; ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_FLD: /* Reference to a capture group, ignoring case. */ { RE_GroupData* group; RE_GroupSpan* span; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); int folded_len; int gfolded_len; Py_UCS4 folded[RE_MAX_FOLDED]; Py_UCS4 gfolded[RE_MAX_FOLDED]; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; full_case_fold = encoding->full_case_fold; if (string_pos < 0) { string_pos = span->start; folded_pos = 0; folded_len = 0; gfolded_pos = 0; gfolded_len = 0; } else { folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos), folded); gfolded_len = full_case_fold(locale_info, char_at(state->text, string_pos), gfolded); } /* Try comparing. */ while (string_pos < span->end) { /* Case-fold at current position in text. */ if (folded_pos >= folded_len) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end) folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos), folded); else folded_len = 0; folded_pos = 0; } /* Case-fold at current position in group. */ if (gfolded_pos >= gfolded_len) { gfolded_len = full_case_fold(locale_info, char_at(state->text, string_pos), gfolded); gfolded_pos = 0; } if (folded_pos < folded_len && folded[folded_pos] == gfolded[gfolded_pos]) { ++folded_pos; ++gfolded_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_group_fld(safe_state, search, &state->text_pos, node, &folded_pos, folded_len, &string_pos, &gfolded_pos, gfolded_len, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } if (folded_pos >= folded_len && folded_len > 0) ++state->text_pos; if (gfolded_pos >= gfolded_len) ++string_pos; } string_pos = -1; if (folded_pos < folded_len || gfolded_pos < gfolded_len) goto backtrack; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_FLD_REV: /* Reference to a capture group, ignoring case. */ { RE_GroupData* group; RE_GroupSpan* span; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); int folded_len; int gfolded_len; Py_UCS4 folded[RE_MAX_FOLDED]; Py_UCS4 gfolded[RE_MAX_FOLDED]; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; full_case_fold = encoding->full_case_fold; if (string_pos < 0) { string_pos = span->end; folded_pos = 0; folded_len = 0; gfolded_pos = 0; gfolded_len = 0; } else { folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos - 1), folded); gfolded_len = full_case_fold(locale_info, char_at(state->text, string_pos - 1), gfolded); } /* Try comparing. */ while (string_pos > span->start) { /* Case-fold at current position in text. */ if (folded_pos <= 0) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start) folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos - 1), folded); else folded_len = 0; folded_pos = folded_len; } /* Case-fold at current position in group. */ if (gfolded_pos <= 0) { gfolded_len = full_case_fold(locale_info, char_at(state->text, string_pos - 1), gfolded); gfolded_pos = gfolded_len; } if (folded_pos > 0 && folded[folded_pos - 1] == gfolded[gfolded_pos - 1]) { --folded_pos; --gfolded_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_group_fld(safe_state, search, &state->text_pos, node, &folded_pos, folded_len, &string_pos, &gfolded_pos, gfolded_len, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } if (folded_pos <= 0 && folded_len > 0) --state->text_pos; if (gfolded_pos <= 0) --string_pos; } string_pos = -1; if (folded_pos > 0 || gfolded_pos > 0) goto backtrack; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_IGN: /* Reference to a capture group, ignoring case. */ { RE_GroupData* group; RE_GroupSpan* span; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; if (string_pos < 0) string_pos = span->start; /* Try comparing. */ while (string_pos < span->end) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && same_char_ign(encoding, locale_info, char_at(state->text, state->text_pos), char_at(state->text, string_pos))) { ++string_pos; ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_IGN_REV: /* Reference to a capture group, ignoring case. */ { RE_GroupData* group; RE_GroupSpan* span; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; if (string_pos < 0) string_pos = span->end; /* Try comparing. */ while (string_pos > span->start) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && same_char_ign(encoding, locale_info, char_at(state->text, state->text_pos - 1), char_at(state->text, string_pos - 1))) { --string_pos; --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_REV: /* Reference to a capture group. */ { RE_GroupData* group; RE_GroupSpan* span; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; if (string_pos < 0) string_pos = span->end; /* Try comparing. */ while (string_pos > span->start) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && same_char(char_at(state->text, state->text_pos - 1), char_at(state->text, string_pos - 1))) { --string_pos; --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_SEARCH_ANCHOR: /* At the start of the search. */ TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) if (state->text_pos == state->search_anchor) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SET_DIFF: /* Character set. */ case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_SET(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SET_DIFF_IGN: /* Character set, ignoring case. */ case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_SET_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SET_DIFF_IGN_REV: /* Character set, ignoring case. */ case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_SET_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SET_DIFF_REV: /* Character set. */ case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_SET(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SKIP: /* Skip the part of the text already matched. */ TRACE(("%s\n", re_op_text[node->op])) if (node->status & RE_STATUS_REVERSE) state->slice_end = state->text_pos; else state->slice_start = state->text_pos; prune_backtracking(state); node = node->next_1.node; break; case RE_OP_START_GROUP: /* Start of a capture group. */ { RE_CODE private_index; RE_CODE public_index; RE_GroupData* group; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[1])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). */ private_index = node->values[0]; public_index = node->values[1]; group = &state->groups[private_index - 1]; if (!add_backtrack(safe_state, RE_OP_START_GROUP)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->group.private_index = private_index; bt_data->group.public_index = public_index; bt_data->group.text_pos = group->span.start; bt_data->group.capture = (BOOL)node->values[2]; bt_data->group.current_capture = group->current_capture; if (pattern->group_info[private_index - 1].referenced && group->span.start != state->text_pos) ++state->capture_change; group->span.start = state->text_pos; /* Save the capture? */ if (node->values[2]) { group->current_capture = (Py_ssize_t)group->capture_count; if (!save_capture(safe_state, private_index, public_index)) return RE_ERROR_MEMORY; } node = node->next_1.node; break; } case RE_OP_START_OF_LINE: /* At the start of a line. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_START_OF_LINE(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_START_OF_LINE_U: /* At the start of a line. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_START_OF_LINE_U(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_START_OF_STRING: /* At the start of the string. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_START_OF_STRING(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_START_OF_WORD: /* At the start of a word. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_START_OF_WORD(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_STRING: /* A string. */ { Py_ssize_t length; RE_CODE* values; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; if (string_pos < 0) string_pos = 0; values = node->values; /* Try comparing. */ while (string_pos < length) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && same_char(char_at(state->text, state->text_pos), values[string_pos])) { ++string_pos; ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_FLD: /* A string, ignoring case. */ { Py_ssize_t length; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); RE_CODE* values; int folded_len; Py_UCS4 folded[RE_MAX_FOLDED]; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; full_case_fold = encoding->full_case_fold; if (string_pos < 0) { string_pos = 0; folded_pos = 0; folded_len = 0; } else { folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos), folded); if (folded_pos >= folded_len) { if (state->text_pos >= state->slice_end) goto backtrack; ++state->text_pos; folded_pos = 0; folded_len = 0; } } values = node->values; /* Try comparing. */ while (string_pos < length) { if (folded_pos >= folded_len) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end) folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos), folded); else folded_len = 0; folded_pos = 0; } if (folded_pos < folded_len && same_char_ign(encoding, locale_info, folded[folded_pos], values[string_pos])) { ++string_pos; ++folded_pos; if (folded_pos >= folded_len) ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string_fld(safe_state, search, &state->text_pos, node, &string_pos, &folded_pos, folded_len, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } if (folded_pos >= folded_len && folded_len > 0) ++state->text_pos; } else { string_pos = -1; goto backtrack; } } if (node->status & RE_STATUS_FUZZY) { while (folded_pos < folded_len) { BOOL matched; if (!fuzzy_match_string_fld(safe_state, search, &state->text_pos, node, &string_pos, &folded_pos, folded_len, &matched, 1)) return RE_ERROR_BACKTRACKING; if (!matched) { string_pos = -1; goto backtrack; } if (folded_pos >= folded_len && folded_len > 0) ++state->text_pos; } } string_pos = -1; if (folded_pos < folded_len) goto backtrack; } /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_FLD_REV: /* A string, ignoring case. */ { Py_ssize_t length; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); RE_CODE* values; int folded_len; Py_UCS4 folded[RE_MAX_FOLDED]; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; full_case_fold = encoding->full_case_fold; if (string_pos < 0) { string_pos = length; folded_pos = 0; folded_len = 0; } else { folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos - 1), folded); if (folded_pos <= 0) { if (state->text_pos <= state->slice_start) goto backtrack; --state->text_pos; folded_pos = 0; folded_len = 0; } } values = node->values; /* Try comparing. */ while (string_pos > 0) { if (folded_pos <= 0) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start) folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos - 1), folded); else folded_len = 0; folded_pos = folded_len; } if (folded_pos > 0 && same_char_ign(encoding, locale_info, folded[folded_pos - 1], values[string_pos - 1])) { --string_pos; --folded_pos; if (folded_pos <= 0) --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string_fld(safe_state, search, &state->text_pos, node, &string_pos, &folded_pos, folded_len, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } if (folded_pos <= 0 && folded_len > 0) --state->text_pos; } else { string_pos = -1; goto backtrack; } } if (node->status & RE_STATUS_FUZZY) { while (folded_pos > 0) { BOOL matched; if (!fuzzy_match_string_fld(safe_state, search, &state->text_pos, node, &string_pos, &folded_pos, folded_len, &matched, -1)) return RE_ERROR_BACKTRACKING; if (!matched) { string_pos = -1; goto backtrack; } if (folded_pos <= 0 && folded_len > 0) --state->text_pos; } } string_pos = -1; if (folded_pos > 0) goto backtrack; } /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_IGN: /* A string, ignoring case. */ { Py_ssize_t length; RE_CODE* values; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; if (string_pos < 0) string_pos = 0; values = node->values; /* Try comparing. */ while (string_pos < length) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && same_char_ign(encoding, locale_info, char_at(state->text, state->text_pos), values[string_pos])) { ++string_pos; ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_IGN_REV: /* A string, ignoring case. */ { Py_ssize_t length; RE_CODE* values; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; if (string_pos < 0) string_pos = length; values = node->values; /* Try comparing. */ while (string_pos > 0) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && same_char_ign(encoding, locale_info, char_at(state->text, state->text_pos - 1), values[string_pos - 1])) { --string_pos; --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_REV: /* A string. */ { Py_ssize_t length; RE_CODE* values; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; if (string_pos < 0) string_pos = length; values = node->values; /* Try comparing. */ while (string_pos > 0) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && same_char(char_at(state->text, state->text_pos - 1), values[string_pos - 1])) { --string_pos; --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_SET: /* Member of a string set. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_fwdrev(safe_state, node, FALSE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_FLD: /* Member of a string set, ignoring case. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_fld_fwdrev(safe_state, node, FALSE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_FLD_REV: /* Member of a string set, ignoring case. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_fld_fwdrev(safe_state, node, TRUE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_IGN: /* Member of a string set, ignoring case. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_ign_fwdrev(safe_state, node, FALSE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_IGN_REV: /* Member of a string set, ignoring case. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_ign_fwdrev(safe_state, node, TRUE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_REV: /* Member of a string set. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_fwdrev(safe_state, node, TRUE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_SUCCESS: /* Success. */ /* Must the match advance past its start? */ TRACE(("%s\n", re_op_text[node->op])) if (state->text_pos == state->search_anchor && state->must_advance) goto backtrack; if (state->match_all) { /* We want to match all of the slice. */ if (state->reverse) { if (state->text_pos != state->slice_start) goto backtrack; } else { if (state->text_pos != state->slice_end) goto backtrack; } } if (state->pattern->flags & RE_FLAG_POSIX) { /* If we're looking for a POSIX match, check whether this one * is better and then keep looking. */ if (!check_posix_match(safe_state)) return RE_ERROR_MEMORY; goto backtrack; } return RE_ERROR_SUCCESS; default: /* Illegal opcode! */ TRACE(("UNKNOWN OP %d\n", node->op)) return RE_ERROR_ILLEGAL; } } backtrack: for (;;) { RE_BacktrackData* bt_data; TRACE(("BACKTRACK ")) /* Should we abort the matching? */ ++state->iterations; if (state->iterations == 0 && safe_check_signals(safe_state)) return RE_ERROR_INTERRUPTED; bt_data = last_backtrack(state); switch (bt_data->op) { case RE_OP_ANY: /* Any character except a newline. */ case RE_OP_ANY_ALL: /* Any character at all. */ case RE_OP_ANY_ALL_REV: /* Any character at all, backwards. */ case RE_OP_ANY_REV: /* Any character except a newline, backwards. */ case RE_OP_ANY_U: /* Any character except a line separator. */ case RE_OP_ANY_U_REV: /* Any character except a line separator, backwards. */ case RE_OP_CHARACTER: /* A character. */ case RE_OP_CHARACTER_IGN: /* A character, ignoring case. */ case RE_OP_CHARACTER_IGN_REV: /* A character, ignoring case, backwards. */ case RE_OP_CHARACTER_REV: /* A character, backwards. */ case RE_OP_PROPERTY: /* A property. */ case RE_OP_PROPERTY_IGN: /* A property, ignoring case. */ case RE_OP_PROPERTY_IGN_REV: /* A property, ignoring case, backwards. */ case RE_OP_PROPERTY_REV: /* A property, backwards. */ case RE_OP_RANGE: /* A range. */ case RE_OP_RANGE_IGN: /* A range, ignoring case. */ case RE_OP_RANGE_IGN_REV: /* A range, ignoring case, backwards. */ case RE_OP_RANGE_REV: /* A range, backwards. */ case RE_OP_SET_DIFF: /* Set difference. */ case RE_OP_SET_DIFF_IGN: /* Set difference, ignoring case. */ case RE_OP_SET_DIFF_IGN_REV: /* Set difference, ignoring case, backwards. */ case RE_OP_SET_DIFF_REV: /* Set difference, backwards. */ case RE_OP_SET_INTER: /* Set intersection. */ case RE_OP_SET_INTER_IGN: /* Set intersection, ignoring case. */ case RE_OP_SET_INTER_IGN_REV: /* Set intersection, ignoring case, backwards. */ case RE_OP_SET_INTER_REV: /* Set intersection, backwards. */ case RE_OP_SET_SYM_DIFF: /* Set symmetric difference. */ case RE_OP_SET_SYM_DIFF_IGN: /* Set symmetric difference, ignoring case. */ case RE_OP_SET_SYM_DIFF_IGN_REV: /* Set symmetric difference, ignoring case, backwards. */ case RE_OP_SET_SYM_DIFF_REV: /* Set symmetric difference, backwards. */ case RE_OP_SET_UNION: /* Set union. */ case RE_OP_SET_UNION_IGN: /* Set union, ignoring case. */ case RE_OP_SET_UNION_IGN_REV: /* Set union, ignoring case, backwards. */ case RE_OP_SET_UNION_REV: /* Set union, backwards. */ TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_item(safe_state, search, &state->text_pos, &node, TRUE); if (status < 0) return RE_ERROR_PARTIAL; if (node) goto advance; break; case RE_OP_ATOMIC: /* Start of an atomic group. */ { RE_AtomicData* atomic; /* backtrack to the start of an atomic group. */ atomic = pop_atomic(safe_state); if (atomic->has_repeats) pop_repeats(state); if (atomic->has_groups) pop_groups(state); state->too_few_errors = bt_data->atomic.too_few_errors; state->capture_change = bt_data->atomic.capture_change; discard_backtrack(state); break; } case RE_OP_BODY_END: { RE_RepeatData* rp_data; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->repeat.index)) /* We're backtracking into the body. */ rp_data = &state->repeats[bt_data->repeat.index]; /* Restore the repeat info. */ rp_data->count = bt_data->repeat.count; rp_data->start = bt_data->repeat.start; rp_data->capture_change = bt_data->repeat.capture_change; discard_backtrack(state); break; } case RE_OP_BODY_START: { TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->repeat.index)) /* The body may have failed to match at this position. */ if (!guard_repeat(safe_state, bt_data->repeat.index, bt_data->repeat.text_pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; discard_backtrack(state); break; } case RE_OP_BOUNDARY: /* On a word boundary. */ case RE_OP_DEFAULT_BOUNDARY: /* On a default word boundary. */ case RE_OP_DEFAULT_END_OF_WORD: /* At a default end of a word. */ case RE_OP_DEFAULT_START_OF_WORD: /* At a default start of a word. */ case RE_OP_END_OF_LINE: /* At the end of a line. */ case RE_OP_END_OF_LINE_U: /* At the end of a line. */ case RE_OP_END_OF_STRING: /* At the end of the string. */ case RE_OP_END_OF_STRING_LINE: /* At end of string or final newline. */ case RE_OP_END_OF_STRING_LINE_U: /* At end of string or final newline. */ case RE_OP_END_OF_WORD: /* At end of a word. */ case RE_OP_GRAPHEME_BOUNDARY: /* On a grapheme boundary. */ case RE_OP_SEARCH_ANCHOR: /* At the start of the search. */ case RE_OP_START_OF_LINE: /* At the start of a line. */ case RE_OP_START_OF_LINE_U: /* At the start of a line. */ case RE_OP_START_OF_STRING: /* At the start of the string. */ case RE_OP_START_OF_WORD: /* At start of a word. */ TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_item(safe_state, search, &state->text_pos, &node, FALSE); if (status < 0) return RE_ERROR_PARTIAL; if (node) goto advance; break; case RE_OP_BRANCH: /* 2-way branch. */ TRACE(("%s\n", re_op_text[bt_data->op])) node = bt_data->branch.position.node; state->text_pos = bt_data->branch.position.text_pos; discard_backtrack(state); goto advance; case RE_OP_CALL_REF: /* A group call ref. */ case RE_OP_GROUP_CALL: /* Group call. */ TRACE(("%s\n", re_op_text[bt_data->op])) pop_group_return(state); discard_backtrack(state); break; case RE_OP_CONDITIONAL: /* Conditional subpattern. */ { TRACE(("%s\n", re_op_text[bt_data->op])) if (bt_data->lookaround.inside) { /* Backtracked to the start of a lookaround. */ RE_AtomicData* conditional; conditional = pop_atomic(safe_state); state->text_pos = conditional->text_pos; state->slice_end = conditional->slice_end; state->slice_start = conditional->slice_start; state->current_backtrack_block = conditional->current_backtrack_block; state->current_backtrack_block->count = conditional->backtrack_count; /* Restore the groups and repeats and certain flags. */ if (conditional->has_repeats) pop_repeats(state); if (conditional->has_groups) pop_groups(state); state->too_few_errors = bt_data->lookaround.too_few_errors; state->capture_change = bt_data->lookaround.capture_change; if (bt_data->lookaround.node->match) { /* It's a positive lookaround that's failed. * * Go to the 'false' branch. */ node = bt_data->lookaround.node->nonstring.next_2.node; } else { /* It's a negative lookaround that's failed. * * Go to the 'true' branch. */ node = bt_data->lookaround.node->nonstring.next_2.node; } discard_backtrack(state); goto advance; } else { /* Backtracked to a lookaround. If it's a positive lookaround * that succeeded, we need to restore the groups; if it's a * negative lookaround that failed, it would have completely * backtracked inside and already restored the groups. We also * need to restore certain flags. */ if (bt_data->lookaround.node->match) pop_groups(state); state->too_few_errors = bt_data->lookaround.too_few_errors; state->capture_change = bt_data->lookaround.capture_change; discard_backtrack(state); } break; } case RE_OP_END_FUZZY: /* End of fuzzy matching. */ TRACE(("%s\n", re_op_text[bt_data->op])) state->total_fuzzy_counts[RE_FUZZY_SUB] -= state->fuzzy_info.counts[RE_FUZZY_SUB]; state->total_fuzzy_counts[RE_FUZZY_INS] -= state->fuzzy_info.counts[RE_FUZZY_INS]; state->total_fuzzy_counts[RE_FUZZY_DEL] -= state->fuzzy_info.counts[RE_FUZZY_DEL]; /* We need to retry the fuzzy match. */ status = retry_fuzzy_insert(safe_state, &state->text_pos, &node); if (status < 0) return RE_ERROR_PARTIAL; /* If there were too few errors, in the fuzzy section, try again. */ if (state->too_few_errors) { state->too_few_errors = FALSE; goto backtrack; } if (node) { state->total_fuzzy_counts[RE_FUZZY_SUB] += state->fuzzy_info.counts[RE_FUZZY_SUB]; state->total_fuzzy_counts[RE_FUZZY_INS] += state->fuzzy_info.counts[RE_FUZZY_INS]; state->total_fuzzy_counts[RE_FUZZY_DEL] += state->fuzzy_info.counts[RE_FUZZY_DEL]; node = node->next_1.node; goto advance; } break; case RE_OP_END_GROUP: /* End of a capture group. */ { RE_CODE private_index; RE_GroupData* group; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->group.public_index)) private_index = bt_data->group.private_index; group = &state->groups[private_index - 1]; /* Unsave the capture? */ if (bt_data->group.capture) unsave_capture(state, bt_data->group.private_index, bt_data->group.public_index); if (pattern->group_info[private_index - 1].referenced && group->span.end != bt_data->group.text_pos) --state->capture_change; group->span.end = bt_data->group.text_pos; group->current_capture = bt_data->group.current_capture; discard_backtrack(state); break; } case RE_OP_FAILURE: { TRACE(("%s\n", re_op_text[bt_data->op])) /* Have we been looking for a POSIX match? */ if (state->found_match) { restore_best_match(safe_state); return RE_OP_SUCCESS; } /* Do we have to advance? */ if (!search) return RE_ERROR_FAILURE; /* Can we advance? */ state->text_pos = state->match_pos; if (state->reverse) { if (state->text_pos <= state->slice_start) return RE_ERROR_FAILURE; } else { if (state->text_pos >= state->slice_end) return RE_ERROR_FAILURE; } /* Skip over any repeated leading characters. */ switch (start_node->op) { case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: { size_t count; BOOL is_partial; /* How many characters did the repeat actually match? */ count = count_one(state, start_node->nonstring.next_2.node, state->text_pos, start_node->values[2], &is_partial); /* If it's fewer than the maximum then skip over those * characters. */ if (count < start_node->values[2]) state->text_pos += (Py_ssize_t)count * pattern_step; break; } } /* Advance and try to match again. e also need to check whether we * need to skip. */ if (state->reverse) { if (state->text_pos > state->slice_end) state->text_pos = state->slice_end; else --state->text_pos; } else { if (state->text_pos < state->slice_start) state->text_pos = state->slice_start; else ++state->text_pos; } /* Clear the groups. */ clear_groups(state); goto start_match; } case RE_OP_FUZZY: /* Fuzzy matching. */ { RE_FuzzyInfo* fuzzy_info; TRACE(("%s\n", re_op_text[bt_data->op])) /* Restore the previous fuzzy info. */ fuzzy_info = &state->fuzzy_info; memmove(fuzzy_info, &bt_data->fuzzy.fuzzy_info, sizeof(RE_FuzzyInfo)); discard_backtrack(state); break; } case RE_OP_GREEDY_REPEAT: /* Greedy repeat. */ case RE_OP_LAZY_REPEAT: /* Lazy repeat. */ { RE_RepeatData* rp_data; TRACE(("%s\n", re_op_text[bt_data->op])) /* The repeat failed to match. */ rp_data = &state->repeats[bt_data->repeat.index]; /* The body may have failed to match at this position. */ if (!guard_repeat(safe_state, bt_data->repeat.index, bt_data->repeat.text_pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; /* Restore the previous repeat. */ rp_data->count = bt_data->repeat.count; rp_data->start = bt_data->repeat.start; rp_data->capture_change = bt_data->repeat.capture_change; discard_backtrack(state); break; } case RE_OP_GREEDY_REPEAT_ONE: /* Greedy repeat for one character. */ { RE_RepeatData* rp_data; size_t count; Py_ssize_t step; Py_ssize_t pos; Py_ssize_t limit; RE_Node* test; BOOL match; BOOL m; size_t index; TRACE(("%s\n", re_op_text[bt_data->op])) node = bt_data->repeat.position.node; rp_data = &state->repeats[bt_data->repeat.index]; /* Unmatch one character at a time until the tail could match or we * have reached the minimum. */ state->text_pos = rp_data->start; count = rp_data->count; step = node->step; pos = state->text_pos + (Py_ssize_t)count * step; limit = state->text_pos + (Py_ssize_t)node->values[1] * step; /* The tail failed to match at this position. */ if (!guard_repeat(safe_state, bt_data->repeat.index, pos, RE_STATUS_TAIL, TRUE)) return RE_ERROR_MEMORY; /* A (*SKIP) might have change the size of the slice. */ if (step > 0) { if (limit < state->slice_start) limit = state->slice_start; } else { if (limit > state->slice_end) limit = state->slice_end; } if (pos == limit) { /* We've backtracked the repeat as far as we can. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); break; } test = node->next_1.test; m = test->match; index = node->values[0]; match = FALSE; if (test->status & RE_STATUS_FUZZY) { for (;;) { int status; RE_Position next_position; pos -= step; status = try_match(state, &node->next_1, pos, &next_position); if (status < 0) return status; if (status != RE_ERROR_FAILURE && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } } else { /* A repeated single-character match is often followed by a * literal, so checking specially for it can be a good * optimisation when working with long strings. */ switch (test->op) { case RE_OP_CHARACTER: { Py_UCS4 ch; ch = test->values[0]; for (;;) { --pos; if (same_char(char_at(state->text, pos), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } case RE_OP_CHARACTER_IGN: { Py_UCS4 ch; ch = test->values[0]; for (;;) { --pos; if (same_char_ign(encoding, locale_info, char_at(state->text, pos), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } case RE_OP_CHARACTER_IGN_REV: { Py_UCS4 ch; ch = test->values[0]; for (;;) { ++pos; if (same_char_ign(encoding, locale_info, char_at(state->text, pos - 1), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } case RE_OP_CHARACTER_REV: { Py_UCS4 ch; ch = test->values[0]; for (;;) { ++pos; if (same_char(char_at(state->text, pos - 1), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } case RE_OP_STRING: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ pos = min_ssize_t(pos - 1, state->slice_end - length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos < limit) break; found = string_search_rev(safe_state, test, pos + length, limit, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found - length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } --pos; } break; } case RE_OP_STRING_FLD: { int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t folded_length; size_t i; Py_UCS4 folded[RE_MAX_FOLDED]; full_case_fold = encoding->full_case_fold; folded_length = 0; for (i = 0; i < test->value_count; i++) folded_length += full_case_fold(locale_info, test->values[i], folded); /* The tail is a string. We don't want to go off the end of * the slice. */ pos = min_ssize_t(pos - 1, state->slice_end - folded_length); for (;;) { Py_ssize_t found; Py_ssize_t new_pos; BOOL is_partial; if (pos < limit) break; found = string_search_fld_rev(safe_state, test, pos + folded_length, limit, &new_pos, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found - folded_length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } --pos; } break; } case RE_OP_STRING_FLD_REV: { int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t folded_length; size_t i; Py_UCS4 folded[RE_MAX_FOLDED]; full_case_fold = encoding->full_case_fold; folded_length = 0; for (i = 0; i < test->value_count; i++) folded_length += full_case_fold(locale_info, test->values[i], folded); /* The tail is a string. We don't want to go off the end of * the slice. */ pos = max_ssize_t(pos + 1, state->slice_start + folded_length); for (;;) { Py_ssize_t found; Py_ssize_t new_pos; BOOL is_partial; if (pos > limit) break; found = string_search_fld(safe_state, test, pos - folded_length, limit, &new_pos, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found + folded_length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } ++pos; } break; } case RE_OP_STRING_IGN: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ pos = min_ssize_t(pos - 1, state->slice_end - length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos < limit) break; found = string_search_ign_rev(safe_state, test, pos + length, limit, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found - length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } --pos; } break; } case RE_OP_STRING_IGN_REV: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ pos = max_ssize_t(pos + 1, state->slice_start + length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos > limit) break; found = string_search_ign(safe_state, test, pos - length, limit, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found + length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } ++pos; } break; } case RE_OP_STRING_REV: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ pos = max_ssize_t(pos + 1, state->slice_start + length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos > limit) break; found = string_search(safe_state, test, pos - length, limit, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found + length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } ++pos; } break; } default: for (;;) { RE_Position next_position; pos -= step; status = try_match(state, &node->next_1, pos, &next_position); if (status < 0) return status; if (status == RE_ERROR_SUCCESS && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } } if (match) { count = (size_t)abs_ssize_t(pos - state->text_pos); /* The tail could match. */ if (count > node->values[1]) /* The match is longer than the minimum, so we might need * to backtrack the repeat again to consume less. */ rp_data->count = count; else { /* We've reached or passed the minimum, so we won't need to * backtrack the repeat again. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); /* Have we passed the minimum? */ if (count < node->values[1]) goto backtrack; } node = node->next_1.node; state->text_pos = pos; goto advance; } else { /* Don't try this repeated match again. */ if (step > 0) { if (!guard_repeat_range(safe_state, bt_data->repeat.index, limit, pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; } else if (step < 0) { if (!guard_repeat_range(safe_state, bt_data->repeat.index, pos, limit, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; } /* We've backtracked the repeat as far as we can. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); } break; } case RE_OP_GROUP_RETURN: /* Group return. */ { RE_Node* return_node; TRACE(("%s\n", re_op_text[bt_data->op])) return_node = bt_data->group_call.node; push_group_return(safe_state, return_node); if (return_node) { /* Restore the groups. */ pop_groups(state); state->capture_change = bt_data->group_call.capture_change; /* Restore the repeats. */ pop_repeats(state); } discard_backtrack(state); break; } case RE_OP_KEEP: /* Keep. */ { state->match_pos = bt_data->keep.match_pos; discard_backtrack(state); break; } case RE_OP_LAZY_REPEAT_ONE: /* Lazy repeat for one character. */ { RE_RepeatData* rp_data; size_t count; Py_ssize_t step; Py_ssize_t pos; Py_ssize_t available; size_t max_count; Py_ssize_t limit; RE_Node* repeated; RE_Node* test; BOOL match; BOOL m; size_t index; TRACE(("%s\n", re_op_text[bt_data->op])) node = bt_data->repeat.position.node; rp_data = &state->repeats[bt_data->repeat.index]; /* Match one character at a time until the tail could match or we * have reached the maximum. */ state->text_pos = rp_data->start; count = rp_data->count; step = node->step; pos = state->text_pos + (Py_ssize_t)count * step; available = step > 0 ? state->slice_end - state->text_pos : state->text_pos - state->slice_start; max_count = min_size_t((size_t)available, node->values[2]); limit = state->text_pos + (Py_ssize_t)max_count * step; repeated = node->nonstring.next_2.node; test = node->next_1.test; m = test->match; index = node->values[0]; match = FALSE; if (test->status & RE_STATUS_FUZZY) { for (;;) { RE_Position next_position; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; pos += step; status = try_match(state, &node->next_1, pos, &next_position); if (status < 0) return status; if (status == RE_ERROR_SUCCESS && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } } else { /* A repeated single-character match is often followed by a * literal, so checking specially for it can be a good * optimisation when working with long strings. */ switch (test->op) { case RE_OP_CHARACTER: { Py_UCS4 ch; ch = test->values[0]; /* The tail is a character. We don't want to go off the end * of the slice. */ limit = min_ssize_t(limit, state->slice_end - 1); for (;;) { if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; if (same_char(char_at(state->text, pos), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_CHARACTER_IGN: { Py_UCS4 ch; ch = test->values[0]; /* The tail is a character. We don't want to go off the end * of the slice. */ limit = min_ssize_t(limit, state->slice_end - 1); for (;;) { if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; if (same_char_ign(encoding, locale_info, char_at(state->text, pos), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_CHARACTER_IGN_REV: { Py_UCS4 ch; ch = test->values[0]; /* The tail is a character. We don't want to go off the end * of the slice. */ limit = max_ssize_t(limit, state->slice_start + 1); for (;;) { if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; if (same_char_ign(encoding, locale_info, char_at(state->text, pos - 1), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_CHARACTER_REV: { Py_UCS4 ch; ch = test->values[0]; /* The tail is a character. We don't want to go off the end * of the slice. */ limit = max_ssize_t(limit, state->slice_start + 1); for (;;) { if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; if (same_char(char_at(state->text, pos - 1), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ limit = min_ssize_t(limit, state->slice_end - length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; /* Look for the tail string. */ found = string_search(safe_state, test, pos + 1, limit + length, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_FLD: { /* The tail is a string. We don't want to go off the end of * the slice. */ limit = min_ssize_t(limit, state->slice_end); for (;;) { Py_ssize_t found; Py_ssize_t new_pos; BOOL is_partial; if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; /* Look for the tail string. */ found = string_search_fld(safe_state, test, pos + 1, limit, &new_pos, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_FLD_REV: { /* The tail is a string. We don't want to go off the end of * the slice. */ limit = max_ssize_t(limit, state->slice_start); for (;;) { Py_ssize_t found; Py_ssize_t new_pos; BOOL is_partial; if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; /* Look for the tail string. */ found = string_search_fld_rev(safe_state, test, pos - 1, limit, &new_pos, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_IGN: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ limit = min_ssize_t(limit, state->slice_end - length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; /* Look for the tail string. */ found = string_search_ign(safe_state, test, pos + 1, limit + length, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_IGN_REV: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ limit = max_ssize_t(limit, state->slice_start + length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; /* Look for the tail string. */ found = string_search_ign_rev(safe_state, test, pos - 1, limit - length, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_REV: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ limit = max_ssize_t(limit, state->slice_start + length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; /* Look for the tail string. */ found = string_search_rev(safe_state, test, pos - 1, limit - length, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } default: for (;;) { RE_Position next_position; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; pos += step; status = try_match(state, &node->next_1, pos, &next_position); if (status < 0) return RE_ERROR_PARTIAL; if (status == RE_ERROR_SUCCESS && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } } if (match) { /* The tail could match. */ count = (size_t)abs_ssize_t(pos - state->text_pos); state->text_pos = pos; if (count < max_count) { /* The match is shorter than the maximum, so we might need * to backtrack the repeat again to consume more. */ rp_data->count = count; } else { /* We've reached or passed the maximum, so we won't need to * backtrack the repeat again. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); /* Have we passed the maximum? */ if (count > max_count) goto backtrack; } node = node->next_1.node; goto advance; } else { /* The tail couldn't match. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); } break; } case RE_OP_LOOKAROUND: /* Lookaround subpattern. */ { TRACE(("%s\n", re_op_text[bt_data->op])) if (bt_data->lookaround.inside) { /* Backtracked to the start of a lookaround. */ RE_AtomicData* lookaround; lookaround = pop_atomic(safe_state); state->text_pos = lookaround->text_pos; state->slice_end = lookaround->slice_end; state->slice_start = lookaround->slice_start; state->current_backtrack_block = lookaround->current_backtrack_block; state->current_backtrack_block->count = lookaround->backtrack_count; /* Restore the groups and repeats and certain flags. */ if (lookaround->has_repeats) pop_repeats(state); if (lookaround->has_groups) pop_groups(state); state->too_few_errors = bt_data->lookaround.too_few_errors; state->capture_change = bt_data->lookaround.capture_change; if (bt_data->lookaround.node->match) { /* It's a positive lookaround that's failed. */ discard_backtrack(state); } else { /* It's a negative lookaround that's failed. Record that * we've now left the lookaround and continue to the * following node. */ bt_data->lookaround.inside = FALSE; node = bt_data->lookaround.node->nonstring.next_2.node; goto advance; } } else { /* Backtracked to a lookaround. If it's a positive lookaround * that succeeded, we need to restore the groups; if it's a * negative lookaround that failed, it would have completely * backtracked inside and already restored the groups. We also * need to restore certain flags. */ if (bt_data->lookaround.node->match && (bt_data->lookaround.node->status & RE_STATUS_HAS_GROUPS)) pop_groups(state); state->too_few_errors = bt_data->lookaround.too_few_errors; state->capture_change = bt_data->lookaround.capture_change; discard_backtrack(state); } break; } case RE_OP_MATCH_BODY: { RE_RepeatData* rp_data; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->repeat.index)) /* We want to match the body. */ rp_data = &state->repeats[bt_data->repeat.index]; /* Restore the repeat info. */ rp_data->count = bt_data->repeat.count; rp_data->start = bt_data->repeat.start; rp_data->capture_change = bt_data->repeat.capture_change; /* Record backtracking info in case the body fails to match. */ bt_data->op = RE_OP_BODY_START; /* Advance into the body. */ node = bt_data->repeat.position.node; state->text_pos = bt_data->repeat.position.text_pos; goto advance; } case RE_OP_MATCH_TAIL: { RE_RepeatData* rp_data; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->repeat.index)) /* We want to match the tail. */ rp_data = &state->repeats[bt_data->repeat.index]; /* Restore the repeat info. */ rp_data->count = bt_data->repeat.count; rp_data->start = bt_data->repeat.start; rp_data->capture_change = bt_data->repeat.capture_change; /* Advance into the tail. */ node = bt_data->repeat.position.node; state->text_pos = bt_data->repeat.position.text_pos; discard_backtrack(state); goto advance; } case RE_OP_REF_GROUP: /* Reference to a capture group. */ case RE_OP_REF_GROUP_IGN: /* Reference to a capture group, ignoring case. */ case RE_OP_REF_GROUP_IGN_REV: /* Reference to a capture group, backwards, ignoring case. */ case RE_OP_REF_GROUP_REV: /* Reference to a capture group, backwards. */ case RE_OP_STRING: /* A string. */ case RE_OP_STRING_IGN: /* A string, ignoring case. */ case RE_OP_STRING_IGN_REV: /* A string, backwards, ignoring case. */ case RE_OP_STRING_REV: /* A string, backwards. */ { BOOL matched; TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_string(safe_state, search, &state->text_pos, &node, &string_pos, &matched); if (status < 0) return RE_ERROR_PARTIAL; if (matched) goto advance; string_pos = -1; break; } case RE_OP_REF_GROUP_FLD: /* Reference to a capture group, ignoring case. */ case RE_OP_REF_GROUP_FLD_REV: /* Reference to a capture group, backwards, ignoring case. */ { BOOL matched; TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_group_fld(safe_state, search, &state->text_pos, &node, &folded_pos, &string_pos, &gfolded_pos, &matched); if (status < 0) return RE_ERROR_PARTIAL; if (matched) goto advance; string_pos = -1; break; } case RE_OP_START_GROUP: /* Start of a capture group. */ { RE_CODE private_index; RE_GroupData* group; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->group.public_index)) private_index = bt_data->group.private_index; group = &state->groups[private_index - 1]; /* Unsave the capture? */ if (bt_data->group.capture) unsave_capture(state, bt_data->group.private_index, bt_data->group.public_index); if (pattern->group_info[private_index - 1].referenced && group->span.start != bt_data->group.text_pos) --state->capture_change; group->span.start = bt_data->group.text_pos; group->current_capture = bt_data->group.current_capture; discard_backtrack(state); break; } case RE_OP_STRING_FLD: /* A string, ignoring case. */ case RE_OP_STRING_FLD_REV: /* A string, backwards, ignoring case. */ { BOOL matched; TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_string_fld(safe_state, search, &state->text_pos, &node, &string_pos, &folded_pos, &matched); if (status < 0) return RE_ERROR_PARTIAL; if (matched) goto advance; string_pos = -1; break; } default: TRACE(("UNKNOWN OP %d\n", bt_data->op)) return RE_ERROR_ILLEGAL; } } } /* Saves group data for fuzzy matching. */ Py_LOCAL_INLINE(RE_GroupData*) save_groups(RE_SafeState* safe_state, RE_GroupData* saved_groups) { RE_State* state; PatternObject* pattern; size_t g; /* Re-acquire the GIL. */ acquire_GIL(safe_state); state = safe_state->re_state; pattern = state->pattern; if (!saved_groups) { saved_groups = (RE_GroupData*)re_alloc(pattern->true_group_count * sizeof(RE_GroupData)); if (!saved_groups) goto error; memset(saved_groups, 0, pattern->true_group_count * sizeof(RE_GroupData)); } for (g = 0; g < pattern->true_group_count; g++) { RE_GroupData* orig; RE_GroupData* copy; orig = &state->groups[g]; copy = &saved_groups[g]; copy->span = orig->span; if (orig->capture_count > copy->capture_capacity) { RE_GroupSpan* cap_copy; cap_copy = (RE_GroupSpan*)re_realloc(copy->captures, orig->capture_count * sizeof(RE_GroupSpan)); if (!cap_copy) goto error; copy->capture_capacity = orig->capture_count; copy->captures = cap_copy; } copy->capture_count = orig->capture_count; Py_MEMCPY(copy->captures, orig->captures, orig->capture_count * sizeof(RE_GroupSpan)); } /* Release the GIL. */ release_GIL(safe_state); return saved_groups; error: if (saved_groups) { for (g = 0; g < pattern->true_group_count; g++) re_dealloc(saved_groups[g].captures); re_dealloc(saved_groups); } /* Release the GIL. */ release_GIL(safe_state); return NULL; } /* Restores group data for fuzzy matching. */ Py_LOCAL_INLINE(void) restore_groups(RE_SafeState* safe_state, RE_GroupData* saved_groups) { RE_State* state; PatternObject* pattern; size_t g; /* Re-acquire the GIL. */ acquire_GIL(safe_state); state = safe_state->re_state; pattern = state->pattern; for (g = 0; g < pattern->true_group_count; g++) re_dealloc(state->groups[g].captures); Py_MEMCPY(state->groups, saved_groups, pattern->true_group_count * sizeof(RE_GroupData)); re_dealloc(saved_groups); /* Release the GIL. */ release_GIL(safe_state); } /* Discards group data for fuzzy matching. */ Py_LOCAL_INLINE(void) discard_groups(RE_SafeState* safe_state, RE_GroupData* saved_groups) { RE_State* state; PatternObject* pattern; size_t g; /* Re-acquire the GIL. */ acquire_GIL(safe_state); state = safe_state->re_state; pattern = state->pattern; for (g = 0; g < pattern->true_group_count; g++) re_dealloc(saved_groups[g].captures); re_dealloc(saved_groups); /* Release the GIL. */ release_GIL(safe_state); } /* Saves the fuzzy info. */ Py_LOCAL_INLINE(void) save_fuzzy_counts(RE_State* state, size_t* fuzzy_counts) { Py_MEMCPY(fuzzy_counts, state->total_fuzzy_counts, sizeof(state->total_fuzzy_counts)); } /* Restores the fuzzy info. */ Py_LOCAL_INLINE(void) restore_fuzzy_counts(RE_State* state, size_t* fuzzy_counts) { Py_MEMCPY(state->total_fuzzy_counts, fuzzy_counts, sizeof(state->total_fuzzy_counts)); } /* Makes the list of best matches found so far. */ Py_LOCAL_INLINE(void) make_best_list(RE_BestList* best_list) { best_list->capacity = 0; best_list->count = 0; best_list->entries = NULL; } /* Clears the list of best matches found so far. */ Py_LOCAL_INLINE(void) clear_best_list(RE_BestList* best_list) { best_list->count = 0; } /* Adds a new entry to the list of best matches found so far. */ Py_LOCAL_INLINE(BOOL) add_to_best_list(RE_SafeState* safe_state, RE_BestList* best_list, Py_ssize_t match_pos, Py_ssize_t text_pos) { RE_BestEntry* entry; if (best_list->count >= best_list->capacity) { RE_BestEntry* new_entries; best_list->capacity = best_list->capacity == 0 ? 16 : best_list->capacity * 2; new_entries = safe_realloc(safe_state, best_list->entries, best_list->capacity * sizeof(RE_BestEntry)); if (!new_entries) return FALSE; best_list->entries = new_entries; } entry = &best_list->entries[best_list->count++]; entry->match_pos = match_pos; entry->text_pos = text_pos; return TRUE; } /* Destroy the list of best matches found so far. */ Py_LOCAL_INLINE(void) destroy_best_list(RE_SafeState* safe_state, RE_BestList* best_list) { if (best_list->entries) safe_dealloc(safe_state, best_list->entries); } /* Performs a match or search from the current text position for a best fuzzy * match. */ Py_LOCAL_INLINE(int) do_best_fuzzy_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; Py_ssize_t available; int step; size_t fewest_errors; BOOL must_advance; BOOL found_match; RE_BestList best_list; Py_ssize_t start_pos; int status; TRACE(("<>\n")) state = safe_state->re_state; if (state->reverse) { available = state->text_pos - state->slice_start; step = -1; } else { available = state->slice_end - state->text_pos; step = 1; } /* The maximum permitted cost. */ state->max_errors = PY_SSIZE_T_MAX; fewest_errors = PY_SSIZE_T_MAX; state->best_text_pos = state->reverse ? state->slice_start : state->slice_end; must_advance = state->must_advance; found_match = FALSE; make_best_list(&best_list); /* Search the text for the best match. */ start_pos = state->text_pos; while (state->slice_start <= start_pos && start_pos <= state->slice_end) { state->text_pos = start_pos; state->must_advance = must_advance; /* Initialise the state. */ init_match(state); status = RE_ERROR_SUCCESS; if (state->max_errors == 0 && state->partial_side == RE_PARTIAL_NONE) { /* An exact match, and partial matches not permitted. */ if (available < state->min_width || (available == 0 && state->must_advance)) status = RE_ERROR_FAILURE; } if (status == RE_ERROR_SUCCESS) status = basic_match(safe_state, search); /* Has an error occurred, or is it a partial match? */ if (status < 0) break; if (status == RE_ERROR_SUCCESS) { /* It was a successful match. */ found_match = TRUE; if (state->total_errors < fewest_errors) { /* This match was better than any of the previous ones. */ fewest_errors = state->total_errors; if (state->total_errors == 0) /* It was a perfect match. */ break; /* Forget all the previous worse matches and remember this one. */ clear_best_list(&best_list); if (!add_to_best_list(safe_state, &best_list, state->match_pos, state->text_pos)) return RE_ERROR_MEMORY; } else if (state->total_errors == fewest_errors) /* This match was as good as the previous matches. Remember * this one. */ add_to_best_list(safe_state, &best_list, state->match_pos, state->text_pos); } /* Should we keep searching? */ if (!search) break; start_pos = state->match_pos + step; } if (found_match) { /* We found a match. */ if (fewest_errors > 0) { /* It doesn't look like a perfect match. */ int i; Py_ssize_t slice_start; Py_ssize_t slice_end; size_t error_limit; size_t best_fuzzy_counts[RE_FUZZY_COUNT]; RE_GroupData* best_groups; Py_ssize_t best_match_pos; Py_ssize_t best_text_pos; slice_start = state->slice_start; slice_end = state->slice_end; error_limit = fewest_errors; if (error_limit > RE_MAX_ERRORS) error_limit = RE_MAX_ERRORS; best_groups = NULL; /* Look again at the best of the matches that we've seen. */ for (i = 0; i < best_list.count; i++) { RE_BestEntry* entry; Py_ssize_t max_offset; Py_ssize_t offset; /* Look for the best fit at this position. */ entry = &best_list.entries[i]; if (search) { max_offset = state->reverse ? entry->match_pos - state->slice_start : state->slice_end - entry->match_pos; if (max_offset > (Py_ssize_t)fewest_errors) max_offset = (Py_ssize_t)fewest_errors; if (max_offset > (Py_ssize_t)error_limit) max_offset = (Py_ssize_t)error_limit; } else max_offset = 0; start_pos = entry->match_pos; offset = 0; while (offset <= max_offset) { state->max_errors = 1; while (state->max_errors <= error_limit) { state->text_pos = start_pos; init_match(state); status = basic_match(safe_state, FALSE); if (status == RE_ERROR_SUCCESS) { BOOL better; if (state->total_errors < error_limit || i == 0 && offset == 0) better = TRUE; else if (state->total_errors == error_limit) /* The cost is as low as the current best, but * is it earlier? */ better = state->reverse ? state->match_pos > best_match_pos : state->match_pos < best_match_pos; if (better) { save_fuzzy_counts(state, best_fuzzy_counts); best_groups = save_groups(safe_state, best_groups); if (!best_groups) { destroy_best_list(safe_state, &best_list); return RE_ERROR_MEMORY; } best_match_pos = state->match_pos; best_text_pos = state->text_pos; error_limit = state->total_errors; } break; } ++state->max_errors; } start_pos += step; ++offset; } if (status == RE_ERROR_SUCCESS && state->total_errors == 0) break; } if (best_groups) { status = RE_ERROR_SUCCESS; state->match_pos = best_match_pos; state->text_pos = best_text_pos; restore_groups(safe_state, best_groups); restore_fuzzy_counts(state, best_fuzzy_counts); } else { /* None of the "best" matches could be improved on, so pick the * first. */ RE_BestEntry* entry; /* Look at only the part of the string around the match. */ entry = &best_list.entries[0]; if (state->reverse) { state->slice_start = entry->text_pos; state->slice_end = entry->match_pos; } else { state->slice_start = entry->match_pos; state->slice_end = entry->text_pos; } /* We'll expand the part that we're looking at to take to * compensate for any matching errors that have occurred. */ if (state->slice_start - slice_start >= (Py_ssize_t)fewest_errors) state->slice_start -= (Py_ssize_t)fewest_errors; else state->slice_start = slice_start; if (slice_end - state->slice_end >= (Py_ssize_t)fewest_errors) state->slice_end += (Py_ssize_t)fewest_errors; else state->slice_end = slice_end; state->max_errors = fewest_errors; state->text_pos = entry->match_pos; init_match(state); status = basic_match(safe_state, search); } state->slice_start = slice_start; state->slice_end = slice_end; } } destroy_best_list(safe_state, &best_list); return status; } /* Performs a match or search from the current text position for an enhanced * fuzzy match. */ Py_LOCAL_INLINE(int) do_enhanced_fuzzy_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; PatternObject* pattern; Py_ssize_t available; size_t fewest_errors; RE_GroupData* best_groups; Py_ssize_t best_match_pos; BOOL must_advance; Py_ssize_t slice_start; Py_ssize_t slice_end; int status; size_t best_fuzzy_counts[RE_FUZZY_COUNT]; Py_ssize_t best_text_pos = 0; /* Initialise to stop compiler warning. */ TRACE(("<>\n")) state = safe_state->re_state; pattern = state->pattern; if (state->reverse) available = state->text_pos - state->slice_start; else available = state->slice_end - state->text_pos; /* The maximum permitted cost. */ state->max_errors = PY_SSIZE_T_MAX; fewest_errors = PY_SSIZE_T_MAX; best_groups = NULL; state->best_match_pos = state->text_pos; state->best_text_pos = state->reverse ? state->slice_start : state->slice_end; best_match_pos = state->text_pos; must_advance = state->must_advance; slice_start = state->slice_start; slice_end = state->slice_end; for (;;) { /* If there's a better match, it won't start earlier in the string than * the current best match, so there's no need to start earlier than * that match. */ state->must_advance = must_advance; /* Initialise the state. */ init_match(state); status = RE_ERROR_SUCCESS; if (state->max_errors == 0 && state->partial_side == RE_PARTIAL_NONE) { /* An exact match, and partial matches not permitted. */ if (available < state->min_width || (available == 0 && state->must_advance)) status = RE_ERROR_FAILURE; } if (status == RE_ERROR_SUCCESS) status = basic_match(safe_state, search); /* Has an error occurred, or is it a partial match? */ if (status < 0) break; if (status == RE_ERROR_SUCCESS) { BOOL better; better = state->total_errors < fewest_errors; if (better) { BOOL same_match; fewest_errors = state->total_errors; state->max_errors = fewest_errors; save_fuzzy_counts(state, best_fuzzy_counts); same_match = state->match_pos == best_match_pos && state->text_pos == best_text_pos; same_match = FALSE; if (best_groups) { size_t g; /* Did we get the same match as the best so far? */ for (g = 0; same_match && g < pattern->public_group_count; g++) { same_match = state->groups[g].span.start == best_groups[g].span.start && state->groups[g].span.end == best_groups[g].span.end; } } /* Save the best result so far. */ best_groups = save_groups(safe_state, best_groups); if (!best_groups) { status = RE_ERROR_MEMORY; break; } best_match_pos = state->match_pos; best_text_pos = state->text_pos; if (same_match || state->total_errors == 0) break; state->max_errors = state->total_errors; if (state->max_errors < RE_MAX_ERRORS) --state->max_errors; } else break; if (state->reverse) { state->slice_start = state->text_pos; state->slice_end = state->match_pos; } else { state->slice_start = state->match_pos; state->slice_end = state->text_pos; } state->text_pos = state->match_pos; if (state->max_errors == PY_SSIZE_T_MAX) state->max_errors = 0; } else break; } state->slice_start = slice_start; state->slice_end = slice_end; if (best_groups) { if (status == RE_ERROR_SUCCESS && state->total_errors == 0) /* We have a perfect match, so the previous best match. */ discard_groups(safe_state, best_groups); else { /* Restore the previous best match. */ status = RE_ERROR_SUCCESS; state->match_pos = best_match_pos; state->text_pos = best_text_pos; restore_groups(safe_state, best_groups); restore_fuzzy_counts(state, best_fuzzy_counts); } } return status; } /* Performs a match or search from the current text position for a simple fuzzy * match. */ Py_LOCAL_INLINE(int) do_simple_fuzzy_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; Py_ssize_t available; int status; TRACE(("<>\n")) state = safe_state->re_state; if (state->reverse) available = state->text_pos - state->slice_start; else available = state->slice_end - state->text_pos; /* The maximum permitted cost. */ state->max_errors = PY_SSIZE_T_MAX; state->best_match_pos = state->text_pos; state->best_text_pos = state->reverse ? state->slice_start : state->slice_end; /* Initialise the state. */ init_match(state); status = RE_ERROR_SUCCESS; if (state->max_errors == 0 && state->partial_side == RE_PARTIAL_NONE) { /* An exact match, and partial matches not permitted. */ if (available < state->min_width || (available == 0 && state->must_advance)) status = RE_ERROR_FAILURE; } if (status == RE_ERROR_SUCCESS) status = basic_match(safe_state, search); return status; } /* Performs a match or search from the current text position for an exact * match. */ Py_LOCAL_INLINE(int) do_exact_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; Py_ssize_t available; int status; TRACE(("<>\n")) state = safe_state->re_state; if (state->reverse) available = state->text_pos - state->slice_start; else available = state->slice_end - state->text_pos; /* The maximum permitted cost. */ state->max_errors = 0; state->best_match_pos = state->text_pos; state->best_text_pos = state->reverse ? state->slice_start : state->slice_end; /* Initialise the state. */ init_match(state); status = RE_ERROR_SUCCESS; if (state->max_errors == 0 && state->partial_side == RE_PARTIAL_NONE) { /* An exact match, and partial matches not permitted. */ if (available < state->min_width || (available == 0 && state->must_advance)) status = RE_ERROR_FAILURE; } if (status == RE_ERROR_SUCCESS) status = basic_match(safe_state, search); return status; } /* Performs a match or search from the current text position. * * The state can sometimes be shared across threads. In such instances there's * a lock (mutex) on it. The lock is held for the duration of matching. */ Py_LOCAL_INLINE(int) do_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; PatternObject* pattern; int status; TRACE(("<>\n")) state = safe_state->re_state; pattern = state->pattern; /* Is there enough to search? */ if (state->reverse) { if (state->text_pos < state->slice_start) return FALSE; } else { if (state->text_pos > state->slice_end) return FALSE; } /* Release the GIL. */ release_GIL(safe_state); if (pattern->is_fuzzy) { if (pattern->flags & RE_FLAG_BESTMATCH) status = do_best_fuzzy_match(safe_state, search); else if (pattern->flags & RE_FLAG_ENHANCEMATCH) status = do_enhanced_fuzzy_match(safe_state, search); else status = do_simple_fuzzy_match(safe_state, search); } else status = do_exact_match(safe_state, search); if (status == RE_ERROR_SUCCESS || status == RE_ERROR_PARTIAL) { Py_ssize_t max_end_index; RE_GroupInfo* group_info; size_t g; /* Store the results. */ state->lastindex = -1; state->lastgroup = -1; max_end_index = -1; if (status == RE_ERROR_PARTIAL) { /* We've matched up to the limit of the slice. */ if (state->reverse) state->text_pos = state->slice_start; else state->text_pos = state->slice_end; } /* Store the capture groups. */ group_info = pattern->group_info; for (g = 0; g < pattern->public_group_count; g++) { RE_GroupSpan* span; span = &state->groups[g].span; /* The string positions are of type Py_ssize_t, so the format needs * to specify that. */ TRACE(("group %d from %" PY_FORMAT_SIZE_T "d to %" PY_FORMAT_SIZE_T "d\n", g + 1, span->start, span->end)) if (span->start >= 0 && span->end >= 0 && group_info[g].end_index > max_end_index) { max_end_index = group_info[g].end_index; state->lastindex = (Py_ssize_t)g + 1; if (group_info[g].has_name) state->lastgroup = (Py_ssize_t)g + 1; } } } /* Re-acquire the GIL. */ acquire_GIL(safe_state); if (status < 0 && status != RE_ERROR_PARTIAL && !PyErr_Occurred()) set_error(status, NULL); return status; } /* Gets a string from a Python object. * * If the function returns true and str_info->should_release is true then it's * the responsibility of the caller to release the buffer when it's no longer * needed. */ Py_LOCAL_INLINE(BOOL) get_string(PyObject* string, RE_StringInfo* str_info) { /* Given a Python object, return a data pointer, a length (in characters), * and a character size. Return FALSE if the object is not a string (or not * compatible). */ PyBufferProcs* buffer; Py_ssize_t bytes; Py_ssize_t size; /* Unicode objects do not support the buffer API. So, get the data directly * instead. */ if (PyUnicode_Check(string)) { /* Unicode strings doesn't always support the buffer interface. */ str_info->characters = (void*)PyUnicode_AS_DATA(string); str_info->length = PyUnicode_GET_SIZE(string); str_info->charsize = sizeof(Py_UNICODE); str_info->is_unicode = TRUE; str_info->should_release = FALSE; return TRUE; } /* Get pointer to string buffer. */ #if PY_VERSION_HEX >= 0x02060000 buffer = Py_TYPE(string)->tp_as_buffer; str_info->view.len = -1; #else buffer = string->ob_type->tp_as_buffer; #endif if (!buffer) { PyErr_SetString(PyExc_TypeError, "expected string or buffer"); return FALSE; } #if PY_VERSION_HEX >= 0x02060000 if (buffer->bf_getbuffer && (*buffer->bf_getbuffer)(string, &str_info->view, PyBUF_SIMPLE) >= 0) /* It's a new-style buffer. */ str_info->should_release = TRUE; else #endif if (buffer->bf_getreadbuffer && buffer->bf_getsegcount && buffer->bf_getsegcount(string, NULL) == 1) /* It's an old-style buffer. */ str_info->should_release = FALSE; else { PyErr_SetString(PyExc_TypeError, "expected string or buffer"); return FALSE; } /* Determine buffer size. */ #if PY_VERSION_HEX >= 0x02060000 if (str_info->should_release) { /* It's a new-style buffer. */ bytes = str_info->view.len; str_info->characters = str_info->view.buf; if (str_info->characters == NULL) { PyBuffer_Release(&str_info->view); PyErr_SetString(PyExc_ValueError, "buffer is NULL"); return FALSE; } } else #endif /* It's an old-style buffer. */ bytes = buffer->bf_getreadbuffer(string, 0, &str_info->characters); if (bytes < 0) { #if PY_VERSION_HEX >= 0x02060000 if (str_info->should_release) PyBuffer_Release(&str_info->view); #endif PyErr_SetString(PyExc_TypeError, "buffer has negative size"); return FALSE; } /* Determine character size. */ size = PyObject_Size(string); if (PyString_Check(string) || bytes == size) str_info->charsize = 1; else { #if PY_VERSION_HEX >= 0x02060000 if (str_info->should_release) PyBuffer_Release(&str_info->view); #endif PyErr_SetString(PyExc_TypeError, "buffer size mismatch"); return FALSE; } str_info->length = size; str_info->is_unicode = FALSE; return TRUE; } /* Deallocates the groups storage. */ Py_LOCAL_INLINE(void) dealloc_groups(RE_GroupData* groups, size_t group_count) { size_t g; if (!groups) return; for (g = 0; g < group_count; g++) re_dealloc(groups[g].captures); re_dealloc(groups); } /* Initialises a state object. */ Py_LOCAL_INLINE(BOOL) state_init_2(RE_State* state, PatternObject* pattern, PyObject* string, RE_StringInfo* str_info, Py_ssize_t start, Py_ssize_t end, BOOL overlapped, int concurrent, BOOL partial, BOOL use_lock, BOOL visible_captures, BOOL match_all) { int i; Py_ssize_t final_pos; state->groups = NULL; state->best_match_groups = NULL; state->repeats = NULL; state->visible_captures = visible_captures; state->match_all = match_all; state->backtrack_block.previous = NULL; state->backtrack_block.next = NULL; state->backtrack_block.capacity = RE_BACKTRACK_BLOCK_SIZE; state->backtrack_allocated = RE_BACKTRACK_BLOCK_SIZE; state->current_atomic_block = NULL; state->first_saved_groups = NULL; state->current_saved_groups = NULL; state->first_saved_repeats = NULL; state->current_saved_repeats = NULL; state->lock = NULL; state->fuzzy_guards = NULL; state->first_group_call_frame = NULL; state->current_group_call_frame = NULL; state->group_call_guard_list = NULL; state->req_pos = -1; /* The call guards used by recursive patterns. */ if (pattern->call_ref_info_count > 0) { state->group_call_guard_list = (RE_GuardList*)re_alloc(pattern->call_ref_info_count * sizeof(RE_GuardList)); if (!state->group_call_guard_list) goto error; memset(state->group_call_guard_list, 0, pattern->call_ref_info_count * sizeof(RE_GuardList)); } /* The capture groups. */ if (pattern->true_group_count) { size_t g; if (pattern->groups_storage) { state->groups = pattern->groups_storage; pattern->groups_storage = NULL; } else { state->groups = (RE_GroupData*)re_alloc(pattern->true_group_count * sizeof(RE_GroupData)); if (!state->groups) goto error; memset(state->groups, 0, pattern->true_group_count * sizeof(RE_GroupData)); for (g = 0; g < pattern->true_group_count; g++) { RE_GroupSpan* captures; captures = (RE_GroupSpan*)re_alloc(sizeof(RE_GroupSpan)); if (!captures) { size_t i; for (i = 0; i < g; i++) re_dealloc(state->groups[i].captures); goto error; } state->groups[g].captures = captures; state->groups[g].capture_capacity = 1; } } } /* Adjust boundaries. */ if (start < 0) start += str_info->length; if (start < 0) start = 0; else if (start > str_info->length) start = str_info->length; if (end < 0) end += str_info->length; if (end < 0) end = 0; else if (end > str_info->length) end = str_info->length; state->overlapped = overlapped; state->min_width = pattern->min_width; /* Initialise the getters and setters for the character size. */ state->charsize = str_info->charsize; state->is_unicode = str_info->is_unicode; #if PY_VERSION_HEX >= 0x02060000 /* Are we using a buffer object? If so, we need to copy the info. */ state->should_release = str_info->should_release; if (state->should_release) state->view = str_info->view; #endif switch (state->charsize) { case 1: state->char_at = bytes1_char_at; state->set_char_at = bytes1_set_char_at; state->point_to = bytes1_point_to; break; case 2: state->char_at = bytes2_char_at; state->set_char_at = bytes2_set_char_at; state->point_to = bytes2_point_to; break; case 4: state->char_at = bytes4_char_at; state->set_char_at = bytes4_set_char_at; state->point_to = bytes4_point_to; break; default: goto error; } state->encoding = pattern->encoding; state->locale_info = pattern->locale_info; /* The state object contains a reference to the string and also a pointer * to its contents. * * The documentation says that the end of the slice behaves like the end of * the string. */ state->text = str_info->characters; state->text_length = end; state->reverse = (pattern->flags & RE_FLAG_REVERSE) != 0; if (partial) state->partial_side = state->reverse ? RE_PARTIAL_LEFT : RE_PARTIAL_RIGHT; else state->partial_side = RE_PARTIAL_NONE; state->slice_start = start; state->slice_end = state->text_length; state->text_pos = state->reverse ? state->slice_end : state->slice_start; /* Point to the final newline and line separator if it's at the end of the * string, otherwise just -1. */ state->final_newline = -1; state->final_line_sep = -1; final_pos = state->text_length - 1; if (final_pos >= 0) { Py_UCS4 ch; ch = state->char_at(state->text, final_pos); if (ch == 0x0A) { /* The string ends with LF. */ state->final_newline = final_pos; state->final_line_sep = final_pos; /* Does the string end with CR/LF? */ --final_pos; if (final_pos >= 0 && state->char_at(state->text, final_pos) == 0x0D) state->final_line_sep = final_pos; } else { /* The string doesn't end with LF, but it could be another kind of * line separator. */ if (state->encoding->is_line_sep(ch)) state->final_line_sep = final_pos; } } /* If the 'new' behaviour is enabled then split correctly on zero-width * matches. */ state->version_0 = (pattern->flags & RE_FLAG_VERSION1) == 0; state->must_advance = FALSE; state->pattern = pattern; state->string = string; if (pattern->repeat_count) { if (pattern->repeats_storage) { state->repeats = pattern->repeats_storage; pattern->repeats_storage = NULL; } else { state->repeats = (RE_RepeatData*)re_alloc(pattern->repeat_count * sizeof(RE_RepeatData)); if (!state->repeats) goto error; memset(state->repeats, 0, pattern->repeat_count * sizeof(RE_RepeatData)); } } if (pattern->fuzzy_count) { state->fuzzy_guards = (RE_FuzzyGuards*)re_alloc(pattern->fuzzy_count * sizeof(RE_FuzzyGuards)); if (!state->fuzzy_guards) goto error; memset(state->fuzzy_guards, 0, pattern->fuzzy_count * sizeof(RE_FuzzyGuards)); } Py_INCREF(state->pattern); Py_INCREF(state->string); /* Multithreading is allowed during matching when explicitly enabled or on * immutable strings. */ switch (concurrent) { case RE_CONC_NO: state->is_multithreaded = FALSE; break; case RE_CONC_YES: state->is_multithreaded = TRUE; break; default: state->is_multithreaded = PyUnicode_Check(string) || PyString_Check(string); break; } /* A state struct can sometimes be shared across threads. In such * instances, if multithreading is enabled we need to protect the state * with a lock (mutex) during matching. */ if (state->is_multithreaded && use_lock) state->lock = PyThread_allocate_lock(); for (i = 0; i < MAX_SEARCH_POSITIONS; i++) state->search_positions[i].start_pos = -1; return TRUE; error: re_dealloc(state->group_call_guard_list); re_dealloc(state->repeats); dealloc_groups(state->groups, pattern->true_group_count); re_dealloc(state->fuzzy_guards); state->repeats = NULL; state->groups = NULL; state->fuzzy_guards = NULL; return FALSE; } #if PY_VERSION_HEX >= 0x02060000 /* Releases the string's buffer, if necessary. */ Py_LOCAL_INLINE(void) release_buffer(RE_StringInfo* str_info) { if (str_info->should_release) PyBuffer_Release(&str_info->view); } #endif /* Initialises a state object. */ Py_LOCAL_INLINE(BOOL) state_init(RE_State* state, PatternObject* pattern, PyObject* string, Py_ssize_t start, Py_ssize_t end, BOOL overlapped, int concurrent, BOOL partial, BOOL use_lock, BOOL visible_captures, BOOL match_all) { RE_StringInfo str_info; /* Get the string to search or match. */ if (!get_string(string, &str_info)) return FALSE; /* If we fail to initialise the state then we need to release the buffer if * the string is a buffer object. */ if (!state_init_2(state, pattern, string, &str_info, start, end, overlapped, concurrent, partial, use_lock, visible_captures, match_all)) { #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return FALSE; } /* The state has been initialised successfully, so now the state has the * responsibility of releasing the buffer if the string is a buffer object. */ return TRUE; } /* Deallocates repeat data. */ Py_LOCAL_INLINE(void) dealloc_repeats(RE_RepeatData* repeats, size_t repeat_count) { size_t i; if (!repeats) return; for (i = 0; i < repeat_count; i++) { re_dealloc(repeats[i].body_guard_list.spans); re_dealloc(repeats[i].tail_guard_list.spans); } re_dealloc(repeats); } /* Deallocates fuzzy guards. */ Py_LOCAL_INLINE(void) dealloc_fuzzy_guards(RE_FuzzyGuards* guards, size_t fuzzy_count) { size_t i; if (!guards) return; for (i = 0; i < fuzzy_count; i++) { re_dealloc(guards[i].body_guard_list.spans); re_dealloc(guards[i].tail_guard_list.spans); } re_dealloc(guards); } /* Finalises a state object, discarding its contents. */ Py_LOCAL_INLINE(void) state_fini(RE_State* state) { RE_BacktrackBlock* current_backtrack; RE_AtomicBlock* current_atomic; PatternObject* pattern; RE_SavedGroups* saved_groups; RE_SavedRepeats* saved_repeats; RE_GroupCallFrame* frame; size_t i; /* Discard the lock (mutex) if there's one. */ if (state->lock) PyThread_free_lock(state->lock); /* Deallocate the backtrack blocks. */ current_backtrack = state->backtrack_block.next; while (current_backtrack) { RE_BacktrackBlock* next; next = current_backtrack->next; re_dealloc(current_backtrack); state->backtrack_allocated -= RE_BACKTRACK_BLOCK_SIZE; current_backtrack = next; } /* Deallocate the atomic blocks. */ current_atomic = state->current_atomic_block; while (current_atomic) { RE_AtomicBlock* next; next = current_atomic->next; re_dealloc(current_atomic); current_atomic = next; } state->current_atomic_block = NULL; pattern = state->pattern; saved_groups = state->first_saved_groups; while (saved_groups) { RE_SavedGroups* next; next = saved_groups->next; re_dealloc(saved_groups->spans); re_dealloc(saved_groups->counts); re_dealloc(saved_groups); saved_groups = next; } saved_repeats = state->first_saved_repeats; while (saved_repeats) { RE_SavedRepeats* next; next = saved_repeats->next; dealloc_repeats(saved_repeats->repeats, pattern->repeat_count); re_dealloc(saved_repeats); saved_repeats = next; } if (state->best_match_groups) dealloc_groups(state->best_match_groups, pattern->true_group_count); if (pattern->groups_storage) dealloc_groups(state->groups, pattern->true_group_count); else pattern->groups_storage = state->groups; if (pattern->repeats_storage) dealloc_repeats(state->repeats, pattern->repeat_count); else pattern->repeats_storage = state->repeats; frame = state->first_group_call_frame; while (frame) { RE_GroupCallFrame* next; next = frame->next; dealloc_groups(frame->groups, pattern->true_group_count); dealloc_repeats(frame->repeats, pattern->repeat_count); re_dealloc(frame); frame = next; } for (i = 0; i < pattern->call_ref_info_count; i++) re_dealloc(state->group_call_guard_list[i].spans); if (state->group_call_guard_list) re_dealloc(state->group_call_guard_list); if (state->fuzzy_guards) dealloc_fuzzy_guards(state->fuzzy_guards, pattern->fuzzy_count); Py_DECREF(state->pattern); Py_DECREF(state->string); #if PY_VERSION_HEX >= 0x02060000 if (state->should_release) PyBuffer_Release(&state->view); #endif } /* Converts a string index to an integer. * * If the index is None then the default will be returned. */ Py_LOCAL_INLINE(Py_ssize_t) as_string_index(PyObject* obj, Py_ssize_t def) { Py_ssize_t value; if (obj == Py_None) return def; value = PyInt_AsSsize_t(obj); if (value != -1 || !PyErr_Occurred()) return value; PyErr_Clear(); value = PyLong_AsLong(obj); if (value != -1 || !PyErr_Occurred()) return value; set_error(RE_ERROR_INDEX, NULL); return 0; } /* Deallocates a MatchObject. */ static void match_dealloc(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; Py_XDECREF(self->string); Py_XDECREF(self->substring); Py_DECREF(self->pattern); if (self->groups) re_dealloc(self->groups); Py_XDECREF(self->regs); PyObject_DEL(self); } /* Restricts a value to a range. */ Py_LOCAL_INLINE(Py_ssize_t) limited_range(Py_ssize_t value, Py_ssize_t lower, Py_ssize_t upper) { if (value < lower) return lower; if (value > upper) return upper; return value; } /* Gets a slice from a Unicode string. */ Py_LOCAL_INLINE(PyObject*) unicode_slice(PyObject* string, Py_ssize_t start, Py_ssize_t end) { Py_ssize_t length; Py_UNICODE* buffer; length = PyUnicode_GET_SIZE(string); start = limited_range(start, 0, length); end = limited_range(end, 0, length); buffer = PyUnicode_AsUnicode(string); return PyUnicode_FromUnicode(buffer + start, end - start); } /* Gets a slice from a bytestring. */ Py_LOCAL_INLINE(PyObject*) bytes_slice(PyObject* string, Py_ssize_t start, Py_ssize_t end) { Py_ssize_t length; char* buffer; length = PyString_GET_SIZE(string); start = limited_range(start, 0, length); end = limited_range(end, 0, length); buffer = PyString_AsString(string); return PyString_FromStringAndSize(buffer + start, end - start); } /* Gets a slice from a string, returning either a Unicode string or a * bytestring. */ Py_LOCAL_INLINE(PyObject*) get_slice(PyObject* string, Py_ssize_t start, Py_ssize_t end) { if (PyUnicode_Check(string)) return unicode_slice(string, start, end); if (PyString_Check(string)) return bytes_slice(string, start, end); return PySequence_GetSlice(string, start, end); } /* Gets a MatchObject's group by integer index. */ static PyObject* match_get_group_by_index(MatchObject* self, Py_ssize_t index, PyObject* def) { RE_GroupSpan* span; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) return get_slice(self->substring, self->match_start - self->substring_offset, self->match_end - self->substring_offset); /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ span = &self->groups[index - 1].span; if (span->start < 0 || span->end < 0) { /* Return default value if the string or group is undefined. */ Py_INCREF(def); return def; } return get_slice(self->substring, span->start - self->substring_offset, span->end - self->substring_offset); } /* Gets a MatchObject's start by integer index. */ static PyObject* match_get_start_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupSpan* span; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) return Py_BuildValue("n", self->match_start); /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ span = &self->groups[index - 1].span; return Py_BuildValue("n", span->start); } /* Gets a MatchObject's starts by integer index. */ static PyObject* match_get_starts_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupData* group; PyObject* result; PyObject* item; size_t i; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) { result = PyList_New(1); if (!result) return NULL; item = Py_BuildValue("n", self->match_start); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, 0, item); return result; } /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ group = &self->groups[index - 1]; result = PyList_New((Py_ssize_t)group->capture_count); if (!result) return NULL; for (i = 0; i < group->capture_count; i++) { item = Py_BuildValue("n", group->captures[i].start); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, i, item); } return result; error: Py_DECREF(result); return NULL; } /* Gets a MatchObject's end by integer index. */ static PyObject* match_get_end_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupSpan* span; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) return Py_BuildValue("n", self->match_end); /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ span = &self->groups[index - 1].span; return Py_BuildValue("n", span->end); } /* Gets a MatchObject's ends by integer index. */ static PyObject* match_get_ends_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupData* group; PyObject* result; PyObject* item; size_t i; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) { result = PyList_New(1); if (!result) return NULL; item = Py_BuildValue("n", self->match_end); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, 0, item); return result; } /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ group = &self->groups[index - 1]; result = PyList_New((Py_ssize_t)group->capture_count); if (!result) return NULL; for (i = 0; i < group->capture_count; i++) { item = Py_BuildValue("n", group->captures[i].end); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, i, item); } return result; error: Py_DECREF(result); return NULL; } /* Gets a MatchObject's span by integer index. */ static PyObject* match_get_span_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupSpan* span; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) return Py_BuildValue("nn", self->match_start, self->match_end); /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ span = &self->groups[index - 1].span; return Py_BuildValue("nn", span->start, span->end); } /* Gets a MatchObject's spans by integer index. */ static PyObject* match_get_spans_by_index(MatchObject* self, Py_ssize_t index) { PyObject* result; PyObject* item; RE_GroupData* group; size_t i; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) { result = PyList_New(1); if (!result) return NULL; item = Py_BuildValue("nn", self->match_start, self->match_end); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, 0, item); return result; } /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ group = &self->groups[index - 1]; result = PyList_New((Py_ssize_t)group->capture_count); if (!result) return NULL; for (i = 0; i < group->capture_count; i++) { item = Py_BuildValue("nn", group->captures[i].start, group->captures[i].end); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, i, item); } return result; error: Py_DECREF(result); return NULL; } /* Gets a MatchObject's captures by integer index. */ static PyObject* match_get_captures_by_index(MatchObject* self, Py_ssize_t index) { PyObject* result; PyObject* slice; RE_GroupData* group; size_t i; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) { result = PyList_New(1); if (!result) return NULL; slice = get_slice(self->substring, self->match_start - self->substring_offset, self->match_end - self->substring_offset); if (!slice) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, 0, slice); return result; } /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ group = &self->groups[index - 1]; result = PyList_New((Py_ssize_t)group->capture_count); if (!result) return NULL; for (i = 0; i < group->capture_count; i++) { slice = get_slice(self->substring, group->captures[i].start - self->substring_offset, group->captures[i].end - self->substring_offset); if (!slice) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, i, slice); } return result; error: Py_DECREF(result); return NULL; } /* Converts a group index to an integer. */ Py_LOCAL_INLINE(Py_ssize_t) as_group_index(PyObject* obj) { Py_ssize_t value; value = PyInt_AsSsize_t(obj); if (value != -1 || !PyErr_Occurred()) return value; PyErr_Clear(); value = PyLong_AsLong(obj); if (value != -1 || !PyErr_Occurred()) return value; set_error(RE_ERROR_INDEX, NULL); return -1; } /* Gets a MatchObject's group index. * * The supplied index can be an integer or a string (group name) object. */ Py_LOCAL_INLINE(Py_ssize_t) match_get_group_index(MatchObject* self, PyObject* index, BOOL allow_neg) { Py_ssize_t group; /* Is the index an integer? */ group = as_group_index(index); if (group != -1 || !PyErr_Occurred()) { Py_ssize_t min_group = 0; /* Adjust negative indices where valid and allowed. */ if (group < 0 && allow_neg) { group += (Py_ssize_t)self->group_count + 1; min_group = 1; } if (min_group <= group && (size_t)group <= self->group_count) return group; return -1; } /* The index might be a group name. */ if (self->pattern->groupindex) { /* Look up the name. */ PyErr_Clear(); index = PyObject_GetItem(self->pattern->groupindex, index); if (index) { /* Check that we have an integer. */ group = as_group_index(index); Py_DECREF(index); if (group != -1 || !PyErr_Occurred()) return group; } } PyErr_Clear(); return -1; } /* Gets a MatchObject's group by object index. */ Py_LOCAL_INLINE(PyObject*) match_get_group(MatchObject* self, PyObject* index, PyObject* def, BOOL allow_neg) { /* Check that the index is an integer or a string. */ if (PyInt_Check(index) || PyLong_Check(index) || PyUnicode_Check(index) || PyString_Check(index)) return match_get_group_by_index(self, match_get_group_index(self, index, allow_neg), def); set_error(RE_ERROR_GROUP_INDEX_TYPE, index); return NULL; } /* Gets info from a MatchObject by object index. */ Py_LOCAL_INLINE(PyObject*) get_by_arg(MatchObject* self, PyObject* index, RE_GetByIndexFunc get_by_index) { /* Check that the index is an integer or a string. */ if (PyInt_Check(index) || PyLong_Check(index) || PyUnicode_Check(index) || PyString_Check(index)) return get_by_index(self, match_get_group_index(self, index, FALSE)); set_error(RE_ERROR_GROUP_INDEX_TYPE, index); return NULL; } /* MatchObject's 'group' method. */ static PyObject* match_group(MatchObject* self, PyObject* args) { Py_ssize_t size; PyObject* result; Py_ssize_t i; size = PyTuple_GET_SIZE(args); switch (size) { case 0: /* group() */ result = match_get_group_by_index(self, 0, Py_None); break; case 1: /* group(x). PyTuple_GET_ITEM borrows the reference. */ result = match_get_group(self, PyTuple_GET_ITEM(args, 0), Py_None, FALSE); break; default: /* group(x, y, z, ...) */ /* Fetch multiple items. */ result = PyTuple_New(size); if (!result) return NULL; for (i = 0; i < size; i++) { PyObject* item; /* PyTuple_GET_ITEM borrows the reference. */ item = match_get_group(self, PyTuple_GET_ITEM(args, i), Py_None, FALSE); if (!item) { Py_DECREF(result); return NULL; } /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(result, i, item); } break; } return result; } /* Generic method for getting info from a MatchObject. */ Py_LOCAL_INLINE(PyObject*) get_from_match(MatchObject* self, PyObject* args, RE_GetByIndexFunc get_by_index) { Py_ssize_t size; PyObject* result; Py_ssize_t i; size = PyTuple_GET_SIZE(args); switch (size) { case 0: /* get() */ result = get_by_index(self, 0); break; case 1: /* get(x). PyTuple_GET_ITEM borrows the reference. */ result = get_by_arg(self, PyTuple_GET_ITEM(args, 0), get_by_index); break; default: /* get(x, y, z, ...) */ /* Fetch multiple items. */ result = PyTuple_New(size); if (!result) return NULL; for (i = 0; i < size; i++) { PyObject* item; /* PyTuple_GET_ITEM borrows the reference. */ item = get_by_arg(self, PyTuple_GET_ITEM(args, i), get_by_index); if (!item) { Py_DECREF(result); return NULL; } /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(result, i, item); } break; } return result; } /* MatchObject's 'start' method. */ static PyObject* match_start(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_start_by_index); } /* MatchObject's 'starts' method. */ static PyObject* match_starts(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_starts_by_index); } /* MatchObject's 'end' method. */ static PyObject* match_end(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_end_by_index); } /* MatchObject's 'ends' method. */ static PyObject* match_ends(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_ends_by_index); } /* MatchObject's 'span' method. */ static PyObject* match_span(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_span_by_index); } /* MatchObject's 'spans' method. */ static PyObject* match_spans(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_spans_by_index); } /* MatchObject's 'captures' method. */ static PyObject* match_captures(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_captures_by_index); } /* MatchObject's 'groups' method. */ static PyObject* match_groups(MatchObject* self, PyObject* args, PyObject* kwargs) { PyObject* result; size_t g; PyObject* def = Py_None; static char* kwlist[] = { "default", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O:groups", kwlist, &def)) return NULL; result = PyTuple_New((Py_ssize_t)self->group_count); if (!result) return NULL; /* Group 0 is the entire matched portion of the string. */ for (g = 0; g < self->group_count; g++) { PyObject* item; item = match_get_group_by_index(self, (Py_ssize_t)g + 1, def); if (!item) goto error; /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(result, g, item); } return result; error: Py_DECREF(result); return NULL; } /* MatchObject's 'groupdict' method. */ static PyObject* match_groupdict(MatchObject* self, PyObject* args, PyObject* kwargs) { PyObject* result; PyObject* keys; Py_ssize_t g; PyObject* def = Py_None; static char* kwlist[] = { "default", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O:groupdict", kwlist, &def)) return NULL; result = PyDict_New(); if (!result || !self->pattern->groupindex) return result; keys = PyMapping_Keys(self->pattern->groupindex); if (!keys) goto failed; for (g = 0; g < PyList_GET_SIZE(keys); g++) { PyObject* key; PyObject* value; int status; /* PyList_GET_ITEM borrows a reference. */ key = PyList_GET_ITEM(keys, g); if (!key) goto failed; value = match_get_group(self, key, def, FALSE); if (!value) goto failed; status = PyDict_SetItem(result, key, value); Py_DECREF(value); if (status < 0) goto failed; } Py_DECREF(keys); return result; failed: Py_XDECREF(keys); Py_DECREF(result); return NULL; } /* MatchObject's 'capturesdict' method. */ static PyObject* match_capturesdict(MatchObject* self) { PyObject* result; PyObject* keys; Py_ssize_t g; result = PyDict_New(); if (!result || !self->pattern->groupindex) return result; keys = PyMapping_Keys(self->pattern->groupindex); if (!keys) goto failed; for (g = 0; g < PyList_GET_SIZE(keys); g++) { PyObject* key; Py_ssize_t group; PyObject* captures; int status; /* PyList_GET_ITEM borrows a reference. */ key = PyList_GET_ITEM(keys, g); if (!key) goto failed; group = match_get_group_index(self, key, FALSE); if (group < 0) goto failed; captures = match_get_captures_by_index(self, group); if (!captures) goto failed; status = PyDict_SetItem(result, key, captures); Py_DECREF(captures); if (status < 0) goto failed; } Py_DECREF(keys); return result; failed: Py_XDECREF(keys); Py_DECREF(result); return NULL; } /* Gets a Python object by name from a named module. */ Py_LOCAL_INLINE(PyObject*) get_object(char* module_name, char* object_name) { PyObject* module; PyObject* object; module = PyImport_ImportModule(module_name); if (!module) return NULL; object = PyObject_GetAttrString(module, object_name); Py_DECREF(module); return object; } /* Calls a function in a module. */ Py_LOCAL_INLINE(PyObject*) call(char* module_name, char* function_name, PyObject* args) { PyObject* function; PyObject* result; if (!args) return NULL; function = get_object(module_name, function_name); if (!function) return NULL; result = PyObject_CallObject(function, args); Py_DECREF(function); Py_DECREF(args); return result; } /* Gets a replacement item from the replacement list. * * The replacement item could be a string literal or a group. */ Py_LOCAL_INLINE(PyObject*) get_match_replacement(MatchObject* self, PyObject* item, size_t group_count) { Py_ssize_t index; if (PyUnicode_Check(item) || PyString_Check(item)) { /* It's a literal, which can be added directly to the list. */ Py_INCREF(item); return item; } /* Is it a group reference? */ index = as_group_index(item); if (index == -1 && PyErr_Occurred()) { /* Not a group either! */ set_error(RE_ERROR_REPLACEMENT, NULL); return NULL; } if (index == 0) { /* The entire matched portion of the string. */ return get_slice(self->substring, self->match_start - self->substring_offset, self->match_end - self->substring_offset); } else if (index >= 1 && (size_t)index <= group_count) { /* A group. If it didn't match then return None instead. */ RE_GroupData* group; group = &self->groups[index - 1]; if (group->capture_count > 0) return get_slice(self->substring, group->span.start - self->substring_offset, group->span.end - self->substring_offset); else { Py_INCREF(Py_None); return Py_None; } } else { /* No such group. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } } /* Initialises the join list. */ Py_LOCAL_INLINE(void) init_join_list(JoinInfo* join_info, BOOL reversed, BOOL is_unicode) { join_info->list = NULL; join_info->item = NULL; join_info->reversed = reversed; join_info->is_unicode = is_unicode; } /* Adds an item to the join list. */ Py_LOCAL_INLINE(int) add_to_join_list(JoinInfo* join_info, PyObject* item) { PyObject* new_item; int status; if (join_info->is_unicode) { if (PyUnicode_Check(item)) { new_item = item; Py_INCREF(new_item); } else { new_item = PyUnicode_FromObject(item); if (!new_item) { set_error(RE_ERROR_NOT_UNICODE, item); return RE_ERROR_NOT_UNICODE; } } } else { if (PyString_Check(item)) { new_item = item; Py_INCREF(new_item); } else { new_item = PyUnicode_FromObject(item); if (!new_item) { set_error(RE_ERROR_NOT_STRING, item); return RE_ERROR_NOT_STRING; } } } /* If the list already exists then just add the item to it. */ if (join_info->list) { status = PyList_Append(join_info->list, new_item); if (status < 0) goto error; Py_DECREF(new_item); return status; } /* If we already have an item then we now have 2(!) and we need to put them * into a list. */ if (join_info->item) { join_info->list = PyList_New(2); if (!join_info->list) { status = RE_ERROR_MEMORY; goto error; } /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(join_info->list, 0, join_info->item); join_info->item = NULL; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(join_info->list, 1, new_item); return 0; } /* This is the first item. */ join_info->item = new_item; return 0; error: Py_DECREF(new_item); set_error(status, NULL); return status; } /* Clears the join list. */ Py_LOCAL_INLINE(void) clear_join_list(JoinInfo* join_info) { Py_XDECREF(join_info->list); Py_XDECREF(join_info->item); } /* Joins together a list of strings for pattern_subx. */ Py_LOCAL_INLINE(PyObject*) join_list_info(JoinInfo* join_info) { /* If the list already exists then just do the join. */ if (join_info->list) { PyObject* joiner; PyObject* result; if (join_info->reversed) /* The list needs to be reversed before being joined. */ PyList_Reverse(join_info->list); if (join_info->is_unicode) { /* Concatenate the Unicode strings. */ joiner = PyUnicode_FromUnicode(NULL, 0); if (!joiner) { clear_join_list(join_info); return NULL; } result = PyUnicode_Join(joiner, join_info->list); } else { joiner = PyString_FromString(""); if (!joiner) { clear_join_list(join_info); return NULL; } /* Concatenate the bytestrings. */ result = _PyString_Join(joiner, join_info->list); } Py_DECREF(joiner); clear_join_list(join_info); return result; } /* If we have only 1 item, so we'll just return it. */ if (join_info->item) return join_info->item; /* There are no items, so return an empty string. */ if (join_info->is_unicode) return PyUnicode_FromUnicode(NULL, 0); else return PyString_FromString(""); } /* Checks whether a string replacement is a literal. * * To keep it simple we'll say that a literal is a string which can be used * as-is. * * Returns its length if it is a literal, otherwise -1. */ Py_LOCAL_INLINE(Py_ssize_t) check_replacement_string(PyObject* str_replacement, unsigned char special_char) { RE_StringInfo str_info; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); Py_ssize_t pos; if (!get_string(str_replacement, &str_info)) return -1; switch (str_info.charsize) { case 1: char_at = bytes1_char_at; break; case 2: char_at = bytes2_char_at; break; case 4: char_at = bytes4_char_at; break; default: #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return -1; } for (pos = 0; pos < str_info.length; pos++) { if (char_at(str_info.characters, pos) == special_char) { #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return -1; } } #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return str_info.length; } /* MatchObject's 'expand' method. */ static PyObject* match_expand(MatchObject* self, PyObject* str_template) { Py_ssize_t literal_length; PyObject* replacement; JoinInfo join_info; Py_ssize_t size; Py_ssize_t i; /* Is the template just a literal? */ literal_length = check_replacement_string(str_template, '\\'); if (literal_length >= 0) { /* It's a literal. */ Py_INCREF(str_template); return str_template; } /* Hand the template to the template compiler. */ replacement = call(RE_MODULE, "_compile_replacement_helper", PyTuple_Pack(2, self->pattern, str_template)); if (!replacement) return NULL; init_join_list(&join_info, FALSE, PyUnicode_Check(self->string)); /* Add each part of the template to the list. */ size = PyList_GET_SIZE(replacement); for (i = 0; i < size; i++) { PyObject* item; PyObject* str_item; /* PyList_GET_ITEM borrows a reference. */ item = PyList_GET_ITEM(replacement, i); str_item = get_match_replacement(self, item, self->group_count); if (!str_item) goto error; /* Add to the list. */ if (str_item == Py_None) Py_DECREF(str_item); else { int status; status = add_to_join_list(&join_info, str_item); Py_DECREF(str_item); if (status < 0) goto error; } } Py_DECREF(replacement); /* Convert the list to a single string (also cleans up join_info). */ return join_list_info(&join_info); error: clear_join_list(&join_info); Py_DECREF(replacement); return NULL; } #if PY_VERSION_HEX >= 0x02060000 /* Gets a MatchObject's group dictionary. */ Py_LOCAL_INLINE(PyObject*) match_get_group_dict(MatchObject* self) { PyObject* result; PyObject* keys; Py_ssize_t g; result = PyDict_New(); if (!result || !self->pattern->groupindex) return result; keys = PyMapping_Keys(self->pattern->groupindex); if (!keys) goto failed; for (g = 0; g < PyList_GET_SIZE(keys); g++) { PyObject* key; PyObject* value; int status; /* PyList_GET_ITEM borrows a reference. */ key = PyList_GET_ITEM(keys, g); if (!key) goto failed; value = match_get_group(self, key, Py_None, FALSE); if (!value) goto failed; status = PyDict_SetItem(result, key, value); Py_DECREF(value); if (status < 0) goto failed; } Py_DECREF(keys); return result; failed: Py_XDECREF(keys); Py_DECREF(result); return NULL; } static PyTypeObject Capture_Type = { PyObject_HEAD_INIT(NULL) 0, "_" RE_MODULE "." "Capture", sizeof(MatchObject) }; /* Creates a new CaptureObject. */ Py_LOCAL_INLINE(PyObject*) make_capture_object(MatchObject** match_indirect, Py_ssize_t index) { CaptureObject* capture; capture = PyObject_NEW(CaptureObject, &Capture_Type); if (!capture) return NULL; capture->group_index = index; capture->match_indirect = match_indirect; return (PyObject*)capture; } #if PY_VERSION_HEX >= 0x02060000 /* Makes a MatchObject's capture dictionary. */ Py_LOCAL_INLINE(PyObject*) make_capture_dict(MatchObject* match, MatchObject** match_indirect) { PyObject* result; PyObject* keys; PyObject* values = NULL; Py_ssize_t g; result = PyDict_New(); if (!result) return result; keys = PyMapping_Keys(match->pattern->groupindex); if (!keys) goto failed; values = PyMapping_Values(match->pattern->groupindex); if (!values) goto failed; for (g = 0; g < PyList_GET_SIZE(keys); g++) { PyObject* key; PyObject* value; Py_ssize_t v; int status; /* PyList_GET_ITEM borrows a reference. */ key = PyList_GET_ITEM(keys, g); if (!key) goto failed; /* PyList_GET_ITEM borrows a reference. */ value = PyList_GET_ITEM(values, g); if (!value) goto failed; v = PyLong_AsLong(value); if (v == -1 && PyErr_Occurred()) goto failed; value = make_capture_object(match_indirect, v); if (!value) goto failed; status = PyDict_SetItem(result, key, value); Py_DECREF(value); if (status < 0) goto failed; } Py_DECREF(values); Py_DECREF(keys); return result; failed: Py_XDECREF(values); Py_XDECREF(keys); Py_DECREF(result); return NULL; } #endif /* MatchObject's 'expandf' method. */ static PyObject* match_expandf(MatchObject* self, PyObject* str_template) { PyObject* format_func; PyObject* args = NULL; size_t g; PyObject* kwargs = NULL; PyObject* result; format_func = PyObject_GetAttrString(str_template, "format"); if (!format_func) return NULL; args = PyTuple_New((Py_ssize_t)self->group_count + 1); if (!args) goto error; for (g = 0; g < self->group_count + 1; g++) /* PyTuple_SetItem borrows the reference. */ PyTuple_SetItem(args, (Py_ssize_t)g, make_capture_object(&self, (Py_ssize_t)g)); kwargs = make_capture_dict(self, &self); if (!kwargs) goto error; result = PyObject_Call(format_func, args, kwargs); Py_DECREF(kwargs); Py_DECREF(args); Py_DECREF(format_func); return result; error: Py_XDECREF(args); Py_DECREF(format_func); return NULL; } #endif Py_LOCAL_INLINE(PyObject*) make_match_copy(MatchObject* self); /* MatchObject's '__copy__' method. */ static PyObject* match_copy(MatchObject* self, PyObject *unused) { return make_match_copy(self); } /* MatchObject's '__deepcopy__' method. */ static PyObject* match_deepcopy(MatchObject* self, PyObject* memo) { return make_match_copy(self); } /* MatchObject's 'regs' attribute. */ static PyObject* match_regs(MatchObject* self) { PyObject* regs; PyObject* item; size_t g; regs = PyTuple_New((Py_ssize_t)self->group_count + 1); if (!regs) return NULL; item = Py_BuildValue("nn", self->match_start, self->match_end); if (!item) goto error; /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(regs, 0, item); for (g = 0; g < self->group_count; g++) { RE_GroupSpan* span; span = &self->groups[g].span; item = Py_BuildValue("nn", span->start, span->end); if (!item) goto error; /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(regs, g + 1, item); } Py_INCREF(regs); self->regs = regs; return regs; error: Py_DECREF(regs); return NULL; } /* MatchObject's slice method. */ Py_LOCAL_INLINE(PyObject*) match_get_group_slice(MatchObject* self, PyObject* slice) { Py_ssize_t start; Py_ssize_t end; Py_ssize_t step; Py_ssize_t slice_length; if (PySlice_GetIndicesEx((PySliceObject*)slice, (Py_ssize_t)self->group_count + 1, &start, &end, &step, &slice_length) < 0) return NULL; if (slice_length <= 0) return PyTuple_New(0); else { PyObject* result; Py_ssize_t cur; Py_ssize_t i; result = PyTuple_New(slice_length); if (!result) return NULL; cur = start; for (i = 0; i < slice_length; i++) { /* PyTuple_SetItem borrows the reference. */ PyTuple_SetItem(result, i, match_get_group_by_index(self, cur, Py_None)); cur += step; } return result; } } /* MatchObject's length method. */ Py_LOCAL_INLINE(Py_ssize_t) match_length(MatchObject* self) { return (Py_ssize_t)self->group_count + 1; } /* MatchObject's '__getitem__' method. */ static PyObject* match_getitem(MatchObject* self, PyObject* item) { if (PySlice_Check(item)) return match_get_group_slice(self, item); return match_get_group(self, item, Py_None, TRUE); } /* Determines the portion of the target string which is covered by the group * captures. */ Py_LOCAL_INLINE(void) determine_target_substring(MatchObject* match, Py_ssize_t* slice_start, Py_ssize_t* slice_end) { Py_ssize_t start; Py_ssize_t end; size_t g; start = match->pos; end = match->endpos; for (g = 0; g < match->group_count; g++) { RE_GroupSpan* span; size_t c; span = &match->groups[g].span; if (span->start >= 0 && span->start < start) start = span->start; if (span->end >= 0 && span->end > end) end = span->end; for (c = 0; c < match->groups[g].capture_count; c++) { RE_GroupSpan* span; span = match->groups[g].captures; if (span->start >= 0 && span->start < start) start = span->start; if (span->end >= 0 && span->end > end) end = span->end; } } *slice_start = start; *slice_end = end; } /* MatchObject's 'detach_string' method. */ static PyObject* match_detach_string(MatchObject* self, PyObject* unused) { if (self->string) { Py_ssize_t start; Py_ssize_t end; PyObject* substring; determine_target_substring(self, &start, &end); substring = get_slice(self->string, start, end); if (substring) { Py_XDECREF(self->substring); self->substring = substring; self->substring_offset = start; Py_DECREF(self->string); self->string = NULL; } } Py_INCREF(Py_None); return Py_None; } /* The documentation of a MatchObject. */ PyDoc_STRVAR(match_group_doc, "group([group1, ...]) --> string or tuple of strings.\n\ Return one or more subgroups of the match. If there is a single argument,\n\ the result is a single string, or None if the group did not contribute to\n\ the match; if there are multiple arguments, the result is a tuple with one\n\ item per argument; if there are no arguments, the whole match is returned.\n\ Group 0 is the whole match."); PyDoc_STRVAR(match_start_doc, "start([group1, ...]) --> int or tuple of ints.\n\ Return the index of the start of one or more subgroups of the match. If\n\ there is a single argument, the result is an index, or -1 if the group did\n\ not contribute to the match; if there are multiple arguments, the result is\n\ a tuple with one item per argument; if there are no arguments, the index of\n\ the start of the whole match is returned. Group 0 is the whole match."); PyDoc_STRVAR(match_end_doc, "end([group1, ...]) --> int or tuple of ints.\n\ Return the index of the end of one or more subgroups of the match. If there\n\ is a single argument, the result is an index, or -1 if the group did not\n\ contribute to the match; if there are multiple arguments, the result is a\n\ tuple with one item per argument; if there are no arguments, the index of\n\ the end of the whole match is returned. Group 0 is the whole match."); PyDoc_STRVAR(match_span_doc, "span([group1, ...]) --> 2-tuple of int or tuple of 2-tuple of ints.\n\ Return the span (a 2-tuple of the indices of the start and end) of one or\n\ more subgroups of the match. If there is a single argument, the result is a\n\ span, or (-1, -1) if the group did not contribute to the match; if there are\n\ multiple arguments, the result is a tuple with one item per argument; if\n\ there are no arguments, the span of the whole match is returned. Group 0 is\n\ the whole match."); PyDoc_STRVAR(match_groups_doc, "groups(default=None) --> tuple of strings.\n\ Return a tuple containing all the subgroups of the match. The argument is\n\ the default for groups that did not participate in the match."); PyDoc_STRVAR(match_groupdict_doc, "groupdict(default=None) --> dict.\n\ Return a dictionary containing all the named subgroups of the match, keyed\n\ by the subgroup name. The argument is the value to be given for groups that\n\ did not participate in the match."); PyDoc_STRVAR(match_capturesdict_doc, "capturesdict() --> dict.\n\ Return a dictionary containing the captures of all the named subgroups of the\n\ match, keyed by the subgroup name."); PyDoc_STRVAR(match_expand_doc, "expand(template) --> string.\n\ Return the string obtained by doing backslash substitution on the template,\n\ as done by the sub() method."); #if PY_VERSION_HEX >= 0x02060000 PyDoc_STRVAR(match_expandf_doc, "expandf(format) --> string.\n\ Return the string obtained by using the format, as done by the subf()\n\ method."); #endif PyDoc_STRVAR(match_captures_doc, "captures([group1, ...]) --> list of strings or tuple of list of strings.\n\ Return the captures of one or more subgroups of the match. If there is a\n\ single argument, the result is a list of strings; if there are multiple\n\ arguments, the result is a tuple of lists with one item per argument; if\n\ there are no arguments, the captures of the whole match is returned. Group\n\ 0 is the whole match."); PyDoc_STRVAR(match_starts_doc, "starts([group1, ...]) --> list of ints or tuple of list of ints.\n\ Return the indices of the starts of the captures of one or more subgroups of\n\ the match. If there is a single argument, the result is a list of indices;\n\ if there are multiple arguments, the result is a tuple of lists with one\n\ item per argument; if there are no arguments, the indices of the starts of\n\ the captures of the whole match is returned. Group 0 is the whole match."); PyDoc_STRVAR(match_ends_doc, "ends([group1, ...]) --> list of ints or tuple of list of ints.\n\ Return the indices of the ends of the captures of one or more subgroups of\n\ the match. If there is a single argument, the result is a list of indices;\n\ if there are multiple arguments, the result is a tuple of lists with one\n\ item per argument; if there are no arguments, the indices of the ends of the\n\ captures of the whole match is returned. Group 0 is the whole match."); PyDoc_STRVAR(match_spans_doc, "spans([group1, ...]) --> list of 2-tuple of ints or tuple of list of 2-tuple of ints.\n\ Return the spans (a 2-tuple of the indices of the start and end) of the\n\ captures of one or more subgroups of the match. If there is a single\n\ argument, the result is a list of spans; if there are multiple arguments,\n\ the result is a tuple of lists with one item per argument; if there are no\n\ arguments, the spans of the captures of the whole match is returned. Group\n\ 0 is the whole match."); PyDoc_STRVAR(match_detach_string_doc, "detach_string()\n\ Detaches the target string from the match object. The 'string' attribute\n\ will become None."); /* MatchObject's methods. */ static PyMethodDef match_methods[] = { {"group", (PyCFunction)match_group, METH_VARARGS, match_group_doc}, {"start", (PyCFunction)match_start, METH_VARARGS, match_start_doc}, {"end", (PyCFunction)match_end, METH_VARARGS, match_end_doc}, {"span", (PyCFunction)match_span, METH_VARARGS, match_span_doc}, {"groups", (PyCFunction)match_groups, METH_VARARGS|METH_KEYWORDS, match_groups_doc}, {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS|METH_KEYWORDS, match_groupdict_doc}, {"capturesdict", (PyCFunction)match_capturesdict, METH_NOARGS, match_capturesdict_doc}, {"expand", (PyCFunction)match_expand, METH_O, match_expand_doc}, #if PY_VERSION_HEX >= 0x02060000 {"expandf", (PyCFunction)match_expandf, METH_O, match_expandf_doc}, #endif {"captures", (PyCFunction)match_captures, METH_VARARGS, match_captures_doc}, {"starts", (PyCFunction)match_starts, METH_VARARGS, match_starts_doc}, {"ends", (PyCFunction)match_ends, METH_VARARGS, match_ends_doc}, {"spans", (PyCFunction)match_spans, METH_VARARGS, match_spans_doc}, {"detach_string", (PyCFunction)match_detach_string, METH_NOARGS, match_detach_string_doc}, {"__copy__", (PyCFunction)match_copy, METH_NOARGS}, {"__deepcopy__", (PyCFunction)match_deepcopy, METH_O}, {"__getitem__", (PyCFunction)match_getitem, METH_O|METH_COEXIST}, {NULL, NULL} }; PyDoc_STRVAR(match_doc, "Match object"); /* MatchObject's 'lastindex' attribute. */ static PyObject* match_lastindex(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; if (self->lastindex >= 0) return Py_BuildValue("n", self->lastindex); Py_INCREF(Py_None); return Py_None; } /* MatchObject's 'lastgroup' attribute. */ static PyObject* match_lastgroup(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; if (self->pattern->indexgroup && self->lastgroup >= 0) { PyObject* index; PyObject* result; index = Py_BuildValue("n", self->lastgroup); /* PyDict_GetItem returns borrows a reference. */ result = PyDict_GetItem(self->pattern->indexgroup, index); Py_DECREF(index); if (result) { Py_INCREF(result); return result; } PyErr_Clear(); } Py_INCREF(Py_None); return Py_None; } /* MatchObject's 'string' attribute. */ static PyObject* match_string(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; if (self->string) { Py_INCREF(self->string); return self->string; } else { Py_INCREF(Py_None); return Py_None; } } #if PY_VERSION_HEX < 0x02060000 /* MatchObject's 'partial' attribute. */ static PyObject* match_partial(PyObject* self_) { MatchObject* self; PyObject* result; self = (MatchObject*)self_; result = self->partial ? Py_True : Py_False; Py_INCREF(result); return result; } #endif /* MatchObject's 'fuzzy_counts' attribute. */ static PyObject* match_fuzzy_counts(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; return Py_BuildValue("nnn", self->fuzzy_counts[RE_FUZZY_SUB], self->fuzzy_counts[RE_FUZZY_INS], self->fuzzy_counts[RE_FUZZY_DEL]); } static PyGetSetDef match_getset[] = { {"lastindex", (getter)match_lastindex, (setter)NULL, "The group number of the last matched capturing group, or None."}, {"lastgroup", (getter)match_lastgroup, (setter)NULL, "The name of the last matched capturing group, or None."}, {"regs", (getter)match_regs, (setter)NULL, "A tuple of the spans of the capturing groups."}, {"string", (getter)match_string, (setter)NULL, "The string that was searched, or None if it has been detached."}, #if PY_VERSION_HEX < 0x02060000 {"partial", (getter)match_partial, (setter)NULL, "Whether it's a partial match."}, #endif {"fuzzy_counts", (getter)match_fuzzy_counts, (setter)NULL, "A tuple of the number of substitutions, insertions and deletions."}, {NULL} /* Sentinel */ }; static PyMemberDef match_members[] = { {"re", T_OBJECT, offsetof(MatchObject, pattern), READONLY, "The regex object that produced this match object."}, {"pos", T_PYSSIZET, offsetof(MatchObject, pos), READONLY, "The position at which the regex engine starting searching."}, {"endpos", T_PYSSIZET, offsetof(MatchObject, endpos), READONLY, "The final position beyond which the regex engine won't search."}, #if PY_VERSION_HEX >= 0x02060000 {"partial", T_BOOL, offsetof(MatchObject, partial), READONLY, "Whether it's a partial match."}, #endif {NULL} /* Sentinel */ }; static PyMappingMethods match_as_mapping = { (lenfunc)match_length, /* mp_length */ (binaryfunc)match_getitem, /* mp_subscript */ 0, /* mp_ass_subscript */ }; static PyTypeObject Match_Type = { PyObject_HEAD_INIT(NULL) 0, "_" RE_MODULE "." "Match", sizeof(MatchObject) }; /* Copies the groups. */ Py_LOCAL_INLINE(RE_GroupData*) copy_groups(RE_GroupData* groups, size_t group_count) { size_t span_count; size_t g; RE_GroupData* groups_copy; RE_GroupSpan* spans_copy; size_t offset; /* Calculate the total size of the group info. */ span_count = 0; for (g = 0; g < group_count; g++) span_count += groups[g].capture_count; /* Allocate the storage for the group info in a single block. */ groups_copy = (RE_GroupData*)re_alloc(group_count * sizeof(RE_GroupData) + span_count * sizeof(RE_GroupSpan)); if (!groups_copy) return NULL; /* The storage for the spans comes after the other group info. */ spans_copy = (RE_GroupSpan*)&groups_copy[group_count]; /* There's no need to initialise the spans info. */ memset(groups_copy, 0, group_count * sizeof(RE_GroupData)); offset = 0; for (g = 0; g < group_count; g++) { RE_GroupData* orig; RE_GroupData* copy; orig = &groups[g]; copy = &groups_copy[g]; copy->span = orig->span; copy->captures = &spans_copy[offset]; offset += orig->capture_count; if (orig->capture_count > 0) { Py_MEMCPY(copy->captures, orig->captures, orig->capture_count * sizeof(RE_GroupSpan)); copy->capture_capacity = orig->capture_count; copy->capture_count = orig->capture_count; } } return groups_copy; } /* Makes a copy of a MatchObject. */ Py_LOCAL_INLINE(PyObject*) make_match_copy(MatchObject* self) { MatchObject* match; if (!self->string) { /* The target string has been detached, so the MatchObject is now * immutable. */ Py_INCREF(self); return (PyObject*)self; } /* Create a MatchObject. */ match = PyObject_NEW(MatchObject, &Match_Type); if (!match) return NULL; Py_MEMCPY(match, self, sizeof(MatchObject)); Py_INCREF(match->string); Py_INCREF(match->substring); Py_INCREF(match->pattern); /* Copy the groups to the MatchObject. */ if (self->group_count > 0) { match->groups = copy_groups(self->groups, self->group_count); if (!match->groups) { Py_DECREF(match); return NULL; } } return (PyObject*)match; } /* Creates a new MatchObject. */ Py_LOCAL_INLINE(PyObject*) pattern_new_match(PatternObject* pattern, RE_State* state, int status) { /* Create MatchObject (from state object). */ if (status > 0 || status == RE_ERROR_PARTIAL) { MatchObject* match; /* Create a MatchObject. */ match = PyObject_NEW(MatchObject, &Match_Type); if (!match) return NULL; match->string = state->string; match->substring = state->string; match->substring_offset = 0; match->pattern = pattern; match->regs = NULL; if (pattern->is_fuzzy) { match->fuzzy_counts[RE_FUZZY_SUB] = state->total_fuzzy_counts[RE_FUZZY_SUB]; match->fuzzy_counts[RE_FUZZY_INS] = state->total_fuzzy_counts[RE_FUZZY_INS]; match->fuzzy_counts[RE_FUZZY_DEL] = state->total_fuzzy_counts[RE_FUZZY_DEL]; } else memset(match->fuzzy_counts, 0, sizeof(match->fuzzy_counts)); match->partial = status == RE_ERROR_PARTIAL; Py_INCREF(match->string); Py_INCREF(match->substring); Py_INCREF(match->pattern); /* Copy the groups to the MatchObject. */ if (pattern->public_group_count > 0) { match->groups = copy_groups(state->groups, pattern->public_group_count); if (!match->groups) { Py_DECREF(match); return NULL; } } else match->groups = NULL; match->group_count = pattern->public_group_count; match->pos = state->slice_start; match->endpos = state->slice_end; if (state->reverse) { match->match_start = state->text_pos; match->match_end = state->match_pos; } else { match->match_start = state->match_pos; match->match_end = state->text_pos; } match->lastindex = state->lastindex; match->lastgroup = state->lastgroup; return (PyObject*)match; } else if (status == 0) { /* No match. */ Py_INCREF(Py_None); return Py_None; } else { /* Internal error. */ set_error(status, NULL); return NULL; } } /* Gets the text of a capture group from a state. */ Py_LOCAL_INLINE(PyObject*) state_get_group(RE_State* state, Py_ssize_t index, PyObject* string, BOOL empty) { RE_GroupData* group; Py_ssize_t start; Py_ssize_t end; group = &state->groups[index - 1]; if (string != Py_None && index >= 1 && (size_t)index <= state->pattern->public_group_count && group->capture_count > 0) { start = group->span.start; end = group->span.end; } else { if (empty) /* Want an empty string. */ start = end = 0; else { Py_INCREF(Py_None); return Py_None; } } return get_slice(string, start, end); } /* Acquires the lock (mutex) on the state if there's one. * * It also increments the owner's refcount just to ensure that it won't be * destroyed by another thread. */ Py_LOCAL_INLINE(void) acquire_state_lock(PyObject* owner, RE_SafeState* safe_state) { RE_State* state; state = safe_state->re_state; if (state->lock) { /* In order to avoid deadlock we need to release the GIL while trying * to acquire the lock. */ Py_INCREF(owner); if (!PyThread_acquire_lock(state->lock, 0)) { release_GIL(safe_state); PyThread_acquire_lock(state->lock, 1); acquire_GIL(safe_state); } } } /* Releases the lock (mutex) on the state if there's one. * * It also decrements the owner's refcount, which was incremented when the lock * was acquired. */ Py_LOCAL_INLINE(void) release_state_lock(PyObject* owner, RE_SafeState* safe_state) { RE_State* state; state = safe_state->re_state; if (state->lock) { PyThread_release_lock(state->lock); Py_DECREF(owner); } } /* Implements the functionality of ScanObject's search and match methods. */ Py_LOCAL_INLINE(PyObject*) scanner_search_or_match(ScannerObject* self, BOOL search) { RE_State* state; RE_SafeState safe_state; PyObject* match; state = &self->state; /* Initialise the "safe state" structure. */ safe_state.re_state = state; safe_state.thread_state = NULL; /* Acquire the state lock in case we're sharing the scanner object across * threads. */ acquire_state_lock((PyObject*)self, &safe_state); if (self->status == RE_ERROR_FAILURE || self->status == RE_ERROR_PARTIAL) { /* No or partial match. */ release_state_lock((PyObject*)self, &safe_state); Py_INCREF(Py_None); return Py_None; } else if (self->status < 0) { /* Internal error. */ release_state_lock((PyObject*)self, &safe_state); set_error(self->status, NULL); return NULL; } /* Look for another match. */ self->status = do_match(&safe_state, search); if (self->status >= 0 || self->status == RE_ERROR_PARTIAL) { /* Create the match object. */ match = pattern_new_match(self->pattern, state, self->status); if (search && state->overlapped) { /* Advance one character. */ Py_ssize_t step; step = state->reverse ? -1 : 1; state->text_pos = state->match_pos + step; state->must_advance = FALSE; } else /* Continue from where we left off, but don't allow 2 contiguous * zero-width matches. */ state->must_advance = state->text_pos == state->match_pos; } else /* Internal error. */ match = NULL; /* Release the state lock. */ release_state_lock((PyObject*)self, &safe_state); return match; } /* ScannerObject's 'match' method. */ static PyObject* scanner_match(ScannerObject* self, PyObject* unused) { return scanner_search_or_match(self, FALSE); } /* ScannerObject's 'search' method. */ static PyObject* scanner_search(ScannerObject* self, PyObject *unused) { return scanner_search_or_match(self, TRUE); } /* ScannerObject's 'next' method. */ static PyObject* scanner_next(PyObject* self) { PyObject* match; match = scanner_search((ScannerObject*)self, NULL); if (match == Py_None) { /* No match. */ Py_DECREF(Py_None); PyErr_SetNone(PyExc_StopIteration); return NULL; } return match; } /* Returns an iterator for a ScannerObject. * * The iterator is actually the ScannerObject itself. */ static PyObject* scanner_iter(PyObject* self) { Py_INCREF(self); return self; } /* Gets the next result from a scanner iterator. */ static PyObject* scanner_iternext(PyObject* self) { PyObject* match; match = scanner_search((ScannerObject*)self, NULL); if (match == Py_None) { /* No match. */ Py_DECREF(match); return NULL; } return match; } /* Makes a copy of a ScannerObject. * * It actually doesn't make a copy, just returns the original object. */ Py_LOCAL_INLINE(PyObject*) make_scanner_copy(ScannerObject* self) { Py_INCREF(self); return (PyObject*)self; } /* ScannerObject's '__copy__' method. */ static PyObject* scanner_copy(ScannerObject* self, PyObject *unused) { return make_scanner_copy(self); } /* ScannerObject's '__deepcopy__' method. */ static PyObject* scanner_deepcopy(ScannerObject* self, PyObject* memo) { return make_scanner_copy(self); } /* The documentation of a ScannerObject. */ PyDoc_STRVAR(scanner_match_doc, "match() --> MatchObject or None.\n\ Match at the current position in the string."); PyDoc_STRVAR(scanner_search_doc, "search() --> MatchObject or None.\n\ Search from the current position in the string."); /* ScannerObject's methods. */ static PyMethodDef scanner_methods[] = { {"next", (PyCFunction)scanner_next, METH_NOARGS}, {"match", (PyCFunction)scanner_match, METH_NOARGS, scanner_match_doc}, {"search", (PyCFunction)scanner_search, METH_NOARGS, scanner_search_doc}, {"__copy__", (PyCFunction)scanner_copy, METH_NOARGS}, {"__deepcopy__", (PyCFunction)scanner_deepcopy, METH_O}, {NULL, NULL} }; PyDoc_STRVAR(scanner_doc, "Scanner object"); /* Deallocates a ScannerObject. */ static void scanner_dealloc(PyObject* self_) { ScannerObject* self; self = (ScannerObject*)self_; state_fini(&self->state); Py_DECREF(self->pattern); PyObject_DEL(self); } static PyMemberDef scanner_members[] = { {"pattern", T_OBJECT, offsetof(ScannerObject, pattern), READONLY, "The regex object that produced this scanner object."}, {NULL} /* Sentinel */ }; static PyTypeObject Scanner_Type = { PyObject_HEAD_INIT(NULL) 0, "_" RE_MODULE "." "Scanner", sizeof(ScannerObject) }; /* Decodes a 'concurrent' argument. */ Py_LOCAL_INLINE(int) decode_concurrent(PyObject* concurrent) { Py_ssize_t value; if (concurrent == Py_None) return RE_CONC_DEFAULT; value = PyLong_AsLong(concurrent); if (value == -1 && PyErr_Occurred()) { set_error(RE_ERROR_CONCURRENT, NULL); return -1; } return value ? RE_CONC_YES : RE_CONC_NO; } /* Decodes a 'partial' argument. */ Py_LOCAL_INLINE(BOOL) decode_partial(PyObject* partial) { Py_ssize_t value; if (partial == Py_False) return FALSE; if (partial == Py_True) return TRUE; value = PyLong_AsLong(partial); if (value == -1 && PyErr_Occurred()) { PyErr_Clear(); return TRUE; } return value != 0; } /* Creates a new ScannerObject. */ static PyObject* pattern_scanner(PatternObject* pattern, PyObject* args, PyObject* kwargs) { /* Create search state object. */ ScannerObject* self; Py_ssize_t start; Py_ssize_t end; int conc; BOOL part; PyObject* string; PyObject* pos = Py_None; PyObject* endpos = Py_None; Py_ssize_t overlapped = FALSE; PyObject* concurrent = Py_None; PyObject* partial = Py_False; static char* kwlist[] = { "string", "pos", "endpos", "overlapped", "concurrent", "partial", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OOnOO:scanner", kwlist, &string, &pos, &endpos, &overlapped, &concurrent, &partial)) return NULL; start = as_string_index(pos, 0); if (start == -1 && PyErr_Occurred()) return NULL; end = as_string_index(endpos, PY_SSIZE_T_MAX); if (end == -1 && PyErr_Occurred()) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; part = decode_partial(partial); /* Create a scanner object. */ self = PyObject_NEW(ScannerObject, &Scanner_Type); if (!self) return NULL; self->pattern = pattern; Py_INCREF(self->pattern); /* The MatchObject, and therefore repeated captures, will be visible. */ if (!state_init(&self->state, pattern, string, start, end, overlapped != 0, conc, part, TRUE, TRUE, FALSE)) { PyObject_DEL(self); return NULL; } self->status = RE_ERROR_SUCCESS; return (PyObject*) self; } /* Performs the split for the SplitterObject. */ Py_LOCAL_INLINE(PyObject*) next_split_part(SplitterObject* self) { RE_State* state; RE_SafeState safe_state; PyObject* result = NULL; /* Initialise to stop compiler warning. */ state = &self->state; /* Initialise the "safe state" structure. */ safe_state.re_state = state; safe_state.thread_state = NULL; /* Acquire the state lock in case we're sharing the splitter object across * threads. */ acquire_state_lock((PyObject*)self, &safe_state); if (self->status == RE_ERROR_FAILURE || self->status == RE_ERROR_PARTIAL) { /* Finished. */ release_state_lock((PyObject*)self, &safe_state); result = Py_False; Py_INCREF(result); return result; } else if (self->status < 0) { /* Internal error. */ release_state_lock((PyObject*)self, &safe_state); set_error(self->status, NULL); return NULL; } if (self->index == 0) { if (self->split_count < self->maxsplit) { Py_ssize_t step; Py_ssize_t end_pos; if (state->reverse) { step = -1; end_pos = state->slice_start; } else { step = 1; end_pos = state->slice_end; } retry: self->status = do_match(&safe_state, TRUE); if (self->status < 0) goto error; if (self->status == RE_ERROR_SUCCESS) { if (state->version_0) { /* Version 0 behaviour is to advance one character if the * split was zero-width. Unfortunately, this can give an * incorrect result. GvR wants this behaviour to be * retained so as not to break any existing software which * might rely on it. */ if (state->text_pos == state->match_pos) { if (self->last_pos == end_pos) goto no_match; /* Advance one character. */ state->text_pos += step; state->must_advance = FALSE; goto retry; } } ++self->split_count; /* Get segment before this match. */ if (state->reverse) result = get_slice(state->string, state->match_pos, self->last_pos); else result = get_slice(state->string, self->last_pos, state->match_pos); if (!result) goto error; self->last_pos = state->text_pos; /* Version 0 behaviour is to advance one character if the match * was zero-width. Unfortunately, this can give an incorrect * result. GvR wants this behaviour to be retained so as not to * break any existing software which might rely on it. */ if (state->version_0) { if (state->text_pos == state->match_pos) /* Advance one character. */ state->text_pos += step; state->must_advance = FALSE; } else /* Continue from where we left off, but don't allow a * contiguous zero-width match. */ state->must_advance = TRUE; } } else goto no_match; if (self->status == RE_ERROR_FAILURE || self->status == RE_ERROR_PARTIAL) { no_match: /* Get segment following last match (even if empty). */ if (state->reverse) result = get_slice(state->string, 0, self->last_pos); else result = get_slice(state->string, self->last_pos, state->text_length); if (!result) goto error; } } else { /* Add group. */ result = state_get_group(state, self->index, state->string, FALSE); if (!result) goto error; } ++self->index; if ((size_t)self->index > state->pattern->public_group_count) self->index = 0; /* Release the state lock. */ release_state_lock((PyObject*)self, &safe_state); return result; error: /* Release the state lock. */ release_state_lock((PyObject*)self, &safe_state); return NULL; } /* SplitterObject's 'split' method. */ static PyObject* splitter_split(SplitterObject* self, PyObject *unused) { PyObject* result; result = next_split_part(self); if (result == Py_False) { /* The sentinel. */ Py_DECREF(Py_False); Py_INCREF(Py_None); return Py_None; } return result; } /* SplitterObject's 'next' method. */ static PyObject* splitter_next(PyObject* self) { PyObject* result; result = next_split_part((SplitterObject*)self); if (result == Py_False) { /* No match. */ Py_DECREF(Py_False); PyErr_SetNone(PyExc_StopIteration); return NULL; } return result; } /* Returns an iterator for a SplitterObject. * * The iterator is actually the SplitterObject itself. */ static PyObject* splitter_iter(PyObject* self) { Py_INCREF(self); return self; } /* Gets the next result from a splitter iterator. */ static PyObject* splitter_iternext(PyObject* self) { PyObject* result; result = next_split_part((SplitterObject*)self); if (result == Py_False) { /* No match. */ Py_DECREF(result); return NULL; } return result; } /* Makes a copy of a SplitterObject. * * It actually doesn't make a copy, just returns the original object. */ Py_LOCAL_INLINE(PyObject*) make_splitter_copy(SplitterObject* self) { Py_INCREF(self); return (PyObject*)self; } /* SplitterObject's '__copy__' method. */ static PyObject* splitter_copy(SplitterObject* self, PyObject *unused) { return make_splitter_copy(self); } /* SplitterObject's '__deepcopy__' method. */ static PyObject* splitter_deepcopy(SplitterObject* self, PyObject* memo) { return make_splitter_copy(self); } /* The documentation of a SplitterObject. */ PyDoc_STRVAR(splitter_split_doc, "split() --> string or None.\n\ Return the next part of the split string."); /* SplitterObject's methods. */ static PyMethodDef splitter_methods[] = { {"next", (PyCFunction)splitter_next, METH_NOARGS}, {"split", (PyCFunction)splitter_split, METH_NOARGS, splitter_split_doc}, {"__copy__", (PyCFunction)splitter_copy, METH_NOARGS}, {"__deepcopy__", (PyCFunction)splitter_deepcopy, METH_O}, {NULL, NULL} }; PyDoc_STRVAR(splitter_doc, "Splitter object"); /* Deallocates a SplitterObject. */ static void splitter_dealloc(PyObject* self_) { SplitterObject* self; self = (SplitterObject*)self_; state_fini(&self->state); Py_DECREF(self->pattern); PyObject_DEL(self); } #if PY_VERSION_HEX >= 0x02060000 /* Converts a captures index to an integer. * * A negative capture index in 'expandf' and 'subf' is passed as a string * because negative indexes are not supported by 'str.format'. */ Py_LOCAL_INLINE(Py_ssize_t) index_to_integer(PyObject* item) { Py_ssize_t value; value = PyInt_AsSsize_t(item); if (value != -1 || !PyErr_Occurred()) return value; PyErr_Clear(); value = PyLong_AsLong(item); if (value != -1 || !PyErr_Occurred()) return value; PyErr_Clear(); /* Is the index a string representation of an integer? */ if (PyUnicode_Check(item)) { PyObject* int_obj; Py_UNICODE* characters; Py_ssize_t length; characters = (Py_UNICODE*)PyUnicode_AS_DATA(item); length = PyUnicode_GET_SIZE(item); int_obj = PyLong_FromUnicode(characters, length, 0); if (!int_obj) goto error; value = PyLong_AsLong(int_obj); Py_DECREF(int_obj); if (!PyErr_Occurred()) return value; } else if (PyString_Check(item)) { char* characters; PyObject* int_obj; characters = PyString_AsString(item); int_obj = PyLong_FromString(characters, NULL, 0); if (!int_obj) goto error; value = PyLong_AsLong(int_obj); Py_DECREF(int_obj); if (!PyErr_Occurred()) return value; } error: PyErr_Format(PyExc_TypeError, "list indices must be integers, not %.200s", item->ob_type->tp_name); return -1; } /* CaptureObject's length method. */ Py_LOCAL_INLINE(Py_ssize_t) capture_length(CaptureObject* self) { MatchObject* match; RE_GroupData* group; if (self->group_index == 0) return 1; match = *self->match_indirect; group = &match->groups[self->group_index - 1]; return (Py_ssize_t)group->capture_count; } /* CaptureObject's '__getitem__' method. */ static PyObject* capture_getitem(CaptureObject* self, PyObject* item) { Py_ssize_t index; MatchObject* match; Py_ssize_t start; Py_ssize_t end; index = index_to_integer(item); if (index == -1 && PyErr_Occurred()) return NULL; match = *self->match_indirect; if (self->group_index == 0) { if (index < 0) index += 1; if (index != 0) { PyErr_SetString(PyExc_IndexError, "list index out of range"); return NULL; } start = match->match_start; end = match->match_end; } else { RE_GroupData* group; RE_GroupSpan* span; group = &match->groups[self->group_index - 1]; if (index < 0) index += group->capture_count; if (index < 0 || index >= (Py_ssize_t)group->capture_count) { PyErr_SetString(PyExc_IndexError, "list index out of range"); return NULL; } span = &group->captures[index]; start = span->start; end = span->end; } return get_slice(match->substring, start - match->substring_offset, end - match->substring_offset); } static PyMappingMethods capture_as_mapping = { (lenfunc)capture_length, /* mp_length */ (binaryfunc)capture_getitem, /* mp_subscript */ 0, /* mp_ass_subscript */ }; /* CaptureObject's methods. */ static PyMethodDef capture_methods[] = { {"__getitem__", (PyCFunction)capture_getitem, METH_O|METH_COEXIST}, {NULL, NULL} }; /* Deallocates a CaptureObject. */ static void capture_dealloc(PyObject* self_) { CaptureObject* self; self = (CaptureObject*)self_; PyObject_DEL(self); } /* CaptureObject's 'str' method. */ static PyObject* capture_str(PyObject* self_) { CaptureObject* self; MatchObject* match; self = (CaptureObject*)self_; match = *self->match_indirect; return match_get_group_by_index(match, self->group_index, Py_None); } #endif static PyMemberDef splitter_members[] = { {"pattern", T_OBJECT, offsetof(SplitterObject, pattern), READONLY, "The regex object that produced this splitter object."}, {NULL} /* Sentinel */ }; static PyTypeObject Splitter_Type = { PyObject_HEAD_INIT(NULL) 0, "_" RE_MODULE "." "Splitter", sizeof(SplitterObject) }; /* Creates a new SplitterObject. */ Py_LOCAL_INLINE(PyObject*) pattern_splitter(PatternObject* pattern, PyObject* args, PyObject* kwargs) { /* Create split state object. */ int conc; SplitterObject* self; RE_State* state; PyObject* string; Py_ssize_t maxsplit = 0; PyObject* concurrent = Py_None; static char* kwlist[] = { "string", "maxsplit", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nO:splitter", kwlist, &string, &maxsplit, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; /* Create a splitter object. */ self = PyObject_NEW(SplitterObject, &Splitter_Type); if (!self) return NULL; self->pattern = pattern; Py_INCREF(self->pattern); if (maxsplit == 0) maxsplit = PY_SSIZE_T_MAX; state = &self->state; /* The MatchObject, and therefore repeated captures, will not be visible. */ if (!state_init(state, pattern, string, 0, PY_SSIZE_T_MAX, FALSE, conc, FALSE, TRUE, FALSE, FALSE)) { PyObject_DEL(self); return NULL; } self->maxsplit = maxsplit; self->last_pos = state->reverse ? state->text_length : 0; self->split_count = 0; self->index = 0; self->status = 1; return (PyObject*) self; } /* Implements the functionality of PatternObject's search and match methods. */ Py_LOCAL_INLINE(PyObject*) pattern_search_or_match(PatternObject* self, PyObject* args, PyObject* kwargs, char* args_desc, BOOL search, BOOL match_all) { Py_ssize_t start; Py_ssize_t end; int conc; BOOL part; RE_State state; RE_SafeState safe_state; int status; PyObject* match; PyObject* string; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; PyObject* partial = Py_False; static char* kwlist[] = { "string", "pos", "endpos", "concurrent", "partial", NULL }; /* When working with a short string, such as a line from a file, the * relative cost of PyArg_ParseTupleAndKeywords can be significant, and * it's worth not using it when there are only positional arguments. */ Py_ssize_t arg_count; if (args && !kwargs && PyTuple_CheckExact(args)) arg_count = PyTuple_GET_SIZE(args); else arg_count = -1; if (1 <= arg_count && arg_count <= 5) { /* PyTuple_GET_ITEM borrows the reference. */ string = PyTuple_GET_ITEM(args, 0); if (arg_count >= 2) pos = PyTuple_GET_ITEM(args, 1); if (arg_count >= 3) endpos = PyTuple_GET_ITEM(args, 2); if (arg_count >= 4) concurrent = PyTuple_GET_ITEM(args, 3); if (arg_count >= 5) partial = PyTuple_GET_ITEM(args, 4); } else if (!PyArg_ParseTupleAndKeywords(args, kwargs, args_desc, kwlist, &string, &pos, &endpos, &concurrent, &partial)) return NULL; start = as_string_index(pos, 0); if (start == -1 && PyErr_Occurred()) return NULL; end = as_string_index(endpos, PY_SSIZE_T_MAX); if (end == -1 && PyErr_Occurred()) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; part = decode_partial(partial); /* The MatchObject, and therefore repeated captures, will be visible. */ if (!state_init(&state, self, string, start, end, FALSE, conc, part, FALSE, TRUE, match_all)) return NULL; /* Initialise the "safe state" structure. */ safe_state.re_state = &state; safe_state.thread_state = NULL; status = do_match(&safe_state, search); if (status >= 0 || status == RE_ERROR_PARTIAL) /* Create the match object. */ match = pattern_new_match(self, &state, status); else match = NULL; state_fini(&state); return match; } /* PatternObject's 'match' method. */ static PyObject* pattern_match(PatternObject* self, PyObject* args, PyObject* kwargs) { return pattern_search_or_match(self, args, kwargs, "O|OOOO:match", FALSE, FALSE); } /* PatternObject's 'fullmatch' method. */ static PyObject* pattern_fullmatch(PatternObject* self, PyObject* args, PyObject* kwargs) { return pattern_search_or_match(self, args, kwargs, "O|OOOO:fullmatch", FALSE, TRUE); } /* PatternObject's 'search' method. */ static PyObject* pattern_search(PatternObject* self, PyObject* args, PyObject* kwargs) { return pattern_search_or_match(self, args, kwargs, "O|OOOO:search", TRUE, FALSE); } /* Gets the limits of the matching. */ Py_LOCAL_INLINE(BOOL) get_limits(PyObject* pos, PyObject* endpos, Py_ssize_t length, Py_ssize_t* start, Py_ssize_t* end) { Py_ssize_t s; Py_ssize_t e; s = as_string_index(pos, 0); if (s == -1 && PyErr_Occurred()) return FALSE; e = as_string_index(endpos, PY_SSIZE_T_MAX); if (e == -1 && PyErr_Occurred()) return FALSE; /* Adjust boundaries. */ if (s < 0) s += length; if (s < 0) s = 0; else if (s > length) s = length; if (e < 0) e += length; if (e < 0) e = 0; else if (e > length) e = length; *start = s; *end = e; return TRUE; } /* Gets a replacement item from the replacement list. * * The replacement item could be a string literal or a group. * * It can return None to represent an empty string. */ Py_LOCAL_INLINE(PyObject*) get_sub_replacement(PyObject* item, PyObject* string, RE_State* state, size_t group_count) { Py_ssize_t index; if (PyUnicode_CheckExact(item) || PyString_CheckExact(item)) { /* It's a literal, which can be added directly to the list. */ Py_INCREF(item); return item; } /* Is it a group reference? */ index = as_group_index(item); if (index == -1 && PyErr_Occurred()) { /* Not a group either! */ set_error(RE_ERROR_REPLACEMENT, NULL); return NULL; } if (index == 0) { /* The entire matched portion of the string. */ if (state->match_pos == state->text_pos) { /* Return None for "". */ Py_INCREF(Py_None); return Py_None; } if (state->reverse) return get_slice(string, state->text_pos, state->match_pos); else return get_slice(string, state->match_pos, state->text_pos); } else if (1 <= index && (size_t)index <= group_count) { /* A group. */ RE_GroupData* group; group = &state->groups[index - 1]; if (group->capture_count == 0 && group->span.start != group->span.end) { /* The group didn't match or is "", so return None for "". */ Py_INCREF(Py_None); return Py_None; } return get_slice(string, group->span.start, group->span.end); } else { /* No such group. */ set_error(RE_ERROR_INVALID_GROUP_REF, NULL); return NULL; } } /* PatternObject's 'subx' method. */ Py_LOCAL_INLINE(PyObject*) pattern_subx(PatternObject* self, PyObject* str_template, PyObject* string, Py_ssize_t maxsub, int sub_type, PyObject* pos, PyObject* endpos, int concurrent) { RE_StringInfo str_info; Py_ssize_t start; Py_ssize_t end; BOOL is_callable = FALSE; PyObject* replacement = NULL; BOOL is_literal = FALSE; #if PY_VERSION_HEX >= 0x02060000 BOOL is_format = FALSE; #endif BOOL is_template = FALSE; RE_State state; RE_SafeState safe_state; JoinInfo join_info; Py_ssize_t sub_count; Py_ssize_t last_pos; Py_ssize_t step; PyObject* item; MatchObject* match; #if PY_VERSION_HEX >= 0x02060000 BOOL built_capture = FALSE; #endif PyObject* args; PyObject* kwargs; Py_ssize_t end_pos; /* Get the string. */ if (!get_string(string, &str_info)) return NULL; /* Get the limits of the search. */ if (!get_limits(pos, endpos, str_info.length, &start, &end)) { #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return NULL; } /* If the pattern is too long for the string, then take a shortcut, unless * it's a fuzzy pattern. */ if (!self->is_fuzzy && self->min_width > end - start) { PyObject* result; Py_INCREF(string); if (sub_type & RE_SUBN) result = Py_BuildValue("Nn", string, 0); else result = string; #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return result; } if (maxsub == 0) maxsub = PY_SSIZE_T_MAX; /* sub/subn takes either a function or a string template. */ if (PyCallable_Check(str_template)) { /* It's callable. */ is_callable = TRUE; replacement = str_template; Py_INCREF(replacement); #if PY_VERSION_HEX >= 0x02060000 } else if (sub_type & RE_SUBF) { /* Is it a literal format? * * To keep it simple we'll say that a literal is a string which can be * used as-is, so no placeholders. */ Py_ssize_t literal_length; literal_length = check_replacement_string(str_template, '{'); if (literal_length > 0) { /* It's a literal. */ is_literal = TRUE; replacement = str_template; Py_INCREF(replacement); } else if (literal_length < 0) { /* It isn't a literal, so get the 'format' method. */ is_format = TRUE; replacement = PyObject_GetAttrString(str_template, "format"); if (!replacement) { release_buffer(&str_info); return NULL; } } #endif } else { /* Is it a literal template? * * To keep it simple we'll say that a literal is a string which can be * used as-is, so no backslashes. */ Py_ssize_t literal_length; literal_length = check_replacement_string(str_template, '\\'); if (literal_length > 0) { /* It's a literal. */ is_literal = TRUE; replacement = str_template; Py_INCREF(replacement); } else if (literal_length < 0 ) { /* It isn't a literal, so hand it over to the template compiler. */ is_template = TRUE; replacement = call(RE_MODULE, "_compile_replacement_helper", PyTuple_Pack(2, self, str_template)); if (!replacement) { #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return NULL; } } } /* The MatchObject, and therefore repeated captures, will be visible only * if the replacement is callable or subf is used. */ #if PY_VERSION_HEX >= 0x02060000 if (!state_init_2(&state, self, string, &str_info, start, end, FALSE, concurrent, FALSE, FALSE, is_callable || (sub_type & RE_SUBF) != 0, FALSE)) { release_buffer(&str_info); #else if (!state_init_2(&state, self, string, &str_info, start, end, FALSE, concurrent, FALSE, FALSE, is_callable, FALSE)) { #endif Py_XDECREF(replacement); return NULL; } /* Initialise the "safe state" structure. */ safe_state.re_state = &state; safe_state.thread_state = NULL; init_join_list(&join_info, state.reverse, PyUnicode_Check(string)); sub_count = 0; last_pos = state.reverse ? state.text_length : 0; step = state.reverse ? -1 : 1; while (sub_count < maxsub) { int status; status = do_match(&safe_state, TRUE); if (status < 0) goto error; if (status == 0) break; /* Append the segment before this match. */ if (state.match_pos != last_pos) { if (state.reverse) item = get_slice(string, state.match_pos, last_pos); else item = get_slice(string, last_pos, state.match_pos); if (!item) goto error; /* Add to the list. */ status = add_to_join_list(&join_info, item); Py_DECREF(item); if (status < 0) goto error; } /* Add this match. */ if (is_literal) { /* The replacement is a literal string. */ status = add_to_join_list(&join_info, replacement); if (status < 0) goto error; #if PY_VERSION_HEX >= 0x02060000 } else if (is_format) { /* The replacement is a format string. */ size_t g; /* We need to create the arguments for the 'format' method. We'll * start by creating a MatchObject. */ match = (MatchObject*)pattern_new_match(self, &state, 1); if (!match) goto error; /* We'll build the args and kwargs the first time. They'll be using * capture objects which refer to the match object indirectly; this * means that args and kwargs can be reused with different match * objects. */ if (!built_capture) { /* The args are a tuple of the capture group matches. */ args = PyTuple_New(match->group_count + 1); if (!args) { Py_DECREF(match); goto error; } for (g = 0; g < match->group_count + 1; g++) /* PyTuple_SetItem borrows the reference. */ PyTuple_SetItem(args, (Py_ssize_t)g, make_capture_object(&match, (Py_ssize_t)g)); /* The kwargs are a dict of the named capture group matches. */ kwargs = make_capture_dict(match, &match); if (!kwargs) { Py_DECREF(args); Py_DECREF(match); goto error; } built_capture = TRUE; } /* Call the 'format' method. */ item = PyObject_Call(replacement, args, kwargs); Py_DECREF(match); if (!item) goto error; /* Add the result to the list. */ status = add_to_join_list(&join_info, item); Py_DECREF(item); if (status < 0) goto error; #endif } else if (is_template) { /* The replacement is a list template. */ Py_ssize_t count; Py_ssize_t index; Py_ssize_t step; /* Add each part of the template to the list. */ count = PyList_GET_SIZE(replacement); if (join_info.reversed) { /* We're searching backwards, so we'll be reversing the list * when it's complete. Therefore, we need to add the items of * the template in reverse order for them to be in the correct * order after the reversal. */ index = count - 1; step = -1; } else { /* We're searching forwards. */ index = 0; step = 1; } while (count > 0) { PyObject* item; PyObject* str_item; /* PyList_GET_ITEM borrows a reference. */ item = PyList_GET_ITEM(replacement, index); str_item = get_sub_replacement(item, string, &state, self->public_group_count); if (!str_item) goto error; /* Add the result to the list. */ if (str_item == Py_None) /* None for "". */ Py_DECREF(str_item); else { status = add_to_join_list(&join_info, str_item); Py_DECREF(str_item); if (status < 0) goto error; } --count; index += step; } } else if (is_callable) { /* Pass a MatchObject to the replacement function. */ PyObject* match; PyObject* args; /* We need to create a MatchObject to pass to the replacement * function. */ match = pattern_new_match(self, &state, 1); if (!match) goto error; /* The args for the replacement function. */ args = PyTuple_Pack(1, match); if (!args) { Py_DECREF(match); goto error; } /* Call the replacement function. */ item = PyObject_CallObject(replacement, args); Py_DECREF(args); Py_DECREF(match); if (!item) goto error; /* Add the result to the list. */ status = add_to_join_list(&join_info, item); Py_DECREF(item); if (status < 0) goto error; } ++sub_count; last_pos = state.text_pos; if (state.version_0) { /* Always advance after a zero-width match. */ if (state.match_pos == state.text_pos) { state.text_pos += step; state.must_advance = FALSE; } else state.must_advance = TRUE; } else /* Continue from where we left off, but don't allow a contiguous * zero-width match. */ state.must_advance = state.match_pos == state.text_pos; } /* Get the segment following the last match. We use 'length' instead of * 'text_length' because the latter is truncated to 'slice_end', a * documented idiosyncracy of the 're' module. */ end_pos = state.reverse ? 0 : str_info.length; if (last_pos != end_pos) { int status; /* The segment is part of the original string. */ if (state.reverse) item = get_slice(string, 0, last_pos); else item = get_slice(string, last_pos, str_info.length); if (!item) goto error; status = add_to_join_list(&join_info, item); Py_DECREF(item); if (status < 0) goto error; } Py_XDECREF(replacement); /* Convert the list to a single string (also cleans up join_info). */ item = join_list_info(&join_info); state_fini(&state); #if PY_VERSION_HEX >= 0x02060000 if (built_capture) { Py_DECREF(kwargs); Py_DECREF(args); } #endif if (!item) return NULL; if (sub_type & RE_SUBN) return Py_BuildValue("Nn", item, sub_count); return item; error: #if PY_VERSION_HEX >= 0x02060000 if (built_capture) { Py_DECREF(kwargs); Py_DECREF(args); } #endif clear_join_list(&join_info); state_fini(&state); Py_XDECREF(replacement); return NULL; } /* PatternObject's 'sub' method. */ static PyObject* pattern_sub(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; PyObject* replacement; PyObject* string; Py_ssize_t count = 0; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; static char* kwlist[] = { "repl", "string", "count", "pos", "endpos", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nOOO:sub", kwlist, &replacement, &string, &count, &pos, &endpos, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; return pattern_subx(self, replacement, string, count, RE_SUB, pos, endpos, conc); } #if PY_VERSION_HEX >= 0x02060000 /* PatternObject's 'subf' method. */ static PyObject* pattern_subf(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; PyObject* format; PyObject* string; Py_ssize_t count = 0; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; static char* kwlist[] = { "format", "string", "count", "pos", "endpos", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nOOO:sub", kwlist, &format, &string, &count, &pos, &endpos, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; return pattern_subx(self, format, string, count, RE_SUBF, pos, endpos, conc); } #endif /* PatternObject's 'subn' method. */ static PyObject* pattern_subn(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; PyObject* replacement; PyObject* string; Py_ssize_t count = 0; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; static char* kwlist[] = { "repl", "string", "count", "pos", "endpos", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nOOO:subn", kwlist, &replacement, &string, &count, &pos, &endpos, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; return pattern_subx(self, replacement, string, count, RE_SUBN, pos, endpos, conc); } #if PY_VERSION_HEX >= 0x02060000 /* PatternObject's 'subfn' method. */ static PyObject* pattern_subfn(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; PyObject* format; PyObject* string; Py_ssize_t count = 0; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; static char* kwlist[] = { "format", "string", "count", "pos", "endpos", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nOOO:subn", kwlist, &format, &string, &count, &pos, &endpos, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; return pattern_subx(self, format, string, count, RE_SUBF | RE_SUBN, pos, endpos, conc); } #endif /* PatternObject's 'split' method. */ static PyObject* pattern_split(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; RE_State state; RE_SafeState safe_state; PyObject* list; PyObject* item; int status; Py_ssize_t split_count; size_t g; Py_ssize_t start_pos; Py_ssize_t end_pos; Py_ssize_t step; Py_ssize_t last_pos; PyObject* string; Py_ssize_t maxsplit = 0; PyObject* concurrent = Py_None; static char* kwlist[] = { "string", "maxsplit", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nO:split", kwlist, &string, &maxsplit, &concurrent)) return NULL; if (maxsplit == 0) maxsplit = PY_SSIZE_T_MAX; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; /* The MatchObject, and therefore repeated captures, will not be visible. */ if (!state_init(&state, self, string, 0, PY_SSIZE_T_MAX, FALSE, conc, FALSE, FALSE, FALSE, FALSE)) return NULL; /* Initialise the "safe state" structure. */ safe_state.re_state = &state; safe_state.thread_state = NULL; list = PyList_New(0); if (!list) { state_fini(&state); return NULL; } split_count = 0; if (state.reverse) { start_pos = state.text_length; end_pos = 0; step = -1; } else { start_pos = 0; end_pos = state.text_length; step = 1; } last_pos = start_pos; while (split_count < maxsplit) { status = do_match(&safe_state, TRUE); if (status < 0) goto error; if (status == 0) /* No more matches. */ break; if (state.version_0) { /* Version 0 behaviour is to advance one character if the split was * zero-width. Unfortunately, this can give an incorrect result. * GvR wants this behaviour to be retained so as not to break any * existing software which might rely on it. */ if (state.text_pos == state.match_pos) { if (last_pos == end_pos) break; /* Advance one character. */ state.text_pos += step; state.must_advance = FALSE; continue; } } /* Get segment before this match. */ if (state.reverse) item = get_slice(string, state.match_pos, last_pos); else item = get_slice(string, last_pos, state.match_pos); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; /* Add groups (if any). */ for (g = 1; g <= self->public_group_count; g++) { item = state_get_group(&state, (Py_ssize_t)g, string, FALSE); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; } ++split_count; last_pos = state.text_pos; /* Version 0 behaviour is to advance one character if the match was * zero-width. Unfortunately, this can give an incorrect result. GvR * wants this behaviour to be retained so as not to break any existing * software which might rely on it. */ if (state.version_0) { if (state.text_pos == state.match_pos) /* Advance one character. */ state.text_pos += step; state.must_advance = FALSE; } else /* Continue from where we left off, but don't allow a contiguous * zero-width match. */ state.must_advance = TRUE; } /* Get segment following last match (even if empty). */ if (state.reverse) item = get_slice(string, 0, last_pos); else item = get_slice(string, last_pos, state.text_length); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; state_fini(&state); return list; error: Py_DECREF(list); state_fini(&state); return NULL; } /* PatternObject's 'splititer' method. */ static PyObject* pattern_splititer(PatternObject* pattern, PyObject* args, PyObject* kwargs) { return pattern_splitter(pattern, args, kwargs); } /* PatternObject's 'findall' method. */ static PyObject* pattern_findall(PatternObject* self, PyObject* args, PyObject* kwargs) { Py_ssize_t start; Py_ssize_t end; int conc; RE_State state; RE_SafeState safe_state; PyObject* list; Py_ssize_t step; int status; Py_ssize_t b; Py_ssize_t e; size_t g; PyObject* string; PyObject* pos = Py_None; PyObject* endpos = Py_None; Py_ssize_t overlapped = FALSE; PyObject* concurrent = Py_None; static char* kwlist[] = { "string", "pos", "endpos", "overlapped", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OOnO:findall", kwlist, &string, &pos, &endpos, &overlapped, &concurrent)) return NULL; start = as_string_index(pos, 0); if (start == -1 && PyErr_Occurred()) return NULL; end = as_string_index(endpos, PY_SSIZE_T_MAX); if (end == -1 && PyErr_Occurred()) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; /* The MatchObject, and therefore repeated captures, will not be visible. */ if (!state_init(&state, self, string, start, end, overlapped != 0, conc, FALSE, FALSE, FALSE, FALSE)) return NULL; /* Initialise the "safe state" structure. */ safe_state.re_state = &state; safe_state.thread_state = NULL; list = PyList_New(0); if (!list) { state_fini(&state); return NULL; } step = state.reverse ? -1 : 1; while (state.slice_start <= state.text_pos && state.text_pos <= state.slice_end) { PyObject* item; status = do_match(&safe_state, TRUE); if (status < 0) goto error; if (status == 0) break; /* Don't bother to build a MatchObject. */ switch (self->public_group_count) { case 0: if (state.reverse) { b = state.text_pos; e = state.match_pos; } else { b = state.match_pos; e = state.text_pos; } item = get_slice(string, b, e); if (!item) goto error; break; case 1: item = state_get_group(&state, 1, string, TRUE); if (!item) goto error; break; default: item = PyTuple_New((Py_ssize_t)self->public_group_count); if (!item) goto error; for (g = 0; g < self->public_group_count; g++) { PyObject* o; o = state_get_group(&state, (Py_ssize_t)g + 1, string, TRUE); if (!o) { Py_DECREF(item); goto error; } /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(item, g, o); } break; } status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; if (state.overlapped) { /* Advance one character. */ state.text_pos = state.match_pos + step; state.must_advance = FALSE; } else /* Continue from where we left off, but don't allow 2 contiguous * zero-width matches. */ state.must_advance = state.text_pos == state.match_pos; } state_fini(&state); return list; error: Py_DECREF(list); state_fini(&state); return NULL; } /* PatternObject's 'finditer' method. */ static PyObject* pattern_finditer(PatternObject* pattern, PyObject* args, PyObject* kwargs) { return pattern_scanner(pattern, args, kwargs); } /* Makes a copy of a PatternObject. */ Py_LOCAL_INLINE(PyObject*) make_pattern_copy(PatternObject* self) { Py_INCREF(self); return (PyObject*)self; } /* PatternObject's '__copy__' method. */ static PyObject* pattern_copy(PatternObject* self, PyObject *unused) { return make_pattern_copy(self); } /* PatternObject's '__deepcopy__' method. */ static PyObject* pattern_deepcopy(PatternObject* self, PyObject* memo) { return make_pattern_copy(self); } /* The documentation of a PatternObject. */ PyDoc_STRVAR(pattern_match_doc, "match(string, pos=None, endpos=None, concurrent=None) --> MatchObject or None.\n\ Match zero or more characters at the beginning of the string."); PyDoc_STRVAR(pattern_fullmatch_doc, "fullmatch(string, pos=None, endpos=None, concurrent=None) --> MatchObject or None.\n\ Match zero or more characters against all of the string."); PyDoc_STRVAR(pattern_search_doc, "search(string, pos=None, endpos=None, concurrent=None) --> MatchObject or None.\n\ Search through string looking for a match, and return a corresponding\n\ match object instance. Return None if no match is found."); PyDoc_STRVAR(pattern_sub_doc, "sub(repl, string, count=0, flags=0, pos=None, endpos=None, concurrent=None) --> newstring\n\ Return the string obtained by replacing the leftmost (or rightmost with a\n\ reverse pattern) non-overlapping occurrences of pattern in string by the\n\ replacement repl."); #if PY_VERSION_HEX >= 0x02060000 PyDoc_STRVAR(pattern_subf_doc, "subf(format, string, count=0, flags=0, pos=None, endpos=None, concurrent=None) --> newstring\n\ Return the string obtained by replacing the leftmost (or rightmost with a\n\ reverse pattern) non-overlapping occurrences of pattern in string by the\n\ replacement format."); #endif PyDoc_STRVAR(pattern_subn_doc, "subn(repl, string, count=0, flags=0, pos=None, endpos=None, concurrent=None) --> (newstring, number of subs)\n\ Return the tuple (new_string, number_of_subs_made) found by replacing the\n\ leftmost (or rightmost with a reverse pattern) non-overlapping occurrences\n\ of pattern with the replacement repl."); #if PY_VERSION_HEX >= 0x02060000 PyDoc_STRVAR(pattern_subfn_doc, "subfn(format, string, count=0, flags=0, pos=None, endpos=None, concurrent=None) --> (newstring, number of subs)\n\ Return the tuple (new_string, number_of_subs_made) found by replacing the\n\ leftmost (or rightmost with a reverse pattern) non-overlapping occurrences\n\ of pattern with the replacement format."); #endif PyDoc_STRVAR(pattern_split_doc, "split(string, string, maxsplit=0, concurrent=None) --> list.\n\ Split string by the occurrences of pattern."); PyDoc_STRVAR(pattern_splititer_doc, "splititer(string, maxsplit=0, concurrent=None) --> iterator.\n\ Return an iterator yielding the parts of a split string."); PyDoc_STRVAR(pattern_findall_doc, "findall(string, pos=None, endpos=None, overlapped=False, concurrent=None) --> list.\n\ Return a list of all matches of pattern in string. The matches may be\n\ overlapped if overlapped is True."); PyDoc_STRVAR(pattern_finditer_doc, "finditer(string, pos=None, endpos=None, overlapped=False, concurrent=None) --> iterator.\n\ Return an iterator over all matches for the RE pattern in string. The\n\ matches may be overlapped if overlapped is True. For each match, the\n\ iterator returns a MatchObject."); PyDoc_STRVAR(pattern_scanner_doc, "scanner(string, pos=None, endpos=None, overlapped=False, concurrent=None) --> scanner.\n\ Return an scanner for the RE pattern in string. The matches may be overlapped\n\ if overlapped is True."); /* The methods of a PatternObject. */ static PyMethodDef pattern_methods[] = { {"match", (PyCFunction)pattern_match, METH_VARARGS|METH_KEYWORDS, pattern_match_doc}, {"fullmatch", (PyCFunction)pattern_fullmatch, METH_VARARGS|METH_KEYWORDS, pattern_fullmatch_doc}, {"search", (PyCFunction)pattern_search, METH_VARARGS|METH_KEYWORDS, pattern_search_doc}, {"sub", (PyCFunction)pattern_sub, METH_VARARGS|METH_KEYWORDS, pattern_sub_doc}, #if PY_VERSION_HEX >= 0x02060000 {"subf", (PyCFunction)pattern_subf, METH_VARARGS|METH_KEYWORDS, pattern_subf_doc}, #endif {"subn", (PyCFunction)pattern_subn, METH_VARARGS|METH_KEYWORDS, pattern_subn_doc}, #if PY_VERSION_HEX >= 0x02060000 {"subfn", (PyCFunction)pattern_subfn, METH_VARARGS|METH_KEYWORDS, pattern_subfn_doc}, #endif {"split", (PyCFunction)pattern_split, METH_VARARGS|METH_KEYWORDS, pattern_split_doc}, {"splititer", (PyCFunction)pattern_splititer, METH_VARARGS|METH_KEYWORDS, pattern_splititer_doc}, {"findall", (PyCFunction)pattern_findall, METH_VARARGS|METH_KEYWORDS, pattern_findall_doc}, {"finditer", (PyCFunction)pattern_finditer, METH_VARARGS|METH_KEYWORDS, pattern_finditer_doc}, {"scanner", (PyCFunction)pattern_scanner, METH_VARARGS|METH_KEYWORDS, pattern_scanner_doc}, {"__copy__", (PyCFunction)pattern_copy, METH_NOARGS}, {"__deepcopy__", (PyCFunction)pattern_deepcopy, METH_O}, {NULL, NULL} }; PyDoc_STRVAR(pattern_doc, "Compiled regex object"); /* Deallocates a PatternObject. */ static void pattern_dealloc(PyObject* self_) { PatternObject* self; size_t i; int partial_side; self = (PatternObject*)self_; /* Discard the nodes. */ for (i = 0; i < self->node_count; i++) { RE_Node* node; node = self->node_list[i]; re_dealloc(node->values); if (node->status & RE_STATUS_STRING) { re_dealloc(node->string.bad_character_offset); re_dealloc(node->string.good_suffix_offset); } re_dealloc(node); } re_dealloc(self->node_list); /* Discard the group info. */ re_dealloc(self->group_info); /* Discard the call_ref info. */ re_dealloc(self->call_ref_info); /* Discard the repeat info. */ re_dealloc(self->repeat_info); dealloc_groups(self->groups_storage, self->true_group_count); dealloc_repeats(self->repeats_storage, self->repeat_count); if (self->weakreflist) PyObject_ClearWeakRefs((PyObject*)self); Py_XDECREF(self->pattern); Py_XDECREF(self->groupindex); Py_XDECREF(self->indexgroup); for (partial_side = 0; partial_side < 2; partial_side++) { if (self->partial_named_lists[partial_side]) { for (i = 0; i < self->named_lists_count; i++) Py_XDECREF(self->partial_named_lists[partial_side][i]); re_dealloc(self->partial_named_lists[partial_side]); } } Py_DECREF(self->named_lists); Py_DECREF(self->named_list_indexes); re_dealloc(self->locale_info); PyObject_DEL(self); } /* Info about the various flags that can be passed in. */ typedef struct RE_FlagName { char* name; int value; } RE_FlagName; /* We won't bother about the A flag in Python 2. */ static RE_FlagName flag_names[] = { {"B", RE_FLAG_BESTMATCH}, {"D", RE_FLAG_DEBUG}, {"S", RE_FLAG_DOTALL}, {"F", RE_FLAG_FULLCASE}, {"I", RE_FLAG_IGNORECASE}, {"L", RE_FLAG_LOCALE}, {"M", RE_FLAG_MULTILINE}, {"P", RE_FLAG_POSIX}, {"R", RE_FLAG_REVERSE}, {"T", RE_FLAG_TEMPLATE}, {"U", RE_FLAG_UNICODE}, {"X", RE_FLAG_VERBOSE}, {"V0", RE_FLAG_VERSION0}, {"V1", RE_FLAG_VERSION1}, {"W", RE_FLAG_WORD}, }; /* Appends a string to a list. */ Py_LOCAL_INLINE(BOOL) append_string(PyObject* list, char* string) { PyObject* item; int status; item = Py_BuildValue("s", string); if (!item) return FALSE; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) return FALSE; return TRUE; } /* Appends a (decimal) integer to a list. */ Py_LOCAL_INLINE(BOOL) append_integer(PyObject* list, Py_ssize_t value) { PyObject* int_obj; PyObject* repr_obj; int status; int_obj = Py_BuildValue("n", value); if (!int_obj) return FALSE; repr_obj = PyObject_Repr(int_obj); Py_DECREF(int_obj); if (!repr_obj) return FALSE; status = PyList_Append(list, repr_obj); Py_DECREF(repr_obj); if (status < 0) return FALSE; return TRUE; } /* MatchObject's '__repr__' method. */ static PyObject* match_repr(PyObject* self_) { MatchObject* self; PyObject* list; PyObject* matched_substring; PyObject* matched_repr; int status; PyObject* separator; PyObject* result; self = (MatchObject*)self_; list = PyList_New(0); if (!list) return NULL; if (!append_string(list, "match_start)) goto error; if (! append_string(list, ", ")) goto error; if (!append_integer(list, self->match_end)) goto error; if (!append_string(list, "), match=")) goto error; matched_substring = get_slice(self->substring, self->match_start - self->substring_offset, self->match_end - self->substring_offset); if (!matched_substring) goto error; matched_repr = PyObject_Repr(matched_substring); Py_DECREF(matched_substring); if (!matched_repr) goto error; status = PyList_Append(list, matched_repr); Py_DECREF(matched_repr); if (status < 0) goto error; if (self->fuzzy_counts[RE_FUZZY_SUB] != 0 || self->fuzzy_counts[RE_FUZZY_INS] != 0 || self->fuzzy_counts[RE_FUZZY_DEL] != 0) { if (! append_string(list, ", fuzzy_counts=(")) goto error; if (!append_integer(list, (Py_ssize_t)self->fuzzy_counts[RE_FUZZY_SUB])) goto error; if (! append_string(list, ", ")) goto error; if (!append_integer(list, (Py_ssize_t)self->fuzzy_counts[RE_FUZZY_INS])) goto error; if (! append_string(list, ", ")) goto error; if (!append_integer(list, (Py_ssize_t)self->fuzzy_counts[RE_FUZZY_DEL])) goto error; if (! append_string(list, ")")) goto error; } if (self->partial) { if (!append_string(list, ", partial=True")) goto error; } if (! append_string(list, ">")) goto error; separator = Py_BuildValue("s", ""); if (!separator) goto error; result = PyUnicode_Join(separator, list); Py_DECREF(separator); Py_DECREF(list); return result; error: Py_DECREF(list); return NULL; } /* PatternObject's '__repr__' method. */ static PyObject* pattern_repr(PyObject* self_) { PatternObject* self; PyObject* list; PyObject* item; int status; int flag_count; unsigned int i; Py_ssize_t pos; PyObject *key; PyObject *value; PyObject* separator; PyObject* result; self = (PatternObject*)self_; list = PyList_New(0); if (!list) return NULL; if (!append_string(list, "regex.Regex(")) goto error; item = PyObject_Repr(self->pattern); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; flag_count = 0; for (i = 0; i < sizeof(flag_names) / sizeof(flag_names[0]); i++) { if (self->flags & flag_names[i].value) { if (flag_count == 0) { if (!append_string(list, ", flags=")) goto error; } else { if (!append_string(list, " | ")) goto error; } if (!append_string(list, "regex.")) goto error; if (!append_string(list, flag_names[i].name)) goto error; ++flag_count; } } pos = 0; /* PyDict_Next borrows references. */ while (PyDict_Next(self->named_lists, &pos, &key, &value)) { if (!append_string(list, ", ")) goto error; status = PyList_Append(list, key); if (status < 0) goto error; if (!append_string(list, "=")) goto error; item = PyObject_Repr(value); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; } if (!append_string(list, ")")) goto error; separator = Py_BuildValue("s", ""); if (!separator) goto error; result = PyUnicode_Join(separator, list); Py_DECREF(separator); Py_DECREF(list); return result; error: Py_DECREF(list); return NULL; } /* PatternObject's 'groupindex' method. */ static PyObject* pattern_groupindex(PyObject* self_) { PatternObject* self; self = (PatternObject*)self_; return PyDict_Copy(self->groupindex); } static PyGetSetDef pattern_getset[] = { {"groupindex", (getter)pattern_groupindex, (setter)NULL, "A dictionary mapping group names to group numbers."}, {NULL} /* Sentinel */ }; static PyMemberDef pattern_members[] = { {"pattern", T_OBJECT, offsetof(PatternObject, pattern), READONLY, "The pattern string from which the regex object was compiled."}, {"flags", T_PYSSIZET, offsetof(PatternObject, flags), READONLY, "The regex matching flags."}, {"groups", T_PYSSIZET, offsetof(PatternObject, public_group_count), READONLY, "The number of capturing groups in the pattern."}, {"named_lists", T_OBJECT, offsetof(PatternObject, named_lists), READONLY, "The named lists used by the regex."}, {NULL} /* Sentinel */ }; static PyTypeObject Pattern_Type = { PyObject_HEAD_INIT(NULL) 0, "_" RE_MODULE "." "Pattern", sizeof(PatternObject) }; /* Building the nodes is made simpler by allowing branches to have a single * exit. These need to be removed. */ Py_LOCAL_INLINE(void) skip_one_way_branches(PatternObject* pattern) { BOOL modified; /* If a node refers to a 1-way branch then make the former refer to the * latter's destination. Repeat until they're all done. */ do { size_t i; modified = FALSE; for (i = 0; i < pattern->node_count; i++) { RE_Node* node; RE_Node* next; node = pattern->node_list[i]; /* Check the first destination. */ next = node->next_1.node; if (next && next->op == RE_OP_BRANCH && !next->nonstring.next_2.node) { node->next_1.node = next->next_1.node; modified = TRUE; } /* Check the second destination. */ next = node->nonstring.next_2.node; if (next && next->op == RE_OP_BRANCH && !next->nonstring.next_2.node) { node->nonstring.next_2.node = next->next_1.node; modified = TRUE; } } } while (modified); /* The start node might be a 1-way branch. Skip over it because it'll be * removed. It might even be the first in a chain. */ while (pattern->start_node->op == RE_OP_BRANCH && !pattern->start_node->nonstring.next_2.node) pattern->start_node = pattern->start_node->next_1.node; } /* Adds guards to repeats which are followed by a reference to a group. * * Returns whether a guard was added for a node at or after the given node. */ Py_LOCAL_INLINE(RE_STATUS_T) add_repeat_guards(PatternObject* pattern, RE_Node* node) { RE_STATUS_T result; result = RE_STATUS_NEITHER; for (;;) { if (node->status & RE_STATUS_VISITED_AG) return node->status & (RE_STATUS_REPEAT | RE_STATUS_REF); switch (node->op) { case RE_OP_BRANCH: { RE_STATUS_T branch_1_result; RE_STATUS_T branch_2_result; RE_STATUS_T status; branch_1_result = add_repeat_guards(pattern, node->next_1.node); branch_2_result = add_repeat_guards(pattern, node->nonstring.next_2.node); status = max_status_3(result, branch_1_result, branch_2_result); node->status = RE_STATUS_VISITED_AG | status; return status; } case RE_OP_END_GREEDY_REPEAT: case RE_OP_END_LAZY_REPEAT: node->status |= RE_STATUS_VISITED_AG; return result; case RE_OP_GREEDY_REPEAT: case RE_OP_LAZY_REPEAT: { BOOL limited; RE_STATUS_T body_result; RE_STATUS_T tail_result; RE_RepeatInfo* repeat_info; RE_STATUS_T status; limited = ~node->values[2] != 0; if (limited) body_result = RE_STATUS_LIMITED; else body_result = add_repeat_guards(pattern, node->next_1.node); tail_result = add_repeat_guards(pattern, node->nonstring.next_2.node); repeat_info = &pattern->repeat_info[node->values[0]]; if (body_result != RE_STATUS_REF) repeat_info->status |= RE_STATUS_BODY; if (tail_result != RE_STATUS_REF) repeat_info->status |= RE_STATUS_TAIL; if (limited) result = max_status_2(result, RE_STATUS_LIMITED); else result = max_status_2(result, RE_STATUS_REPEAT); status = max_status_3(result, body_result, tail_result); node->status |= RE_STATUS_VISITED_AG | status; return status; } case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: { BOOL limited; RE_STATUS_T tail_result; RE_RepeatInfo* repeat_info; RE_STATUS_T status; limited = ~node->values[2] != 0; tail_result = add_repeat_guards(pattern, node->next_1.node); repeat_info = &pattern->repeat_info[node->values[0]]; repeat_info->status |= RE_STATUS_BODY; if (tail_result != RE_STATUS_REF) repeat_info->status |= RE_STATUS_TAIL; if (limited) result = max_status_2(result, RE_STATUS_LIMITED); else result = max_status_2(result, RE_STATUS_REPEAT); status = max_status_3(result, RE_STATUS_REPEAT, tail_result); node->status = RE_STATUS_VISITED_AG | status; return status; } case RE_OP_GROUP_CALL: case RE_OP_REF_GROUP: case RE_OP_REF_GROUP_FLD: case RE_OP_REF_GROUP_FLD_REV: case RE_OP_REF_GROUP_IGN: case RE_OP_REF_GROUP_IGN_REV: case RE_OP_REF_GROUP_REV: result = RE_STATUS_REF; node = node->next_1.node; break; case RE_OP_GROUP_EXISTS: { RE_STATUS_T branch_1_result; RE_STATUS_T branch_2_result; RE_STATUS_T status; branch_1_result = add_repeat_guards(pattern, node->next_1.node); branch_2_result = add_repeat_guards(pattern, node->nonstring.next_2.node); status = max_status_4(result, branch_1_result, branch_2_result, RE_STATUS_REF); node->status = RE_STATUS_VISITED_AG | status; return status; } case RE_OP_SUCCESS: node->status = RE_STATUS_VISITED_AG | result; return result; default: node = node->next_1.node; break; } } } /* Adds an index to a node's values unless it's already present. * * 'offset' is the offset of the index count within the values. */ Py_LOCAL_INLINE(BOOL) add_index(RE_Node* node, size_t offset, size_t index) { size_t index_count; size_t first_index; size_t i; RE_CODE* new_values; if (!node) return TRUE; index_count = node->values[offset]; first_index = offset + 1; /* Is the index already present? */ for (i = 0; i < index_count; i++) { if (node->values[first_index + i] == index) return TRUE; } /* Allocate more space for the new index. */ new_values = re_realloc(node->values, (node->value_count + 1) * sizeof(RE_CODE)); if (!new_values) return FALSE; ++node->value_count; node->values = new_values; node->values[first_index + node->values[offset]++] = (RE_CODE)index; return TRUE; } /* Records the index of every repeat and fuzzy section within atomic * subpatterns and lookarounds. */ Py_LOCAL_INLINE(BOOL) record_subpattern_repeats_and_fuzzy_sections(RE_Node* parent_node, size_t offset, size_t repeat_count, RE_Node* node) { while (node) { if (node->status & RE_STATUS_VISITED_REP) return TRUE; node->status |= RE_STATUS_VISITED_REP; switch (node->op) { case RE_OP_BRANCH: case RE_OP_GROUP_EXISTS: if (!record_subpattern_repeats_and_fuzzy_sections(parent_node, offset, repeat_count, node->next_1.node)) return FALSE; node = node->nonstring.next_2.node; break; case RE_OP_END_FUZZY: node = node->next_1.node; break; case RE_OP_END_GREEDY_REPEAT: case RE_OP_END_LAZY_REPEAT: return TRUE; case RE_OP_FUZZY: /* Record the fuzzy index. */ if (!add_index(parent_node, offset, repeat_count + node->values[0])) return FALSE; node = node->next_1.node; break; case RE_OP_GREEDY_REPEAT: case RE_OP_LAZY_REPEAT: /* Record the repeat index. */ if (!add_index(parent_node, offset, node->values[0])) return FALSE; if (!record_subpattern_repeats_and_fuzzy_sections(parent_node, offset, repeat_count, node->next_1.node)) return FALSE; node = node->nonstring.next_2.node; break; case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: /* Record the repeat index. */ if (!add_index(parent_node, offset, node->values[0])) return FALSE; node = node->next_1.node; break; default: node = node->next_1.node; break; } } return TRUE; } /* Marks nodes which are being used as used. */ Py_LOCAL_INLINE(void) use_nodes(RE_Node* node) { while (node && !(node->status & RE_STATUS_USED)) { node->status |= RE_STATUS_USED; if (!(node->status & RE_STATUS_STRING)) { if (node->nonstring.next_2.node) use_nodes(node->nonstring.next_2.node); } node = node->next_1.node; } } /* Discards any unused nodes. * * Optimising the nodes might result in some nodes no longer being used. */ Py_LOCAL_INLINE(void) discard_unused_nodes(PatternObject* pattern) { size_t i; size_t new_count; /* Mark the nodes which are being used. */ use_nodes(pattern->start_node); for (i = 0; i < pattern->call_ref_info_capacity; i++) use_nodes(pattern->call_ref_info[i].node); new_count = 0; for (i = 0; i < pattern->node_count; i++) { RE_Node* node; node = pattern->node_list[i]; if (node->status & RE_STATUS_USED) pattern->node_list[new_count++] = node; else { re_dealloc(node->values); if (node->status & RE_STATUS_STRING) { re_dealloc(node->string.bad_character_offset); re_dealloc(node->string.good_suffix_offset); } re_dealloc(node); } } pattern->node_count = new_count; } /* Marks all the group which are named. Returns FALSE if there's an error. */ Py_LOCAL_INLINE(BOOL) mark_named_groups(PatternObject* pattern) { size_t i; for (i = 0; i < pattern->public_group_count; i++) { RE_GroupInfo* group_info; PyObject* index; int status; group_info = &pattern->group_info[i]; index = Py_BuildValue("n", i + 1); if (!index) return FALSE; status = PyDict_Contains(pattern->indexgroup, index); Py_DECREF(index); if (status < 0) return FALSE; group_info->has_name = status == 1; } return TRUE; } /* Gets the test node. * * The test node lets the matcher look ahead in the pattern, allowing it to * avoid the cost of housekeeping, only to find that what follows doesn't match * anyway. */ Py_LOCAL_INLINE(void) set_test_node(RE_NextNode* next) { RE_Node* node = next->node; RE_Node* test; next->test = node; next->match_next = node; next->match_step = 0; if (!node) return; test = node; while (test->op == RE_OP_END_GROUP || test->op == RE_OP_START_GROUP) test = test->next_1.node; next->test = test; if (test != node) return; switch (test->op) { case RE_OP_ANY: case RE_OP_ANY_ALL: case RE_OP_ANY_ALL_REV: case RE_OP_ANY_REV: case RE_OP_ANY_U: case RE_OP_ANY_U_REV: case RE_OP_BOUNDARY: case RE_OP_CHARACTER: case RE_OP_CHARACTER_IGN: case RE_OP_CHARACTER_IGN_REV: case RE_OP_CHARACTER_REV: case RE_OP_DEFAULT_BOUNDARY: case RE_OP_DEFAULT_END_OF_WORD: case RE_OP_DEFAULT_START_OF_WORD: case RE_OP_END_OF_LINE: case RE_OP_END_OF_LINE_U: case RE_OP_END_OF_STRING: case RE_OP_END_OF_STRING_LINE: case RE_OP_END_OF_STRING_LINE_U: case RE_OP_END_OF_WORD: case RE_OP_GRAPHEME_BOUNDARY: case RE_OP_PROPERTY: case RE_OP_PROPERTY_IGN: case RE_OP_PROPERTY_IGN_REV: case RE_OP_PROPERTY_REV: case RE_OP_RANGE: case RE_OP_RANGE_IGN: case RE_OP_RANGE_IGN_REV: case RE_OP_RANGE_REV: case RE_OP_SEARCH_ANCHOR: case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_IGN: case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER: case RE_OP_SET_INTER_IGN: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION: case RE_OP_SET_UNION_IGN: case RE_OP_SET_UNION_IGN_REV: case RE_OP_SET_UNION_REV: case RE_OP_START_OF_LINE: case RE_OP_START_OF_LINE_U: case RE_OP_START_OF_STRING: case RE_OP_START_OF_WORD: case RE_OP_STRING: case RE_OP_STRING_FLD: case RE_OP_STRING_FLD_REV: case RE_OP_STRING_IGN: case RE_OP_STRING_IGN_REV: case RE_OP_STRING_REV: next->match_next = test->next_1.node; next->match_step = test->step; break; case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: if (test->values[1] > 0) next->test = test; break; } } /* Sets the test nodes. */ Py_LOCAL_INLINE(void) set_test_nodes(PatternObject* pattern) { RE_Node** node_list; size_t i; node_list = pattern->node_list; for (i = 0; i < pattern->node_count; i++) { RE_Node* node; node = node_list[i]; set_test_node(&node->next_1); if (!(node->status & RE_STATUS_STRING)) set_test_node(&node->nonstring.next_2); } } /* Optimises the pattern. */ Py_LOCAL_INLINE(BOOL) optimise_pattern(PatternObject* pattern) { size_t i; /* Building the nodes is made simpler by allowing branches to have a single * exit. These need to be removed. */ skip_one_way_branches(pattern); /* Add position guards for repeat bodies containing a reference to a group * or repeat tails followed at some point by a reference to a group. */ add_repeat_guards(pattern, pattern->start_node); /* Record the index of repeats and fuzzy sections within the body of atomic * and lookaround nodes. */ if (!record_subpattern_repeats_and_fuzzy_sections(NULL, 0, pattern->repeat_count, pattern->start_node)) return FALSE; for (i = 0; i < pattern->call_ref_info_count; i++) { RE_Node* node; node = pattern->call_ref_info[i].node; if (!record_subpattern_repeats_and_fuzzy_sections(NULL, 0, pattern->repeat_count, node)) return FALSE; } /* Discard any unused nodes. */ discard_unused_nodes(pattern); /* Set the test nodes. */ set_test_nodes(pattern); /* Mark all the group that are named. */ if (!mark_named_groups(pattern)) return FALSE; return TRUE; } /* Creates a new pattern node. */ Py_LOCAL_INLINE(RE_Node*) create_node(PatternObject* pattern, RE_UINT8 op, RE_CODE flags, Py_ssize_t step, size_t value_count) { RE_Node* node; node = (RE_Node*)re_alloc(sizeof(*node)); if (!node) return NULL; memset(node, 0, sizeof(RE_Node)); node->value_count = value_count; if (node->value_count > 0) { node->values = (RE_CODE*)re_alloc(node->value_count * sizeof(RE_CODE)); if (!node->values) goto error; } else node->values = NULL; node->op = op; node->match = (flags & RE_POSITIVE_OP) != 0; node->status = (RE_STATUS_T)(flags << RE_STATUS_SHIFT); node->step = step; /* Ensure that there's enough storage to record the new node. */ if (pattern->node_count >= pattern->node_capacity) { RE_Node** new_node_list; pattern->node_capacity *= 2; if (pattern->node_capacity == 0) pattern->node_capacity = RE_INIT_NODE_LIST_SIZE; new_node_list = (RE_Node**)re_realloc(pattern->node_list, pattern->node_capacity * sizeof(RE_Node*)); if (!new_node_list) goto error; pattern->node_list = new_node_list; } /* Record the new node. */ pattern->node_list[pattern->node_count++] = node; return node; error: re_dealloc(node->values); re_dealloc(node); return NULL; } /* Adds a node as a next node for another node. */ Py_LOCAL_INLINE(void) add_node(RE_Node* node_1, RE_Node* node_2) { if (!node_1->next_1.node) node_1->next_1.node = node_2; else node_1->nonstring.next_2.node = node_2; } /* Ensures that the entry for a group's details actually exists. */ Py_LOCAL_INLINE(BOOL) ensure_group(PatternObject* pattern, size_t group) { size_t old_capacity; size_t new_capacity; RE_GroupInfo* new_group_info; if (group <= pattern->true_group_count) /* We already have an entry for the group. */ return TRUE; /* Increase the storage capacity to include the new entry if it's * insufficient. */ old_capacity = pattern->group_info_capacity; new_capacity = pattern->group_info_capacity; while (group > new_capacity) new_capacity += RE_LIST_SIZE_INC; if (new_capacity > old_capacity) { new_group_info = (RE_GroupInfo*)re_realloc(pattern->group_info, new_capacity * sizeof(RE_GroupInfo)); if (!new_group_info) return FALSE; memset(new_group_info + old_capacity, 0, (new_capacity - old_capacity) * sizeof(RE_GroupInfo)); pattern->group_info = new_group_info; pattern->group_info_capacity = new_capacity; } pattern->true_group_count = group; return TRUE; } /* Records that there's a reference to a group. */ Py_LOCAL_INLINE(BOOL) record_ref_group(PatternObject* pattern, size_t group) { if (!ensure_group(pattern, group)) return FALSE; pattern->group_info[group - 1].referenced = TRUE; return TRUE; } /* Records that there's a new group. */ Py_LOCAL_INLINE(BOOL) record_group(PatternObject* pattern, size_t group, RE_Node* node) { if (!ensure_group(pattern, group)) return FALSE; if (group >= 1) { RE_GroupInfo* info; info = &pattern->group_info[group - 1]; info->end_index = (Py_ssize_t)pattern->true_group_count; info->node = node; } return TRUE; } /* Records that a group has closed. */ Py_LOCAL_INLINE(void) record_group_end(PatternObject* pattern, size_t group) { if (group >= 1) pattern->group_info[group - 1].end_index = ++pattern->group_end_index; } /* Ensures that the entry for a call_ref's details actually exists. */ Py_LOCAL_INLINE(BOOL) ensure_call_ref(PatternObject* pattern, size_t call_ref) { size_t old_capacity; size_t new_capacity; RE_CallRefInfo* new_call_ref_info; if (call_ref < pattern->call_ref_info_count) /* We already have an entry for the call_ref. */ return TRUE; /* Increase the storage capacity to include the new entry if it's * insufficient. */ old_capacity = pattern->call_ref_info_capacity; new_capacity = pattern->call_ref_info_capacity; while (call_ref >= new_capacity) new_capacity += RE_LIST_SIZE_INC; if (new_capacity > old_capacity) { new_call_ref_info = (RE_CallRefInfo*)re_realloc(pattern->call_ref_info, new_capacity * sizeof(RE_CallRefInfo)); if (!new_call_ref_info) return FALSE; memset(new_call_ref_info + old_capacity, 0, (new_capacity - old_capacity) * sizeof(RE_CallRefInfo)); pattern->call_ref_info = new_call_ref_info; pattern->call_ref_info_capacity = new_capacity; } pattern->call_ref_info_count = 1 + call_ref; return TRUE; } /* Records that a call_ref is defined. */ Py_LOCAL_INLINE(BOOL) record_call_ref_defined(PatternObject* pattern, size_t call_ref, RE_Node* node) { if (!ensure_call_ref(pattern, call_ref)) return FALSE; pattern->call_ref_info[call_ref].defined = TRUE; pattern->call_ref_info[call_ref].node = node; return TRUE; } /* Records that a call_ref is used. */ Py_LOCAL_INLINE(BOOL) record_call_ref_used(PatternObject* pattern, size_t call_ref) { if (!ensure_call_ref(pattern, call_ref)) return FALSE; pattern->call_ref_info[call_ref].used = TRUE; return TRUE; } /* Checks whether a node matches one and only one character. */ Py_LOCAL_INLINE(BOOL) sequence_matches_one(RE_Node* node) { while (node->op == RE_OP_BRANCH && !node->nonstring.next_2.node) node = node->next_1.node; if (node->next_1.node || (node->status & RE_STATUS_FUZZY)) return FALSE; return node_matches_one_character(node); } /* Records a repeat. */ Py_LOCAL_INLINE(BOOL) record_repeat(PatternObject* pattern, size_t index, size_t repeat_depth) { size_t old_capacity; size_t new_capacity; /* Increase the storage capacity to include the new entry if it's * insufficient. */ old_capacity = pattern->repeat_info_capacity; new_capacity = pattern->repeat_info_capacity; while (index >= new_capacity) new_capacity += RE_LIST_SIZE_INC; if (new_capacity > old_capacity) { RE_RepeatInfo* new_repeat_info; new_repeat_info = (RE_RepeatInfo*)re_realloc(pattern->repeat_info, new_capacity * sizeof(RE_RepeatInfo)); if (!new_repeat_info) return FALSE; memset(new_repeat_info + old_capacity, 0, (new_capacity - old_capacity) * sizeof(RE_RepeatInfo)); pattern->repeat_info = new_repeat_info; pattern->repeat_info_capacity = new_capacity; } if (index >= pattern->repeat_count) pattern->repeat_count = index + 1; if (repeat_depth > 0) pattern->repeat_info[index].status |= RE_STATUS_INNER; return TRUE; } Py_LOCAL_INLINE(Py_ssize_t) get_step(RE_CODE op) { switch (op) { case RE_OP_ANY: case RE_OP_ANY_ALL: case RE_OP_ANY_U: case RE_OP_CHARACTER: case RE_OP_CHARACTER_IGN: case RE_OP_PROPERTY: case RE_OP_PROPERTY_IGN: case RE_OP_RANGE: case RE_OP_RANGE_IGN: case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION: case RE_OP_SET_UNION_IGN: case RE_OP_STRING: case RE_OP_STRING_FLD: case RE_OP_STRING_IGN: return 1; case RE_OP_ANY_ALL_REV: case RE_OP_ANY_REV: case RE_OP_ANY_U_REV: case RE_OP_CHARACTER_IGN_REV: case RE_OP_CHARACTER_REV: case RE_OP_PROPERTY_IGN_REV: case RE_OP_PROPERTY_REV: case RE_OP_RANGE_IGN_REV: case RE_OP_RANGE_REV: case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_IGN_REV: case RE_OP_SET_UNION_REV: case RE_OP_STRING_FLD_REV: case RE_OP_STRING_IGN_REV: case RE_OP_STRING_REV: return -1; } return 0; } Py_LOCAL_INLINE(int) build_sequence(RE_CompileArgs* args); /* Builds an ANY node. */ Py_LOCAL_INLINE(int) build_ANY(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; Py_ssize_t step; RE_Node* node; /* codes: opcode, flags. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; flags = args->code[1]; step = get_step(op); /* Create the node. */ node = create_node(args->pattern, op, flags, step, 0); if (!node) return RE_ERROR_MEMORY; args->code += 2; /* Append the node. */ add_node(args->end, node); args->end = node; ++args->min_width; return RE_ERROR_SUCCESS; } /* Builds a FUZZY node. */ Py_LOCAL_INLINE(int) build_FUZZY(RE_CompileArgs* args) { RE_CODE flags; RE_Node* start_node; RE_Node* end_node; RE_CODE index; RE_CompileArgs subargs; int status; /* codes: opcode, flags, constraints, sequence, end. */ if (args->code + 13 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; /* Create nodes for the start and end of the fuzzy sequence. */ start_node = create_node(args->pattern, RE_OP_FUZZY, flags, 0, 9); end_node = create_node(args->pattern, RE_OP_END_FUZZY, flags, 0, 5); if (!start_node || !end_node) return RE_ERROR_MEMORY; index = (RE_CODE)args->pattern->fuzzy_count++; start_node->values[0] = index; end_node->values[0] = index; /* The constraints consist of 4 pairs of limits and the cost equation. */ end_node->values[RE_FUZZY_VAL_MIN_DEL] = args->code[2]; /* Deletion minimum. */ end_node->values[RE_FUZZY_VAL_MIN_INS] = args->code[4]; /* Insertion minimum. */ end_node->values[RE_FUZZY_VAL_MIN_SUB] = args->code[6]; /* Substitution minimum. */ end_node->values[RE_FUZZY_VAL_MIN_ERR] = args->code[8]; /* Error minimum. */ start_node->values[RE_FUZZY_VAL_MAX_DEL] = args->code[3]; /* Deletion maximum. */ start_node->values[RE_FUZZY_VAL_MAX_INS] = args->code[5]; /* Insertion maximum. */ start_node->values[RE_FUZZY_VAL_MAX_SUB] = args->code[7]; /* Substitution maximum. */ start_node->values[RE_FUZZY_VAL_MAX_ERR] = args->code[9]; /* Error maximum. */ start_node->values[RE_FUZZY_VAL_DEL_COST] = args->code[10]; /* Deletion cost. */ start_node->values[RE_FUZZY_VAL_INS_COST] = args->code[11]; /* Insertion cost. */ start_node->values[RE_FUZZY_VAL_SUB_COST] = args->code[12]; /* Substitution cost. */ start_node->values[RE_FUZZY_VAL_MAX_COST] = args->code[13]; /* Total cost. */ args->code += 14; subargs = *args; subargs.within_fuzzy = TRUE; /* Compile the sequence and check that we've reached the end of the * subpattern. */ status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy = TRUE; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; ++args->code; /* Append the fuzzy sequence. */ add_node(args->end, start_node); add_node(start_node, subargs.start); add_node(subargs.end, end_node); args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds an ATOMIC node. */ Py_LOCAL_INLINE(int) build_ATOMIC(RE_CompileArgs* args) { RE_Node* atomic_node; RE_CompileArgs subargs; int status; RE_Node* end_node; /* codes: opcode, sequence, end. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; atomic_node = create_node(args->pattern, RE_OP_ATOMIC, 0, 0, 0); if (!atomic_node) return RE_ERROR_MEMORY; ++args->code; /* Compile the sequence and check that we've reached the end of it. */ subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; ++args->code; /* Check the subpattern. */ args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; if (subargs.has_groups) atomic_node->status |= RE_STATUS_HAS_GROUPS; if (subargs.has_repeats) atomic_node->status |= RE_STATUS_HAS_REPEATS; /* Create the node to terminate the subpattern. */ end_node = create_node(subargs.pattern, RE_OP_END_ATOMIC, 0, 0, 0); if (!end_node) return RE_ERROR_MEMORY; /* Append the new sequence. */ add_node(args->end, atomic_node); add_node(atomic_node, subargs.start); add_node(subargs.end, end_node); args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a BOUNDARY node. */ Py_LOCAL_INLINE(int) build_BOUNDARY(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; RE_Node* node; /* codes: opcode, flags. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; flags = args->code[1]; args->code += 2; /* Create the node. */ node = create_node(args->pattern, op, flags, 0, 0); if (!node) return RE_ERROR_MEMORY; /* Append the node. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a BRANCH node. */ Py_LOCAL_INLINE(int) build_BRANCH(RE_CompileArgs* args) { RE_Node* branch_node; RE_Node* join_node; Py_ssize_t min_width; RE_CompileArgs subargs; int status; /* codes: opcode, branch, next, branch, end. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; /* Create nodes for the start and end of the branch sequence. */ branch_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); join_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!branch_node || !join_node) return RE_ERROR_MEMORY; /* Append the node. */ add_node(args->end, branch_node); args->end = join_node; min_width = PY_SSIZE_T_MAX; subargs = *args; /* A branch in the regular expression is compiled into a series of 2-way * branches. */ do { RE_Node* next_branch_node; /* Skip over the 'BRANCH' or 'NEXT' opcode. */ ++subargs.code; /* Compile the sequence until the next 'BRANCH' or 'NEXT' opcode. */ status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; min_width = min_ssize_t(min_width, subargs.min_width); args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; /* Append the sequence. */ add_node(branch_node, subargs.start); add_node(subargs.end, join_node); /* Create a start node for the next sequence and append it. */ next_branch_node = create_node(subargs.pattern, RE_OP_BRANCH, 0, 0, 0); if (!next_branch_node) return RE_ERROR_MEMORY; add_node(branch_node, next_branch_node); branch_node = next_branch_node; } while (subargs.code < subargs.end_code && subargs.code[0] == RE_OP_NEXT); /* We should have reached the end of the branch. */ if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; ++args->code; args->min_width += min_width; return RE_ERROR_SUCCESS; } /* Builds a CALL_REF node. */ Py_LOCAL_INLINE(int) build_CALL_REF(RE_CompileArgs* args) { RE_CODE call_ref; RE_Node* start_node; RE_Node* end_node; RE_CompileArgs subargs; int status; /* codes: opcode, call_ref. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; call_ref = args->code[1]; args->code += 2; /* Create nodes for the start and end of the subpattern. */ start_node = create_node(args->pattern, RE_OP_CALL_REF, 0, 0, 1); end_node = create_node(args->pattern, RE_OP_GROUP_RETURN, 0, 0, 0); if (!start_node || !end_node) return RE_ERROR_MEMORY; start_node->values[0] = call_ref; /* Compile the sequence and check that we've reached the end of the * subpattern. */ subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; ++args->code; /* Record that we defined a call_ref. */ if (!record_call_ref_defined(args->pattern, call_ref, start_node)) return RE_ERROR_MEMORY; /* Append the node. */ add_node(args->end, start_node); add_node(start_node, subargs.start); add_node(subargs.end, end_node); args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a CHARACTER or PROPERTY node. */ Py_LOCAL_INLINE(int) build_CHARACTER_or_PROPERTY(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; Py_ssize_t step; RE_Node* node; /* codes: opcode, flags, value. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; flags = args->code[1]; step = get_step(op); if (flags & RE_ZEROWIDTH_OP) step = 0; /* Create the node. */ node = create_node(args->pattern, op, flags, step, 1); if (!node) return RE_ERROR_MEMORY; node->values[0] = args->code[2]; args->code += 3; /* Append the node. */ add_node(args->end, node); args->end = node; if (step != 0) ++args->min_width; return RE_ERROR_SUCCESS; } /* Builds a CONDITIONAL node. */ Py_LOCAL_INLINE(int) build_CONDITIONAL(RE_CompileArgs* args) { RE_CODE flags; BOOL forward; RE_Node* test_node; RE_CompileArgs subargs; int status; RE_Node* end_test_node; RE_Node* end_node; Py_ssize_t min_width; /* codes: opcode, flags, forward, sequence, next, sequence, next, sequence, * end. */ if (args->code + 4 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; forward = (BOOL)args->code[2]; /* Create a node for the lookaround. */ test_node = create_node(args->pattern, RE_OP_CONDITIONAL, flags, 0, 0); if (!test_node) return RE_ERROR_MEMORY; args->code += 3; add_node(args->end, test_node); /* Compile the lookaround test and check that we've reached the end of the * subpattern. */ subargs = *args; subargs.forward = forward; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_NEXT) return RE_ERROR_ILLEGAL; args->code = subargs.code; ++args->code; /* Check the lookaround subpattern. */ args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; if (subargs.has_groups) test_node->status |= RE_STATUS_HAS_GROUPS; if (subargs.has_repeats) test_node->status |= RE_STATUS_HAS_REPEATS; /* Create the node to terminate the test. */ end_test_node = create_node(args->pattern, RE_OP_END_CONDITIONAL, 0, 0, 0); if (!end_test_node) return RE_ERROR_MEMORY; /* test node -> test -> end test node */ add_node(test_node, subargs.start); add_node(subargs.end, end_test_node); /* Compile the true branch. */ subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; /* Check the true branch. */ args->code = subargs.code; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; min_width = subargs.min_width; /* Create the terminating node. */ end_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!end_node) return RE_ERROR_MEMORY; /* end test node -> true branch -> end node */ add_node(end_test_node, subargs.start); add_node(subargs.end, end_node); if (args->code[0] == RE_OP_NEXT) { /* There's a false branch. */ ++args->code; /* Compile the false branch. */ subargs.code = args->code; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; /* Check the false branch. */ args->code = subargs.code; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; min_width = min_ssize_t(min_width, subargs.min_width); /* test node -> false branch -> end node */ add_node(test_node, subargs.start); add_node(subargs.end, end_node); } else /* end test node -> end node */ add_node(end_test_node, end_node); if (args->code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->min_width += min_width; ++args->code; args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a GROUP node. */ Py_LOCAL_INLINE(int) build_GROUP(RE_CompileArgs* args) { RE_CODE private_group; RE_CODE public_group; RE_Node* start_node; RE_Node* end_node; RE_CompileArgs subargs; int status; /* codes: opcode, private_group, public_group. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; private_group = args->code[1]; public_group = args->code[2]; args->code += 3; /* Create nodes for the start and end of the capture group. */ start_node = create_node(args->pattern, args->forward ? RE_OP_START_GROUP : RE_OP_END_GROUP, 0, 0, 3); end_node = create_node(args->pattern, args->forward ? RE_OP_END_GROUP : RE_OP_START_GROUP, 0, 0, 3); if (!start_node || !end_node) return RE_ERROR_MEMORY; start_node->values[0] = private_group; end_node->values[0] = private_group; start_node->values[1] = public_group; end_node->values[1] = public_group; /* Signal that the capture should be saved when it's complete. */ start_node->values[2] = 0; end_node->values[2] = 1; /* Record that we have a new capture group. */ if (!record_group(args->pattern, private_group, start_node)) return RE_ERROR_MEMORY; /* Compile the sequence and check that we've reached the end of the capture * group. */ subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures | subargs.visible_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= TRUE; args->has_repeats |= subargs.has_repeats; ++args->code; /* Record that the capture group has closed. */ record_group_end(args->pattern, private_group); /* Append the capture group. */ add_node(args->end, start_node); add_node(start_node, subargs.start); add_node(subargs.end, end_node); args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a GROUP_CALL node. */ Py_LOCAL_INLINE(int) build_GROUP_CALL(RE_CompileArgs* args) { RE_CODE call_ref; RE_Node* node; /* codes: opcode, call_ref. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; call_ref = args->code[1]; /* Create the node. */ node = create_node(args->pattern, RE_OP_GROUP_CALL, 0, 0, 1); if (!node) return RE_ERROR_MEMORY; node->values[0] = call_ref; node->status |= RE_STATUS_HAS_GROUPS; node->status |= RE_STATUS_HAS_REPEATS; args->code += 2; /* Record that we used a call_ref. */ if (!record_call_ref_used(args->pattern, call_ref)) return RE_ERROR_MEMORY; /* Append the node. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a GROUP_EXISTS node. */ Py_LOCAL_INLINE(int) build_GROUP_EXISTS(RE_CompileArgs* args) { RE_CODE group; RE_Node* start_node; RE_Node* end_node; RE_CompileArgs subargs; int status; Py_ssize_t min_width; /* codes: opcode, sequence, next, sequence, end. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; group = args->code[1]; args->code += 2; /* Record that we have a reference to a group. If group is 0, then we have * a DEFINE and not a true group. */ if (group > 0 && !record_ref_group(args->pattern, group)) return RE_ERROR_MEMORY; /* Create nodes for the start and end of the structure. */ start_node = create_node(args->pattern, RE_OP_GROUP_EXISTS, 0, 0, 1); end_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!start_node || !end_node) return RE_ERROR_MEMORY; start_node->values[0] = group; subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; args->code = subargs.code; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; min_width = subargs.min_width; /* Append the start node. */ add_node(args->end, start_node); add_node(start_node, subargs.start); if (args->code[0] == RE_OP_NEXT) { RE_Node* true_branch_end; ++args->code; true_branch_end = subargs.end; subargs.code = args->code; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; args->code = subargs.code; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; if (group == 0) { /* Join the 2 branches end-to-end and bypass it. The sequence * itself will never be matched as a whole, so it doesn't matter. */ min_width = 0; add_node(start_node, end_node); add_node(true_branch_end, subargs.start); } else { args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; min_width = min_ssize_t(min_width, subargs.min_width); add_node(start_node, subargs.start); add_node(true_branch_end, end_node); } add_node(subargs.end, end_node); } else { add_node(start_node, end_node); add_node(subargs.end, end_node); min_width = 0; } args->min_width += min_width; if (args->code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; ++args->code; args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a LOOKAROUND node. */ Py_LOCAL_INLINE(int) build_LOOKAROUND(RE_CompileArgs* args) { RE_CODE flags; BOOL forward; RE_Node* lookaround_node; RE_CompileArgs subargs; int status; RE_Node* end_node; RE_Node* next_node; /* codes: opcode, flags, forward, sequence, end. */ if (args->code + 3 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; forward = (BOOL)args->code[2]; /* Create a node for the lookaround. */ lookaround_node = create_node(args->pattern, RE_OP_LOOKAROUND, flags, 0, 0); if (!lookaround_node) return RE_ERROR_MEMORY; args->code += 3; /* Compile the sequence and check that we've reached the end of the * subpattern. */ subargs = *args; subargs.forward = forward; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; ++args->code; /* Check the subpattern. */ args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; if (subargs.has_groups) lookaround_node->status |= RE_STATUS_HAS_GROUPS; if (subargs.has_repeats) lookaround_node->status |= RE_STATUS_HAS_REPEATS; /* Create the node to terminate the subpattern. */ end_node = create_node(args->pattern, RE_OP_END_LOOKAROUND, 0, 0, 0); if (!end_node) return RE_ERROR_MEMORY; /* Make a continuation node. */ next_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!next_node) return RE_ERROR_MEMORY; /* Append the new sequence. */ add_node(args->end, lookaround_node); add_node(lookaround_node, subargs.start); add_node(lookaround_node, next_node); add_node(subargs.end, end_node); add_node(end_node, next_node); args->end = next_node; return RE_ERROR_SUCCESS; } /* Builds a RANGE node. */ Py_LOCAL_INLINE(int) build_RANGE(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; Py_ssize_t step; RE_Node* node; /* codes: opcode, flags, lower, upper. */ if (args->code + 3 > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; flags = args->code[1]; step = get_step(op); if (flags & RE_ZEROWIDTH_OP) step = 0; /* Create the node. */ node = create_node(args->pattern, op, flags, step, 2); if (!node) return RE_ERROR_MEMORY; node->values[0] = args->code[2]; node->values[1] = args->code[3]; args->code += 4; /* Append the node. */ add_node(args->end, node); args->end = node; if (step != 0) ++args->min_width; return RE_ERROR_SUCCESS; } /* Builds a REF_GROUP node. */ Py_LOCAL_INLINE(int) build_REF_GROUP(RE_CompileArgs* args) { RE_CODE flags; RE_CODE group; RE_Node* node; /* codes: opcode, flags, group. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; group = args->code[2]; node = create_node(args->pattern, (RE_UINT8)args->code[0], flags, 0, 1); if (!node) return RE_ERROR_MEMORY; node->values[0] = group; args->code += 3; /* Record that we have a reference to a group. */ if (!record_ref_group(args->pattern, group)) return RE_ERROR_MEMORY; /* Append the reference. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a REPEAT node. */ Py_LOCAL_INLINE(int) build_REPEAT(RE_CompileArgs* args) { BOOL greedy; RE_CODE min_count; RE_CODE max_count; int status; /* codes: opcode, min_count, max_count, sequence, end. */ if (args->code + 3 > args->end_code) return RE_ERROR_ILLEGAL; greedy = args->code[0] == RE_OP_GREEDY_REPEAT; min_count = args->code[1]; max_count = args->code[2]; if (args->code[1] > args->code[2]) return RE_ERROR_ILLEGAL; args->code += 3; if (min_count == 1 && max_count == 1) { /* Singly-repeated sequence. */ RE_CompileArgs subargs; subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; ++args->code; /* Append the sequence. */ add_node(args->end, subargs.start); args->end = subargs.end; } else { size_t index; RE_Node* repeat_node; RE_CompileArgs subargs; index = args->pattern->repeat_count; /* Create the nodes for the repeat. */ repeat_node = create_node(args->pattern, greedy ? RE_OP_GREEDY_REPEAT : RE_OP_LAZY_REPEAT, 0, args->forward ? 1 : -1, 4); if (!repeat_node || !record_repeat(args->pattern, index, args->repeat_depth)) return RE_ERROR_MEMORY; repeat_node->values[0] = (RE_CODE)index; repeat_node->values[1] = min_count; repeat_node->values[2] = max_count; repeat_node->values[3] = args->forward; if (args->within_fuzzy) args->pattern->repeat_info[index].status |= RE_STATUS_BODY; /* Compile the 'body' and check that we've reached the end of it. */ subargs = *args; subargs.visible_captures = TRUE; ++subargs.repeat_depth; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += (Py_ssize_t)min_count * subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats = TRUE; ++args->code; /* Is it a repeat of something which will match a single character? * * If it's in a fuzzy section then it won't be optimised as a * single-character repeat. */ if (sequence_matches_one(subargs.start)) { repeat_node->op = greedy ? RE_OP_GREEDY_REPEAT_ONE : RE_OP_LAZY_REPEAT_ONE; /* Append the new sequence. */ add_node(args->end, repeat_node); repeat_node->nonstring.next_2.node = subargs.start; args->end = repeat_node; } else { RE_Node* end_repeat_node; RE_Node* end_node; end_repeat_node = create_node(args->pattern, greedy ? RE_OP_END_GREEDY_REPEAT : RE_OP_END_LAZY_REPEAT, 0, args->forward ? 1 : -1, 4); if (!end_repeat_node) return RE_ERROR_MEMORY; end_repeat_node->values[0] = repeat_node->values[0]; end_repeat_node->values[1] = repeat_node->values[1]; end_repeat_node->values[2] = repeat_node->values[2]; end_repeat_node->values[3] = args->forward; end_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!end_node) return RE_ERROR_MEMORY; /* Append the new sequence. */ add_node(args->end, repeat_node); add_node(repeat_node, subargs.start); add_node(repeat_node, end_node); add_node(subargs.end, end_repeat_node); add_node(end_repeat_node, subargs.start); add_node(end_repeat_node, end_node); args->end = end_node; } } return RE_ERROR_SUCCESS; } /* Builds a STRING node. */ Py_LOCAL_INLINE(int) build_STRING(RE_CompileArgs* args, BOOL is_charset) { RE_CODE flags; RE_CODE length; RE_UINT8 op; Py_ssize_t step; RE_Node* node; size_t i; /* codes: opcode, flags, length, characters. */ flags = args->code[1]; length = args->code[2]; if (args->code + 3 + length > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; step = get_step(op); /* Create the node. */ node = create_node(args->pattern, op, flags, step * (Py_ssize_t)length, length); if (!node) return RE_ERROR_MEMORY; if (!is_charset) node->status |= RE_STATUS_STRING; for (i = 0; i < length; i++) node->values[i] = args->code[3 + i]; args->code += 3 + length; /* Append the node. */ add_node(args->end, node); args->end = node; /* Because of full case-folding, one character in the text could match * multiple characters in the pattern. */ if (op == RE_OP_STRING_FLD || op == RE_OP_STRING_FLD_REV) args->min_width += possible_unfolded_length((Py_ssize_t)length); else args->min_width += (Py_ssize_t)length; return RE_ERROR_SUCCESS; } /* Builds a SET node. */ Py_LOCAL_INLINE(int) build_SET(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; Py_ssize_t step; RE_Node* node; Py_ssize_t min_width; int status; /* codes: opcode, flags, members. */ op = (RE_UINT8)args->code[0]; flags = args->code[1]; step = get_step(op); if (flags & RE_ZEROWIDTH_OP) step = 0; node = create_node(args->pattern, op, flags, step, 0); if (!node) return RE_ERROR_MEMORY; args->code += 2; /* Append the node. */ add_node(args->end, node); args->end = node; min_width = args->min_width; /* Compile the character set. */ do { switch (args->code[0]) { case RE_OP_CHARACTER: case RE_OP_PROPERTY: status = build_CHARACTER_or_PROPERTY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_RANGE: status = build_RANGE(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: status = build_SET(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_STRING: /* A set of characters. */ if (!build_STRING(args, TRUE)) return FALSE; break; default: /* Illegal opcode for a character set. */ return RE_ERROR_ILLEGAL; } } while (args->code < args->end_code && args->code[0] != RE_OP_END); /* Check that we've reached the end correctly. (The last opcode should be * 'END'.) */ if (args->code >= args->end_code || args->code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; ++args->code; /* At this point the set's members are in the main sequence. They need to * be moved out-of-line. */ node->nonstring.next_2.node = node->next_1.node; node->next_1.node = NULL; args->end = node; args->min_width = min_width; if (step != 0) ++args->min_width; return RE_ERROR_SUCCESS; } /* Builds a STRING_SET node. */ Py_LOCAL_INLINE(int) build_STRING_SET(RE_CompileArgs* args) { RE_CODE index; RE_CODE min_len; RE_CODE max_len; RE_Node* node; /* codes: opcode, index, min_len, max_len. */ if (args->code + 3 > args->end_code) return RE_ERROR_ILLEGAL; index = args->code[1]; min_len = args->code[2]; max_len = args->code[3]; node = create_node(args->pattern, (RE_UINT8)args->code[0], 0, 0, 3); if (!node) return RE_ERROR_MEMORY; node->values[0] = index; node->values[1] = min_len; node->values[2] = max_len; args->code += 4; /* Append the reference. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a SUCCESS node . */ Py_LOCAL_INLINE(int) build_SUCCESS(RE_CompileArgs* args) { RE_Node* node; /* code: opcode. */ /* Create the node. */ node = create_node(args->pattern, (RE_UINT8)args->code[0], 0, 0, 0); if (!node) return RE_ERROR_MEMORY; ++args->code; /* Append the node. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a zero-width node. */ Py_LOCAL_INLINE(int) build_zerowidth(RE_CompileArgs* args) { RE_CODE flags; RE_Node* node; /* codes: opcode, flags. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; /* Create the node. */ node = create_node(args->pattern, (RE_UINT8)args->code[0], flags, 0, 0); if (!node) return RE_ERROR_MEMORY; args->code += 2; /* Append the node. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a sequence of nodes from regular expression code. */ Py_LOCAL_INLINE(int) build_sequence(RE_CompileArgs* args) { int status; /* Guarantee that there's something to attach to. */ args->start = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); args->end = args->start; args->min_width = 0; args->has_captures = FALSE; args->is_fuzzy = FALSE; args->has_groups = FALSE; args->has_repeats = FALSE; /* The sequence should end with an opcode we don't understand. If it * doesn't then the code is illegal. */ while (args->code < args->end_code) { /* The following code groups opcodes by format, not function. */ switch (args->code[0]) { case RE_OP_ANY: case RE_OP_ANY_ALL: case RE_OP_ANY_ALL_REV: case RE_OP_ANY_REV: case RE_OP_ANY_U: case RE_OP_ANY_U_REV: /* A simple opcode with no trailing codewords and width of 1. */ status = build_ANY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_ATOMIC: /* An atomic sequence. */ status = build_ATOMIC(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_BOUNDARY: case RE_OP_DEFAULT_BOUNDARY: case RE_OP_DEFAULT_END_OF_WORD: case RE_OP_DEFAULT_START_OF_WORD: case RE_OP_END_OF_WORD: case RE_OP_GRAPHEME_BOUNDARY: case RE_OP_KEEP: case RE_OP_SKIP: case RE_OP_START_OF_WORD: /* A word or grapheme boundary. */ status = build_BOUNDARY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_BRANCH: /* A 2-way branch. */ status = build_BRANCH(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_CALL_REF: /* A group call ref. */ status = build_CALL_REF(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_CHARACTER: case RE_OP_CHARACTER_IGN: case RE_OP_CHARACTER_IGN_REV: case RE_OP_CHARACTER_REV: case RE_OP_PROPERTY: case RE_OP_PROPERTY_IGN: case RE_OP_PROPERTY_IGN_REV: case RE_OP_PROPERTY_REV: /* A character literal or a property. */ status = build_CHARACTER_or_PROPERTY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_CONDITIONAL: /* A lookaround conditional. */ status = build_CONDITIONAL(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_END_OF_LINE: case RE_OP_END_OF_LINE_U: case RE_OP_END_OF_STRING: case RE_OP_END_OF_STRING_LINE: case RE_OP_END_OF_STRING_LINE_U: case RE_OP_SEARCH_ANCHOR: case RE_OP_START_OF_LINE: case RE_OP_START_OF_LINE_U: case RE_OP_START_OF_STRING: /* A simple opcode with no trailing codewords and width of 0. */ status = build_zerowidth(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_FAILURE: case RE_OP_PRUNE: case RE_OP_SUCCESS: status = build_SUCCESS(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_FUZZY: /* A fuzzy sequence. */ status = build_FUZZY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_GREEDY_REPEAT: case RE_OP_LAZY_REPEAT: /* A repeated sequence. */ status = build_REPEAT(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_GROUP: /* A capture group. */ status = build_GROUP(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_GROUP_CALL: /* A group call. */ status = build_GROUP_CALL(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_GROUP_EXISTS: /* A conditional sequence. */ status = build_GROUP_EXISTS(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_LOOKAROUND: /* A lookaround. */ status = build_LOOKAROUND(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_RANGE: case RE_OP_RANGE_IGN: case RE_OP_RANGE_IGN_REV: case RE_OP_RANGE_REV: /* A range. */ status = build_RANGE(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_REF_GROUP: case RE_OP_REF_GROUP_FLD: case RE_OP_REF_GROUP_FLD_REV: case RE_OP_REF_GROUP_IGN: case RE_OP_REF_GROUP_IGN_REV: case RE_OP_REF_GROUP_REV: /* A reference to a group. */ status = build_REF_GROUP(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_IGN: case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER: case RE_OP_SET_INTER_IGN: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION: case RE_OP_SET_UNION_IGN: case RE_OP_SET_UNION_IGN_REV: case RE_OP_SET_UNION_REV: /* A set. */ status = build_SET(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_STRING: case RE_OP_STRING_FLD: case RE_OP_STRING_FLD_REV: case RE_OP_STRING_IGN: case RE_OP_STRING_IGN_REV: case RE_OP_STRING_REV: /* A string literal. */ if (!build_STRING(args, FALSE)) return FALSE; break; case RE_OP_STRING_SET: case RE_OP_STRING_SET_FLD: case RE_OP_STRING_SET_FLD_REV: case RE_OP_STRING_SET_IGN: case RE_OP_STRING_SET_IGN_REV: case RE_OP_STRING_SET_REV: /* A reference to a list. */ status = build_STRING_SET(args); if (status != RE_ERROR_SUCCESS) return status; break; default: /* We've found an opcode which we don't recognise. We'll leave it * for the caller. */ return RE_ERROR_SUCCESS; } } /* If we're here then we should be at the end of the code, otherwise we * have an error. */ return args->code == args->end_code; } /* Compiles the regular expression code to 'nodes'. * * Various details about the regular expression are discovered during * compilation and stored in the PatternObject. */ Py_LOCAL_INLINE(BOOL) compile_to_nodes(RE_CODE* code, RE_CODE* end_code, PatternObject* pattern) { RE_CompileArgs args; int status; /* Compile a regex sequence and then check that we've reached the end * correctly. (The last opcode should be 'SUCCESS'.) * * If successful, 'start' and 'end' will point to the start and end nodes * of the compiled sequence. */ args.code = code; args.end_code = end_code; args.pattern = pattern; args.forward = (pattern->flags & RE_FLAG_REVERSE) == 0; args.visible_captures = FALSE; args.has_captures = FALSE; args.repeat_depth = 0; args.is_fuzzy = FALSE; args.within_fuzzy = FALSE; status = build_sequence(&args); if (status == RE_ERROR_ILLEGAL) set_error(RE_ERROR_ILLEGAL, NULL); if (status != RE_ERROR_SUCCESS) return FALSE; pattern->min_width = args.min_width; pattern->is_fuzzy = args.is_fuzzy; pattern->do_search_start = TRUE; pattern->start_node = args.start; /* Optimise the pattern. */ if (!optimise_pattern(pattern)) return FALSE; pattern->start_test = locate_test_start(pattern->start_node); /* Get the call_ref for the entire pattern, if any. */ if (pattern->start_node->op == RE_OP_CALL_REF) pattern->pattern_call_ref = (Py_ssize_t)pattern->start_node->values[0]; else pattern->pattern_call_ref = -1; return TRUE; } /* Gets the required characters for a regex. * * In the event of an error, it just pretends that there are no required * characters. */ Py_LOCAL_INLINE(void) get_required_chars(PyObject* required_chars, RE_CODE** req_chars, size_t* req_length) { Py_ssize_t len; RE_CODE* chars; Py_ssize_t i; *req_chars = NULL; *req_length = 0; len = PyTuple_GET_SIZE(required_chars); if (len < 1 || PyErr_Occurred()) { PyErr_Clear(); return; } chars = (RE_CODE*)re_alloc((size_t)len * sizeof(RE_CODE)); if (!chars) goto error; for (i = 0; i < len; i++) { PyObject* o; size_t value; /* PyTuple_SET_ITEM borrows the reference. */ o = PyTuple_GET_ITEM(required_chars, i); value = PyLong_AsUnsignedLong(o); if ((Py_ssize_t)value == -1 && PyErr_Occurred()) goto error; chars[i] = (RE_CODE)value; if (chars[i] != value) goto error; } *req_chars = chars; *req_length = (size_t)len; return; error: PyErr_Clear(); re_dealloc(chars); } /* Makes a STRING node. */ Py_LOCAL_INLINE(RE_Node*) make_STRING_node(PatternObject* pattern, RE_UINT8 op, size_t length, RE_CODE* chars) { Py_ssize_t step; RE_Node* node; size_t i; step = get_step(op); /* Create the node. */ node = create_node(pattern, op, 0, step * (Py_ssize_t)length, length); if (!node) return NULL; node->status |= RE_STATUS_STRING; for (i = 0; i < length; i++) node->values[i] = chars[i]; return node; } /* Scans all of the characters in the current locale for their properties. */ Py_LOCAL_INLINE(void) scan_locale_chars(RE_LocaleInfo* locale_info) { int c; for (c = 0; c < 0x100; c++) { unsigned short props = 0; if (isalnum(c)) props |= RE_LOCALE_ALNUM; if (isalpha(c)) props |= RE_LOCALE_ALPHA; if (iscntrl(c)) props |= RE_LOCALE_CNTRL; if (isdigit(c)) props |= RE_LOCALE_DIGIT; if (isgraph(c)) props |= RE_LOCALE_GRAPH; if (islower(c)) props |= RE_LOCALE_LOWER; if (isprint(c)) props |= RE_LOCALE_PRINT; if (ispunct(c)) props |= RE_LOCALE_PUNCT; if (isspace(c)) props |= RE_LOCALE_SPACE; if (isupper(c)) props |= RE_LOCALE_UPPER; locale_info->properties[c] = props; locale_info->uppercase[c] = (unsigned char)toupper(c); locale_info->lowercase[c] = (unsigned char)tolower(c); } } /* Compiles regular expression code to a PatternObject. * * The regular expression code is provided as a list and is then compiled to * 'nodes'. Various details about the regular expression are discovered during * compilation and stored in the PatternObject. */ static PyObject* re_compile(PyObject* self_, PyObject* args) { PyObject* pattern; Py_ssize_t flags = 0; PyObject* code_list; PyObject* groupindex; PyObject* indexgroup; PyObject* named_lists; PyObject* named_list_indexes; Py_ssize_t req_offset; PyObject* required_chars; Py_ssize_t req_flags; size_t public_group_count; Py_ssize_t code_len; RE_CODE* code; Py_ssize_t i; RE_CODE* req_chars; size_t req_length; PatternObject* self; BOOL unicode; BOOL locale; BOOL ascii; BOOL ok; if (!PyArg_ParseTuple(args, "OnOOOOOnOnn:re_compile", &pattern, &flags, &code_list, &groupindex, &indexgroup, &named_lists, &named_list_indexes, &req_offset, &required_chars, &req_flags, &public_group_count)) return NULL; /* Read the regex code. */ code_len = PyList_GET_SIZE(code_list); code = (RE_CODE*)re_alloc((size_t)code_len * sizeof(RE_CODE)); if (!code) return NULL; for (i = 0; i < code_len; i++) { PyObject* o; size_t value; /* PyList_GET_ITEM borrows a reference. */ o = PyList_GET_ITEM(code_list, i); value = PyLong_AsUnsignedLong(o); if ((Py_ssize_t)value == -1 && PyErr_Occurred()) goto error; code[i] = (RE_CODE)value; if (code[i] != value) goto error; } /* Get the required characters. */ get_required_chars(required_chars, &req_chars, &req_length); /* Create the PatternObject. */ self = PyObject_NEW(PatternObject, &Pattern_Type); if (!self) { set_error(RE_ERROR_MEMORY, NULL); re_dealloc(req_chars); re_dealloc(code); return NULL; } /* Initialise the PatternObject. */ self->pattern = pattern; self->flags = flags; self->weakreflist = NULL; self->start_node = NULL; self->repeat_count = 0; self->true_group_count = 0; self->public_group_count = public_group_count; self->group_end_index = 0; self->groupindex = groupindex; self->indexgroup = indexgroup; self->named_lists = named_lists; self->named_lists_count = (size_t)PyDict_Size(named_lists); self->partial_named_lists[0] = NULL; self->partial_named_lists[1] = NULL; self->named_list_indexes = named_list_indexes; self->node_capacity = 0; self->node_count = 0; self->node_list = NULL; self->group_info_capacity = 0; self->group_info = NULL; self->call_ref_info_capacity = 0; self->call_ref_info_count = 0; self->call_ref_info = NULL; self->repeat_info_capacity = 0; self->repeat_info = NULL; self->groups_storage = NULL; self->repeats_storage = NULL; self->fuzzy_count = 0; self->recursive = FALSE; self->req_offset = req_offset; self->req_string = NULL; self->locale_info = NULL; Py_INCREF(self->pattern); Py_INCREF(self->groupindex); Py_INCREF(self->indexgroup); Py_INCREF(self->named_lists); Py_INCREF(self->named_list_indexes); /* Initialise the character encoding. */ unicode = (flags & RE_FLAG_UNICODE) != 0; locale = (flags & RE_FLAG_LOCALE) != 0; ascii = (flags & RE_FLAG_ASCII) != 0; if (!unicode && !locale && !ascii) { if (PyString_Check(self->pattern)) ascii = RE_FLAG_ASCII; else unicode = RE_FLAG_UNICODE; } if (unicode) self->encoding = &unicode_encoding; else if (locale) self->encoding = &locale_encoding; else if (ascii) self->encoding = &ascii_encoding; /* Compile the regular expression code to nodes. */ ok = compile_to_nodes(code, code + code_len, self); /* We no longer need the regular expression code. */ re_dealloc(code); if (!ok) { Py_DECREF(self); re_dealloc(req_chars); return NULL; } /* Make a node for the required string, if there's one. */ if (req_chars) { /* Remove the FULLCASE flag if it's not a Unicode pattern or not * ignoring case. */ if (!(self->flags & RE_FLAG_UNICODE) || !(self->flags & RE_FLAG_IGNORECASE)) req_flags &= ~RE_FLAG_FULLCASE; if (self->flags & RE_FLAG_REVERSE) { switch (req_flags) { case 0: self->req_string = make_STRING_node(self, RE_OP_STRING_REV, req_length, req_chars); break; case RE_FLAG_IGNORECASE | RE_FLAG_FULLCASE: self->req_string = make_STRING_node(self, RE_OP_STRING_FLD_REV, req_length, req_chars); break; case RE_FLAG_IGNORECASE: self->req_string = make_STRING_node(self, RE_OP_STRING_IGN_REV, req_length, req_chars); break; } } else { switch (req_flags) { case 0: self->req_string = make_STRING_node(self, RE_OP_STRING, req_length, req_chars); break; case RE_FLAG_IGNORECASE | RE_FLAG_FULLCASE: self->req_string = make_STRING_node(self, RE_OP_STRING_FLD, req_length, req_chars); break; case RE_FLAG_IGNORECASE: self->req_string = make_STRING_node(self, RE_OP_STRING_IGN, req_length, req_chars); break; } } re_dealloc(req_chars); } if (locale) { /* Store info about the characters in the locale for locale-sensitive * matching. */ self->locale_info = re_alloc(sizeof(RE_LocaleInfo)); if (!self->locale_info) { Py_DECREF(self); return NULL; } scan_locale_chars(self->locale_info); } return (PyObject*)self; error: re_dealloc(code); set_error(RE_ERROR_ILLEGAL, NULL); return NULL; } /* Gets the size of the codewords. */ static PyObject* get_code_size(PyObject* self, PyObject* unused) { return Py_BuildValue("n", sizeof(RE_CODE)); } /* Gets the property dict. */ static PyObject* get_properties(PyObject* self_, PyObject* args) { Py_INCREF(property_dict); return property_dict; } /* Folds the case of a string. */ static PyObject* fold_case(PyObject* self_, PyObject* args) { RE_StringInfo str_info; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo locale_info; Py_ssize_t folded_charsize; void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); Py_ssize_t buf_size; void* folded; Py_ssize_t folded_len; PyObject* result; Py_ssize_t flags; PyObject* string; if (!PyArg_ParseTuple(args, "nO:fold_case", &flags, &string)) return NULL; if (!(flags & RE_FLAG_IGNORECASE)) { Py_INCREF(string); return string; } /* Get the string. */ if (!get_string(string, &str_info)) return NULL; /* Get the function for reading from the original string. */ switch (str_info.charsize) { case 1: char_at = bytes1_char_at; break; case 2: char_at = bytes2_char_at; break; case 4: char_at = bytes4_char_at; break; default: #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return NULL; } /* What's the encoding? */ if (flags & RE_FLAG_UNICODE) encoding = &unicode_encoding; else if (flags & RE_FLAG_LOCALE) { encoding = &locale_encoding; scan_locale_chars(&locale_info); } else if (flags & RE_FLAG_ASCII) encoding = &ascii_encoding; else encoding = &unicode_encoding; /* The folded string will have the same width as the original string. */ folded_charsize = str_info.charsize; /* Get the function for writing to the folded string. */ switch (folded_charsize) { case 1: set_char_at = bytes1_set_char_at; break; case 2: set_char_at = bytes2_set_char_at; break; case 4: set_char_at = bytes4_set_char_at; break; default: #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return NULL; } /* Allocate a buffer for the folded string. */ if (flags & RE_FLAG_FULLCASE) /* When using full case-folding with Unicode, some single codepoints * are mapped to multiple codepoints. */ buf_size = str_info.length * RE_MAX_FOLDED; else buf_size = str_info.length; folded = re_alloc((size_t)(buf_size * folded_charsize)); if (!folded) { #if PY_VERSION_HEX >= 0x02060000 release_buffer(&str_info); #endif return NULL; } /* Fold the case of the string. */ folded_len = 0; if (flags & RE_FLAG_FULLCASE) { /* Full case-folding. */ int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t i; Py_UCS4 codepoints[RE_MAX_FOLDED]; full_case_fold = encoding->full_case_fold; for (i = 0; i < str_info.length; i++) { int count; int j; count = full_case_fold(&locale_info, char_at(str_info.characters, i), codepoints); for (j = 0; j < count; j++) set_char_at(folded, folded_len + j, codepoints[j]); folded_len += count; } } else { /* Simple case-folding. */ Py_UCS4 (*simple_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch); Py_ssize_t i; simple_case_fold = encoding->simple_case_fold; for (i = 0; i < str_info.length; i++) { Py_UCS4 ch; ch = simple_case_fold(&locale_info, char_at(str_info.characters, i)); set_char_at(folded, i, ch); } folded_len = str_info.length; } /* Build the result string. */ if (str_info.is_unicode) result = build_unicode_value(folded, folded_len, folded_charsize); else result = build_bytes_value(folded, folded_len, folded_charsize); re_dealloc(folded); #if PY_VERSION_HEX >= 0x02060000 /* Release the original string's buffer. */ release_buffer(&str_info); #endif return result; } /* Returns a tuple of the Unicode characters that expand on full case-folding. */ static PyObject* get_expand_on_folding(PyObject* self, PyObject* unused) { int count; PyObject* result; int i; /* How many characters are there? */ count = sizeof(re_expand_on_folding) / sizeof(re_expand_on_folding[0]); /* Put all the characters in a tuple. */ result = PyTuple_New(count); if (!result) return NULL; for (i = 0; i < count; i++) { Py_UNICODE codepoint; PyObject* item; codepoint = re_expand_on_folding[i]; item = build_unicode_value(&codepoint, 1, sizeof(codepoint)); if (!item) goto error; /* PyTuple_SetItem borrows the reference. */ PyTuple_SetItem(result, i, item); } return result; error: Py_DECREF(result); return NULL; } /* Returns whether a character has a given value for a Unicode property. */ static PyObject* has_property_value(PyObject* self_, PyObject* args) { BOOL v; Py_ssize_t property_value; Py_ssize_t character; if (!PyArg_ParseTuple(args, "nn:has_property_value", &property_value, &character)) return NULL; v = unicode_has_property((RE_CODE)property_value, (Py_UCS4)character) ? 1 : 0; return Py_BuildValue("n", v); } /* Returns a list of all the simple cases of a character. * * If full case-folding is turned on and the character also expands on full * case-folding, a None is appended to the list. */ static PyObject* get_all_cases(PyObject* self_, PyObject* args) { RE_EncodingTable* encoding; RE_LocaleInfo locale_info; int count; Py_UCS4 cases[RE_MAX_CASES]; PyObject* result; int i; Py_UCS4 folded[RE_MAX_FOLDED]; Py_ssize_t flags; Py_ssize_t character; if (!PyArg_ParseTuple(args, "nn:get_all_cases", &flags, &character)) return NULL; /* What's the encoding? */ if (flags & RE_FLAG_UNICODE) encoding = &unicode_encoding; else if (flags & RE_FLAG_LOCALE) { encoding = &locale_encoding; scan_locale_chars(&locale_info); } else if (flags & RE_FLAG_ASCII) encoding = &ascii_encoding; else encoding = &ascii_encoding; /* Get all the simple cases. */ count = encoding->all_cases(&locale_info, (Py_UCS4)character, cases); result = PyList_New(count); if (!result) return NULL; for (i = 0; i < count; i++) { PyObject* item; item = Py_BuildValue("n", cases[i]); if (!item) goto error; /* PyList_SetItem borrows the reference. */ PyList_SetItem(result, i, item); } /* If the character also expands on full case-folding, append a None. */ if ((flags & RE_FULL_CASE_FOLDING) == RE_FULL_CASE_FOLDING) { count = encoding->full_case_fold(&locale_info, (Py_UCS4)character, folded); if (count > 1) PyList_Append(result, Py_None); } return result; error: Py_DECREF(result); return NULL; } /* The table of the module's functions. */ static PyMethodDef _functions[] = { {"compile", (PyCFunction)re_compile, METH_VARARGS}, {"get_code_size", (PyCFunction)get_code_size, METH_NOARGS}, {"get_properties", (PyCFunction)get_properties, METH_VARARGS}, {"fold_case", (PyCFunction)fold_case, METH_VARARGS}, {"get_expand_on_folding", (PyCFunction)get_expand_on_folding, METH_NOARGS}, {"has_property_value", (PyCFunction)has_property_value, METH_VARARGS}, {"get_all_cases", (PyCFunction)get_all_cases, METH_VARARGS}, {NULL, NULL} }; /* Initialises the property dictionary. */ Py_LOCAL_INLINE(BOOL) init_property_dict(void) { size_t value_set_count; size_t i; PyObject** value_dicts; property_dict = NULL; /* How many value sets are there? */ value_set_count = 0; for (i = 0; i < sizeof(re_property_values) / sizeof(re_property_values[0]); i++) { RE_PropertyValue* value; value = &re_property_values[i]; if (value->value_set >= value_set_count) value_set_count = (size_t)value->value_set + 1; } /* Quick references for the value sets. */ value_dicts = (PyObject**)re_alloc(value_set_count * sizeof(value_dicts[0])); if (!value_dicts) return FALSE; memset(value_dicts, 0, value_set_count * sizeof(value_dicts[0])); /* Build the property values dictionaries. */ for (i = 0; i < sizeof(re_property_values) / sizeof(re_property_values[0]); i++) { RE_PropertyValue* value; PyObject* v; int status; value = &re_property_values[i]; if (!value_dicts[value->value_set]) { value_dicts[value->value_set] = PyDict_New(); if (!value_dicts[value->value_set]) goto error; } v = Py_BuildValue("i", value->id); if (!v) goto error; status = PyDict_SetItemString(value_dicts[value->value_set], re_strings[value->name], v); Py_DECREF(v); if (status < 0) goto error; } /* Build the property dictionary. */ property_dict = PyDict_New(); if (!property_dict) goto error; for (i = 0; i < sizeof(re_properties) / sizeof(re_properties[0]); i++) { RE_Property* property; PyObject* v; int status; property = &re_properties[i]; v = Py_BuildValue("iO", property->id, value_dicts[property->value_set]); if (!v) goto error; status = PyDict_SetItemString(property_dict, re_strings[property->name], v); Py_DECREF(v); if (status < 0) goto error; } /* DECREF the value sets. Any unused ones will be deallocated. */ for (i = 0; i < value_set_count; i++) Py_XDECREF(value_dicts[i]); re_dealloc(value_dicts); return TRUE; error: Py_XDECREF(property_dict); /* DECREF the value sets. */ for (i = 0; i < value_set_count; i++) Py_XDECREF(value_dicts[i]); re_dealloc(value_dicts); return FALSE; } /* Initialises the module. */ PyMODINIT_FUNC init_regex(void) { PyObject* m; PyObject* d; PyObject* x; #if defined(VERBOSE) /* Unbuffered in case it crashes! */ setvbuf(stdout, NULL, _IONBF, 0); #endif /* Initialise Pattern_Type. */ Pattern_Type.tp_dealloc = pattern_dealloc; Pattern_Type.tp_repr = pattern_repr; Pattern_Type.tp_flags = Py_TPFLAGS_HAVE_WEAKREFS; Pattern_Type.tp_doc = pattern_doc; Pattern_Type.tp_weaklistoffset = offsetof(PatternObject, weakreflist); Pattern_Type.tp_methods = pattern_methods; Pattern_Type.tp_members = pattern_members; Pattern_Type.tp_getset = pattern_getset; /* Initialise Match_Type. */ Match_Type.tp_dealloc = match_dealloc; Match_Type.tp_repr = match_repr; Match_Type.tp_as_mapping = &match_as_mapping; Match_Type.tp_flags = Py_TPFLAGS_DEFAULT; Match_Type.tp_doc = match_doc; Match_Type.tp_methods = match_methods; Match_Type.tp_members = match_members; Match_Type.tp_getset = match_getset; /* Initialise Scanner_Type. */ Scanner_Type.tp_dealloc = scanner_dealloc; Scanner_Type.tp_flags = Py_TPFLAGS_DEFAULT; Scanner_Type.tp_doc = scanner_doc; Scanner_Type.tp_iter = scanner_iter; Scanner_Type.tp_iternext = scanner_iternext; Scanner_Type.tp_methods = scanner_methods; Scanner_Type.tp_members = scanner_members; /* Initialise Splitter_Type. */ Splitter_Type.tp_dealloc = splitter_dealloc; Splitter_Type.tp_flags = Py_TPFLAGS_DEFAULT; Splitter_Type.tp_doc = splitter_doc; Splitter_Type.tp_iter = splitter_iter; Splitter_Type.tp_iternext = splitter_iternext; Splitter_Type.tp_methods = splitter_methods; Splitter_Type.tp_members = splitter_members; #if PY_VERSION_HEX >= 0x02060000 /* Initialise Capture_Type. */ Capture_Type.tp_dealloc = capture_dealloc; Capture_Type.tp_str = capture_str; Capture_Type.tp_as_mapping = &capture_as_mapping; Capture_Type.tp_flags = Py_TPFLAGS_DEFAULT; Capture_Type.tp_methods = capture_methods; #endif /* Initialize object types */ if (PyType_Ready(&Pattern_Type) < 0) return; if (PyType_Ready(&Match_Type) < 0) return; if (PyType_Ready(&Scanner_Type) < 0) return; if (PyType_Ready(&Splitter_Type) < 0) return; #if PY_VERSION_HEX >= 0x02060000 if (PyType_Ready(&Capture_Type) < 0) return; #endif error_exception = NULL; m = Py_InitModule("_" RE_MODULE, _functions); if (!m) return; d = PyModule_GetDict(m); x = PyInt_FromLong(RE_MAGIC); if (x) { PyDict_SetItemString(d, "MAGIC", x); Py_DECREF(x); } x = PyInt_FromLong(sizeof(RE_CODE)); if (x) { PyDict_SetItemString(d, "CODE_SIZE", x); Py_DECREF(x); } x = PyString_FromString(copyright); if (x) { PyDict_SetItemString(d, "copyright", x); Py_DECREF(x); } /* Initialise the property dictionary. */ if (!init_property_dict()) return; } /* vim:ts=4:sw=4:et */ regex-2016.01.10/Python2/_regex.h0000666000000000000000000001461312644551563014353 0ustar 00000000000000/* * Secret Labs' Regular Expression Engine * * regular expression matching engine * * Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved. * * NOTE: This file is generated by regex.py. If you need * to change anything in here, edit regex.py and run it. * * 2010-01-16 mrab Re-written */ /* Supports Unicode version 8.0.0. */ #define RE_MAGIC 20100116 #include "_regex_unicode.h" /* Operators. */ #define RE_OP_FAILURE 0 #define RE_OP_SUCCESS 1 #define RE_OP_ANY 2 #define RE_OP_ANY_ALL 3 #define RE_OP_ANY_ALL_REV 4 #define RE_OP_ANY_REV 5 #define RE_OP_ANY_U 6 #define RE_OP_ANY_U_REV 7 #define RE_OP_ATOMIC 8 #define RE_OP_BOUNDARY 9 #define RE_OP_BRANCH 10 #define RE_OP_CALL_REF 11 #define RE_OP_CHARACTER 12 #define RE_OP_CHARACTER_IGN 13 #define RE_OP_CHARACTER_IGN_REV 14 #define RE_OP_CHARACTER_REV 15 #define RE_OP_CONDITIONAL 16 #define RE_OP_DEFAULT_BOUNDARY 17 #define RE_OP_DEFAULT_END_OF_WORD 18 #define RE_OP_DEFAULT_START_OF_WORD 19 #define RE_OP_END 20 #define RE_OP_END_OF_LINE 21 #define RE_OP_END_OF_LINE_U 22 #define RE_OP_END_OF_STRING 23 #define RE_OP_END_OF_STRING_LINE 24 #define RE_OP_END_OF_STRING_LINE_U 25 #define RE_OP_END_OF_WORD 26 #define RE_OP_FUZZY 27 #define RE_OP_GRAPHEME_BOUNDARY 28 #define RE_OP_GREEDY_REPEAT 29 #define RE_OP_GROUP 30 #define RE_OP_GROUP_CALL 31 #define RE_OP_GROUP_EXISTS 32 #define RE_OP_KEEP 33 #define RE_OP_LAZY_REPEAT 34 #define RE_OP_LOOKAROUND 35 #define RE_OP_NEXT 36 #define RE_OP_PROPERTY 37 #define RE_OP_PROPERTY_IGN 38 #define RE_OP_PROPERTY_IGN_REV 39 #define RE_OP_PROPERTY_REV 40 #define RE_OP_PRUNE 41 #define RE_OP_RANGE 42 #define RE_OP_RANGE_IGN 43 #define RE_OP_RANGE_IGN_REV 44 #define RE_OP_RANGE_REV 45 #define RE_OP_REF_GROUP 46 #define RE_OP_REF_GROUP_FLD 47 #define RE_OP_REF_GROUP_FLD_REV 48 #define RE_OP_REF_GROUP_IGN 49 #define RE_OP_REF_GROUP_IGN_REV 50 #define RE_OP_REF_GROUP_REV 51 #define RE_OP_SEARCH_ANCHOR 52 #define RE_OP_SET_DIFF 53 #define RE_OP_SET_DIFF_IGN 54 #define RE_OP_SET_DIFF_IGN_REV 55 #define RE_OP_SET_DIFF_REV 56 #define RE_OP_SET_INTER 57 #define RE_OP_SET_INTER_IGN 58 #define RE_OP_SET_INTER_IGN_REV 59 #define RE_OP_SET_INTER_REV 60 #define RE_OP_SET_SYM_DIFF 61 #define RE_OP_SET_SYM_DIFF_IGN 62 #define RE_OP_SET_SYM_DIFF_IGN_REV 63 #define RE_OP_SET_SYM_DIFF_REV 64 #define RE_OP_SET_UNION 65 #define RE_OP_SET_UNION_IGN 66 #define RE_OP_SET_UNION_IGN_REV 67 #define RE_OP_SET_UNION_REV 68 #define RE_OP_SKIP 69 #define RE_OP_START_OF_LINE 70 #define RE_OP_START_OF_LINE_U 71 #define RE_OP_START_OF_STRING 72 #define RE_OP_START_OF_WORD 73 #define RE_OP_STRING 74 #define RE_OP_STRING_FLD 75 #define RE_OP_STRING_FLD_REV 76 #define RE_OP_STRING_IGN 77 #define RE_OP_STRING_IGN_REV 78 #define RE_OP_STRING_REV 79 #define RE_OP_STRING_SET 80 #define RE_OP_STRING_SET_FLD 81 #define RE_OP_STRING_SET_FLD_REV 82 #define RE_OP_STRING_SET_IGN 83 #define RE_OP_STRING_SET_IGN_REV 84 #define RE_OP_STRING_SET_REV 85 #define RE_OP_BODY_END 86 #define RE_OP_BODY_START 87 #define RE_OP_END_ATOMIC 88 #define RE_OP_END_CONDITIONAL 89 #define RE_OP_END_FUZZY 90 #define RE_OP_END_GREEDY_REPEAT 91 #define RE_OP_END_GROUP 92 #define RE_OP_END_LAZY_REPEAT 93 #define RE_OP_END_LOOKAROUND 94 #define RE_OP_GREEDY_REPEAT_ONE 95 #define RE_OP_GROUP_RETURN 96 #define RE_OP_LAZY_REPEAT_ONE 97 #define RE_OP_MATCH_BODY 98 #define RE_OP_MATCH_TAIL 99 #define RE_OP_START_GROUP 100 char* re_op_text[] = { "RE_OP_FAILURE", "RE_OP_SUCCESS", "RE_OP_ANY", "RE_OP_ANY_ALL", "RE_OP_ANY_ALL_REV", "RE_OP_ANY_REV", "RE_OP_ANY_U", "RE_OP_ANY_U_REV", "RE_OP_ATOMIC", "RE_OP_BOUNDARY", "RE_OP_BRANCH", "RE_OP_CALL_REF", "RE_OP_CHARACTER", "RE_OP_CHARACTER_IGN", "RE_OP_CHARACTER_IGN_REV", "RE_OP_CHARACTER_REV", "RE_OP_CONDITIONAL", "RE_OP_DEFAULT_BOUNDARY", "RE_OP_DEFAULT_END_OF_WORD", "RE_OP_DEFAULT_START_OF_WORD", "RE_OP_END", "RE_OP_END_OF_LINE", "RE_OP_END_OF_LINE_U", "RE_OP_END_OF_STRING", "RE_OP_END_OF_STRING_LINE", "RE_OP_END_OF_STRING_LINE_U", "RE_OP_END_OF_WORD", "RE_OP_FUZZY", "RE_OP_GRAPHEME_BOUNDARY", "RE_OP_GREEDY_REPEAT", "RE_OP_GROUP", "RE_OP_GROUP_CALL", "RE_OP_GROUP_EXISTS", "RE_OP_KEEP", "RE_OP_LAZY_REPEAT", "RE_OP_LOOKAROUND", "RE_OP_NEXT", "RE_OP_PROPERTY", "RE_OP_PROPERTY_IGN", "RE_OP_PROPERTY_IGN_REV", "RE_OP_PROPERTY_REV", "RE_OP_PRUNE", "RE_OP_RANGE", "RE_OP_RANGE_IGN", "RE_OP_RANGE_IGN_REV", "RE_OP_RANGE_REV", "RE_OP_REF_GROUP", "RE_OP_REF_GROUP_FLD", "RE_OP_REF_GROUP_FLD_REV", "RE_OP_REF_GROUP_IGN", "RE_OP_REF_GROUP_IGN_REV", "RE_OP_REF_GROUP_REV", "RE_OP_SEARCH_ANCHOR", "RE_OP_SET_DIFF", "RE_OP_SET_DIFF_IGN", "RE_OP_SET_DIFF_IGN_REV", "RE_OP_SET_DIFF_REV", "RE_OP_SET_INTER", "RE_OP_SET_INTER_IGN", "RE_OP_SET_INTER_IGN_REV", "RE_OP_SET_INTER_REV", "RE_OP_SET_SYM_DIFF", "RE_OP_SET_SYM_DIFF_IGN", "RE_OP_SET_SYM_DIFF_IGN_REV", "RE_OP_SET_SYM_DIFF_REV", "RE_OP_SET_UNION", "RE_OP_SET_UNION_IGN", "RE_OP_SET_UNION_IGN_REV", "RE_OP_SET_UNION_REV", "RE_OP_SKIP", "RE_OP_START_OF_LINE", "RE_OP_START_OF_LINE_U", "RE_OP_START_OF_STRING", "RE_OP_START_OF_WORD", "RE_OP_STRING", "RE_OP_STRING_FLD", "RE_OP_STRING_FLD_REV", "RE_OP_STRING_IGN", "RE_OP_STRING_IGN_REV", "RE_OP_STRING_REV", "RE_OP_STRING_SET", "RE_OP_STRING_SET_FLD", "RE_OP_STRING_SET_FLD_REV", "RE_OP_STRING_SET_IGN", "RE_OP_STRING_SET_IGN_REV", "RE_OP_STRING_SET_REV", "RE_OP_BODY_END", "RE_OP_BODY_START", "RE_OP_END_ATOMIC", "RE_OP_END_CONDITIONAL", "RE_OP_END_FUZZY", "RE_OP_END_GREEDY_REPEAT", "RE_OP_END_GROUP", "RE_OP_END_LAZY_REPEAT", "RE_OP_END_LOOKAROUND", "RE_OP_GREEDY_REPEAT_ONE", "RE_OP_GROUP_RETURN", "RE_OP_LAZY_REPEAT_ONE", "RE_OP_MATCH_BODY", "RE_OP_MATCH_TAIL", "RE_OP_START_GROUP", }; #define RE_FLAG_ASCII 0x80 #define RE_FLAG_BESTMATCH 0x1000 #define RE_FLAG_DEBUG 0x200 #define RE_FLAG_DOTALL 0x10 #define RE_FLAG_ENHANCEMATCH 0x8000 #define RE_FLAG_FULLCASE 0x4000 #define RE_FLAG_IGNORECASE 0x2 #define RE_FLAG_LOCALE 0x4 #define RE_FLAG_MULTILINE 0x8 #define RE_FLAG_POSIX 0x10000 #define RE_FLAG_REVERSE 0x400 #define RE_FLAG_TEMPLATE 0x1 #define RE_FLAG_UNICODE 0x20 #define RE_FLAG_VERBOSE 0x40 #define RE_FLAG_VERSION0 0x2000 #define RE_FLAG_VERSION1 0x100 #define RE_FLAG_WORD 0x800 regex-2016.01.10/Python2/_regex_core.py0000666000000000000000000041234412644551563015567 0ustar 00000000000000# # Secret Labs' Regular Expression Engine core module # # Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved. # # This version of the SRE library can be redistributed under CNRI's # Python 1.6 license. For any other use, please contact Secret Labs # AB (info@pythonware.com). # # Portions of this engine have been developed in cooperation with # CNRI. Hewlett-Packard provided funding for 1.6 integration and # other compatibility work. # # 2010-01-16 mrab Python front-end re-written and extended import string import sys import unicodedata from collections import defaultdict import _regex __all__ = ["A", "ASCII", "B", "BESTMATCH", "D", "DEBUG", "E", "ENHANCEMATCH", "F", "FULLCASE", "I", "IGNORECASE", "L", "LOCALE", "M", "MULTILINE", "P", "POSIX", "R", "REVERSE", "S", "DOTALL", "T", "TEMPLATE", "U", "UNICODE", "V0", "VERSION0", "V1", "VERSION1", "W", "WORD", "X", "VERBOSE", "error", "Scanner"] # The regex exception. class error(Exception): def __init__(self, message, pattern=None, pos=None): newline = u'\n' if isinstance(pattern, unicode) else '\n' self.msg = message self.pattern = pattern self.pos = pos if pattern is not None and pos is not None: self.lineno = pattern.count(newline, 0, pos) + 1 self.colno = pos - pattern.rfind(newline, 0, pos) message = "%s at position %d" % (message, pos) if newline in pattern: message += " (line %d, column %d)" % (self.lineno, self.colno) Exception.__init__(self, message) # The exception for when a positional flag has been turned on in the old # behaviour. class _UnscopedFlagSet(Exception): pass # The exception for when parsing fails and we want to try something else. class ParseError(Exception): pass # The exception for when there isn't a valid first set. class _FirstSetError(Exception): pass # Flags. A = ASCII = 0x80 # Assume ASCII locale. B = BESTMATCH = 0x1000 # Best fuzzy match. D = DEBUG = 0x200 # Print parsed pattern. E = ENHANCEMATCH = 0x8000 # Attempt to improve the fit after finding the first # fuzzy match. F = FULLCASE = 0x4000 # Unicode full case-folding. I = IGNORECASE = 0x2 # Ignore case. L = LOCALE = 0x4 # Assume current 8-bit locale. M = MULTILINE = 0x8 # Make anchors look for newline. P = POSIX = 0x10000 # POSIX-style matching (leftmost longest). R = REVERSE = 0x400 # Search backwards. S = DOTALL = 0x10 # Make dot match newline. U = UNICODE = 0x20 # Assume Unicode locale. V0 = VERSION0 = 0x2000 # Old legacy behaviour. V1 = VERSION1 = 0x100 # New enhanced behaviour. W = WORD = 0x800 # Default Unicode word breaks. X = VERBOSE = 0x40 # Ignore whitespace and comments. T = TEMPLATE = 0x1 # Template (present because re module has it). DEFAULT_VERSION = VERSION1 _ALL_VERSIONS = VERSION0 | VERSION1 _ALL_ENCODINGS = ASCII | LOCALE | UNICODE # The default flags for the various versions. DEFAULT_FLAGS = {VERSION0: 0, VERSION1: FULLCASE} # The mask for the flags. GLOBAL_FLAGS = (_ALL_ENCODINGS | _ALL_VERSIONS | BESTMATCH | DEBUG | ENHANCEMATCH | POSIX | REVERSE) SCOPED_FLAGS = FULLCASE | IGNORECASE | MULTILINE | DOTALL | WORD | VERBOSE ALPHA = frozenset(string.ascii_letters) DIGITS = frozenset(string.digits) ALNUM = ALPHA | DIGITS OCT_DIGITS = frozenset(string.octdigits) HEX_DIGITS = frozenset(string.hexdigits) SPECIAL_CHARS = frozenset("()|?*+{^$.[\\#") | frozenset([""]) NAMED_CHAR_PART = ALNUM | frozenset(" -") PROPERTY_NAME_PART = ALNUM | frozenset(" &_-.") SET_OPS = ("||", "~~", "&&", "--") # The width of the code words inside the regex engine. BYTES_PER_CODE = _regex.get_code_size() BITS_PER_CODE = BYTES_PER_CODE * 8 # The repeat count which represents infinity. UNLIMITED = (1 << BITS_PER_CODE) - 1 # The regular expression flags. REGEX_FLAGS = {"a": ASCII, "b": BESTMATCH, "e": ENHANCEMATCH, "f": FULLCASE, "i": IGNORECASE, "L": LOCALE, "m": MULTILINE, "p": POSIX, "r": REVERSE, "s": DOTALL, "u": UNICODE, "V0": VERSION0, "V1": VERSION1, "w": WORD, "x": VERBOSE} # The case flags. CASE_FLAGS = FULLCASE | IGNORECASE NOCASE = 0 FULLIGNORECASE = FULLCASE | IGNORECASE FULL_CASE_FOLDING = UNICODE | FULLIGNORECASE CASE_FLAGS_COMBINATIONS = {0: 0, FULLCASE: 0, IGNORECASE: IGNORECASE, FULLIGNORECASE: FULLIGNORECASE} # The number of digits in hexadecimal escapes. HEX_ESCAPES = {"x": 2, "u": 4, "U": 8} # A singleton which indicates a comment within a pattern. COMMENT = object() FLAGS = object() # The names of the opcodes. OPCODES = """ FAILURE SUCCESS ANY ANY_ALL ANY_ALL_REV ANY_REV ANY_U ANY_U_REV ATOMIC BOUNDARY BRANCH CALL_REF CHARACTER CHARACTER_IGN CHARACTER_IGN_REV CHARACTER_REV CONDITIONAL DEFAULT_BOUNDARY DEFAULT_END_OF_WORD DEFAULT_START_OF_WORD END END_OF_LINE END_OF_LINE_U END_OF_STRING END_OF_STRING_LINE END_OF_STRING_LINE_U END_OF_WORD FUZZY GRAPHEME_BOUNDARY GREEDY_REPEAT GROUP GROUP_CALL GROUP_EXISTS KEEP LAZY_REPEAT LOOKAROUND NEXT PROPERTY PROPERTY_IGN PROPERTY_IGN_REV PROPERTY_REV PRUNE RANGE RANGE_IGN RANGE_IGN_REV RANGE_REV REF_GROUP REF_GROUP_FLD REF_GROUP_FLD_REV REF_GROUP_IGN REF_GROUP_IGN_REV REF_GROUP_REV SEARCH_ANCHOR SET_DIFF SET_DIFF_IGN SET_DIFF_IGN_REV SET_DIFF_REV SET_INTER SET_INTER_IGN SET_INTER_IGN_REV SET_INTER_REV SET_SYM_DIFF SET_SYM_DIFF_IGN SET_SYM_DIFF_IGN_REV SET_SYM_DIFF_REV SET_UNION SET_UNION_IGN SET_UNION_IGN_REV SET_UNION_REV SKIP START_OF_LINE START_OF_LINE_U START_OF_STRING START_OF_WORD STRING STRING_FLD STRING_FLD_REV STRING_IGN STRING_IGN_REV STRING_REV STRING_SET STRING_SET_FLD STRING_SET_FLD_REV STRING_SET_IGN STRING_SET_IGN_REV STRING_SET_REV """ # Define the opcodes in a namespace. class Namespace(object): pass OP = Namespace() for i, op in enumerate(OPCODES.split()): setattr(OP, op, i) def _shrink_cache(cache_dict, args_dict, locale_sensitive, max_length, divisor=5): """Make room in the given cache. Args: cache_dict: The cache dictionary to modify. args_dict: The dictionary of named list args used by patterns. max_length: Maximum # of entries in cache_dict before it is shrunk. divisor: Cache will shrink to max_length - 1/divisor*max_length items. """ # Toss out a fraction of the entries at random to make room for new ones. # A random algorithm was chosen as opposed to simply cache_dict.popitem() # as popitem could penalize the same regular expression repeatedly based # on its internal hash value. Being random should spread the cache miss # love around. cache_keys = tuple(cache_dict.keys()) overage = len(cache_keys) - max_length if overage < 0: # Cache is already within limits. Normally this should not happen # but it could due to multithreading. return number_to_toss = max_length // divisor + overage # The import is done here to avoid a circular dependency. import random if not hasattr(random, 'sample'): # Do nothing while resolving the circular dependency: # re->random->warnings->tokenize->string->re return for doomed_key in random.sample(cache_keys, number_to_toss): try: del cache_dict[doomed_key] except KeyError: # Ignore problems if the cache changed from another thread. pass # Rebuild the arguments and locale-sensitivity dictionaries. args_dict.clear() sensitivity_dict = {} for pattern, pattern_type, flags, args, default_version, locale in tuple(cache_dict): args_dict[pattern, pattern_type, flags, default_version, locale] = args try: sensitivity_dict[pattern_type, pattern] = locale_sensitive[pattern_type, pattern] except KeyError: pass locale_sensitive.clear() locale_sensitive.update(sensitivity_dict) def _fold_case(info, string): "Folds the case of a string." flags = info.flags if (flags & _ALL_ENCODINGS) == 0: flags |= info.guess_encoding return _regex.fold_case(flags, string) def is_cased(info, char): "Checks whether a character is cased." return len(_regex.get_all_cases(info.flags, char)) > 1 def _compile_firstset(info, fs): "Compiles the firstset for the pattern." if not fs or None in fs: return [] # If we ignore the case, for simplicity we won't build a firstset. members = set() for i in fs: if isinstance(i, Character) and not i.positive: return [] if i.case_flags: if isinstance(i, Character): if is_cased(info, i.value): return [] elif isinstance(i, SetBase): return [] members.add(i.with_flags(case_flags=NOCASE)) # Build the firstset. fs = SetUnion(info, list(members), zerowidth=True) fs = fs.optimise(info, in_set=True) # Compile the firstset. return fs.compile(bool(info.flags & REVERSE)) def _flatten_code(code): "Flattens the code from a list of tuples." flat_code = [] for c in code: flat_code.extend(c) return flat_code def make_character(info, value, in_set=False): "Makes a character literal." if in_set: # A character set is built case-sensitively. return Character(value) return Character(value, case_flags=info.flags & CASE_FLAGS) def make_ref_group(info, name, position): "Makes a group reference." return RefGroup(info, name, position, case_flags=info.flags & CASE_FLAGS) def make_string_set(info, name): "Makes a string set." return StringSet(info, name, case_flags=info.flags & CASE_FLAGS) def make_property(info, prop, in_set): "Makes a property." if in_set: return prop return prop.with_flags(case_flags=info.flags & CASE_FLAGS) def _parse_pattern(source, info): "Parses a pattern, eg. 'a|b|c'." branches = [parse_sequence(source, info)] while source.match("|"): branches.append(parse_sequence(source, info)) if len(branches) == 1: return branches[0] return Branch(branches) def parse_sequence(source, info): "Parses a sequence, eg. 'abc'." sequence = [] applied = False while True: # Get literal characters followed by an element. characters, case_flags, element = parse_literal_and_element(source, info) if not element: # No element, just a literal. We've also reached the end of the # sequence. append_literal(characters, case_flags, sequence) break if element is COMMENT or element is FLAGS: append_literal(characters, case_flags, sequence) elif type(element) is tuple: # It looks like we've found a quantifier. ch, saved_pos = element counts = parse_quantifier(source, info, ch) if counts: # It _is_ a quantifier. apply_quantifier(source, info, counts, characters, case_flags, ch, saved_pos, applied, sequence) applied = True else: # It's not a quantifier. Maybe it's a fuzzy constraint. constraints = parse_fuzzy(source, ch) if constraints: # It _is_ a fuzzy constraint. apply_constraint(source, info, constraints, characters, case_flags, saved_pos, applied, sequence) applied = True else: # The element was just a literal. characters.append(ord(ch)) append_literal(characters, case_flags, sequence) applied = False else: # We have a literal followed by something else. append_literal(characters, case_flags, sequence) sequence.append(element) applied = False return make_sequence(sequence) def apply_quantifier(source, info, counts, characters, case_flags, ch, saved_pos, applied, sequence): if characters: # The quantifier applies to the last character. append_literal(characters[ : -1], case_flags, sequence) element = Character(characters[-1], case_flags=case_flags) else: # The quantifier applies to the last item in the sequence. if applied: raise error("multiple repeat", source.string, saved_pos) if not sequence: raise error("nothing to repeat", source.string, saved_pos) element = sequence.pop() min_count, max_count = counts saved_pos = source.pos ch = source.get() if ch == "?": # The "?" suffix that means it's a lazy repeat. repeated = LazyRepeat elif ch == "+": # The "+" suffix that means it's a possessive repeat. repeated = PossessiveRepeat else: # No suffix means that it's a greedy repeat. source.pos = saved_pos repeated = GreedyRepeat # Ignore the quantifier if it applies to a zero-width item or the number of # repeats is fixed at 1. if not element.is_empty() and (min_count != 1 or max_count != 1): element = repeated(element, min_count, max_count) sequence.append(element) def apply_constraint(source, info, constraints, characters, case_flags, saved_pos, applied, sequence): if characters: # The constraint applies to the last character. append_literal(characters[ : -1], case_flags, sequence) element = Character(characters[-1], case_flags=case_flags) sequence.append(Fuzzy(element, constraints)) else: # The constraint applies to the last item in the sequence. if applied or not sequence: raise error("nothing for fuzzy constraint", source.string, saved_pos) element = sequence.pop() # If a group is marked as fuzzy then put all of the fuzzy part in the # group. if isinstance(element, Group): element.subpattern = Fuzzy(element.subpattern, constraints) sequence.append(element) else: sequence.append(Fuzzy(element, constraints)) def append_literal(characters, case_flags, sequence): if characters: sequence.append(Literal(characters, case_flags=case_flags)) def PossessiveRepeat(element, min_count, max_count): "Builds a possessive repeat." return Atomic(GreedyRepeat(element, min_count, max_count)) _QUANTIFIERS = {"?": (0, 1), "*": (0, None), "+": (1, None)} def parse_quantifier(source, info, ch): "Parses a quantifier." q = _QUANTIFIERS.get(ch) if q: # It's a quantifier. return q if ch == "{": # Looks like a limited repeated element, eg. 'a{2,3}'. counts = parse_limited_quantifier(source) if counts: return counts return None def is_above_limit(count): "Checks whether a count is above the maximum." return count is not None and count >= UNLIMITED def parse_limited_quantifier(source): "Parses a limited quantifier." saved_pos = source.pos min_count = parse_count(source) if source.match(","): max_count = parse_count(source) # No minimum means 0 and no maximum means unlimited. min_count = int(min_count or 0) max_count = int(max_count) if max_count else None if max_count is not None and min_count > max_count: raise error("min repeat greater than max repeat", source.string, saved_pos) else: if not min_count: source.pos = saved_pos return None min_count = max_count = int(min_count) if is_above_limit(min_count) or is_above_limit(max_count): raise error("repeat count too big", source.string, saved_pos) if not source.match ("}"): source.pos = saved_pos return None return min_count, max_count def parse_fuzzy(source, ch): "Parses a fuzzy setting, if present." if ch != "{": return None saved_pos = source.pos constraints = {} try: parse_fuzzy_item(source, constraints) while source.match(","): parse_fuzzy_item(source, constraints) except ParseError: source.pos = saved_pos return None if not source.match("}"): raise error("expected }", source.string, source.pos) return constraints def parse_fuzzy_item(source, constraints): "Parses a fuzzy setting item." saved_pos = source.pos try: parse_cost_constraint(source, constraints) except ParseError: source.pos = saved_pos parse_cost_equation(source, constraints) def parse_cost_constraint(source, constraints): "Parses a cost constraint." saved_pos = source.pos ch = source.get() if ch in ALPHA: # Syntax: constraint [("<=" | "<") cost] constraint = parse_constraint(source, constraints, ch) max_inc = parse_fuzzy_compare(source) if max_inc is None: # No maximum cost. constraints[constraint] = 0, None else: # There's a maximum cost. cost_pos = source.pos max_cost = int(parse_count(source)) # Inclusive or exclusive limit? if not max_inc: max_cost -= 1 if max_cost < 0: raise error("bad fuzzy cost limit", source.string, cost_pos) constraints[constraint] = 0, max_cost elif ch in DIGITS: # Syntax: cost ("<=" | "<") constraint ("<=" | "<") cost source.pos = saved_pos try: # Minimum cost. min_cost = int(parse_count(source)) min_inc = parse_fuzzy_compare(source) if min_inc is None: raise ParseError() constraint = parse_constraint(source, constraints, source.get()) max_inc = parse_fuzzy_compare(source) if max_inc is None: raise ParseError() # Maximum cost. cost_pos = source.pos max_cost = int(parse_count(source)) # Inclusive or exclusive limits? if not min_inc: min_cost += 1 if not max_inc: max_cost -= 1 if not 0 <= min_cost <= max_cost: raise error("bad fuzzy cost limit", source.string, cost_pos) constraints[constraint] = min_cost, max_cost except ValueError: raise ParseError() else: raise ParseError() def parse_constraint(source, constraints, ch): "Parses a constraint." if ch not in "deis": raise error("bad fuzzy constraint", source.string, source.pos) if ch in constraints: raise error("repeated fuzzy constraint", source.string, source.pos) return ch def parse_fuzzy_compare(source): "Parses a cost comparator." if source.match("<="): return True elif source.match("<"): return False else: return None def parse_cost_equation(source, constraints): "Parses a cost equation." if "cost" in constraints: raise error("more than one cost equation", source.string, source.pos) cost = {} parse_cost_term(source, cost) while source.match("+"): parse_cost_term(source, cost) max_inc = parse_fuzzy_compare(source) if max_inc is None: raise error("missing fuzzy cost limit", source.string, source.pos) max_cost = int(parse_count(source)) if not max_inc: max_cost -= 1 if max_cost < 0: raise error("bad fuzzy cost limit", source.string, source.pos) cost["max"] = max_cost constraints["cost"] = cost def parse_cost_term(source, cost): "Parses a cost equation term." coeff = parse_count(source) ch = source.get() if ch not in "dis": raise ParseError() if ch in cost: raise error("repeated fuzzy cost", source.string, source.pos) cost[ch] = int(coeff or 1) def parse_count(source): "Parses a quantifier's count, which can be empty." return source.get_while(DIGITS) def parse_literal_and_element(source, info): """Parses a literal followed by an element. The element is FLAGS if it's an inline flag or None if it has reached the end of a sequence. """ characters = [] case_flags = info.flags & CASE_FLAGS while True: saved_pos = source.pos ch = source.get() if ch in SPECIAL_CHARS: if ch in ")|": # The end of a sequence. At the end of the pattern ch is "". source.pos = saved_pos return characters, case_flags, None elif ch == "\\": # An escape sequence outside a set. element = parse_escape(source, info, False) return characters, case_flags, element elif ch == "(": # A parenthesised subpattern or a flag. element = parse_paren(source, info) if element and element is not COMMENT: return characters, case_flags, element elif ch == ".": # Any character. if info.flags & DOTALL: element = AnyAll() elif info.flags & WORD: element = AnyU() else: element = Any() return characters, case_flags, element elif ch == "[": # A character set. element = parse_set(source, info) return characters, case_flags, element elif ch == "^": # The start of a line or the string. if info.flags & MULTILINE: if info.flags & WORD: element = StartOfLineU() else: element = StartOfLine() else: element = StartOfString() return characters, case_flags, element elif ch == "$": # The end of a line or the string. if info.flags & MULTILINE: if info.flags & WORD: element = EndOfLineU() else: element = EndOfLine() else: if info.flags & WORD: element = EndOfStringLineU() else: element = EndOfStringLine() return characters, case_flags, element elif ch in "?*+{": # Looks like a quantifier. return characters, case_flags, (ch, saved_pos) else: # A literal. characters.append(ord(ch)) else: # A literal. characters.append(ord(ch)) def parse_paren(source, info): """Parses a parenthesised subpattern or a flag. Returns FLAGS if it's an inline flag. """ saved_pos = source.pos ch = source.get() if ch == "?": # (?... saved_pos_2 = source.pos ch = source.get() if ch == "<": # (?<... saved_pos_3 = source.pos ch = source.get() if ch in ("=", "!"): # (?<=... or (?") saved_flags = info.flags try: subpattern = _parse_pattern(source, info) source.expect(")") finally: info.flags = saved_flags source.ignore_space = bool(info.flags & VERBOSE) info.close_group() return Group(info, group, subpattern) if ch in ("=", "!"): # (?=... or (?!...: lookahead. return parse_lookaround(source, info, False, ch == "=") if ch == "P": # (?P...: a Python extension. return parse_extension(source, info) if ch == "#": # (?#...: a comment. return parse_comment(source) if ch == "(": # (?(...: a conditional subpattern. return parse_conditional(source, info) if ch == ">": # (?>...: an atomic subpattern. return parse_atomic(source, info) if ch == "|": # (?|...: a common/reset groups branch. return parse_common(source, info) if ch == "R" or "0" <= ch <= "9": # (?R...: probably a call to a group. return parse_call_group(source, info, ch, saved_pos_2) if ch == "&": # (?&...: a call to a named group. return parse_call_named_group(source, info, saved_pos_2) # (?...: probably a flags subpattern. source.pos = saved_pos_2 return parse_flags_subpattern(source, info) if ch == "*": # (*... saved_pos_2 = source.pos word = source.get_while(set(")>"), include=False) if word[ : 1].isalpha(): verb = VERBS.get(word) if not verb: raise error("unknown verb", source.string, saved_pos_2) source.expect(")") return verb # (...: an unnamed capture group. source.pos = saved_pos group = info.open_group() saved_flags = info.flags try: subpattern = _parse_pattern(source, info) source.expect(")") finally: info.flags = saved_flags source.ignore_space = bool(info.flags & VERBOSE) info.close_group() return Group(info, group, subpattern) def parse_extension(source, info): "Parses a Python extension." saved_pos = source.pos ch = source.get() if ch == "<": # (?P<...: a named capture group. name = parse_name(source) group = info.open_group(name) source.expect(">") saved_flags = info.flags try: subpattern = _parse_pattern(source, info) source.expect(")") finally: info.flags = saved_flags source.ignore_space = bool(info.flags & VERBOSE) info.close_group() return Group(info, group, subpattern) if ch == "=": # (?P=...: a named group reference. name = parse_name(source, allow_numeric=True) source.expect(")") if info.is_open_group(name): raise error("cannot refer to an open group", source.string, saved_pos) return make_ref_group(info, name, saved_pos) if ch == ">" or ch == "&": # (?P>...: a call to a group. return parse_call_named_group(source, info, saved_pos) source.pos = saved_pos raise error("unknown extension", source.string, saved_pos) def parse_comment(source): "Parses a comment." source.skip_while(set(")"), include=False) source.expect(")") return COMMENT def parse_lookaround(source, info, behind, positive): "Parses a lookaround." saved_flags = info.flags try: subpattern = _parse_pattern(source, info) source.expect(")") finally: info.flags = saved_flags source.ignore_space = bool(info.flags & VERBOSE) return LookAround(behind, positive, subpattern) def parse_conditional(source, info): "Parses a conditional subpattern." saved_flags = info.flags saved_pos = source.pos ch = source.get() if ch == "?": # (?(?... ch = source.get() if ch in ("=", "!"): # (?(?=... or (?(?!...: lookahead conditional. return parse_lookaround_conditional(source, info, False, ch == "=") if ch == "<": # (?(?<... ch = source.get() if ch in ("=", "!"): # (?(?<=... or (?(?"), include=False) if not name: raise error("missing group name", source.string, source.pos) if name.isdigit(): min_group = 0 if allow_group_0 else 1 if not allow_numeric or int(name) < min_group: raise error("bad character in group name", source.string, source.pos) else: if not is_identifier(name): raise error("bad character in group name", source.string, source.pos) return name def is_identifier(name): if not name: return False if name[0] not in ALPHA and name[0] != "_": return False name = name.replace("_", "") return not name or all(c in ALNUM for c in name) def is_octal(string): "Checks whether a string is octal." return all(ch in OCT_DIGITS for ch in string) def is_decimal(string): "Checks whether a string is decimal." return all(ch in DIGITS for ch in string) def is_hexadecimal(string): "Checks whether a string is hexadecimal." return all(ch in HEX_DIGITS for ch in string) def parse_escape(source, info, in_set): "Parses an escape sequence." saved_ignore = source.ignore_space source.ignore_space = False ch = source.get() source.ignore_space = saved_ignore if not ch: # A backslash at the end of the pattern. raise error("bad escape (end of pattern)", source.string, source.pos) if ch in HEX_ESCAPES: # A hexadecimal escape sequence. return parse_hex_escape(source, info, HEX_ESCAPES[ch], in_set, ch) elif ch == "g" and not in_set: # A group reference. saved_pos = source.pos try: return parse_group_ref(source, info) except error: # Invalid as a group reference, so assume it's a literal. source.pos = saved_pos return make_character(info, ord(ch), in_set) elif ch == "G" and not in_set: # A search anchor. return SearchAnchor() elif ch == "L" and not in_set: # A string set. return parse_string_set(source, info) elif ch == "N": # A named codepoint. return parse_named_char(source, info, in_set) elif ch in "pP": # A Unicode property, positive or negative. return parse_property(source, info, ch == "p", in_set) elif ch == "X" and not in_set: # A grapheme cluster. return Grapheme() elif ch in ALPHA: # An alphabetic escape sequence. # Positional escapes aren't allowed inside a character set. if not in_set: if info.flags & WORD: value = WORD_POSITION_ESCAPES.get(ch) else: value = POSITION_ESCAPES.get(ch) if value: return value value = CHARSET_ESCAPES.get(ch) if value: return value value = CHARACTER_ESCAPES.get(ch) if value: return Character(ord(value)) return make_character(info, ord(ch), in_set) elif ch in DIGITS: # A numeric escape sequence. return parse_numeric_escape(source, info, ch, in_set) else: # A literal. return make_character(info, ord(ch), in_set) def parse_numeric_escape(source, info, ch, in_set): "Parses a numeric escape sequence." if in_set or ch == "0": # Octal escape sequence, max 3 digits. return parse_octal_escape(source, info, [ch], in_set) # At least 1 digit, so either octal escape or group. digits = ch saved_pos = source.pos ch = source.get() if ch in DIGITS: # At least 2 digits, so either octal escape or group. digits += ch saved_pos = source.pos ch = source.get() if is_octal(digits) and ch in OCT_DIGITS: # 3 octal digits, so octal escape sequence. encoding = info.flags & _ALL_ENCODINGS if encoding == ASCII or encoding == LOCALE: octal_mask = 0xFF else: octal_mask = 0x1FF value = int(digits + ch, 8) & octal_mask return make_character(info, value) # Group reference. source.pos = saved_pos if info.is_open_group(digits): raise error("cannot refer to an open group", source.string, source.pos) return make_ref_group(info, digits, source.pos) def parse_octal_escape(source, info, digits, in_set): "Parses an octal escape sequence." saved_pos = source.pos ch = source.get() while len(digits) < 3 and ch in OCT_DIGITS: digits.append(ch) saved_pos = source.pos ch = source.get() source.pos = saved_pos try: value = int("".join(digits), 8) return make_character(info, value, in_set) except ValueError: if digits[0] in OCT_DIGITS: raise error("incomplete escape \\%s" % ''.join(digits), source.string, source.pos) else: raise error("bad escape \\%s" % digits[0], source.string, source.pos) def parse_hex_escape(source, info, expected_len, in_set, type): "Parses a hex escape sequence." digits = [] for i in range(expected_len): ch = source.get() if ch not in HEX_DIGITS: raise error("incomplete escape \\%s%s" % (type, ''.join(digits)), source.string, source.pos) digits.append(ch) value = int("".join(digits), 16) return make_character(info, value, in_set) def parse_group_ref(source, info): "Parses a group reference." source.expect("<") saved_pos = source.pos name = parse_name(source, True) source.expect(">") if info.is_open_group(name): raise error("cannot refer to an open group", source.string, source.pos) return make_ref_group(info, name, saved_pos) def parse_string_set(source, info): "Parses a string set reference." source.expect("<") name = parse_name(source, True) source.expect(">") if name is None or name not in info.kwargs: raise error("undefined named list", source.string, source.pos) return make_string_set(info, name) def parse_named_char(source, info, in_set): "Parses a named character." saved_pos = source.pos if source.match("{"): name = source.get_while(NAMED_CHAR_PART) if source.match("}"): try: value = unicodedata.lookup(name) return make_character(info, ord(value), in_set) except KeyError: raise error("undefined character name", source.string, source.pos) source.pos = saved_pos return make_character(info, ord("N"), in_set) def parse_property(source, info, positive, in_set): "Parses a Unicode property." saved_pos = source.pos ch = source.get() if ch == "{": negate = source.match("^") prop_name, name = parse_property_name(source) if source.match("}"): # It's correctly delimited. prop = lookup_property(prop_name, name, positive != negate, source) return make_property(info, prop, in_set) elif ch and ch in "CLMNPSZ": # An abbreviated property, eg \pL. prop = lookup_property(None, ch, positive, source) return make_property(info, prop, in_set) # Not a property, so treat as a literal "p" or "P". source.pos = saved_pos ch = "p" if positive else "P" return make_character(info, ord(ch), in_set) def parse_property_name(source): "Parses a property name, which may be qualified." name = source.get_while(PROPERTY_NAME_PART) saved_pos = source.pos ch = source.get() if ch and ch in ":=": prop_name = name name = source.get_while(ALNUM | set(" &_-./")).strip() if name: # Name after the ":" or "=", so it's a qualified name. saved_pos = source.pos else: # No name after the ":" or "=", so assume it's an unqualified name. prop_name, name = None, prop_name else: prop_name = None source.pos = saved_pos return prop_name, name def parse_set(source, info): "Parses a character set." version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION saved_ignore = source.ignore_space source.ignore_space = False # Negative set? negate = source.match("^") try: if version == VERSION0: item = parse_set_imp_union(source, info) else: item = parse_set_union(source, info) if not source.match("]"): raise error("missing ]", source.string, source.pos) finally: source.ignore_space = saved_ignore if negate: item = item.with_flags(positive=not item.positive) item = item.with_flags(case_flags=info.flags & CASE_FLAGS) return item def parse_set_union(source, info): "Parses a set union ([x||y])." items = [parse_set_symm_diff(source, info)] while source.match("||"): items.append(parse_set_symm_diff(source, info)) if len(items) == 1: return items[0] return SetUnion(info, items) def parse_set_symm_diff(source, info): "Parses a set symmetric difference ([x~~y])." items = [parse_set_inter(source, info)] while source.match("~~"): items.append(parse_set_inter(source, info)) if len(items) == 1: return items[0] return SetSymDiff(info, items) def parse_set_inter(source, info): "Parses a set intersection ([x&&y])." items = [parse_set_diff(source, info)] while source.match("&&"): items.append(parse_set_diff(source, info)) if len(items) == 1: return items[0] return SetInter(info, items) def parse_set_diff(source, info): "Parses a set difference ([x--y])." items = [parse_set_imp_union(source, info)] while source.match("--"): items.append(parse_set_imp_union(source, info)) if len(items) == 1: return items[0] return SetDiff(info, items) def parse_set_imp_union(source, info): "Parses a set implicit union ([xy])." version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION items = [parse_set_member(source, info)] while True: saved_pos = source.pos if source.match("]"): # End of the set. source.pos = saved_pos break if version == VERSION1 and any(source.match(op) for op in SET_OPS): # The new behaviour has set operators. source.pos = saved_pos break items.append(parse_set_member(source, info)) if len(items) == 1: return items[0] return SetUnion(info, items) def parse_set_member(source, info): "Parses a member in a character set." # Parse a set item. start = parse_set_item(source, info) saved_pos1 = source.pos if (not isinstance(start, Character) or not start.positive or not source.match("-")): # It's not the start of a range. return start version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION # It looks like the start of a range of characters. saved_pos2 = source.pos if version == VERSION1 and source.match("-"): # It's actually the set difference operator '--', so return the # character. source.pos = saved_pos1 return start if source.match("]"): # We've reached the end of the set, so return both the character and # hyphen. source.pos = saved_pos2 return SetUnion(info, [start, Character(ord("-"))]) # Parse a set item. end = parse_set_item(source, info) if not isinstance(end, Character) or not end.positive: # It's not a range, so return the character, hyphen and property. return SetUnion(info, [start, Character(ord("-")), end]) # It _is_ a range. if start.value > end.value: raise error("bad character range", source.string, source.pos) if start.value == end.value: return start return Range(start.value, end.value) def parse_set_item(source, info): "Parses an item in a character set." version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION if source.match("\\"): # An escape sequence in a set. return parse_escape(source, info, True) saved_pos = source.pos if source.match("[:"): # Looks like a POSIX character class. try: return parse_posix_class(source, info) except ParseError: # Not a POSIX character class. source.pos = saved_pos if version == VERSION1 and source.match("["): # It's the start of a nested set. # Negative set? negate = source.match("^") item = parse_set_union(source, info) if not source.match("]"): raise error("missing ]", source.string, source.pos) if negate: item = item.with_flags(positive=not item.positive) return item ch = source.get() if not ch: raise error("unterminated character set", source.string, source.pos) return Character(ord(ch)) def parse_posix_class(source, info): "Parses a POSIX character class." negate = source.match("^") prop_name, name = parse_property_name(source) if not source.match(":]"): raise ParseError() return lookup_property(prop_name, name, not negate, source, posix=True) def float_to_rational(flt): "Converts a float to a rational pair." int_part = int(flt) error = flt - int_part if abs(error) < 0.0001: return int_part, 1 den, num = float_to_rational(1.0 / error) return int_part * den + num, den def numeric_to_rational(numeric): "Converts a numeric string to a rational string, if possible." if numeric[ : 1] == "-": sign, numeric = numeric[0], numeric[1 : ] else: sign = "" parts = numeric.split("/") if len(parts) == 2: num, den = float_to_rational(float(parts[0]) / float(parts[1])) elif len(parts) == 1: num, den = float_to_rational(float(parts[0])) else: raise ValueError() result = "%s%s/%s" % (sign, num, den) if result.endswith("/1"): return result[ : -2] return result def standardise_name(name): "Standardises a property or value name." try: return numeric_to_rational("".join(name)) except (ValueError, ZeroDivisionError): return "".join(ch for ch in name if ch not in "_- ").upper() _posix_classes = set('ALNUM DIGIT PUNCT XDIGIT'.split()) def lookup_property(property, value, positive, source=None, posix=False): "Looks up a property." # Normalise the names (which may still be lists). property = standardise_name(property) if property else None value = standardise_name(value) if (property, value) == ("GENERALCATEGORY", "ASSIGNED"): property, value, positive = "GENERALCATEGORY", "UNASSIGNED", not positive if posix and not property and value.upper() in _posix_classes: value = 'POSIX' + value if property: # Both the property and the value are provided. prop = PROPERTIES.get(property) if not prop: if not source: raise error("unknown property") raise error("unknown property", source.string, source.pos) prop_id, value_dict = prop val_id = value_dict.get(value) if val_id is None: if not source: raise error("unknown property value") raise error("unknown property value", source.string, source.pos) if "YES" in value_dict and val_id == 0: positive, val_id = not positive, 1 return Property((prop_id << 16) | val_id, positive) # Only the value is provided. # It might be the name of a GC, script or block value. for property in ("GC", "SCRIPT", "BLOCK"): prop_id, value_dict = PROPERTIES.get(property) val_id = value_dict.get(value) if val_id is not None: return Property((prop_id << 16) | val_id, positive) # It might be the name of a binary property. prop = PROPERTIES.get(value) if prop: prop_id, value_dict = prop if "YES" in value_dict: return Property((prop_id << 16) | 1, positive) # It might be the name of a binary property starting with a prefix. if value.startswith("IS"): prop = PROPERTIES.get(value[2 : ]) if prop: prop_id, value_dict = prop if "YES" in value_dict: return Property((prop_id << 16) | 1, positive) # It might be the name of a script or block starting with a prefix. for prefix, property in (("IS", "SCRIPT"), ("IN", "BLOCK")): if value.startswith(prefix): prop_id, value_dict = PROPERTIES.get(property) val_id = value_dict.get(value[2 : ]) if val_id is not None: return Property((prop_id << 16) | val_id, positive) # Unknown property. if not source: raise error("unknown property") raise error("unknown property", source.string, source.pos) def _compile_replacement(source, pattern, is_unicode): "Compiles a replacement template escape sequence." ch = source.get() if ch in ALPHA: # An alphabetic escape sequence. value = CHARACTER_ESCAPES.get(ch) if value: return False, [ord(value)] if ch in HEX_ESCAPES and (ch == "x" or is_unicode): # A hexadecimal escape sequence. return False, [parse_repl_hex_escape(source, HEX_ESCAPES[ch], ch)] if ch == "g": # A group preference. return True, [compile_repl_group(source, pattern)] if ch == "N" and is_unicode: # A named character. value = parse_repl_named_char(source) if value is not None: return False, [value] return False, [ord("\\"), ord(ch)] if isinstance(source.sep, str): octal_mask = 0xFF else: octal_mask = 0x1FF if ch == "0": # An octal escape sequence. digits = ch while len(digits) < 3: saved_pos = source.pos ch = source.get() if ch not in OCT_DIGITS: source.pos = saved_pos break digits += ch return False, [int(digits, 8) & octal_mask] if ch in DIGITS: # Either an octal escape sequence (3 digits) or a group reference (max # 2 digits). digits = ch saved_pos = source.pos ch = source.get() if ch in DIGITS: digits += ch saved_pos = source.pos ch = source.get() if ch and is_octal(digits + ch): # An octal escape sequence. return False, [int(digits + ch, 8) & octal_mask] # A group reference. source.pos = saved_pos return True, [int(digits)] if ch == "\\": # An escaped backslash is a backslash. return False, [ord("\\")] if not ch: # A trailing backslash. raise error("bad escape (end of pattern)", source.string, source.pos) # An escaped non-backslash is a backslash followed by the literal. return False, [ord("\\"), ord(ch)] def parse_repl_hex_escape(source, expected_len, type): "Parses a hex escape sequence in a replacement string." digits = [] for i in range(expected_len): ch = source.get() if ch not in HEX_DIGITS: raise error("incomplete escape \\%s%s" % (type, ''.join(digits)), source.string, source.pos) digits.append(ch) return int("".join(digits), 16) def parse_repl_named_char(source): "Parses a named character in a replacement string." saved_pos = source.pos if source.match("{"): name = source.get_while(ALPHA | set(" ")) if source.match("}"): try: value = unicodedata.lookup(name) return ord(value) except KeyError: raise error("undefined character name", source.string, source.pos) source.pos = saved_pos return None def compile_repl_group(source, pattern): "Compiles a replacement template group reference." source.expect("<") name = parse_name(source, True, True) source.expect(">") if name.isdigit(): index = int(name) if not 0 <= index <= pattern.groups: raise error("invalid group reference", source.string, source.pos) return index try: return pattern.groupindex[name] except KeyError: raise IndexError("unknown group") # The regular expression is parsed into a syntax tree. The different types of # node are defined below. INDENT = " " POSITIVE_OP = 0x1 ZEROWIDTH_OP = 0x2 FUZZY_OP = 0x4 REVERSE_OP = 0x8 REQUIRED_OP = 0x10 POS_TEXT = {False: "NON-MATCH", True: "MATCH"} CASE_TEXT = {NOCASE: "", IGNORECASE: " SIMPLE_IGNORE_CASE", FULLCASE: "", FULLIGNORECASE: " FULL_IGNORE_CASE"} def make_sequence(items): if len(items) == 1: return items[0] return Sequence(items) # Common base class for all nodes. class RegexBase(object): def __init__(self): self._key = self.__class__ def with_flags(self, positive=None, case_flags=None, zerowidth=None): if positive is None: positive = self.positive else: positive = bool(positive) if case_flags is None: case_flags = self.case_flags else: case_flags = CASE_FLAGS_COMBINATIONS[case_flags & CASE_FLAGS] if zerowidth is None: zerowidth = self.zerowidth else: zerowidth = bool(zerowidth) if (positive == self.positive and case_flags == self.case_flags and zerowidth == self.zerowidth): return self return self.rebuild(positive, case_flags, zerowidth) def fix_groups(self, pattern, reverse, fuzzy): pass def optimise(self, info): return self def pack_characters(self, info): return self def remove_captures(self): return self def is_atomic(self): return True def can_be_affix(self): return True def contains_group(self): return False def get_firstset(self, reverse): raise _FirstSetError() def has_simple_start(self): return False def compile(self, reverse=False, fuzzy=False): return self._compile(reverse, fuzzy) def dump(self, indent, reverse): self._dump(indent, reverse) def is_empty(self): return False def __hash__(self): return hash(self._key) def __eq__(self, other): return type(self) is type(other) and self._key == other._key def __ne__(self, other): return not self.__eq__(other) def get_required_string(self, reverse): return self.max_width(), None # Base class for zero-width nodes. class ZeroWidthBase(RegexBase): def __init__(self, positive=True): RegexBase.__init__(self) self.positive = bool(positive) self._key = self.__class__, self.positive def get_firstset(self, reverse): return set([None]) def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if fuzzy: flags |= FUZZY_OP if reverse: flags |= REVERSE_OP return [(self._opcode, flags)] def _dump(self, indent, reverse): print "%s%s %s" % (INDENT * indent, self._op_name, POS_TEXT[self.positive]) def max_width(self): return 0 class Any(RegexBase): _opcode = {False: OP.ANY, True: OP.ANY_REV} _op_name = "ANY" def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if fuzzy: flags |= FUZZY_OP return [(self._opcode[reverse], flags)] def _dump(self, indent, reverse): print "%s%s" % (INDENT * indent, self._op_name) def max_width(self): return 1 class AnyAll(Any): _opcode = {False: OP.ANY_ALL, True: OP.ANY_ALL_REV} _op_name = "ANY_ALL" class AnyU(Any): _opcode = {False: OP.ANY_U, True: OP.ANY_U_REV} _op_name = "ANY_U" class Atomic(RegexBase): def __init__(self, subpattern): RegexBase.__init__(self) self.subpattern = subpattern def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): self.subpattern = self.subpattern.optimise(info) if self.subpattern.is_empty(): return self.subpattern return self def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): self.subpattern = self.subpattern.remove_captures() return self def can_be_affix(self): return self.subpattern.can_be_affix() def contains_group(self): return self.subpattern.contains_group() def get_firstset(self, reverse): return self.subpattern.get_firstset(reverse) def has_simple_start(self): return self.subpattern.has_simple_start() def _compile(self, reverse, fuzzy): return ([(OP.ATOMIC, )] + self.subpattern.compile(reverse, fuzzy) + [(OP.END, )]) def _dump(self, indent, reverse): print "%sATOMIC" % (INDENT * indent) self.subpattern.dump(indent + 1, reverse) def is_empty(self): return self.subpattern.is_empty() def __eq__(self, other): return (type(self) is type(other) and self.subpattern == other.subpattern) def max_width(self): return self.subpattern.max_width() def get_required_string(self, reverse): return self.subpattern.get_required_string(reverse) class Boundary(ZeroWidthBase): _opcode = OP.BOUNDARY _op_name = "BOUNDARY" class Branch(RegexBase): def __init__(self, branches): RegexBase.__init__(self) self.branches = branches def fix_groups(self, pattern, reverse, fuzzy): for b in self.branches: b.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): # Flatten branches within branches. branches = Branch._flatten_branches(info, self.branches) # Move any common prefix or suffix out of the branches. prefix, branches = Branch._split_common_prefix(info, branches) # Try to reduce adjacent single-character branches to sets. branches = Branch._reduce_to_set(info, branches) if len(branches) > 1: sequence = [Branch(branches)] else: sequence = branches return make_sequence(prefix + sequence) def pack_characters(self, info): self.branches = [b.pack_characters(info) for b in self.branches] return self def remove_captures(self): self.branches = [b.remove_captures() for b in self.branches] return self def is_atomic(self): return all(b.is_atomic() for b in self.branches) def can_be_affix(self): return all(b.can_be_affix() for b in self.branches) def contains_group(self): return any(b.contains_group() for b in self.branches) def get_firstset(self, reverse): fs = set() for b in self.branches: fs |= b.get_firstset(reverse) return fs or set([None]) def _compile(self, reverse, fuzzy): code = [(OP.BRANCH, )] for b in self.branches: code.extend(b.compile(reverse, fuzzy)) code.append((OP.NEXT, )) code[-1] = (OP.END, ) return code def _dump(self, indent, reverse): print "%sBRANCH" % (INDENT * indent) self.branches[0].dump(indent + 1, reverse) for b in self.branches[1 : ]: print "%sOR" % (INDENT * indent) b.dump(indent + 1, reverse) @staticmethod def _flatten_branches(info, branches): # Flatten the branches so that there aren't branches of branches. new_branches = [] for b in branches: b = b.optimise(info) if isinstance(b, Branch): new_branches.extend(b.branches) else: new_branches.append(b) return new_branches @staticmethod def _split_common_prefix(info, branches): # Common leading items can be moved out of the branches. # Get the items in the branches. alternatives = [] for b in branches: if isinstance(b, Sequence): alternatives.append(b.items) else: alternatives.append([b]) # What is the maximum possible length of the prefix? max_count = min(len(a) for a in alternatives) # What is the longest common prefix? prefix = alternatives[0] pos = 0 end_pos = max_count while pos < end_pos and prefix[pos].can_be_affix() and all(a[pos] == prefix[pos] for a in alternatives): pos += 1 count = pos if info.flags & UNICODE: # We need to check that we're not splitting a sequence of # characters which could form part of full case-folding. count = pos while count > 0 and not all(Branch._can_split(a, count) for a in alternatives): count -= 1 # No common prefix is possible. if count == 0: return [], branches # Rebuild the branches. new_branches = [] for a in alternatives: new_branches.append(make_sequence(a[count : ])) return prefix[ : count], new_branches @staticmethod def _split_common_suffix(info, branches): # Common trailing items can be moved out of the branches. # Get the items in the branches. alternatives = [] for b in branches: if isinstance(b, Sequence): alternatives.append(b.items) else: alternatives.append([b]) # What is the maximum possible length of the suffix? max_count = min(len(a) for a in alternatives) # What is the longest common suffix? suffix = alternatives[0] pos = -1 end_pos = -1 - max_count while pos > end_pos and suffix[pos].can_be_affix() and all(a[pos] == suffix[pos] for a in alternatives): pos -= 1 count = -1 - pos if info.flags & UNICODE: # We need to check that we're not splitting a sequence of # characters which could form part of full case-folding. while count > 0 and not all(Branch._can_split_rev(a, count) for a in alternatives): count -= 1 # No common suffix is possible. if count == 0: return [], branches # Rebuild the branches. new_branches = [] for a in alternatives: new_branches.append(make_sequence(a[ : -count])) return suffix[-count : ], new_branches @staticmethod def _can_split(items, count): # Check the characters either side of the proposed split. if not Branch._is_full_case(items, count - 1): return True if not Branch._is_full_case(items, count): return True # Check whether a 1-1 split would be OK. if Branch._is_folded(items[count - 1 : count + 1]): return False # Check whether a 1-2 split would be OK. if (Branch._is_full_case(items, count + 2) and Branch._is_folded(items[count - 1 : count + 2])): return False # Check whether a 2-1 split would be OK. if (Branch._is_full_case(items, count - 2) and Branch._is_folded(items[count - 2 : count + 1])): return False return True @staticmethod def _can_split_rev(items, count): end = len(items) # Check the characters either side of the proposed split. if not Branch._is_full_case(items, end - count): return True if not Branch._is_full_case(items, end - count - 1): return True # Check whether a 1-1 split would be OK. if Branch._is_folded(items[end - count - 1 : end - count + 1]): return False # Check whether a 1-2 split would be OK. if (Branch._is_full_case(items, end - count + 2) and Branch._is_folded(items[end - count - 1 : end - count + 2])): return False # Check whether a 2-1 split would be OK. if (Branch._is_full_case(items, end - count - 2) and Branch._is_folded(items[end - count - 2 : end - count + 1])): return False return True @staticmethod def _merge_common_prefixes(info, branches): # Branches with the same case-sensitive character prefix can be grouped # together if they are separated only by other branches with a # character prefix. prefixed = defaultdict(list) order = {} new_branches = [] for b in branches: if Branch._is_simple_character(b): # Branch starts with a simple character. prefixed[b.value].append([b]) order.setdefault(b.value, len(order)) elif (isinstance(b, Sequence) and b.items and Branch._is_simple_character(b.items[0])): # Branch starts with a simple character. prefixed[b.items[0].value].append(b.items) order.setdefault(b.items[0].value, len(order)) else: Branch._flush_char_prefix(info, prefixed, order, new_branches) new_branches.append(b) Branch._flush_char_prefix(info, prefixed, order, new_branches) return new_branches @staticmethod def _is_simple_character(c): return isinstance(c, Character) and c.positive and not c.case_flags @staticmethod def _reduce_to_set(info, branches): # Can the branches be reduced to a set? new_branches = [] items = set() case_flags = NOCASE for b in branches: if isinstance(b, (Character, Property, SetBase)): # Branch starts with a single character. if b.case_flags != case_flags: # Different case sensitivity, so flush. Branch._flush_set_members(info, items, case_flags, new_branches) case_flags = b.case_flags items.add(b.with_flags(case_flags=NOCASE)) else: Branch._flush_set_members(info, items, case_flags, new_branches) new_branches.append(b) Branch._flush_set_members(info, items, case_flags, new_branches) return new_branches @staticmethod def _flush_char_prefix(info, prefixed, order, new_branches): # Flush the prefixed branches. if not prefixed: return for value, branches in sorted(prefixed.items(), key=lambda pair: order[pair[0]]): if len(branches) == 1: new_branches.append(make_sequence(branches[0])) else: subbranches = [] optional = False for b in branches: if len(b) > 1: subbranches.append(make_sequence(b[1 : ])) elif not optional: subbranches.append(Sequence()) optional = True sequence = Sequence([Character(value), Branch(subbranches)]) new_branches.append(sequence.optimise(info)) prefixed.clear() order.clear() @staticmethod def _flush_set_members(info, items, case_flags, new_branches): # Flush the set members. if not items: return if len(items) == 1: item = list(items)[0] else: item = SetUnion(info, list(items)).optimise(info) new_branches.append(item.with_flags(case_flags=case_flags)) items.clear() @staticmethod def _is_full_case(items, i): if not 0 <= i < len(items): return False item = items[i] return (isinstance(item, Character) and item.positive and (item.case_flags & FULLIGNORECASE) == FULLIGNORECASE) @staticmethod def _is_folded(items): if len(items) < 2: return False for i in items: if (not isinstance(i, Character) or not i.positive or not i.case_flags): return False folded = u"".join(unichr(i.value) for i in items) folded = _regex.fold_case(FULL_CASE_FOLDING, folded) # Get the characters which expand to multiple codepoints on folding. expanding_chars = _regex.get_expand_on_folding() for c in expanding_chars: if folded == _regex.fold_case(FULL_CASE_FOLDING, c): return True return False def is_empty(self): return all(b.is_empty() for b in self.branches) def __eq__(self, other): return type(self) is type(other) and self.branches == other.branches def max_width(self): return max(b.max_width() for b in self.branches) class CallGroup(RegexBase): def __init__(self, info, group, position): RegexBase.__init__(self) self.info = info self.group = group self.position = position self._key = self.__class__, self.group def fix_groups(self, pattern, reverse, fuzzy): try: self.group = int(self.group) except ValueError: try: self.group = self.info.group_index[self.group] except KeyError: raise error("invalid group reference", pattern, self.position) if not 0 <= self.group <= self.info.group_count: raise error("unknown group", pattern, self.position) if self.group > 0 and self.info.open_group_count[self.group] > 1: raise error("ambiguous group reference", pattern, self.position) self.info.group_calls.append((self, reverse, fuzzy)) self._key = self.__class__, self.group def remove_captures(self): raise error("group reference not allowed", pattern, self.position) def _compile(self, reverse, fuzzy): return [(OP.GROUP_CALL, self.call_ref)] def _dump(self, indent, reverse): print "%sGROUP_CALL %s" % (INDENT * indent, self.group) def __eq__(self, other): return type(self) is type(other) and self.group == other.group def max_width(self): return UNLIMITED class Character(RegexBase): _opcode = {(NOCASE, False): OP.CHARACTER, (IGNORECASE, False): OP.CHARACTER_IGN, (FULLCASE, False): OP.CHARACTER, (FULLIGNORECASE, False): OP.CHARACTER_IGN, (NOCASE, True): OP.CHARACTER_REV, (IGNORECASE, True): OP.CHARACTER_IGN_REV, (FULLCASE, True): OP.CHARACTER_REV, (FULLIGNORECASE, True): OP.CHARACTER_IGN_REV} def __init__(self, value, positive=True, case_flags=NOCASE, zerowidth=False): RegexBase.__init__(self) self.value = value self.positive = bool(positive) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self.zerowidth = bool(zerowidth) if (self.positive and (self.case_flags & FULLIGNORECASE) == FULLIGNORECASE): self.folded = _regex.fold_case(FULL_CASE_FOLDING, unichr(self.value)) else: self.folded = unichr(self.value) self._key = (self.__class__, self.value, self.positive, self.case_flags, self.zerowidth) def rebuild(self, positive, case_flags, zerowidth): return Character(self.value, positive, case_flags, zerowidth) def optimise(self, info, in_set=False): return self def get_firstset(self, reverse): return set([self]) def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP code = PrecompiledCode([self._opcode[self.case_flags, reverse], flags, self.value]) if len(self.folded) > 1: # The character expands on full case-folding. code = Branch([code, String([ord(c) for c in self.folded], case_flags=self.case_flags)]) return code.compile(reverse, fuzzy) def _dump(self, indent, reverse): display = repr(unichr(self.value)).lstrip("bu") print "%sCHARACTER %s %s%s" % (INDENT * indent, POS_TEXT[self.positive], display, CASE_TEXT[self.case_flags]) def matches(self, ch): return (ch == self.value) == self.positive def max_width(self): return len(self.folded) def get_required_string(self, reverse): if not self.positive: return 1, None self.folded_characters = tuple(ord(c) for c in self.folded) return 0, self class Conditional(RegexBase): def __init__(self, info, group, yes_item, no_item, position): RegexBase.__init__(self) self.info = info self.group = group self.yes_item = yes_item self.no_item = no_item self.position = position def fix_groups(self, pattern, reverse, fuzzy): try: self.group = int(self.group) except ValueError: try: self.group = self.info.group_index[self.group] except KeyError: if self.group == 'DEFINE': # 'DEFINE' is a special name unless there's a group with # that name. self.group = 0 else: raise error("unknown group", pattern, self.position) if not 0 <= self.group <= self.info.group_count: raise error("invalid group reference", pattern, self.position) self.yes_item.fix_groups(pattern, reverse, fuzzy) self.no_item.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): yes_item = self.yes_item.optimise(info) no_item = self.no_item.optimise(info) return Conditional(info, self.group, yes_item, no_item, self.position) def pack_characters(self, info): self.yes_item = self.yes_item.pack_characters(info) self.no_item = self.no_item.pack_characters(info) return self def remove_captures(self): self.yes_item = self.yes_item.remove_captures() self.no_item = self.no_item.remove_captures() def is_atomic(self): return self.yes_item.is_atomic() and self.no_item.is_atomic() def can_be_affix(self): return self.yes_item.can_be_affix() and self.no_item.can_be_affix() def contains_group(self): return self.yes_item.contains_group() or self.no_item.contains_group() def get_firstset(self, reverse): return (self.yes_item.get_firstset(reverse) | self.no_item.get_firstset(reverse)) def _compile(self, reverse, fuzzy): code = [(OP.GROUP_EXISTS, self.group)] code.extend(self.yes_item.compile(reverse, fuzzy)) add_code = self.no_item.compile(reverse, fuzzy) if add_code: code.append((OP.NEXT, )) code.extend(add_code) code.append((OP.END, )) return code def _dump(self, indent, reverse): print "%sGROUP_EXISTS %s" % (INDENT * indent, self.group) self.yes_item.dump(indent + 1, reverse) if not self.no_item.is_empty(): print "%sOR" % (INDENT * indent) self.no_item.dump(indent + 1, reverse) def is_empty(self): return self.yes_item.is_empty() and self.no_item.is_empty() def __eq__(self, other): return type(self) is type(other) and (self.group, self.yes_item, self.no_item) == (other.group, other.yes_item, other.no_item) def max_width(self): return max(self.yes_item.max_width(), self.no_item.max_width()) class DefaultBoundary(ZeroWidthBase): _opcode = OP.DEFAULT_BOUNDARY _op_name = "DEFAULT_BOUNDARY" class DefaultEndOfWord(ZeroWidthBase): _opcode = OP.DEFAULT_END_OF_WORD _op_name = "DEFAULT_END_OF_WORD" class DefaultStartOfWord(ZeroWidthBase): _opcode = OP.DEFAULT_START_OF_WORD _op_name = "DEFAULT_START_OF_WORD" class EndOfLine(ZeroWidthBase): _opcode = OP.END_OF_LINE _op_name = "END_OF_LINE" class EndOfLineU(EndOfLine): _opcode = OP.END_OF_LINE_U _op_name = "END_OF_LINE_U" class EndOfString(ZeroWidthBase): _opcode = OP.END_OF_STRING _op_name = "END_OF_STRING" class EndOfStringLine(ZeroWidthBase): _opcode = OP.END_OF_STRING_LINE _op_name = "END_OF_STRING_LINE" class EndOfStringLineU(EndOfStringLine): _opcode = OP.END_OF_STRING_LINE_U _op_name = "END_OF_STRING_LINE_U" class EndOfWord(ZeroWidthBase): _opcode = OP.END_OF_WORD _op_name = "END_OF_WORD" class Failure(ZeroWidthBase): _op_name = "FAILURE" def _compile(self, reverse, fuzzy): return [(OP.FAILURE, )] class Fuzzy(RegexBase): def __init__(self, subpattern, constraints=None): RegexBase.__init__(self) if constraints is None: constraints = {} self.subpattern = subpattern self.constraints = constraints # If an error type is mentioned in the cost equation, then its maximum # defaults to unlimited. if "cost" in constraints: for e in "dis": if e in constraints["cost"]: constraints.setdefault(e, (0, None)) # If any error type is mentioned, then all the error maxima default to # 0, otherwise they default to unlimited. if set(constraints) & set("dis"): for e in "dis": constraints.setdefault(e, (0, 0)) else: for e in "dis": constraints.setdefault(e, (0, None)) # The maximum of the generic error type defaults to unlimited. constraints.setdefault("e", (0, None)) # The cost equation defaults to equal costs. Also, the cost of any # error type not mentioned in the cost equation defaults to 0. if "cost" in constraints: for e in "dis": constraints["cost"].setdefault(e, 0) else: constraints["cost"] = {"d": 1, "i": 1, "s": 1, "max": constraints["e"][1]} def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, reverse, True) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): self.subpattern = self.subpattern.remove_captures() return self def is_atomic(self): return self.subpattern.is_atomic() def contains_group(self): return self.subpattern.contains_group() def _compile(self, reverse, fuzzy): # The individual limits. arguments = [] for e in "dise": v = self.constraints[e] arguments.append(v[0]) arguments.append(UNLIMITED if v[1] is None else v[1]) # The coeffs of the cost equation. for e in "dis": arguments.append(self.constraints["cost"][e]) # The maximum of the cost equation. v = self.constraints["cost"]["max"] arguments.append(UNLIMITED if v is None else v) flags = 0 if reverse: flags |= REVERSE_OP return ([(OP.FUZZY, flags) + tuple(arguments)] + self.subpattern.compile(reverse, True) + [(OP.END,)]) def _dump(self, indent, reverse): constraints = self._constraints_to_string() if constraints: constraints = " " + constraints print "%sFUZZY%s" % (INDENT * indent, constraints) self.subpattern.dump(indent + 1, reverse) def is_empty(self): return self.subpattern.is_empty() def __eq__(self, other): return (type(self) is type(other) and self.subpattern == other.subpattern) def max_width(self): return UNLIMITED def _constraints_to_string(self): constraints = [] for name in "ids": min, max = self.constraints[name] if max == 0: continue con = "" if min > 0: con = "%s<=" % min con += name if max is not None: con += "<=%s" % max constraints.append(con) cost = [] for name in "ids": coeff = self.constraints["cost"][name] if coeff > 0: cost.append("%s%s" % (coeff, name)) limit = self.constraints["cost"]["max"] if limit is not None and limit > 0: cost = "%s<=%s" % ("+".join(cost), limit) constraints.append(cost) return ",".join(constraints) class Grapheme(RegexBase): def _compile(self, reverse, fuzzy): # Match at least 1 character until a grapheme boundary is reached. Note # that this is the same whether matching forwards or backwards. grapheme_matcher = Atomic(Sequence([LazyRepeat(AnyAll(), 1, None), GraphemeBoundary()])) return grapheme_matcher.compile(reverse, fuzzy) def _dump(self, indent, reverse): print "%sGRAPHEME" % (INDENT * indent) def max_width(self): return UNLIMITED class GraphemeBoundary: def compile(self, reverse, fuzzy): return [(OP.GRAPHEME_BOUNDARY, 1)] class GreedyRepeat(RegexBase): _opcode = OP.GREEDY_REPEAT _op_name = "GREEDY_REPEAT" def __init__(self, subpattern, min_count, max_count): RegexBase.__init__(self) self.subpattern = subpattern self.min_count = min_count self.max_count = max_count def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): subpattern = self.subpattern.optimise(info) return type(self)(subpattern, self.min_count, self.max_count) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): self.subpattern = self.subpattern.remove_captures() return self def is_atomic(self): return self.min_count == self.max_count and self.subpattern.is_atomic() def contains_group(self): return self.subpattern.contains_group() def get_firstset(self, reverse): fs = self.subpattern.get_firstset(reverse) if self.min_count == 0: fs.add(None) return fs def _compile(self, reverse, fuzzy): repeat = [self._opcode, self.min_count] if self.max_count is None: repeat.append(UNLIMITED) else: repeat.append(self.max_count) subpattern = self.subpattern.compile(reverse, fuzzy) if not subpattern: return [] return ([tuple(repeat)] + subpattern + [(OP.END, )]) def _dump(self, indent, reverse): if self.max_count is None: limit = "INF" else: limit = self.max_count print "%s%s %s %s" % (INDENT * indent, self._op_name, self.min_count, limit) self.subpattern.dump(indent + 1, reverse) def is_empty(self): return self.subpattern.is_empty() def __eq__(self, other): return type(self) is type(other) and (self.subpattern, self.min_count, self.max_count) == (other.subpattern, other.min_count, other.max_count) def max_width(self): if self.max_count is None: return UNLIMITED return self.subpattern.max_width() * self.max_count def get_required_string(self, reverse): max_count = UNLIMITED if self.max_count is None else self.max_count if self.min_count == 0: w = self.subpattern.max_width() * max_count return min(w, UNLIMITED), None ofs, req = self.subpattern.get_required_string(reverse) if req: return ofs, req w = self.subpattern.max_width() * max_count return min(w, UNLIMITED), None class Group(RegexBase): def __init__(self, info, group, subpattern): RegexBase.__init__(self) self.info = info self.group = group self.subpattern = subpattern self.call_ref = None def fix_groups(self, pattern, reverse, fuzzy): self.info.defined_groups[self.group] = (self, reverse, fuzzy) self.subpattern.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): subpattern = self.subpattern.optimise(info) return Group(self.info, self.group, subpattern) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): return self.subpattern.remove_captures() def is_atomic(self): return self.subpattern.is_atomic() def can_be_affix(self): return False def contains_group(self): return True def get_firstset(self, reverse): return self.subpattern.get_firstset(reverse) def has_simple_start(self): return self.subpattern.has_simple_start() def _compile(self, reverse, fuzzy): code = [] key = self.group, reverse, fuzzy ref = self.info.call_refs.get(key) if ref is not None: code += [(OP.CALL_REF, ref)] public_group = private_group = self.group if private_group < 0: public_group = self.info.private_groups[private_group] private_group = self.info.group_count - private_group code += ([(OP.GROUP, private_group, public_group)] + self.subpattern.compile(reverse, fuzzy) + [(OP.END, )]) if ref is not None: code += [(OP.END, )] return code def _dump(self, indent, reverse): group = self.group if group < 0: group = private_groups[group] print "%sGROUP %s" % (INDENT * indent, group) self.subpattern.dump(indent + 1, reverse) def __eq__(self, other): return (type(self) is type(other) and (self.group, self.subpattern) == (other.group, other.subpattern)) def max_width(self): return self.subpattern.max_width() def get_required_string(self, reverse): return self.subpattern.get_required_string(reverse) class Keep(ZeroWidthBase): _opcode = OP.KEEP _op_name = "KEEP" class LazyRepeat(GreedyRepeat): _opcode = OP.LAZY_REPEAT _op_name = "LAZY_REPEAT" class LookAround(RegexBase): _dir_text = {False: "AHEAD", True: "BEHIND"} def __init__(self, behind, positive, subpattern): RegexBase.__init__(self) self.behind = bool(behind) self.positive = bool(positive) self.subpattern = subpattern def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, self.behind, fuzzy) def optimise(self, info): subpattern = self.subpattern.optimise(info) if self.positive and subpattern.is_empty(): return subpattern return LookAround(self.behind, self.positive, subpattern) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): return self.subpattern.remove_captures() def is_atomic(self): return self.subpattern.is_atomic() def can_be_affix(self): return self.subpattern.can_be_affix() def contains_group(self): return self.subpattern.contains_group() def _compile(self, reverse, fuzzy): return ([(OP.LOOKAROUND, int(self.positive), int(not self.behind))] + self.subpattern.compile(self.behind) + [(OP.END, )]) def _dump(self, indent, reverse): print "%sLOOK%s %s" % (INDENT * indent, self._dir_text[self.behind], POS_TEXT[self.positive]) self.subpattern.dump(indent + 1, self.behind) def is_empty(self): return self.positive and self.subpattern.is_empty() def __eq__(self, other): return type(self) is type(other) and (self.behind, self.positive, self.subpattern) == (other.behind, other.positive, other.subpattern) def max_width(self): return 0 class LookAroundConditional(RegexBase): _dir_text = {False: "AHEAD", True: "BEHIND"} def __init__(self, behind, positive, subpattern, yes_item, no_item): RegexBase.__init__(self) self.behind = bool(behind) self.positive = bool(positive) self.subpattern = subpattern self.yes_item = yes_item self.no_item = no_item def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, reverse, fuzzy) self.yes_item.fix_groups(pattern, reverse, fuzzy) self.no_item.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): subpattern = self.subpattern.optimise(info) yes_item = self.yes_item.optimise(info) no_item = self.no_item.optimise(info) return LookAroundConditional(self.behind, self.positive, subpattern, yes_item, no_item) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) self.yes_item = self.yes_item.pack_characters(info) self.no_item = self.no_item.pack_characters(info) return self def remove_captures(self): self.subpattern = self.subpattern.remove_captures() self.yes_item = self.yes_item.remove_captures() self.no_item = self.no_item.remove_captures() def is_atomic(self): return (self.subpattern.is_atomic() and self.yes_item.is_atomic() and self.no_item.is_atomic()) def can_be_affix(self): return (self.subpattern.can_be_affix() and self.yes_item.can_be_affix() and self.no_item.can_be_affix()) def contains_group(self): return (self.subpattern.contains_group() or self.yes_item.contains_group() or self.no_item.contains_group()) def get_firstset(self, reverse): return (self.subpattern.get_firstset(reverse) | self.no_item.get_firstset(reverse)) def _compile(self, reverse, fuzzy): code = [(OP.CONDITIONAL, int(self.positive), int(not self.behind))] code.extend(self.subpattern.compile(self.behind, fuzzy)) code.append((OP.NEXT, )) code.extend(self.yes_item.compile(reverse, fuzzy)) add_code = self.no_item.compile(reverse, fuzzy) if add_code: code.append((OP.NEXT, )) code.extend(add_code) code.append((OP.END, )) return code def _dump(self, indent, reverse): print("%sCONDITIONAL %s %s" % (INDENT * indent, self._dir_text[self.behind], POS_TEXT[self.positive])) self.subpattern.dump(indent + 1, self.behind) print("%sEITHER" % (INDENT * indent)) self.yes_item.dump(indent + 1, reverse) if not self.no_item.is_empty(): print("%sOR".format(INDENT * indent)) self.no_item.dump(indent + 1, reverse) def is_empty(self): return (self.subpattern.is_empty() and self.yes_item.is_empty() or self.no_item.is_empty()) def __eq__(self, other): return type(self) is type(other) and (self.subpattern, self.yes_item, self.no_item) == (other.subpattern, other.yes_item, other.no_item) def max_width(self): return max(self.yes_item.max_width(), self.no_item.max_width()) def get_required_string(self, reverse): return self.max_width(), None class PrecompiledCode(RegexBase): def __init__(self, code): self.code = code def _compile(self, reverse, fuzzy): return [tuple(self.code)] class Property(RegexBase): _opcode = {(NOCASE, False): OP.PROPERTY, (IGNORECASE, False): OP.PROPERTY_IGN, (FULLCASE, False): OP.PROPERTY, (FULLIGNORECASE, False): OP.PROPERTY_IGN, (NOCASE, True): OP.PROPERTY_REV, (IGNORECASE, True): OP.PROPERTY_IGN_REV, (FULLCASE, True): OP.PROPERTY_REV, (FULLIGNORECASE, True): OP.PROPERTY_IGN_REV} def __init__(self, value, positive=True, case_flags=NOCASE, zerowidth=False): RegexBase.__init__(self) self.value = value self.positive = bool(positive) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self.zerowidth = bool(zerowidth) self._key = (self.__class__, self.value, self.positive, self.case_flags, self.zerowidth) def rebuild(self, positive, case_flags, zerowidth): return Property(self.value, positive, case_flags, zerowidth) def optimise(self, info, in_set=False): return self def get_firstset(self, reverse): return set([self]) def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP return [(self._opcode[self.case_flags, reverse], flags, self.value)] def _dump(self, indent, reverse): prop = PROPERTY_NAMES[self.value >> 16] name, value = prop[0], prop[1][self.value & 0xFFFF] print "%sPROPERTY %s %s:%s%s" % (INDENT * indent, POS_TEXT[self.positive], name, value, CASE_TEXT[self.case_flags]) def matches(self, ch): return _regex.has_property_value(self.value, ch) == self.positive def max_width(self): return 1 class Prune(ZeroWidthBase): _op_name = "PRUNE" def _compile(self, reverse, fuzzy): return [(OP.PRUNE, )] class Range(RegexBase): _opcode = {(NOCASE, False): OP.RANGE, (IGNORECASE, False): OP.RANGE_IGN, (FULLCASE, False): OP.RANGE, (FULLIGNORECASE, False): OP.RANGE_IGN, (NOCASE, True): OP.RANGE_REV, (IGNORECASE, True): OP.RANGE_IGN_REV, (FULLCASE, True): OP.RANGE_REV, (FULLIGNORECASE, True): OP.RANGE_IGN_REV} _op_name = "RANGE" def __init__(self, lower, upper, positive=True, case_flags=NOCASE, zerowidth=False): RegexBase.__init__(self) self.lower = lower self.upper = upper self.positive = bool(positive) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self.zerowidth = bool(zerowidth) self._key = (self.__class__, self.lower, self.upper, self.positive, self.case_flags, self.zerowidth) def rebuild(self, positive, case_flags, zerowidth): return Range(self.lower, self.upper, positive, case_flags, zerowidth) def optimise(self, info, in_set=False): # Is the range case-sensitive? if not self.positive or not (self.case_flags & IGNORECASE) or in_set: return self # Is full case-folding possible? if (not (info.flags & UNICODE) or (self.case_flags & FULLIGNORECASE) != FULLIGNORECASE): return self # Get the characters which expand to multiple codepoints on folding. expanding_chars = _regex.get_expand_on_folding() # Get the folded characters in the range. items = [] for ch in expanding_chars: if self.lower <= ord(ch) <= self.upper: folded = _regex.fold_case(FULL_CASE_FOLDING, ch) items.append(String([ord(c) for c in folded], case_flags=self.case_flags)) if not items: # We can fall back to simple case-folding. return self if len(items) < self.upper - self.lower + 1: # Not all the characters are covered by the full case-folding. items.insert(0, self) return Branch(items) def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP return [(self._opcode[self.case_flags, reverse], flags, self.lower, self.upper)] def _dump(self, indent, reverse): display_lower = repr(unichr(self.lower)).lstrip("bu") display_upper = repr(unichr(self.upper)).lstrip("bu") print "%sRANGE %s %s %s%s" % (INDENT * indent, POS_TEXT[self.positive], display_lower, display_upper, CASE_TEXT[self.case_flags]) def matches(self, ch): return (self.lower <= ch <= self.upper) == self.positive def max_width(self): return 1 class RefGroup(RegexBase): _opcode = {(NOCASE, False): OP.REF_GROUP, (IGNORECASE, False): OP.REF_GROUP_IGN, (FULLCASE, False): OP.REF_GROUP, (FULLIGNORECASE, False): OP.REF_GROUP_FLD, (NOCASE, True): OP.REF_GROUP_REV, (IGNORECASE, True): OP.REF_GROUP_IGN_REV, (FULLCASE, True): OP.REF_GROUP_REV, (FULLIGNORECASE, True): OP.REF_GROUP_FLD_REV} def __init__(self, info, group, position, case_flags=NOCASE): RegexBase.__init__(self) self.info = info self.group = group self.position = position self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self._key = self.__class__, self.group, self.case_flags def fix_groups(self, pattern, reverse, fuzzy): try: self.group = int(self.group) except ValueError: try: self.group = self.info.group_index[self.group] except KeyError: raise error("unknown group", pattern, self.position) if not 1 <= self.group <= self.info.group_count: raise error("invalid group reference", pattern, self.position) self._key = self.__class__, self.group, self.case_flags def remove_captures(self): raise error("group reference not allowed", pattern, self.position) def _compile(self, reverse, fuzzy): flags = 0 if fuzzy: flags |= FUZZY_OP return [(self._opcode[self.case_flags, reverse], flags, self.group)] def _dump(self, indent, reverse): print "%sREF_GROUP %s%s" % (INDENT * indent, self.group, CASE_TEXT[self.case_flags]) def max_width(self): return UNLIMITED class SearchAnchor(ZeroWidthBase): _opcode = OP.SEARCH_ANCHOR _op_name = "SEARCH_ANCHOR" class Sequence(RegexBase): def __init__(self, items=None): RegexBase.__init__(self) if items is None: items = [] self.items = items def fix_groups(self, pattern, reverse, fuzzy): for s in self.items: s.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): # Flatten the sequences. items = [] for s in self.items: s = s.optimise(info) if isinstance(s, Sequence): items.extend(s.items) else: items.append(s) return make_sequence(items) def pack_characters(self, info): "Packs sequences of characters into strings." items = [] characters = [] case_flags = NOCASE for s in self.items: if type(s) is Character and s.positive: if s.case_flags != case_flags: # Different case sensitivity, so flush, unless neither the # previous nor the new character are cased. if s.case_flags or is_cased(info, s.value): Sequence._flush_characters(info, characters, case_flags, items) case_flags = s.case_flags characters.append(s.value) elif type(s) is String or type(s) is Literal: if s.case_flags != case_flags: # Different case sensitivity, so flush, unless the neither # the previous nor the new string are cased. if s.case_flags or any(is_cased(info, c) for c in characters): Sequence._flush_characters(info, characters, case_flags, items) case_flags = s.case_flags characters.extend(s.characters) else: Sequence._flush_characters(info, characters, case_flags, items) items.append(s.pack_characters(info)) Sequence._flush_characters(info, characters, case_flags, items) return make_sequence(items) def remove_captures(self): self.items = [s.remove_captures() for s in self.items] return self def is_atomic(self): return all(s.is_atomic() for s in self.items) def can_be_affix(self): return False def contains_group(self): return any(s.contains_group() for s in self.items) def get_firstset(self, reverse): fs = set() items = self.items if reverse: items.reverse() for s in items: fs |= s.get_firstset(reverse) if None not in fs: return fs fs.discard(None) return fs | set([None]) def has_simple_start(self): return bool(self.items) and self.items[0].has_simple_start() def _compile(self, reverse, fuzzy): seq = self.items if reverse: seq = seq[::-1] code = [] for s in seq: code.extend(s.compile(reverse, fuzzy)) return code def _dump(self, indent, reverse): for s in self.items: s.dump(indent, reverse) @staticmethod def _flush_characters(info, characters, case_flags, items): if not characters: return # Disregard case_flags if all of the characters are case-less. if case_flags & IGNORECASE: if not any(is_cased(info, c) for c in characters): case_flags = NOCASE if len(characters) == 1: items.append(Character(characters[0], case_flags=case_flags)) else: items.append(String(characters, case_flags=case_flags)) characters[:] = [] def is_empty(self): return all(i.is_empty() for i in self.items) def __eq__(self, other): return type(self) is type(other) and self.items == other.items def max_width(self): return sum(s.max_width() for s in self.items) def get_required_string(self, reverse): seq = self.items if reverse: seq = seq[::-1] offset = 0 for s in seq: ofs, req = s.get_required_string(reverse) offset += ofs if req: return offset, req return offset, None class SetBase(RegexBase): def __init__(self, info, items, positive=True, case_flags=NOCASE, zerowidth=False): RegexBase.__init__(self) self.info = info self.items = tuple(items) self.positive = bool(positive) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self.zerowidth = bool(zerowidth) self.char_width = 1 self._key = (self.__class__, self.items, self.positive, self.case_flags, self.zerowidth) def rebuild(self, positive, case_flags, zerowidth): return type(self)(self.info, self.items, positive, case_flags, zerowidth).optimise(self.info) def get_firstset(self, reverse): return set([self]) def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP code = [(self._opcode[self.case_flags, reverse], flags)] for m in self.items: code.extend(m.compile()) code.append((OP.END, )) return code def _dump(self, indent, reverse): print "%s%s %s%s" % (INDENT * indent, self._op_name, POS_TEXT[self.positive], CASE_TEXT[self.case_flags]) for i in self.items: i.dump(indent + 1, reverse) def _handle_case_folding(self, info, in_set): # Is the set case-sensitive? if not self.positive or not (self.case_flags & IGNORECASE) or in_set: return self # Is full case-folding possible? if (not (self.info.flags & UNICODE) or (self.case_flags & FULLIGNORECASE) != FULLIGNORECASE): return self # Get the characters which expand to multiple codepoints on folding. expanding_chars = _regex.get_expand_on_folding() # Get the folded characters in the set. items = [] seen = set() for ch in expanding_chars: if self.matches(ord(ch)): folded = _regex.fold_case(FULL_CASE_FOLDING, ch) if folded not in seen: items.append(String([ord(c) for c in folded], case_flags=self.case_flags)) seen.add(folded) if not items: # We can fall back to simple case-folding. return self return Branch([self] + items) def max_width(self): # Is the set case-sensitive? if not self.positive or not (self.case_flags & IGNORECASE): return 1 # Is full case-folding possible? if (not (self.info.flags & UNICODE) or (self.case_flags & FULLIGNORECASE) != FULLIGNORECASE): return 1 # Get the characters which expand to multiple codepoints on folding. expanding_chars = _regex.get_expand_on_folding() # Get the folded characters in the set. seen = set() for ch in expanding_chars: if self.matches(ord(ch)): folded = _regex.fold_case(FULL_CASE_FOLDING, ch) seen.add(folded) if not seen: return 1 return max(len(folded) for folded in seen) class SetDiff(SetBase): _opcode = {(NOCASE, False): OP.SET_DIFF, (IGNORECASE, False): OP.SET_DIFF_IGN, (FULLCASE, False): OP.SET_DIFF, (FULLIGNORECASE, False): OP.SET_DIFF_IGN, (NOCASE, True): OP.SET_DIFF_REV, (IGNORECASE, True): OP.SET_DIFF_IGN_REV, (FULLCASE, True): OP.SET_DIFF_REV, (FULLIGNORECASE, True): OP.SET_DIFF_IGN_REV} _op_name = "SET_DIFF" def optimise(self, info, in_set=False): items = self.items if len(items) > 2: items = [items[0], SetUnion(info, items[1 : ])] if len(items) == 1: return items[0].with_flags(case_flags=self.case_flags, zerowidth=self.zerowidth).optimise(info, in_set) self.items = tuple(m.optimise(info, in_set=True) for m in items) return self._handle_case_folding(info, in_set) def matches(self, ch): m = self.items[0].matches(ch) and not self.items[1].matches(ch) return m == self.positive class SetInter(SetBase): _opcode = {(NOCASE, False): OP.SET_INTER, (IGNORECASE, False): OP.SET_INTER_IGN, (FULLCASE, False): OP.SET_INTER, (FULLIGNORECASE, False): OP.SET_INTER_IGN, (NOCASE, True): OP.SET_INTER_REV, (IGNORECASE, True): OP.SET_INTER_IGN_REV, (FULLCASE, True): OP.SET_INTER_REV, (FULLIGNORECASE, True): OP.SET_INTER_IGN_REV} _op_name = "SET_INTER" def optimise(self, info, in_set=False): items = [] for m in self.items: m = m.optimise(info, in_set=True) if isinstance(m, SetInter) and m.positive: # Intersection in intersection. items.extend(m.items) else: items.append(m) if len(items) == 1: return items[0].with_flags(case_flags=self.case_flags, zerowidth=self.zerowidth).optimise(info, in_set) self.items = tuple(items) return self._handle_case_folding(info, in_set) def matches(self, ch): m = all(i.matches(ch) for i in self.items) return m == self.positive class SetSymDiff(SetBase): _opcode = {(NOCASE, False): OP.SET_SYM_DIFF, (IGNORECASE, False): OP.SET_SYM_DIFF_IGN, (FULLCASE, False): OP.SET_SYM_DIFF, (FULLIGNORECASE, False): OP.SET_SYM_DIFF_IGN, (NOCASE, True): OP.SET_SYM_DIFF_REV, (IGNORECASE, True): OP.SET_SYM_DIFF_IGN_REV, (FULLCASE, True): OP.SET_SYM_DIFF_REV, (FULLIGNORECASE, True): OP.SET_SYM_DIFF_IGN_REV} _op_name = "SET_SYM_DIFF" def optimise(self, info, in_set=False): items = [] for m in self.items: m = m.optimise(info, in_set=True) if isinstance(m, SetSymDiff) and m.positive: # Symmetric difference in symmetric difference. items.extend(m.items) else: items.append(m) if len(items) == 1: return items[0].with_flags(case_flags=self.case_flags, zerowidth=self.zerowidth).optimise(info, in_set) self.items = tuple(items) return self._handle_case_folding(info, in_set) def matches(self, ch): m = False for i in self.items: m = m != i.matches(ch) return m == self.positive class SetUnion(SetBase): _opcode = {(NOCASE, False): OP.SET_UNION, (IGNORECASE, False): OP.SET_UNION_IGN, (FULLCASE, False): OP.SET_UNION, (FULLIGNORECASE, False): OP.SET_UNION_IGN, (NOCASE, True): OP.SET_UNION_REV, (IGNORECASE, True): OP.SET_UNION_IGN_REV, (FULLCASE, True): OP.SET_UNION_REV, (FULLIGNORECASE, True): OP.SET_UNION_IGN_REV} _op_name = "SET_UNION" def optimise(self, info, in_set=False): items = [] for m in self.items: m = m.optimise(info, in_set=True) if isinstance(m, SetUnion) and m.positive: # Union in union. items.extend(m.items) else: items.append(m) if len(items) == 1: i = items[0] return i.with_flags(positive=i.positive == self.positive, case_flags=self.case_flags, zerowidth=self.zerowidth).optimise(info, in_set) self.items = tuple(items) return self._handle_case_folding(info, in_set) def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP characters, others = defaultdict(list), [] for m in self.items: if isinstance(m, Character): characters[m.positive].append(m.value) else: others.append(m) code = [(self._opcode[self.case_flags, reverse], flags)] for positive, values in characters.items(): flags = 0 if positive: flags |= POSITIVE_OP if len(values) == 1: code.append((OP.CHARACTER, flags, values[0])) else: code.append((OP.STRING, flags, len(values)) + tuple(values)) for m in others: code.extend(m.compile()) code.append((OP.END, )) return code def matches(self, ch): m = any(i.matches(ch) for i in self.items) return m == self.positive class Skip(ZeroWidthBase): _op_name = "SKIP" _opcode = OP.SKIP class StartOfLine(ZeroWidthBase): _opcode = OP.START_OF_LINE _op_name = "START_OF_LINE" class StartOfLineU(StartOfLine): _opcode = OP.START_OF_LINE_U _op_name = "START_OF_LINE_U" class StartOfString(ZeroWidthBase): _opcode = OP.START_OF_STRING _op_name = "START_OF_STRING" class StartOfWord(ZeroWidthBase): _opcode = OP.START_OF_WORD _op_name = "START_OF_WORD" class String(RegexBase): _opcode = {(NOCASE, False): OP.STRING, (IGNORECASE, False): OP.STRING_IGN, (FULLCASE, False): OP.STRING, (FULLIGNORECASE, False): OP.STRING_FLD, (NOCASE, True): OP.STRING_REV, (IGNORECASE, True): OP.STRING_IGN_REV, (FULLCASE, True): OP.STRING_REV, (FULLIGNORECASE, True): OP.STRING_FLD_REV} def __init__(self, characters, case_flags=NOCASE): self.characters = tuple(characters) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] if (self.case_flags & FULLIGNORECASE) == FULLIGNORECASE: folded_characters = [] for char in self.characters: folded = _regex.fold_case(FULL_CASE_FOLDING, unichr(char)) folded_characters.extend(ord(c) for c in folded) else: folded_characters = self.characters self.folded_characters = tuple(folded_characters) self.required = False self._key = self.__class__, self.characters, self.case_flags def get_firstset(self, reverse): if reverse: pos = -1 else: pos = 0 return set([Character(self.characters[pos], case_flags=self.case_flags)]) def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if fuzzy: flags |= FUZZY_OP if self.required: flags |= REQUIRED_OP return [(self._opcode[self.case_flags, reverse], flags, len(self.folded_characters)) + self.folded_characters] def _dump(self, indent, reverse): display = repr("".join(unichr(c) for c in self.characters)).lstrip("bu") print "%sSTRING %s%s" % (INDENT * indent, display, CASE_TEXT[self.case_flags]) def max_width(self): return len(self.folded_characters) def get_required_string(self, reverse): return 0, self class Literal(String): def _dump(self, indent, reverse): for c in self.characters: display = repr(unichr(c)).lstrip("bu") print "%sCHARACTER MATCH %s%s" % (INDENT * indent, display, CASE_TEXT[self.case_flags]) class StringSet(RegexBase): _opcode = {(NOCASE, False): OP.STRING_SET, (IGNORECASE, False): OP.STRING_SET_IGN, (FULLCASE, False): OP.STRING_SET, (FULLIGNORECASE, False): OP.STRING_SET_FLD, (NOCASE, True): OP.STRING_SET_REV, (IGNORECASE, True): OP.STRING_SET_IGN_REV, (FULLCASE, True): OP.STRING_SET_REV, (FULLIGNORECASE, True): OP.STRING_SET_FLD_REV} def __init__(self, info, name, case_flags=NOCASE): self.info = info self.name = name self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self._key = self.__class__, self.name, self.case_flags self.set_key = (name, self.case_flags) if self.set_key not in info.named_lists_used: info.named_lists_used[self.set_key] = len(info.named_lists_used) def _compile(self, reverse, fuzzy): index = self.info.named_lists_used[self.set_key] items = self.info.kwargs[self.name] case_flags = self.case_flags if not items: return [] encoding = self.info.flags & _ALL_ENCODINGS fold_flags = encoding | case_flags if fuzzy: choices = [self._folded(fold_flags, i) for i in items] # Sort from longest to shortest. choices.sort(key=lambda s: (-len(s), s)) branches = [] for string in choices: branches.append(Sequence([Character(c, case_flags=case_flags) for c in string])) if len(branches) > 1: branch = Branch(branches) else: branch = branches[0] branch = branch.optimise(self.info).pack_characters(self.info) return branch.compile(reverse, fuzzy) else: min_len = min(len(i) for i in items) max_len = max(len(self._folded(fold_flags, i)) for i in items) return [(self._opcode[case_flags, reverse], index, min_len, max_len)] def _dump(self, indent, reverse): print "%sSTRING_SET %s%s" % (INDENT * indent, self.name, CASE_TEXT[self.case_flags]) def _folded(self, fold_flags, item): if isinstance(item, unicode): return [ord(c) for c in _regex.fold_case(fold_flags, item)] else: return [ord(c) for c in item] def _flatten(self, s): # Flattens the branches. if isinstance(s, Branch): for b in s.branches: self._flatten(b) elif isinstance(s, Sequence) and s.items: seq = s.items while isinstance(seq[-1], Sequence): seq[-1 : ] = seq[-1].items n = 0 while n < len(seq) and isinstance(seq[n], Character): n += 1 if n > 1: seq[ : n] = [String([c.value for c in seq[ : n]], case_flags=self.case_flags)] self._flatten(seq[-1]) def max_width(self): if not self.info.kwargs[self.name]: return 0 if self.case_flags & IGNORECASE: fold_flags = (self.info.flags & _ALL_ENCODINGS) | self.case_flags return max(len(_regex.fold_case(fold_flags, i)) for i in self.info.kwargs[self.name]) else: return max(len(i) for i in self.info.kwargs[self.name]) class Source(object): "Scanner for the regular expression source string." def __init__(self, string): if isinstance(string, unicode): self.string = string self.char_type = unichr else: self.string = string self.char_type = chr self.pos = 0 self.ignore_space = False self.sep = string[ : 0] def get(self): string = self.string pos = self.pos try: if self.ignore_space: while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) else: break ch = string[pos] self.pos = pos + 1 return ch except IndexError: # We've reached the end of the string. self.pos = pos return string[ : 0] except ValueError: # The comment extended to the end of the string. self.pos = len(string) return string[ : 0] def get_many(self, count=1): string = self.string pos = self.pos try: if self.ignore_space: substring = [] while len(substring) < count: while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) else: break substring.append(string[pos]) pos += 1 substring = "".join(substring) else: substring = string[pos : pos + count] pos += len(substring) self.pos = pos return substring except IndexError: # We've reached the end of the string. self.pos = len(string) return "".join(substring) except ValueError: # The comment extended to the end of the string. self.pos = len(string) return "".join(substring) def get_while(self, test_set, include=True): string = self.string pos = self.pos if self.ignore_space: try: substring = [] while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) elif (string[pos] in test_set) == include: substring.append(string[pos]) pos += 1 else: break self.pos = pos except IndexError: # We've reached the end of the string. self.pos = len(string) except ValueError: # The comment extended to the end of the string. self.pos = len(string) return "".join(substring) else: try: while (string[pos] in test_set) == include: pos += 1 substring = string[self.pos : pos] self.pos = pos return substring except IndexError: # We've reached the end of the string. substring = string[self.pos : pos] self.pos = pos return substring def skip_while(self, test_set, include=True): string = self.string pos = self.pos try: if self.ignore_space: while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) elif (string[pos] in test_set) == include: pos += 1 else: break else: while (string[pos] in test_set) == include: pos += 1 self.pos = pos except IndexError: # We've reached the end of the string. self.pos = len(string) except ValueError: # The comment extended to the end of the string. self.pos = len(string) def match(self, substring): string = self.string pos = self.pos if self.ignore_space: try: for c in substring: while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) else: break if string[pos] != c: return False pos += 1 self.pos = pos return True except IndexError: # We've reached the end of the string. return False except ValueError: # The comment extended to the end of the string. return False else: if not string.startswith(substring, pos): return False self.pos = pos + len(substring) return True def expect(self, substring): if not self.match(substring): raise error("missing %s" % substring, self.string, self.pos) def at_end(self): string = self.string pos = self.pos try: if self.ignore_space: while True: if string[pos].isspace(): pos += 1 elif string[pos] == "#": pos = string.index("\n", pos) else: break return pos >= len(string) except IndexError: # We've reached the end of the string. return True except ValueError: # The comment extended to the end of the string. return True class Info(object): "Info about the regular expression." def __init__(self, flags=0, char_type=None, kwargs={}): flags |= DEFAULT_FLAGS[(flags & _ALL_VERSIONS) or DEFAULT_VERSION] self.flags = flags self.global_flags = flags self.inline_locale = False self.kwargs = kwargs self.group_count = 0 self.group_index = {} self.group_name = {} self.char_type = char_type self.named_lists_used = {} self.open_groups = [] self.open_group_count = {} self.defined_groups = {} self.group_calls = [] self.private_groups = {} def open_group(self, name=None): group = self.group_index.get(name) if group is None: while True: self.group_count += 1 if name is None or self.group_count not in self.group_name: break group = self.group_count if name: self.group_index[name] = group self.group_name[group] = name if group in self.open_groups: # We have a nested named group. We'll assign it a private group # number, initially negative until we can assign a proper # (positive) number. group_alias = -(len(self.private_groups) + 1) self.private_groups[group_alias] = group group = group_alias self.open_groups.append(group) self.open_group_count[group] = self.open_group_count.get(group, 0) + 1 return group def close_group(self): self.open_groups.pop() def is_open_group(self, name): # In version 1, a group reference can refer to an open group. We'll # just pretend the group isn't open. version = (self.flags & _ALL_VERSIONS) or DEFAULT_VERSION if version == VERSION1: return False if name.isdigit(): group = int(name) else: group = self.group_index.get(name) return group in self.open_groups def _check_group_features(info, parsed): """Checks whether the reverse and fuzzy features of the group calls match the groups which they call. """ call_refs = {} additional_groups = [] for call, reverse, fuzzy in info.group_calls: # Look up the reference of this group call. key = (call.group, reverse, fuzzy) ref = call_refs.get(key) if ref is None: # This group doesn't have a reference yet, so look up its features. if call.group == 0: # Calling the pattern as a whole. rev = bool(info.flags & REVERSE) fuz = isinstance(parsed, Fuzzy) if (rev, fuz) != (reverse, fuzzy): # The pattern as a whole doesn't have the features we want, # so we'll need to make a copy of it with the desired # features. additional_groups.append((parsed, reverse, fuzzy)) else: # Calling a capture group. def_info = info.defined_groups[call.group] group = def_info[0] if def_info[1 : ] != (reverse, fuzzy): # The group doesn't have the features we want, so we'll # need to make a copy of it with the desired features. additional_groups.append((group, reverse, fuzzy)) ref = len(call_refs) call_refs[key] = ref call.call_ref = ref info.call_refs = call_refs info.additional_groups = additional_groups def _get_required_string(parsed, flags): "Gets the required string and related info of a parsed pattern." req_offset, required = parsed.get_required_string(bool(flags & REVERSE)) if required: required.required = True if req_offset >= UNLIMITED: req_offset = -1 req_flags = required.case_flags if not (flags & UNICODE): req_flags &= ~UNICODE req_chars = required.folded_characters else: req_offset = 0 req_chars = () req_flags = 0 return req_offset, req_chars, req_flags class Scanner: def __init__(self, lexicon, flags=0): self.lexicon = lexicon # Combine phrases into a compound pattern. patterns = [] for phrase, action in lexicon: # Parse the regular expression. source = Source(phrase) info = Info(flags, source.char_type) source.ignore_space = bool(info.flags & VERBOSE) parsed = _parse_pattern(source, info) if not source.at_end(): raise error("unbalanced parenthesis", source.string, source.pos) # We want to forbid capture groups within each phrase. patterns.append(parsed.remove_captures()) # Combine all the subpatterns into one pattern. info = Info(flags) patterns = [Group(info, g + 1, p) for g, p in enumerate(patterns)] parsed = Branch(patterns) # Optimise the compound pattern. parsed = parsed.optimise(info) parsed = parsed.pack_characters(info) # Get the required string. req_offset, req_chars, req_flags = _get_required_string(parsed, info.flags) # Check the features of the groups. _check_group_features(info, parsed) # Complain if there are any group calls. They are not supported by the # Scanner class. if info.call_refs: raise error("recursive regex not supported by Scanner", source.string, source.pos) reverse = bool(info.flags & REVERSE) # Compile the compound pattern. The result is a list of tuples. code = parsed.compile(reverse) + [(OP.SUCCESS, )] # Flatten the code into a list of ints. code = _flatten_code(code) if not parsed.has_simple_start(): # Get the first set, if possible. try: fs_code = _compile_firstset(info, parsed.get_firstset(reverse)) fs_code = _flatten_code(fs_code) code = fs_code + code except _FirstSetError: pass # Check the global flags for conflicts. version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION if version not in (0, VERSION0, VERSION1): raise ValueError("VERSION0 and VERSION1 flags are mutually incompatible") # Create the PatternObject. # # Local flags like IGNORECASE affect the code generation, but aren't # needed by the PatternObject itself. Conversely, global flags like # LOCALE _don't_ affect the code generation but _are_ needed by the # PatternObject. self.scanner = _regex.compile(None, (flags & GLOBAL_FLAGS) | version, code, {}, {}, {}, [], req_offset, req_chars, req_flags, len(patterns)) def scan(self, string): result = [] append = result.append match = self.scanner.scanner(string).match i = 0 while True: m = match() if not m: break j = m.end() if i == j: break action = self.lexicon[m.lastindex - 1][1] if hasattr(action, '__call__'): self.match = m action = action(self, m.group()) if action is not None: append(action) i = j return result, string[i : ] # Get the known properties dict. PROPERTIES = _regex.get_properties() # Build the inverse of the properties dict. PROPERTY_NAMES = {} for prop_name, (prop_id, values) in PROPERTIES.items(): name, prop_values = PROPERTY_NAMES.get(prop_id, ("", {})) name = max(name, prop_name, key=len) PROPERTY_NAMES[prop_id] = name, prop_values for val_name, val_id in values.items(): prop_values[val_id] = max(prop_values.get(val_id, ""), val_name, key=len) # Character escape sequences. CHARACTER_ESCAPES = { "a": "\a", "b": "\b", "f": "\f", "n": "\n", "r": "\r", "t": "\t", "v": "\v", } # Predefined character set escape sequences. CHARSET_ESCAPES = { "d": lookup_property(None, "Digit", True), "D": lookup_property(None, "Digit", False), "s": lookup_property(None, "Space", True), "S": lookup_property(None, "Space", False), "w": lookup_property(None, "Word", True), "W": lookup_property(None, "Word", False), } # Positional escape sequences. POSITION_ESCAPES = { "A": StartOfString(), "b": Boundary(), "B": Boundary(False), "K": Keep(), "m": StartOfWord(), "M": EndOfWord(), "Z": EndOfString(), } # Positional escape sequences when WORD flag set. WORD_POSITION_ESCAPES = dict(POSITION_ESCAPES) WORD_POSITION_ESCAPES.update({ "b": DefaultBoundary(), "B": DefaultBoundary(False), "m": DefaultStartOfWord(), "M": DefaultEndOfWord(), }) # Regex control verbs. VERBS = { "FAIL": Failure(), "F": Failure(), "PRUNE": Prune(), "SKIP": Skip(), } regex-2016.01.10/Python2/_regex_unicode.c0000666000000000000000000253720412540663552016061 0ustar 00000000000000/* For Unicode version 8.0.0 */ #include "_regex_unicode.h" #define RE_BLANK_MASK ((1 << RE_PROP_ZL) | (1 << RE_PROP_ZP)) #define RE_GRAPH_MASK ((1 << RE_PROP_CC) | (1 << RE_PROP_CS) | (1 << RE_PROP_CN)) #define RE_WORD_MASK (RE_PROP_M_MASK | (1 << RE_PROP_ND) | (1 << RE_PROP_PC)) typedef struct RE_AllCases { RE_INT32 diffs[RE_MAX_CASES - 1]; } RE_AllCases; typedef struct RE_FullCaseFolding { RE_INT32 diff; RE_UINT16 codepoints[RE_MAX_FOLDED - 1]; } RE_FullCaseFolding; /* strings. */ char* re_strings[] = { "-1/2", "0", "1", "1/10", "1/12", "1/16", "1/2", "1/3", "1/4", "1/5", "1/6", "1/7", "1/8", "1/9", "10", "100", "1000", "10000", "100000", "1000000", "100000000", "10000000000", "1000000000000", "103", "107", "11", "11/12", "11/2", "118", "12", "122", "129", "13", "13/2", "130", "132", "133", "14", "15", "15/2", "16", "17", "17/2", "18", "19", "2", "2/3", "2/5", "20", "200", "2000", "20000", "200000", "202", "21", "214", "216", "216000", "218", "22", "220", "222", "224", "226", "228", "23", "230", "232", "233", "234", "24", "240", "25", "26", "27", "28", "29", "3", "3/16", "3/2", "3/4", "3/5", "3/8", "30", "300", "3000", "30000", "300000", "31", "32", "33", "34", "35", "36", "37", "38", "39", "4", "4/5", "40", "400", "4000", "40000", "400000", "41", "42", "43", "432000", "44", "45", "46", "47", "48", "49", "5", "5/12", "5/2", "5/6", "5/8", "50", "500", "5000", "50000", "500000", "6", "60", "600", "6000", "60000", "600000", "7", "7/12", "7/2", "7/8", "70", "700", "7000", "70000", "700000", "8", "80", "800", "8000", "80000", "800000", "84", "9", "9/2", "90", "900", "9000", "90000", "900000", "91", "A", "ABOVE", "ABOVELEFT", "ABOVERIGHT", "AEGEANNUMBERS", "AGHB", "AHEX", "AHOM", "AI", "AIN", "AL", "ALAPH", "ALCHEMICAL", "ALCHEMICALSYMBOLS", "ALEF", "ALETTER", "ALNUM", "ALPHA", "ALPHABETIC", "ALPHABETICPF", "ALPHABETICPRESENTATIONFORMS", "ALPHANUMERIC", "AMBIGUOUS", "AN", "ANATOLIANHIEROGLYPHS", "ANCIENTGREEKMUSIC", "ANCIENTGREEKMUSICALNOTATION", "ANCIENTGREEKNUMBERS", "ANCIENTSYMBOLS", "ANY", "AR", "ARAB", "ARABIC", "ARABICEXTA", "ARABICEXTENDEDA", "ARABICLETTER", "ARABICMATH", "ARABICMATHEMATICALALPHABETICSYMBOLS", "ARABICNUMBER", "ARABICPFA", "ARABICPFB", "ARABICPRESENTATIONFORMSA", "ARABICPRESENTATIONFORMSB", "ARABICSUP", "ARABICSUPPLEMENT", "ARMENIAN", "ARMI", "ARMN", "ARROWS", "ASCII", "ASCIIHEXDIGIT", "ASSIGNED", "AT", "ATA", "ATAR", "ATB", "ATBL", "ATERM", "ATTACHEDABOVE", "ATTACHEDABOVERIGHT", "ATTACHEDBELOW", "ATTACHEDBELOWLEFT", "AVAGRAHA", "AVESTAN", "AVST", "B", "B2", "BA", "BALI", "BALINESE", "BAMU", "BAMUM", "BAMUMSUP", "BAMUMSUPPLEMENT", "BASICLATIN", "BASS", "BASSAVAH", "BATAK", "BATK", "BB", "BC", "BEH", "BELOW", "BELOWLEFT", "BELOWRIGHT", "BENG", "BENGALI", "BETH", "BIDIC", "BIDICLASS", "BIDICONTROL", "BIDIM", "BIDIMIRRORED", "BINDU", "BK", "BL", "BLANK", "BLK", "BLOCK", "BLOCKELEMENTS", "BN", "BOPO", "BOPOMOFO", "BOPOMOFOEXT", "BOPOMOFOEXTENDED", "BOTTOM", "BOTTOMANDRIGHT", "BOUNDARYNEUTRAL", "BOXDRAWING", "BR", "BRAH", "BRAHMI", "BRAHMIJOININGNUMBER", "BRAI", "BRAILLE", "BRAILLEPATTERNS", "BREAKAFTER", "BREAKBEFORE", "BREAKBOTH", "BREAKSYMBOLS", "BUGI", "BUGINESE", "BUHD", "BUHID", "BURUSHASKIYEHBARREE", "BYZANTINEMUSIC", "BYZANTINEMUSICALSYMBOLS", "C", "C&", "CAKM", "CAN", "CANADIANABORIGINAL", "CANADIANSYLLABICS", "CANONICAL", "CANONICALCOMBININGCLASS", "CANS", "CANTILLATIONMARK", "CARI", "CARIAN", "CARRIAGERETURN", "CASED", "CASEDLETTER", "CASEIGNORABLE", "CAUCASIANALBANIAN", "CB", "CC", "CCC", "CCC10", "CCC103", "CCC107", "CCC11", "CCC118", "CCC12", "CCC122", "CCC129", "CCC13", "CCC130", "CCC132", "CCC133", "CCC14", "CCC15", "CCC16", "CCC17", "CCC18", "CCC19", "CCC20", "CCC21", "CCC22", "CCC23", "CCC24", "CCC25", "CCC26", "CCC27", "CCC28", "CCC29", "CCC30", "CCC31", "CCC32", "CCC33", "CCC34", "CCC35", "CCC36", "CCC84", "CCC91", "CF", "CHAKMA", "CHAM", "CHANGESWHENCASEFOLDED", "CHANGESWHENCASEMAPPED", "CHANGESWHENLOWERCASED", "CHANGESWHENTITLECASED", "CHANGESWHENUPPERCASED", "CHER", "CHEROKEE", "CHEROKEESUP", "CHEROKEESUPPLEMENT", "CI", "CIRCLE", "CJ", "CJK", "CJKCOMPAT", "CJKCOMPATFORMS", "CJKCOMPATIBILITY", "CJKCOMPATIBILITYFORMS", "CJKCOMPATIBILITYIDEOGRAPHS", "CJKCOMPATIBILITYIDEOGRAPHSSUPPLEMENT", "CJKCOMPATIDEOGRAPHS", "CJKCOMPATIDEOGRAPHSSUP", "CJKEXTA", "CJKEXTB", "CJKEXTC", "CJKEXTD", "CJKEXTE", "CJKRADICALSSUP", "CJKRADICALSSUPPLEMENT", "CJKSTROKES", "CJKSYMBOLS", "CJKSYMBOLSANDPUNCTUATION", "CJKUNIFIEDIDEOGRAPHS", "CJKUNIFIEDIDEOGRAPHSEXTENSIONA", "CJKUNIFIEDIDEOGRAPHSEXTENSIONB", "CJKUNIFIEDIDEOGRAPHSEXTENSIONC", "CJKUNIFIEDIDEOGRAPHSEXTENSIOND", "CJKUNIFIEDIDEOGRAPHSEXTENSIONE", "CL", "CLOSE", "CLOSEPARENTHESIS", "CLOSEPUNCTUATION", "CM", "CN", "CNTRL", "CO", "COM", "COMBININGDIACRITICALMARKS", "COMBININGDIACRITICALMARKSEXTENDED", "COMBININGDIACRITICALMARKSFORSYMBOLS", "COMBININGDIACRITICALMARKSSUPPLEMENT", "COMBININGHALFMARKS", "COMBININGMARK", "COMBININGMARKSFORSYMBOLS", "COMMON", "COMMONINDICNUMBERFORMS", "COMMONSEPARATOR", "COMPAT", "COMPATJAMO", "COMPLEXCONTEXT", "CONDITIONALJAPANESESTARTER", "CONNECTORPUNCTUATION", "CONSONANT", "CONSONANTDEAD", "CONSONANTFINAL", "CONSONANTHEADLETTER", "CONSONANTKILLER", "CONSONANTMEDIAL", "CONSONANTPLACEHOLDER", "CONSONANTPRECEDINGREPHA", "CONSONANTPREFIXED", "CONSONANTSUBJOINED", "CONSONANTSUCCEEDINGREPHA", "CONSONANTWITHSTACKER", "CONTINGENTBREAK", "CONTROL", "CONTROLPICTURES", "COPT", "COPTIC", "COPTICEPACTNUMBERS", "COUNTINGROD", "COUNTINGRODNUMERALS", "CP", "CPRT", "CR", "CS", "CUNEIFORM", "CUNEIFORMNUMBERS", "CUNEIFORMNUMBERSANDPUNCTUATION", "CURRENCYSYMBOL", "CURRENCYSYMBOLS", "CWCF", "CWCM", "CWL", "CWT", "CWU", "CYPRIOT", "CYPRIOTSYLLABARY", "CYRILLIC", "CYRILLICEXTA", "CYRILLICEXTB", "CYRILLICEXTENDEDA", "CYRILLICEXTENDEDB", "CYRILLICSUP", "CYRILLICSUPPLEMENT", "CYRILLICSUPPLEMENTARY", "CYRL", "D", "DA", "DAL", "DALATHRISH", "DASH", "DASHPUNCTUATION", "DB", "DE", "DECIMAL", "DECIMALNUMBER", "DECOMPOSITIONTYPE", "DEFAULTIGNORABLECODEPOINT", "DEP", "DEPRECATED", "DESERET", "DEVA", "DEVANAGARI", "DEVANAGARIEXT", "DEVANAGARIEXTENDED", "DI", "DIA", "DIACRITIC", "DIACRITICALS", "DIACRITICALSEXT", "DIACRITICALSFORSYMBOLS", "DIACRITICALSSUP", "DIGIT", "DINGBATS", "DOMINO", "DOMINOTILES", "DOUBLEABOVE", "DOUBLEBELOW", "DOUBLEQUOTE", "DQ", "DSRT", "DT", "DUALJOINING", "DUPL", "DUPLOYAN", "E", "EA", "EARLYDYNASTICCUNEIFORM", "EASTASIANWIDTH", "EGYP", "EGYPTIANHIEROGLYPHS", "ELBA", "ELBASAN", "EMOTICONS", "EN", "ENC", "ENCLOSEDALPHANUM", "ENCLOSEDALPHANUMERICS", "ENCLOSEDALPHANUMERICSUPPLEMENT", "ENCLOSEDALPHANUMSUP", "ENCLOSEDCJK", "ENCLOSEDCJKLETTERSANDMONTHS", "ENCLOSEDIDEOGRAPHICSUP", "ENCLOSEDIDEOGRAPHICSUPPLEMENT", "ENCLOSINGMARK", "ES", "ET", "ETHI", "ETHIOPIC", "ETHIOPICEXT", "ETHIOPICEXTA", "ETHIOPICEXTENDED", "ETHIOPICEXTENDEDA", "ETHIOPICSUP", "ETHIOPICSUPPLEMENT", "EUROPEANNUMBER", "EUROPEANSEPARATOR", "EUROPEANTERMINATOR", "EX", "EXCLAMATION", "EXT", "EXTEND", "EXTENDER", "EXTENDNUMLET", "F", "FALSE", "FARSIYEH", "FE", "FEH", "FIN", "FINAL", "FINALPUNCTUATION", "FINALSEMKATH", "FIRSTSTRONGISOLATE", "FO", "FONT", "FORMAT", "FRA", "FRACTION", "FSI", "FULLWIDTH", "GAF", "GAMAL", "GC", "GCB", "GEMINATIONMARK", "GENERALCATEGORY", "GENERALPUNCTUATION", "GEOMETRICSHAPES", "GEOMETRICSHAPESEXT", "GEOMETRICSHAPESEXTENDED", "GEOR", "GEORGIAN", "GEORGIANSUP", "GEORGIANSUPPLEMENT", "GL", "GLAG", "GLAGOLITIC", "GLUE", "GOTH", "GOTHIC", "GRAN", "GRANTHA", "GRAPH", "GRAPHEMEBASE", "GRAPHEMECLUSTERBREAK", "GRAPHEMEEXTEND", "GRAPHEMELINK", "GRBASE", "GREEK", "GREEKANDCOPTIC", "GREEKEXT", "GREEKEXTENDED", "GREK", "GREXT", "GRLINK", "GUJARATI", "GUJR", "GURMUKHI", "GURU", "H", "H2", "H3", "HAH", "HALFANDFULLFORMS", "HALFMARKS", "HALFWIDTH", "HALFWIDTHANDFULLWIDTHFORMS", "HAMZAONHEHGOAL", "HAN", "HANG", "HANGUL", "HANGULCOMPATIBILITYJAMO", "HANGULJAMO", "HANGULJAMOEXTENDEDA", "HANGULJAMOEXTENDEDB", "HANGULSYLLABLES", "HANGULSYLLABLETYPE", "HANI", "HANO", "HANUNOO", "HATR", "HATRAN", "HE", "HEBR", "HEBREW", "HEBREWLETTER", "HEH", "HEHGOAL", "HETH", "HEX", "HEXDIGIT", "HIGHPRIVATEUSESURROGATES", "HIGHPUSURROGATES", "HIGHSURROGATES", "HIRA", "HIRAGANA", "HL", "HLUW", "HMNG", "HRKT", "HST", "HUNG", "HY", "HYPHEN", "ID", "IDC", "IDCONTINUE", "IDEO", "IDEOGRAPHIC", "IDEOGRAPHICDESCRIPTIONCHARACTERS", "IDS", "IDSB", "IDSBINARYOPERATOR", "IDST", "IDSTART", "IDSTRINARYOPERATOR", "IMPERIALARAMAIC", "IN", "INDICNUMBERFORMS", "INDICPOSITIONALCATEGORY", "INDICSYLLABICCATEGORY", "INFIXNUMERIC", "INHERITED", "INIT", "INITIAL", "INITIALPUNCTUATION", "INPC", "INSC", "INSCRIPTIONALPAHLAVI", "INSCRIPTIONALPARTHIAN", "INSEPARABLE", "INSEPERABLE", "INVISIBLESTACKER", "IOTASUBSCRIPT", "IPAEXT", "IPAEXTENSIONS", "IS", "ISO", "ISOLATED", "ITAL", "JAMO", "JAMOEXTA", "JAMOEXTB", "JAVA", "JAVANESE", "JG", "JL", "JOINC", "JOINCAUSING", "JOINCONTROL", "JOINER", "JOININGGROUP", "JOININGTYPE", "JT", "JV", "KA", "KAF", "KAITHI", "KALI", "KANA", "KANASUP", "KANASUPPLEMENT", "KANAVOICING", "KANBUN", "KANGXI", "KANGXIRADICALS", "KANNADA", "KAPH", "KATAKANA", "KATAKANAEXT", "KATAKANAORHIRAGANA", "KATAKANAPHONETICEXTENSIONS", "KAYAHLI", "KHAPH", "KHAR", "KHAROSHTHI", "KHMER", "KHMERSYMBOLS", "KHMR", "KHOJ", "KHOJKI", "KHUDAWADI", "KNDA", "KNOTTEDHEH", "KTHI", "KV", "L", "L&", "LAM", "LAMADH", "LANA", "LAO", "LAOO", "LATIN", "LATIN1", "LATIN1SUP", "LATIN1SUPPLEMENT", "LATINEXTA", "LATINEXTADDITIONAL", "LATINEXTB", "LATINEXTC", "LATINEXTD", "LATINEXTE", "LATINEXTENDEDA", "LATINEXTENDEDADDITIONAL", "LATINEXTENDEDB", "LATINEXTENDEDC", "LATINEXTENDEDD", "LATINEXTENDEDE", "LATN", "LB", "LC", "LE", "LEADINGJAMO", "LEFT", "LEFTANDRIGHT", "LEFTJOINING", "LEFTTORIGHT", "LEFTTORIGHTEMBEDDING", "LEFTTORIGHTISOLATE", "LEFTTORIGHTOVERRIDE", "LEPC", "LEPCHA", "LETTER", "LETTERLIKESYMBOLS", "LETTERNUMBER", "LF", "LIMB", "LIMBU", "LINA", "LINB", "LINEARA", "LINEARB", "LINEARBIDEOGRAMS", "LINEARBSYLLABARY", "LINEBREAK", "LINEFEED", "LINESEPARATOR", "LISU", "LL", "LM", "LO", "LOE", "LOGICALORDEREXCEPTION", "LOWER", "LOWERCASE", "LOWERCASELETTER", "LOWSURROGATES", "LRE", "LRI", "LRO", "LT", "LU", "LV", "LVSYLLABLE", "LVT", "LVTSYLLABLE", "LYCI", "LYCIAN", "LYDI", "LYDIAN", "M", "M&", "MAHAJANI", "MAHJ", "MAHJONG", "MAHJONGTILES", "MALAYALAM", "MAND", "MANDAIC", "MANDATORYBREAK", "MANI", "MANICHAEAN", "MANICHAEANALEPH", "MANICHAEANAYIN", "MANICHAEANBETH", "MANICHAEANDALETH", "MANICHAEANDHAMEDH", "MANICHAEANFIVE", "MANICHAEANGIMEL", "MANICHAEANHETH", "MANICHAEANHUNDRED", "MANICHAEANKAPH", "MANICHAEANLAMEDH", "MANICHAEANMEM", "MANICHAEANNUN", "MANICHAEANONE", "MANICHAEANPE", "MANICHAEANQOPH", "MANICHAEANRESH", "MANICHAEANSADHE", "MANICHAEANSAMEKH", "MANICHAEANTAW", "MANICHAEANTEN", "MANICHAEANTETH", "MANICHAEANTHAMEDH", "MANICHAEANTWENTY", "MANICHAEANWAW", "MANICHAEANYODH", "MANICHAEANZAYIN", "MARK", "MATH", "MATHALPHANUM", "MATHEMATICALALPHANUMERICSYMBOLS", "MATHEMATICALOPERATORS", "MATHOPERATORS", "MATHSYMBOL", "MB", "MC", "ME", "MED", "MEDIAL", "MEEM", "MEETEIMAYEK", "MEETEIMAYEKEXT", "MEETEIMAYEKEXTENSIONS", "MEND", "MENDEKIKAKUI", "MERC", "MERO", "MEROITICCURSIVE", "MEROITICHIEROGLYPHS", "MIAO", "MIDLETTER", "MIDNUM", "MIDNUMLET", "MIM", "MISCARROWS", "MISCELLANEOUSMATHEMATICALSYMBOLSA", "MISCELLANEOUSMATHEMATICALSYMBOLSB", "MISCELLANEOUSSYMBOLS", "MISCELLANEOUSSYMBOLSANDARROWS", "MISCELLANEOUSSYMBOLSANDPICTOGRAPHS", "MISCELLANEOUSTECHNICAL", "MISCMATHSYMBOLSA", "MISCMATHSYMBOLSB", "MISCPICTOGRAPHS", "MISCSYMBOLS", "MISCTECHNICAL", "ML", "MLYM", "MN", "MODI", "MODIFIERLETTER", "MODIFIERLETTERS", "MODIFIERSYMBOL", "MODIFIERTONELETTERS", "MODIFYINGLETTER", "MONG", "MONGOLIAN", "MRO", "MROO", "MTEI", "MULT", "MULTANI", "MUSIC", "MUSICALSYMBOLS", "MYANMAR", "MYANMAREXTA", "MYANMAREXTB", "MYANMAREXTENDEDA", "MYANMAREXTENDEDB", "MYMR", "N", "N&", "NA", "NABATAEAN", "NAN", "NAR", "NARB", "NARROW", "NB", "NBAT", "NCHAR", "ND", "NEUTRAL", "NEWLINE", "NEWTAILUE", "NEXTLINE", "NK", "NKO", "NKOO", "NL", "NO", "NOBLOCK", "NOBREAK", "NOJOININGGROUP", "NONCHARACTERCODEPOINT", "NONE", "NONJOINER", "NONJOINING", "NONSPACINGMARK", "NONSTARTER", "NOON", "NOTAPPLICABLE", "NOTREORDERED", "NR", "NS", "NSM", "NT", "NU", "NUKTA", "NUMBER", "NUMBERFORMS", "NUMBERJOINER", "NUMERIC", "NUMERICTYPE", "NUMERICVALUE", "NUN", "NV", "NYA", "OALPHA", "OCR", "ODI", "OGAM", "OGHAM", "OGREXT", "OIDC", "OIDS", "OLCHIKI", "OLCK", "OLDHUNGARIAN", "OLDITALIC", "OLDNORTHARABIAN", "OLDPERMIC", "OLDPERSIAN", "OLDSOUTHARABIAN", "OLDTURKIC", "OLETTER", "OLOWER", "OMATH", "ON", "OP", "OPENPUNCTUATION", "OPTICALCHARACTERRECOGNITION", "ORIYA", "ORKH", "ORNAMENTALDINGBATS", "ORYA", "OSMA", "OSMANYA", "OTHER", "OTHERALPHABETIC", "OTHERDEFAULTIGNORABLECODEPOINT", "OTHERGRAPHEMEEXTEND", "OTHERIDCONTINUE", "OTHERIDSTART", "OTHERLETTER", "OTHERLOWERCASE", "OTHERMATH", "OTHERNEUTRAL", "OTHERNUMBER", "OTHERPUNCTUATION", "OTHERSYMBOL", "OTHERUPPERCASE", "OUPPER", "OV", "OVERLAY", "OVERSTRUCK", "P", "P&", "PAHAWHHMONG", "PALM", "PALMYRENE", "PARAGRAPHSEPARATOR", "PATSYN", "PATTERNSYNTAX", "PATTERNWHITESPACE", "PATWS", "PAUC", "PAUCINHAU", "PC", "PD", "PDF", "PDI", "PE", "PERM", "PF", "PHAG", "PHAGSPA", "PHAISTOS", "PHAISTOSDISC", "PHLI", "PHLP", "PHNX", "PHOENICIAN", "PHONETICEXT", "PHONETICEXTENSIONS", "PHONETICEXTENSIONSSUPPLEMENT", "PHONETICEXTSUP", "PI", "PLAYINGCARDS", "PLRD", "PO", "POPDIRECTIONALFORMAT", "POPDIRECTIONALISOLATE", "POSIXALNUM", "POSIXDIGIT", "POSIXPUNCT", "POSIXXDIGIT", "POSTFIXNUMERIC", "PP", "PR", "PREFIXNUMERIC", "PREPEND", "PRINT", "PRIVATEUSE", "PRIVATEUSEAREA", "PRTI", "PS", "PSALTERPAHLAVI", "PUA", "PUNCT", "PUNCTUATION", "PUREKILLER", "QAAC", "QAAI", "QAF", "QAPH", "QMARK", "QU", "QUOTATION", "QUOTATIONMARK", "R", "RADICAL", "REGIONALINDICATOR", "REGISTERSHIFTER", "REH", "REJANG", "REVERSEDPE", "RI", "RIGHT", "RIGHTJOINING", "RIGHTTOLEFT", "RIGHTTOLEFTEMBEDDING", "RIGHTTOLEFTISOLATE", "RIGHTTOLEFTOVERRIDE", "RJNG", "RLE", "RLI", "RLO", "ROHINGYAYEH", "RUMI", "RUMINUMERALSYMBOLS", "RUNIC", "RUNR", "S", "S&", "SA", "SAD", "SADHE", "SAMARITAN", "SAMR", "SARB", "SAUR", "SAURASHTRA", "SB", "SC", "SCONTINUE", "SCRIPT", "SD", "SE", "SEEN", "SEGMENTSEPARATOR", "SEMKATH", "SENTENCEBREAK", "SEP", "SEPARATOR", "SG", "SGNW", "SHARADA", "SHAVIAN", "SHAW", "SHIN", "SHORTHANDFORMATCONTROLS", "SHRD", "SIDD", "SIDDHAM", "SIGNWRITING", "SIND", "SINGLEQUOTE", "SINH", "SINHALA", "SINHALAARCHAICNUMBERS", "SK", "SM", "SMALL", "SMALLFORMS", "SMALLFORMVARIANTS", "SML", "SO", "SOFTDOTTED", "SORA", "SORASOMPENG", "SP", "SPACE", "SPACESEPARATOR", "SPACINGMARK", "SPACINGMODIFIERLETTERS", "SPECIALS", "SQ", "SQR", "SQUARE", "ST", "STERM", "STRAIGHTWAW", "SUB", "SUND", "SUNDANESE", "SUNDANESESUP", "SUNDANESESUPPLEMENT", "SUP", "SUPARROWSA", "SUPARROWSB", "SUPARROWSC", "SUPER", "SUPERANDSUB", "SUPERSCRIPTSANDSUBSCRIPTS", "SUPMATHOPERATORS", "SUPPLEMENTALARROWSA", "SUPPLEMENTALARROWSB", "SUPPLEMENTALARROWSC", "SUPPLEMENTALMATHEMATICALOPERATORS", "SUPPLEMENTALPUNCTUATION", "SUPPLEMENTALSYMBOLSANDPICTOGRAPHS", "SUPPLEMENTARYPRIVATEUSEAREAA", "SUPPLEMENTARYPRIVATEUSEAREAB", "SUPPUAA", "SUPPUAB", "SUPPUNCTUATION", "SUPSYMBOLSANDPICTOGRAPHS", "SURROGATE", "SUTTONSIGNWRITING", "SWASHKAF", "SY", "SYLLABLEMODIFIER", "SYLO", "SYLOTINAGRI", "SYMBOL", "SYRC", "SYRIAC", "SYRIACWAW", "T", "TAGALOG", "TAGB", "TAGBANWA", "TAGS", "TAH", "TAILE", "TAITHAM", "TAIVIET", "TAIXUANJING", "TAIXUANJINGSYMBOLS", "TAKR", "TAKRI", "TALE", "TALU", "TAMIL", "TAML", "TAVT", "TAW", "TEHMARBUTA", "TEHMARBUTAGOAL", "TELU", "TELUGU", "TERM", "TERMINALPUNCTUATION", "TETH", "TFNG", "TGLG", "THAA", "THAANA", "THAI", "TIBETAN", "TIBT", "TIFINAGH", "TIRH", "TIRHUTA", "TITLECASELETTER", "TONELETTER", "TONEMARK", "TOP", "TOPANDBOTTOM", "TOPANDBOTTOMANDRIGHT", "TOPANDLEFT", "TOPANDLEFTANDRIGHT", "TOPANDRIGHT", "TRAILINGJAMO", "TRANSPARENT", "TRANSPORTANDMAP", "TRANSPORTANDMAPSYMBOLS", "TRUE", "U", "UCAS", "UCASEXT", "UGAR", "UGARITIC", "UIDEO", "UNASSIGNED", "UNIFIEDCANADIANABORIGINALSYLLABICS", "UNIFIEDCANADIANABORIGINALSYLLABICSEXTENDED", "UNIFIEDIDEOGRAPH", "UNKNOWN", "UP", "UPPER", "UPPERCASE", "UPPERCASELETTER", "V", "VAI", "VAII", "VARIATIONSELECTOR", "VARIATIONSELECTORS", "VARIATIONSELECTORSSUPPLEMENT", "VEDICEXT", "VEDICEXTENSIONS", "VERT", "VERTICAL", "VERTICALFORMS", "VIRAMA", "VISARGA", "VISUALORDERLEFT", "VOWEL", "VOWELDEPENDENT", "VOWELINDEPENDENT", "VOWELJAMO", "VR", "VS", "VSSUP", "W", "WARA", "WARANGCITI", "WAW", "WB", "WHITESPACE", "WIDE", "WJ", "WORD", "WORDBREAK", "WORDJOINER", "WS", "WSPACE", "XDIGIT", "XIDC", "XIDCONTINUE", "XIDS", "XIDSTART", "XPEO", "XSUX", "XX", "Y", "YEH", "YEHBARREE", "YEHWITHTAIL", "YES", "YI", "YIII", "YIJING", "YIJINGHEXAGRAMSYMBOLS", "YIRADICALS", "YISYLLABLES", "YUDH", "YUDHHE", "Z", "Z&", "ZAIN", "ZHAIN", "ZINH", "ZL", "ZP", "ZS", "ZW", "ZWSPACE", "ZYYY", "ZZZZ", }; /* strings: 12240 bytes. */ /* properties. */ RE_Property re_properties[] = { { 547, 0, 0}, { 544, 0, 0}, { 252, 1, 1}, { 251, 1, 1}, {1081, 2, 2}, {1079, 2, 2}, {1259, 3, 3}, {1254, 3, 3}, { 566, 4, 4}, { 545, 4, 4}, {1087, 5, 5}, {1078, 5, 5}, { 823, 6, 6}, { 172, 7, 6}, { 171, 7, 6}, { 767, 8, 6}, { 766, 8, 6}, {1227, 9, 6}, {1226, 9, 6}, { 294, 10, 6}, { 296, 11, 6}, { 350, 11, 6}, { 343, 12, 6}, { 433, 12, 6}, { 345, 13, 6}, { 435, 13, 6}, { 344, 14, 6}, { 434, 14, 6}, { 341, 15, 6}, { 431, 15, 6}, { 342, 16, 6}, { 432, 16, 6}, { 636, 17, 6}, { 632, 17, 6}, { 628, 18, 6}, { 627, 18, 6}, {1267, 19, 6}, {1266, 19, 6}, {1265, 20, 6}, {1264, 20, 6}, { 458, 21, 6}, { 466, 21, 6}, { 567, 22, 6}, { 575, 22, 6}, { 565, 23, 6}, { 569, 23, 6}, { 568, 24, 6}, { 576, 24, 6}, {1255, 25, 6}, {1262, 25, 6}, {1117, 25, 6}, { 244, 26, 6}, { 242, 26, 6}, { 671, 27, 6}, { 669, 27, 6}, { 451, 28, 6}, { 625, 29, 6}, {1044, 30, 6}, {1041, 30, 6}, {1188, 31, 6}, {1187, 31, 6}, { 971, 32, 6}, { 952, 32, 6}, { 612, 33, 6}, { 611, 33, 6}, { 204, 34, 6}, { 160, 34, 6}, { 964, 35, 6}, { 933, 35, 6}, { 630, 36, 6}, { 629, 36, 6}, { 468, 37, 6}, { 467, 37, 6}, { 523, 38, 6}, { 521, 38, 6}, { 970, 39, 6}, { 951, 39, 6}, { 976, 40, 6}, { 977, 40, 6}, { 909, 41, 6}, { 895, 41, 6}, { 966, 42, 6}, { 938, 42, 6}, { 634, 43, 6}, { 633, 43, 6}, { 637, 44, 6}, { 635, 44, 6}, {1046, 45, 6}, {1223, 46, 6}, {1219, 46, 6}, { 965, 47, 6}, { 935, 47, 6}, { 460, 48, 6}, { 459, 48, 6}, {1113, 49, 6}, {1082, 49, 6}, { 765, 50, 6}, { 764, 50, 6}, { 968, 51, 6}, { 940, 51, 6}, { 967, 52, 6}, { 939, 52, 6}, {1126, 53, 6}, {1232, 54, 6}, {1248, 54, 6}, { 989, 55, 6}, { 990, 55, 6}, { 988, 56, 6}, { 987, 56, 6}, { 598, 57, 7}, { 622, 57, 7}, { 243, 58, 8}, { 234, 58, 8}, { 288, 59, 9}, { 300, 59, 9}, { 457, 60, 10}, { 482, 60, 10}, { 489, 61, 11}, { 487, 61, 11}, { 673, 62, 12}, { 667, 62, 12}, { 674, 63, 13}, { 675, 63, 13}, { 757, 64, 14}, { 732, 64, 14}, { 928, 65, 15}, { 921, 65, 15}, { 929, 66, 16}, { 931, 66, 16}, { 246, 67, 6}, { 245, 67, 6}, { 641, 68, 17}, { 648, 68, 17}, { 642, 69, 18}, { 649, 69, 18}, { 175, 70, 6}, { 170, 70, 6}, { 183, 71, 6}, { 250, 72, 6}, { 564, 73, 6}, {1027, 74, 6}, {1258, 75, 6}, {1263, 76, 6}, {1019, 77, 6}, {1018, 78, 6}, {1020, 79, 6}, {1021, 80, 6}, }; /* properties: 588 bytes. */ /* property values. */ RE_PropertyValue re_property_values[] = { {1220, 0, 0}, { 383, 0, 0}, {1228, 0, 1}, { 774, 0, 1}, { 768, 0, 2}, { 761, 0, 2}, {1200, 0, 3}, { 773, 0, 3}, { 865, 0, 4}, { 762, 0, 4}, { 969, 0, 5}, { 763, 0, 5}, { 913, 0, 6}, { 863, 0, 6}, { 505, 0, 7}, { 831, 0, 7}, {1119, 0, 8}, { 830, 0, 8}, { 456, 0, 9}, { 896, 0, 9}, { 473, 0, 9}, { 747, 0, 10}, { 904, 0, 10}, { 973, 0, 11}, { 905, 0, 11}, {1118, 0, 12}, {1291, 0, 12}, { 759, 0, 13}, {1289, 0, 13}, { 986, 0, 14}, {1290, 0, 14}, { 415, 0, 15}, { 299, 0, 15}, { 384, 0, 15}, { 537, 0, 16}, { 338, 0, 16}, {1028, 0, 17}, { 385, 0, 17}, {1153, 0, 18}, { 425, 0, 18}, { 452, 0, 19}, { 994, 0, 19}, { 955, 0, 20}, {1031, 0, 20}, { 381, 0, 21}, { 997, 0, 21}, { 401, 0, 22}, { 993, 0, 22}, { 974, 0, 23}, {1015, 0, 23}, { 828, 0, 24}, {1107, 0, 24}, { 429, 0, 25}, {1079, 0, 25}, { 867, 0, 26}, {1106, 0, 26}, { 975, 0, 27}, {1112, 0, 27}, { 647, 0, 28}, {1012, 0, 28}, { 532, 0, 29}, { 999, 0, 29}, { 963, 0, 30}, { 281, 0, 30}, { 282, 0, 30}, { 745, 0, 31}, { 708, 0, 31}, { 709, 0, 31}, { 822, 0, 32}, { 783, 0, 32}, { 392, 0, 32}, { 784, 0, 32}, { 924, 0, 33}, { 885, 0, 33}, { 886, 0, 33}, {1035, 0, 34}, { 981, 0, 34}, {1034, 0, 34}, { 982, 0, 34}, {1160, 0, 35}, {1068, 0, 35}, {1069, 0, 35}, {1089, 0, 36}, {1284, 0, 36}, {1285, 0, 36}, { 295, 0, 37}, { 733, 0, 37}, { 205, 0, 38}, { 906, 1, 0}, { 893, 1, 0}, { 228, 1, 1}, { 203, 1, 1}, { 718, 1, 2}, { 717, 1, 2}, { 716, 1, 2}, { 725, 1, 3}, { 719, 1, 3}, { 727, 1, 4}, { 721, 1, 4}, { 657, 1, 5}, { 656, 1, 5}, {1120, 1, 6}, { 866, 1, 6}, { 387, 1, 7}, { 469, 1, 7}, { 571, 1, 8}, { 570, 1, 8}, { 438, 1, 9}, { 444, 1, 10}, { 443, 1, 10}, { 445, 1, 10}, { 199, 1, 11}, { 606, 1, 12}, { 186, 1, 13}, {1162, 1, 14}, { 198, 1, 15}, { 197, 1, 15}, {1193, 1, 16}, { 902, 1, 17}, {1073, 1, 18}, { 791, 1, 19}, { 188, 1, 20}, { 187, 1, 20}, { 463, 1, 21}, { 240, 1, 22}, { 579, 1, 23}, { 577, 1, 24}, { 957, 1, 25}, {1179, 1, 26}, {1186, 1, 27}, { 688, 1, 28}, { 789, 1, 29}, {1104, 1, 30}, {1194, 1, 31}, { 713, 1, 32}, {1195, 1, 33}, { 879, 1, 34}, { 553, 1, 35}, { 594, 1, 36}, { 662, 1, 36}, { 509, 1, 37}, { 515, 1, 38}, { 514, 1, 38}, { 347, 1, 39}, {1221, 1, 40}, {1215, 1, 40}, { 286, 1, 40}, { 937, 1, 41}, {1066, 1, 42}, {1165, 1, 43}, { 601, 1, 44}, { 277, 1, 45}, {1167, 1, 46}, { 698, 1, 47}, { 871, 1, 48}, {1222, 1, 49}, {1216, 1, 49}, { 750, 1, 50}, {1170, 1, 51}, { 899, 1, 52}, { 699, 1, 53}, { 275, 1, 54}, {1171, 1, 55}, { 388, 1, 56}, { 470, 1, 56}, { 223, 1, 57}, {1130, 1, 58}, { 231, 1, 59}, { 744, 1, 60}, { 941, 1, 61}, {1132, 1, 62}, {1131, 1, 62}, {1236, 1, 63}, {1235, 1, 63}, {1009, 1, 64}, {1008, 1, 64}, {1010, 1, 65}, {1011, 1, 65}, { 390, 1, 66}, { 472, 1, 66}, { 726, 1, 67}, { 720, 1, 67}, { 573, 1, 68}, { 572, 1, 68}, { 548, 1, 69}, {1035, 1, 69}, {1139, 1, 70}, {1138, 1, 70}, { 430, 1, 71}, { 389, 1, 72}, { 471, 1, 72}, { 393, 1, 72}, { 746, 1, 73}, { 925, 1, 74}, { 202, 1, 75}, { 826, 1, 76}, { 827, 1, 76}, { 855, 1, 77}, { 860, 1, 77}, { 416, 1, 78}, { 956, 1, 79}, { 934, 1, 79}, { 498, 1, 80}, { 497, 1, 80}, { 262, 1, 81}, { 253, 1, 82}, { 549, 1, 83}, { 852, 1, 84}, { 859, 1, 84}, { 474, 1, 85}, { 850, 1, 86}, { 856, 1, 86}, {1141, 1, 87}, {1134, 1, 87}, { 269, 1, 88}, { 268, 1, 88}, {1142, 1, 89}, {1135, 1, 89}, { 851, 1, 90}, { 857, 1, 90}, {1144, 1, 91}, {1140, 1, 91}, { 853, 1, 92}, { 849, 1, 92}, { 558, 1, 93}, { 728, 1, 94}, { 722, 1, 94}, { 418, 1, 95}, { 555, 1, 96}, { 554, 1, 96}, {1197, 1, 97}, { 512, 1, 98}, { 510, 1, 98}, { 441, 1, 99}, { 439, 1, 99}, {1145, 1, 100}, {1151, 1, 100}, { 368, 1, 101}, { 367, 1, 101}, { 687, 1, 102}, { 686, 1, 102}, { 631, 1, 103}, { 627, 1, 103}, { 371, 1, 104}, { 370, 1, 104}, { 617, 1, 105}, { 690, 1, 106}, { 256, 1, 107}, { 593, 1, 108}, { 398, 1, 108}, { 685, 1, 109}, { 258, 1, 110}, { 257, 1, 110}, { 369, 1, 111}, { 693, 1, 112}, { 691, 1, 112}, { 502, 1, 113}, { 501, 1, 113}, { 356, 1, 114}, { 354, 1, 114}, { 373, 1, 115}, { 362, 1, 115}, {1279, 1, 116}, {1278, 1, 116}, { 372, 1, 117}, { 353, 1, 117}, {1281, 1, 118}, {1280, 1, 119}, { 760, 1, 120}, {1230, 1, 121}, { 442, 1, 122}, { 440, 1, 122}, { 225, 1, 123}, { 868, 1, 124}, { 729, 1, 125}, { 723, 1, 125}, {1159, 1, 126}, { 395, 1, 127}, { 640, 1, 127}, {1001, 1, 128}, {1077, 1, 129}, { 465, 1, 130}, { 464, 1, 130}, { 694, 1, 131}, {1050, 1, 132}, { 595, 1, 133}, { 663, 1, 133}, { 666, 1, 134}, { 883, 1, 135}, { 881, 1, 135}, { 340, 1, 136}, { 882, 1, 137}, { 880, 1, 137}, {1172, 1, 138}, { 837, 1, 139}, { 836, 1, 139}, { 513, 1, 140}, { 511, 1, 140}, { 730, 1, 141}, { 724, 1, 141}, { 349, 1, 142}, { 348, 1, 142}, { 835, 1, 143}, { 597, 1, 144}, { 592, 1, 144}, { 596, 1, 145}, { 664, 1, 145}, { 615, 1, 146}, { 613, 1, 147}, { 614, 1, 147}, { 769, 1, 148}, {1029, 1, 149}, {1033, 1, 149}, {1028, 1, 149}, { 358, 1, 150}, { 360, 1, 150}, { 174, 1, 151}, { 173, 1, 151}, { 195, 1, 152}, { 193, 1, 152}, {1233, 1, 153}, {1248, 1, 153}, {1239, 1, 154}, { 391, 1, 155}, { 586, 1, 155}, { 357, 1, 156}, { 355, 1, 156}, {1110, 1, 157}, {1109, 1, 157}, { 196, 1, 158}, { 194, 1, 158}, { 588, 1, 159}, { 585, 1, 159}, {1121, 1, 160}, { 756, 1, 161}, { 755, 1, 162}, { 158, 1, 163}, { 181, 1, 164}, { 182, 1, 165}, {1003, 1, 166}, {1002, 1, 166}, { 780, 1, 167}, { 292, 1, 168}, { 419, 1, 169}, { 944, 1, 170}, { 561, 1, 171}, { 946, 1, 172}, {1218, 1, 173}, { 947, 1, 174}, { 461, 1, 175}, {1093, 1, 176}, { 962, 1, 177}, { 493, 1, 178}, { 297, 1, 179}, { 753, 1, 180}, { 437, 1, 181}, { 638, 1, 182}, { 985, 1, 183}, { 888, 1, 184}, { 603, 1, 185}, {1007, 1, 186}, { 782, 1, 187}, { 843, 1, 188}, { 842, 1, 189}, { 697, 1, 190}, { 948, 1, 191}, { 945, 1, 192}, { 794, 1, 193}, { 217, 1, 194}, { 651, 1, 195}, { 650, 1, 196}, {1032, 1, 197}, { 949, 1, 198}, { 943, 1, 199}, {1065, 1, 200}, {1064, 1, 200}, { 265, 1, 201}, { 679, 1, 202}, {1115, 1, 203}, { 339, 1, 204}, { 785, 1, 205}, {1092, 1, 206}, {1105, 1, 207}, { 702, 1, 208}, { 876, 1, 209}, { 703, 1, 210}, { 563, 1, 211}, {1199, 1, 212}, {1099, 1, 213}, { 864, 1, 214}, {1176, 1, 215}, { 161, 1, 216}, {1252, 1, 217}, { 992, 1, 218}, { 426, 1, 219}, { 428, 1, 220}, { 427, 1, 220}, { 488, 1, 221}, { 491, 1, 222}, { 178, 1, 223}, { 227, 1, 224}, { 226, 1, 224}, { 872, 1, 225}, { 230, 1, 226}, { 983, 1, 227}, { 844, 1, 228}, { 683, 1, 229}, { 682, 1, 229}, { 485, 1, 230}, {1096, 1, 231}, { 280, 1, 232}, { 279, 1, 232}, { 878, 1, 233}, { 877, 1, 233}, { 180, 1, 234}, { 179, 1, 234}, {1174, 1, 235}, {1173, 1, 235}, { 421, 1, 236}, { 420, 1, 236}, { 825, 1, 237}, { 824, 1, 237}, {1154, 1, 238}, { 839, 1, 239}, { 191, 1, 240}, { 190, 1, 240}, { 788, 1, 241}, { 787, 1, 241}, { 476, 1, 242}, { 475, 1, 242}, {1013, 1, 243}, { 499, 1, 244}, { 500, 1, 244}, { 504, 1, 245}, { 503, 1, 245}, { 854, 1, 246}, { 858, 1, 246}, { 494, 1, 247}, { 959, 1, 248}, {1212, 1, 249}, {1211, 1, 249}, { 167, 1, 250}, { 166, 1, 250}, { 551, 1, 251}, { 550, 1, 251}, {1143, 1, 252}, {1136, 1, 252}, {1146, 1, 253}, {1152, 1, 253}, { 374, 1, 254}, { 363, 1, 254}, { 375, 1, 255}, { 364, 1, 255}, { 376, 1, 256}, { 365, 1, 256}, { 377, 1, 257}, { 366, 1, 257}, { 359, 1, 258}, { 361, 1, 258}, {1168, 1, 259}, {1234, 1, 260}, {1249, 1, 260}, {1147, 1, 261}, {1149, 1, 261}, {1148, 1, 262}, {1150, 1, 262}, {1224, 2, 0}, {1295, 2, 0}, { 394, 2, 1}, {1294, 2, 1}, { 715, 2, 2}, { 731, 2, 2}, { 570, 2, 3}, { 574, 2, 3}, { 438, 2, 4}, { 446, 2, 4}, { 199, 2, 5}, { 201, 2, 5}, { 606, 2, 6}, { 605, 2, 6}, { 186, 2, 7}, { 185, 2, 7}, {1162, 2, 8}, {1161, 2, 8}, {1193, 2, 9}, {1192, 2, 9}, { 463, 2, 10}, { 462, 2, 10}, { 240, 2, 11}, { 239, 2, 11}, { 579, 2, 12}, { 580, 2, 12}, { 577, 2, 13}, { 578, 2, 13}, { 957, 2, 14}, { 960, 2, 14}, {1179, 2, 15}, {1180, 2, 15}, {1186, 2, 16}, {1185, 2, 16}, { 688, 2, 17}, { 704, 2, 17}, { 789, 2, 18}, { 862, 2, 18}, {1104, 2, 19}, {1103, 2, 19}, {1194, 2, 20}, { 713, 2, 21}, { 714, 2, 21}, {1195, 2, 22}, {1196, 2, 22}, { 879, 2, 23}, { 884, 2, 23}, { 553, 2, 24}, { 552, 2, 24}, { 592, 2, 25}, { 591, 2, 25}, { 509, 2, 26}, { 508, 2, 26}, { 347, 2, 27}, { 346, 2, 27}, { 285, 2, 28}, { 289, 2, 28}, { 937, 2, 29}, { 936, 2, 29}, {1066, 2, 30}, {1067, 2, 30}, { 698, 2, 31}, { 700, 2, 31}, { 871, 2, 32}, { 870, 2, 32}, { 617, 2, 33}, { 616, 2, 33}, { 690, 2, 34}, { 681, 2, 34}, { 256, 2, 35}, { 255, 2, 35}, { 590, 2, 36}, { 599, 2, 36}, {1276, 2, 37}, {1277, 2, 37}, { 944, 2, 38}, { 661, 2, 38}, { 561, 2, 39}, { 560, 2, 39}, { 461, 2, 40}, { 481, 2, 40}, { 644, 2, 41}, {1288, 2, 41}, {1038, 2, 41}, {1165, 2, 42}, {1191, 2, 42}, { 601, 2, 43}, { 600, 2, 43}, { 277, 2, 44}, { 276, 2, 44}, {1167, 2, 45}, {1166, 2, 45}, { 750, 2, 46}, { 749, 2, 46}, {1170, 2, 47}, {1177, 2, 47}, { 754, 2, 48}, { 752, 2, 48}, {1218, 2, 49}, {1217, 2, 49}, {1093, 2, 50}, {1094, 2, 50}, { 962, 2, 51}, { 961, 2, 51}, { 436, 2, 52}, { 423, 2, 52}, { 268, 2, 53}, { 267, 2, 53}, { 275, 2, 54}, { 274, 2, 54}, { 418, 2, 55}, { 417, 2, 55}, {1037, 2, 55}, { 899, 2, 56}, {1178, 2, 56}, { 558, 2, 57}, { 557, 2, 57}, {1197, 2, 58}, {1190, 2, 58}, {1159, 2, 59}, {1158, 2, 59}, { 947, 2, 60}, {1268, 2, 60}, { 697, 2, 61}, { 696, 2, 61}, { 223, 2, 62}, { 222, 2, 62}, { 426, 2, 63}, {1269, 2, 63}, {1007, 2, 64}, {1006, 2, 64}, {1001, 2, 65}, {1000, 2, 65}, { 902, 2, 66}, { 903, 2, 66}, {1130, 2, 67}, {1129, 2, 67}, { 744, 2, 68}, { 743, 2, 68}, { 941, 2, 69}, { 942, 2, 69}, {1230, 2, 70}, {1231, 2, 70}, {1077, 2, 71}, {1076, 2, 71}, { 694, 2, 72}, { 680, 2, 72}, {1050, 2, 73}, {1059, 2, 73}, { 780, 2, 74}, { 779, 2, 74}, { 292, 2, 75}, { 291, 2, 75}, { 782, 2, 76}, { 781, 2, 76}, { 340, 2, 77}, {1171, 2, 78}, { 712, 2, 78}, {1172, 2, 79}, {1181, 2, 79}, { 217, 2, 80}, { 218, 2, 80}, { 491, 2, 81}, { 490, 2, 81}, {1073, 2, 82}, {1074, 2, 82}, { 760, 2, 83}, { 225, 2, 84}, { 224, 2, 84}, { 666, 2, 85}, { 665, 2, 85}, { 835, 2, 86}, { 874, 2, 86}, { 638, 2, 87}, { 200, 2, 87}, { 948, 2, 88}, {1075, 2, 88}, { 651, 2, 89}, {1030, 2, 89}, { 650, 2, 90}, {1004, 2, 90}, { 949, 2, 91}, { 958, 2, 91}, { 679, 2, 92}, { 706, 2, 92}, { 231, 2, 93}, { 232, 2, 93}, { 265, 2, 94}, { 264, 2, 94}, { 791, 2, 95}, { 790, 2, 95}, { 339, 2, 96}, { 283, 2, 96}, { 842, 2, 97}, { 840, 2, 97}, { 843, 2, 98}, { 841, 2, 98}, { 844, 2, 99}, {1014, 2, 99}, {1092, 2, 100}, {1097, 2, 100}, {1115, 2, 101}, {1114, 2, 101}, {1176, 2, 102}, {1175, 2, 102}, { 297, 2, 103}, { 159, 2, 103}, { 230, 2, 104}, { 229, 2, 104}, { 485, 2, 105}, { 484, 2, 105}, { 493, 2, 106}, { 492, 2, 106}, { 563, 2, 107}, { 562, 2, 107}, { 983, 2, 108}, { 620, 2, 108}, { 702, 2, 109}, { 701, 2, 109}, { 753, 2, 110}, { 751, 2, 110}, { 785, 2, 111}, { 786, 2, 111}, { 794, 2, 112}, { 793, 2, 112}, { 839, 2, 113}, { 838, 2, 113}, { 864, 2, 114}, { 872, 2, 115}, { 873, 2, 115}, { 945, 2, 116}, { 891, 2, 116}, { 888, 2, 117}, { 894, 2, 117}, { 985, 2, 118}, { 984, 2, 118}, { 992, 2, 119}, { 991, 2, 119}, { 946, 2, 120}, { 998, 2, 120}, {1032, 2, 121}, {1005, 2, 121}, {1099, 2, 122}, {1098, 2, 122}, { 703, 2, 123}, {1101, 2, 123}, {1199, 2, 124}, {1198, 2, 124}, {1252, 2, 125}, {1251, 2, 125}, { 161, 2, 126}, { 178, 2, 127}, { 619, 2, 127}, { 603, 2, 128}, { 602, 2, 128}, { 876, 2, 129}, { 875, 2, 129}, { 943, 2, 130}, { 623, 2, 130}, {1100, 2, 131}, {1091, 2, 131}, { 692, 2, 132}, { 621, 2, 132}, { 963, 3, 0}, {1270, 3, 0}, { 479, 3, 1}, { 480, 3, 1}, {1102, 3, 2}, {1122, 3, 2}, { 607, 3, 3}, { 618, 3, 3}, { 424, 3, 4}, { 748, 3, 5}, { 898, 3, 6}, { 904, 3, 6}, { 522, 3, 7}, {1047, 3, 8}, {1052, 3, 8}, { 537, 3, 9}, { 535, 3, 9}, { 690, 3, 10}, { 677, 3, 10}, { 169, 3, 11}, { 734, 3, 11}, { 845, 3, 12}, { 861, 3, 12}, { 846, 3, 13}, { 863, 3, 13}, { 847, 3, 14}, { 829, 3, 14}, { 927, 3, 15}, { 922, 3, 15}, { 524, 3, 16}, { 519, 3, 16}, { 963, 4, 0}, {1270, 4, 0}, { 424, 4, 1}, { 748, 4, 2}, { 415, 4, 3}, { 383, 4, 3}, { 522, 4, 4}, { 519, 4, 4}, {1047, 4, 5}, {1052, 4, 5}, {1119, 4, 6}, {1107, 4, 6}, { 708, 4, 7}, {1229, 4, 8}, {1164, 4, 9}, { 775, 4, 10}, { 777, 4, 11}, {1026, 4, 12}, {1023, 4, 12}, { 963, 5, 0}, {1270, 5, 0}, { 424, 5, 1}, { 748, 5, 2}, { 522, 5, 3}, { 519, 5, 3}, {1088, 5, 4}, {1083, 5, 4}, { 537, 5, 5}, { 535, 5, 5}, {1116, 5, 6}, { 766, 5, 7}, { 763, 5, 7}, {1226, 5, 8}, {1225, 5, 8}, { 950, 5, 9}, { 734, 5, 9}, { 927, 5, 10}, { 922, 5, 10}, { 211, 5, 11}, { 206, 5, 11}, {1126, 5, 12}, {1125, 5, 12}, { 379, 5, 13}, { 378, 5, 13}, {1080, 5, 14}, {1079, 5, 14}, { 905, 6, 0}, { 885, 6, 0}, { 525, 6, 0}, { 526, 6, 0}, {1275, 6, 1}, {1271, 6, 1}, {1164, 6, 1}, {1213, 6, 1}, { 916, 7, 0}, { 887, 7, 0}, { 735, 7, 1}, { 708, 7, 1}, {1246, 7, 2}, {1229, 7, 2}, {1209, 7, 3}, {1164, 7, 3}, { 776, 7, 4}, { 775, 7, 4}, { 778, 7, 5}, { 777, 7, 5}, { 739, 8, 0}, { 708, 8, 0}, {1055, 8, 1}, {1045, 8, 1}, { 516, 8, 2}, { 495, 8, 2}, { 517, 8, 3}, { 506, 8, 3}, { 518, 8, 4}, { 507, 8, 4}, { 192, 8, 5}, { 177, 8, 5}, { 396, 8, 6}, { 425, 8, 6}, { 986, 8, 7}, { 219, 8, 7}, {1085, 8, 8}, {1068, 8, 8}, {1255, 8, 9}, {1261, 8, 9}, { 972, 8, 10}, { 953, 8, 10}, { 261, 8, 11}, { 254, 8, 11}, { 913, 8, 12}, { 920, 8, 12}, { 189, 8, 13}, { 164, 8, 13}, { 742, 8, 14}, { 772, 8, 14}, {1058, 8, 15}, {1062, 8, 15}, { 740, 8, 16}, { 770, 8, 16}, {1056, 8, 17}, {1060, 8, 17}, {1016, 8, 18}, { 995, 8, 18}, { 741, 8, 19}, { 771, 8, 19}, {1057, 8, 20}, {1061, 8, 20}, { 534, 8, 21}, { 540, 8, 21}, {1017, 8, 22}, { 996, 8, 22}, { 917, 9, 0}, { 1, 9, 0}, { 918, 9, 0}, { 979, 9, 1}, { 2, 9, 1}, { 978, 9, 1}, { 923, 9, 2}, { 130, 9, 2}, { 901, 9, 2}, { 684, 9, 3}, { 139, 9, 3}, { 707, 9, 3}, {1240, 9, 4}, { 146, 9, 4}, {1247, 9, 4}, { 301, 9, 5}, { 14, 9, 5}, { 304, 9, 6}, { 25, 9, 6}, { 306, 9, 7}, { 29, 9, 7}, { 309, 9, 8}, { 32, 9, 8}, { 313, 9, 9}, { 37, 9, 9}, { 314, 9, 10}, { 38, 9, 10}, { 315, 9, 11}, { 40, 9, 11}, { 316, 9, 12}, { 41, 9, 12}, { 317, 9, 13}, { 43, 9, 13}, { 318, 9, 14}, { 44, 9, 14}, { 319, 9, 15}, { 48, 9, 15}, { 320, 9, 16}, { 54, 9, 16}, { 321, 9, 17}, { 59, 9, 17}, { 322, 9, 18}, { 65, 9, 18}, { 323, 9, 19}, { 70, 9, 19}, { 324, 9, 20}, { 72, 9, 20}, { 325, 9, 21}, { 73, 9, 21}, { 326, 9, 22}, { 74, 9, 22}, { 327, 9, 23}, { 75, 9, 23}, { 328, 9, 24}, { 76, 9, 24}, { 329, 9, 25}, { 83, 9, 25}, { 330, 9, 26}, { 88, 9, 26}, { 331, 9, 27}, { 89, 9, 27}, { 332, 9, 28}, { 90, 9, 28}, { 333, 9, 29}, { 91, 9, 29}, { 334, 9, 30}, { 92, 9, 30}, { 335, 9, 31}, { 93, 9, 31}, { 336, 9, 32}, { 145, 9, 32}, { 337, 9, 33}, { 153, 9, 33}, { 302, 9, 34}, { 23, 9, 34}, { 303, 9, 35}, { 24, 9, 35}, { 305, 9, 36}, { 28, 9, 36}, { 307, 9, 37}, { 30, 9, 37}, { 308, 9, 38}, { 31, 9, 38}, { 310, 9, 39}, { 34, 9, 39}, { 311, 9, 40}, { 35, 9, 40}, { 214, 9, 41}, { 53, 9, 41}, { 209, 9, 41}, { 212, 9, 42}, { 55, 9, 42}, { 207, 9, 42}, { 213, 9, 43}, { 56, 9, 43}, { 208, 9, 43}, { 237, 9, 44}, { 58, 9, 44}, { 249, 9, 44}, { 236, 9, 45}, { 60, 9, 45}, { 219, 9, 45}, { 238, 9, 46}, { 61, 9, 46}, { 263, 9, 46}, { 736, 9, 47}, { 62, 9, 47}, { 708, 9, 47}, {1053, 9, 48}, { 63, 9, 48}, {1045, 9, 48}, { 156, 9, 49}, { 64, 9, 49}, { 164, 9, 49}, { 155, 9, 50}, { 66, 9, 50}, { 154, 9, 50}, { 157, 9, 51}, { 67, 9, 51}, { 184, 9, 51}, { 478, 9, 52}, { 68, 9, 52}, { 453, 9, 52}, { 477, 9, 53}, { 69, 9, 53}, { 448, 9, 53}, { 655, 9, 54}, { 71, 9, 54}, { 658, 9, 54}, { 312, 9, 55}, { 36, 9, 55}, { 215, 9, 56}, { 49, 9, 56}, { 210, 9, 56}, { 910, 10, 0}, { 287, 10, 1}, { 284, 10, 1}, { 397, 10, 2}, { 386, 10, 2}, { 536, 10, 3}, { 907, 10, 4}, { 893, 10, 4}, { 646, 10, 5}, { 645, 10, 5}, { 833, 10, 6}, { 832, 10, 6}, { 531, 10, 7}, { 530, 10, 7}, { 660, 10, 8}, { 659, 10, 8}, { 351, 10, 9}, { 496, 10, 9}, {1137, 10, 10}, {1133, 10, 10}, {1128, 10, 11}, {1238, 10, 12}, {1237, 10, 12}, {1256, 10, 13}, { 892, 10, 14}, { 890, 10, 14}, {1108, 10, 15}, {1111, 10, 15}, {1124, 10, 16}, {1123, 10, 16}, { 539, 10, 17}, { 538, 10, 17}, { 897, 11, 0}, { 885, 11, 0}, { 176, 11, 1}, { 154, 11, 1}, { 587, 11, 2}, { 581, 11, 2}, {1256, 11, 3}, {1250, 11, 3}, { 541, 11, 4}, { 525, 11, 4}, { 892, 11, 5}, { 887, 11, 5}, { 908, 12, 0}, { 163, 12, 1}, { 165, 12, 2}, { 168, 12, 3}, { 235, 12, 4}, { 241, 12, 5}, { 449, 12, 6}, { 450, 12, 7}, { 486, 12, 8}, { 529, 12, 9}, { 533, 12, 10}, { 542, 12, 11}, { 543, 12, 12}, { 584, 12, 13}, { 589, 12, 14}, {1184, 12, 14}, { 604, 12, 15}, { 608, 12, 16}, { 609, 12, 17}, { 610, 12, 18}, { 678, 12, 19}, { 689, 12, 20}, { 705, 12, 21}, { 710, 12, 22}, { 711, 12, 23}, { 834, 12, 24}, { 848, 12, 25}, { 915, 12, 26}, { 930, 12, 27}, { 997, 12, 28}, {1039, 12, 29}, {1040, 12, 30}, {1049, 12, 31}, {1051, 12, 32}, {1071, 12, 33}, {1072, 12, 34}, {1084, 12, 35}, {1086, 12, 36}, {1095, 12, 37}, {1155, 12, 38}, {1169, 12, 39}, {1182, 12, 40}, {1183, 12, 41}, {1189, 12, 42}, {1253, 12, 43}, {1163, 12, 44}, {1272, 12, 45}, {1273, 12, 46}, {1274, 12, 47}, {1282, 12, 48}, {1283, 12, 49}, {1286, 12, 50}, {1287, 12, 51}, { 695, 12, 52}, { 528, 12, 53}, { 278, 12, 54}, { 527, 12, 55}, { 932, 12, 56}, {1063, 12, 57}, {1127, 12, 58}, { 795, 12, 59}, { 796, 12, 60}, { 797, 12, 61}, { 798, 12, 62}, { 799, 12, 63}, { 800, 12, 64}, { 801, 12, 65}, { 802, 12, 66}, { 803, 12, 67}, { 804, 12, 68}, { 805, 12, 69}, { 806, 12, 70}, { 807, 12, 71}, { 808, 12, 72}, { 809, 12, 73}, { 810, 12, 74}, { 811, 12, 75}, { 812, 12, 76}, { 813, 12, 77}, { 814, 12, 78}, { 815, 12, 79}, { 816, 12, 80}, { 817, 12, 81}, { 818, 12, 82}, { 819, 12, 83}, { 820, 12, 84}, { 821, 12, 85}, { 912, 13, 0}, {1214, 13, 0}, { 670, 13, 1}, { 281, 13, 1}, { 483, 13, 2}, { 447, 13, 2}, {1054, 13, 3}, {1045, 13, 3}, { 738, 13, 4}, { 708, 13, 4}, {1210, 13, 5}, {1164, 13, 5}, {1224, 14, 0}, {1270, 14, 0}, { 955, 14, 1}, { 954, 14, 1}, { 381, 14, 2}, { 378, 14, 2}, {1043, 14, 3}, {1042, 14, 3}, { 559, 14, 4}, { 556, 14, 4}, { 914, 14, 5}, { 919, 14, 5}, { 520, 14, 6}, { 519, 14, 6}, { 273, 14, 7}, {1156, 14, 7}, { 643, 14, 8}, { 658, 14, 8}, {1025, 14, 9}, {1024, 14, 9}, {1022, 14, 10}, {1015, 14, 10}, { 927, 14, 11}, { 922, 14, 11}, { 172, 14, 12}, { 164, 14, 12}, { 630, 14, 13}, { 626, 14, 13}, { 652, 14, 14}, { 639, 14, 14}, { 653, 14, 14}, { 625, 14, 15}, { 624, 14, 15}, { 392, 14, 16}, { 382, 14, 16}, { 271, 14, 17}, { 233, 14, 17}, { 270, 14, 18}, { 221, 14, 18}, {1117, 14, 19}, {1116, 14, 19}, { 792, 14, 20}, { 248, 14, 20}, { 293, 14, 21}, { 424, 14, 21}, { 758, 14, 22}, { 748, 14, 22}, { 414, 14, 23}, { 298, 14, 23}, { 399, 14, 24}, {1070, 14, 24}, { 176, 14, 25}, { 162, 14, 25}, { 272, 14, 26}, { 220, 14, 26}, {1153, 14, 27}, {1090, 14, 27}, {1293, 14, 28}, {1292, 14, 28}, { 900, 14, 29}, { 904, 14, 29}, {1260, 14, 30}, {1257, 14, 30}, { 668, 14, 31}, { 676, 14, 32}, { 675, 14, 33}, { 582, 14, 34}, { 583, 14, 35}, { 380, 14, 36}, { 422, 14, 36}, { 607, 14, 37}, { 618, 14, 37}, { 400, 14, 38}, { 352, 14, 38}, {1047, 14, 39}, {1052, 14, 39}, { 910, 15, 0}, { 927, 15, 1}, { 922, 15, 1}, { 473, 15, 2}, { 466, 15, 2}, { 455, 15, 3}, { 454, 15, 3}, { 889, 16, 0}, { 0, 16, 1}, { 1, 16, 2}, { 5, 16, 3}, { 4, 16, 4}, { 3, 16, 5}, { 13, 16, 6}, { 12, 16, 7}, { 11, 16, 8}, { 10, 16, 9}, { 78, 16, 10}, { 9, 16, 11}, { 8, 16, 12}, { 7, 16, 13}, { 82, 16, 14}, { 47, 16, 15}, { 115, 16, 16}, { 6, 16, 17}, { 131, 16, 18}, { 81, 16, 19}, { 118, 16, 20}, { 46, 16, 21}, { 80, 16, 22}, { 98, 16, 23}, { 117, 16, 24}, { 133, 16, 25}, { 26, 16, 26}, { 2, 16, 27}, { 79, 16, 28}, { 45, 16, 29}, { 116, 16, 30}, { 77, 16, 31}, { 132, 16, 32}, { 97, 16, 33}, { 147, 16, 34}, { 114, 16, 35}, { 27, 16, 36}, { 124, 16, 37}, { 33, 16, 38}, { 130, 16, 39}, { 39, 16, 40}, { 139, 16, 41}, { 42, 16, 42}, { 146, 16, 43}, { 14, 16, 44}, { 25, 16, 45}, { 29, 16, 46}, { 32, 16, 47}, { 37, 16, 48}, { 38, 16, 49}, { 40, 16, 50}, { 41, 16, 51}, { 43, 16, 52}, { 44, 16, 53}, { 48, 16, 54}, { 54, 16, 55}, { 59, 16, 56}, { 65, 16, 57}, { 70, 16, 58}, { 72, 16, 59}, { 73, 16, 60}, { 74, 16, 61}, { 75, 16, 62}, { 76, 16, 63}, { 83, 16, 64}, { 88, 16, 65}, { 89, 16, 66}, { 90, 16, 67}, { 91, 16, 68}, { 92, 16, 69}, { 93, 16, 70}, { 94, 16, 71}, { 95, 16, 72}, { 96, 16, 73}, { 99, 16, 74}, { 104, 16, 75}, { 105, 16, 76}, { 106, 16, 77}, { 108, 16, 78}, { 109, 16, 79}, { 110, 16, 80}, { 111, 16, 81}, { 112, 16, 82}, { 113, 16, 83}, { 119, 16, 84}, { 125, 16, 85}, { 134, 16, 86}, { 140, 16, 87}, { 148, 16, 88}, { 15, 16, 89}, { 49, 16, 90}, { 84, 16, 91}, { 100, 16, 92}, { 120, 16, 93}, { 126, 16, 94}, { 135, 16, 95}, { 141, 16, 96}, { 149, 16, 97}, { 16, 16, 98}, { 50, 16, 99}, { 85, 16, 100}, { 101, 16, 101}, { 121, 16, 102}, { 127, 16, 103}, { 136, 16, 104}, { 142, 16, 105}, { 150, 16, 106}, { 17, 16, 107}, { 51, 16, 108}, { 86, 16, 109}, { 102, 16, 110}, { 122, 16, 111}, { 128, 16, 112}, { 137, 16, 113}, { 143, 16, 114}, { 151, 16, 115}, { 18, 16, 116}, { 52, 16, 117}, { 57, 16, 118}, { 87, 16, 119}, { 103, 16, 120}, { 107, 16, 121}, { 123, 16, 122}, { 129, 16, 123}, { 138, 16, 124}, { 144, 16, 125}, { 152, 16, 126}, { 19, 16, 127}, { 20, 16, 128}, { 21, 16, 129}, { 22, 16, 130}, { 887, 17, 0}, {1053, 17, 1}, { 736, 17, 2}, {1242, 17, 3}, { 737, 17, 4}, {1203, 17, 5}, { 259, 17, 6}, {1204, 17, 7}, {1208, 17, 8}, {1206, 17, 9}, {1207, 17, 10}, { 260, 17, 11}, {1205, 17, 12}, { 980, 17, 13}, { 963, 18, 0}, { 247, 18, 1}, {1241, 18, 2}, { 216, 18, 3}, { 923, 18, 4}, {1240, 18, 5}, {1036, 18, 6}, { 654, 18, 7}, {1245, 18, 8}, {1244, 18, 9}, {1243, 18, 10}, { 408, 18, 11}, { 402, 18, 12}, { 403, 18, 13}, { 413, 18, 14}, { 410, 18, 15}, { 409, 18, 16}, { 412, 18, 17}, { 411, 18, 18}, { 407, 18, 19}, { 404, 18, 20}, { 405, 18, 21}, { 869, 18, 22}, {1201, 18, 23}, {1202, 18, 24}, { 546, 18, 25}, { 290, 18, 26}, {1048, 18, 27}, {1157, 18, 28}, { 406, 18, 29}, { 911, 18, 30}, { 672, 18, 31}, { 926, 18, 32}, { 924, 18, 33}, { 266, 18, 34}, }; /* property values: 5648 bytes. */ /* Codepoints which expand on full case-folding. */ RE_UINT16 re_expand_on_folding[] = { 223, 304, 329, 496, 912, 944, 1415, 7830, 7831, 7832, 7833, 7834, 7838, 8016, 8018, 8020, 8022, 8064, 8065, 8066, 8067, 8068, 8069, 8070, 8071, 8072, 8073, 8074, 8075, 8076, 8077, 8078, 8079, 8080, 8081, 8082, 8083, 8084, 8085, 8086, 8087, 8088, 8089, 8090, 8091, 8092, 8093, 8094, 8095, 8096, 8097, 8098, 8099, 8100, 8101, 8102, 8103, 8104, 8105, 8106, 8107, 8108, 8109, 8110, 8111, 8114, 8115, 8116, 8118, 8119, 8124, 8130, 8131, 8132, 8134, 8135, 8140, 8146, 8147, 8150, 8151, 8162, 8163, 8164, 8166, 8167, 8178, 8179, 8180, 8182, 8183, 8188, 64256, 64257, 64258, 64259, 64260, 64261, 64262, 64275, 64276, 64277, 64278, 64279, }; /* expand_on_folding: 208 bytes. */ /* General_Category. */ static RE_UINT8 re_general_category_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 14, 14, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 23, 21, 21, 21, 21, 24, 21, 21, 21, 21, 21, 21, 21, 21, 25, 26, 21, 21, 27, 28, 21, 29, 30, 31, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 32, 7, 33, 34, 7, 35, 21, 21, 21, 21, 21, 36, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 37, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, }; static RE_UINT8 re_general_category_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38, 39, 34, 34, 34, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 65, 66, 67, 68, 69, 70, 71, 69, 72, 73, 69, 69, 64, 74, 64, 64, 75, 76, 77, 78, 79, 80, 81, 82, 69, 83, 84, 85, 86, 87, 88, 89, 69, 69, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 90, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 91, 92, 34, 34, 34, 34, 34, 34, 34, 34, 93, 34, 34, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 106, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 34, 34, 109, 110, 111, 112, 34, 34, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 123, 34, 34, 130, 123, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 123, 123, 141, 123, 123, 123, 142, 143, 144, 145, 146, 147, 148, 123, 123, 149, 123, 150, 151, 152, 153, 123, 123, 154, 123, 123, 123, 155, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 34, 34, 34, 34, 34, 34, 156, 157, 34, 158, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 34, 34, 34, 34, 34, 34, 34, 159, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 34, 34, 34, 160, 123, 123, 123, 34, 34, 34, 34, 161, 162, 163, 164, 123, 123, 123, 123, 123, 123, 165, 166, 167, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 168, 169, 123, 123, 123, 123, 123, 123, 69, 170, 171, 172, 173, 123, 174, 123, 175, 176, 177, 178, 179, 180, 181, 182, 69, 69, 69, 69, 183, 184, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 185, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 186, 187, 123, 123, 188, 189, 190, 191, 192, 123, 69, 193, 69, 69, 194, 195, 69, 196, 197, 198, 199, 200, 201, 202, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 203, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 204, 34, 205, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 206, 123, 123, 34, 34, 34, 34, 207, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 208, 123, 209, 210, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 211, }; static RE_UINT16 re_general_category_stage_3[] = { 0, 0, 1, 2, 3, 4, 5, 6, 0, 0, 7, 8, 9, 10, 11, 12, 13, 13, 13, 14, 15, 13, 13, 16, 17, 18, 19, 20, 21, 22, 13, 23, 13, 13, 13, 24, 25, 11, 11, 11, 11, 26, 11, 27, 28, 29, 30, 31, 32, 32, 32, 32, 32, 32, 32, 33, 34, 35, 36, 11, 37, 38, 13, 39, 9, 9, 9, 11, 11, 11, 13, 13, 40, 13, 13, 13, 41, 13, 13, 13, 13, 13, 13, 42, 9, 43, 44, 11, 45, 46, 32, 47, 48, 49, 50, 51, 52, 53, 49, 49, 54, 32, 55, 56, 49, 49, 49, 49, 49, 57, 58, 59, 60, 61, 49, 32, 62, 49, 49, 49, 49, 49, 63, 64, 65, 49, 66, 67, 49, 68, 69, 70, 49, 71, 72, 72, 72, 72, 49, 73, 72, 72, 74, 32, 75, 49, 49, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 82, 83, 90, 91, 92, 93, 94, 95, 96, 83, 97, 98, 99, 87, 100, 101, 82, 83, 102, 103, 104, 87, 105, 106, 107, 108, 109, 110, 111, 93, 112, 113, 114, 83, 115, 116, 117, 87, 118, 119, 114, 83, 120, 121, 122, 87, 123, 119, 114, 49, 124, 125, 126, 87, 127, 128, 129, 49, 130, 131, 132, 93, 133, 134, 49, 49, 135, 136, 137, 72, 72, 138, 139, 140, 141, 142, 143, 72, 72, 144, 145, 146, 147, 148, 49, 149, 150, 151, 152, 32, 153, 154, 155, 72, 72, 49, 49, 156, 157, 158, 159, 160, 161, 162, 163, 9, 9, 164, 49, 49, 165, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 166, 167, 49, 49, 166, 49, 49, 168, 169, 170, 49, 49, 49, 169, 49, 49, 49, 171, 172, 173, 49, 174, 9, 9, 9, 9, 9, 175, 176, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 177, 49, 178, 179, 49, 49, 49, 49, 180, 181, 182, 183, 49, 184, 49, 185, 182, 186, 49, 49, 49, 187, 188, 189, 190, 191, 192, 190, 49, 49, 193, 49, 49, 194, 49, 49, 195, 49, 49, 49, 49, 196, 49, 197, 198, 199, 200, 49, 201, 73, 49, 49, 202, 49, 203, 204, 205, 205, 49, 206, 49, 49, 49, 207, 208, 209, 190, 190, 210, 211, 72, 72, 72, 72, 212, 49, 49, 213, 214, 158, 215, 216, 217, 49, 218, 65, 49, 49, 219, 220, 49, 49, 221, 222, 223, 65, 49, 224, 72, 72, 72, 72, 225, 226, 227, 228, 11, 11, 229, 27, 27, 27, 230, 231, 11, 232, 27, 27, 32, 32, 32, 233, 13, 13, 13, 13, 13, 13, 13, 13, 13, 234, 13, 13, 13, 13, 13, 13, 235, 236, 235, 235, 236, 237, 235, 238, 239, 239, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 72, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 266, 267, 268, 269, 205, 270, 271, 205, 272, 273, 273, 273, 273, 273, 273, 273, 273, 274, 205, 275, 205, 205, 205, 205, 276, 205, 277, 273, 278, 205, 279, 280, 281, 205, 205, 282, 72, 281, 72, 265, 265, 265, 283, 205, 205, 205, 205, 284, 265, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 285, 286, 205, 205, 287, 205, 205, 205, 205, 205, 205, 288, 205, 205, 205, 205, 205, 205, 205, 289, 290, 265, 291, 205, 205, 292, 273, 293, 273, 294, 295, 273, 273, 273, 296, 273, 297, 205, 205, 205, 273, 298, 205, 205, 299, 205, 300, 205, 301, 302, 303, 304, 72, 9, 9, 305, 11, 11, 306, 307, 308, 13, 13, 13, 13, 13, 13, 309, 310, 11, 11, 311, 49, 49, 49, 312, 313, 49, 314, 315, 315, 315, 315, 32, 32, 316, 317, 318, 319, 320, 72, 72, 72, 205, 321, 205, 205, 205, 205, 205, 322, 205, 205, 205, 205, 205, 323, 72, 324, 325, 326, 327, 328, 134, 49, 49, 49, 49, 329, 176, 49, 49, 49, 49, 330, 331, 49, 201, 134, 49, 49, 49, 49, 197, 332, 49, 50, 205, 205, 322, 49, 205, 333, 334, 205, 335, 336, 205, 205, 334, 205, 205, 336, 205, 205, 205, 333, 49, 49, 49, 196, 205, 205, 205, 205, 49, 49, 49, 49, 49, 196, 72, 72, 49, 337, 49, 49, 49, 49, 49, 49, 149, 205, 205, 205, 282, 49, 49, 224, 338, 49, 339, 72, 13, 13, 340, 341, 13, 342, 49, 49, 49, 49, 343, 344, 31, 345, 346, 347, 13, 13, 13, 348, 349, 350, 351, 352, 72, 72, 72, 353, 354, 49, 355, 356, 49, 49, 49, 357, 358, 49, 49, 359, 360, 190, 32, 361, 65, 49, 362, 49, 363, 364, 49, 149, 75, 49, 49, 365, 366, 367, 368, 369, 49, 49, 370, 371, 372, 373, 49, 374, 49, 49, 49, 375, 376, 377, 378, 379, 380, 381, 315, 11, 11, 382, 383, 11, 11, 11, 11, 11, 49, 49, 384, 190, 49, 49, 385, 49, 386, 49, 49, 202, 387, 387, 387, 387, 387, 387, 387, 387, 388, 388, 388, 388, 388, 388, 388, 388, 49, 49, 49, 49, 49, 49, 201, 49, 49, 49, 49, 49, 49, 203, 72, 72, 389, 390, 391, 392, 393, 49, 49, 49, 49, 49, 49, 394, 395, 396, 49, 49, 49, 49, 49, 397, 72, 49, 49, 49, 49, 398, 49, 49, 194, 72, 72, 399, 32, 400, 32, 401, 402, 403, 404, 405, 49, 49, 49, 49, 49, 49, 49, 406, 407, 2, 3, 4, 5, 408, 409, 410, 49, 411, 49, 197, 412, 413, 414, 415, 416, 49, 170, 417, 201, 201, 72, 72, 49, 49, 49, 49, 49, 49, 49, 50, 418, 265, 265, 419, 266, 266, 266, 420, 421, 324, 422, 72, 72, 205, 205, 423, 72, 72, 72, 72, 72, 72, 72, 72, 49, 149, 49, 49, 49, 99, 424, 425, 49, 49, 426, 49, 427, 49, 49, 428, 49, 429, 49, 49, 430, 431, 72, 72, 9, 9, 432, 11, 11, 49, 49, 49, 49, 201, 190, 72, 72, 72, 72, 72, 49, 49, 194, 49, 49, 49, 433, 72, 49, 49, 49, 314, 49, 196, 194, 72, 434, 49, 49, 435, 49, 436, 49, 437, 49, 197, 438, 72, 72, 72, 49, 439, 49, 440, 49, 441, 72, 72, 72, 72, 49, 49, 49, 442, 265, 443, 265, 265, 444, 445, 49, 446, 447, 448, 49, 449, 49, 450, 72, 72, 451, 49, 452, 453, 49, 49, 49, 454, 49, 455, 49, 456, 49, 457, 458, 72, 72, 72, 72, 72, 49, 49, 49, 49, 459, 72, 72, 72, 9, 9, 9, 460, 11, 11, 11, 461, 72, 72, 72, 72, 72, 72, 265, 462, 463, 49, 49, 464, 465, 443, 466, 467, 217, 49, 49, 468, 469, 49, 459, 190, 470, 49, 471, 472, 473, 49, 49, 474, 217, 49, 49, 475, 476, 477, 478, 479, 49, 96, 480, 481, 72, 72, 72, 72, 482, 483, 484, 49, 49, 485, 486, 190, 487, 82, 83, 97, 488, 489, 490, 491, 49, 49, 49, 492, 493, 190, 72, 72, 49, 49, 494, 495, 496, 497, 72, 72, 49, 49, 49, 498, 499, 190, 72, 72, 49, 49, 500, 501, 190, 72, 72, 72, 49, 502, 503, 504, 72, 72, 72, 72, 72, 72, 9, 9, 11, 11, 146, 505, 72, 72, 72, 72, 49, 49, 49, 459, 49, 203, 72, 72, 72, 72, 72, 72, 266, 266, 266, 266, 266, 266, 506, 507, 49, 49, 49, 49, 385, 72, 72, 72, 49, 49, 197, 72, 72, 72, 72, 72, 49, 49, 49, 49, 314, 72, 72, 72, 49, 49, 49, 459, 49, 197, 367, 72, 72, 72, 72, 72, 72, 49, 201, 508, 49, 49, 49, 509, 510, 511, 512, 513, 49, 72, 72, 72, 72, 72, 72, 72, 49, 49, 49, 49, 73, 514, 515, 516, 467, 517, 72, 72, 72, 72, 72, 72, 518, 72, 72, 72, 72, 72, 72, 72, 49, 49, 49, 49, 49, 49, 50, 149, 459, 519, 520, 72, 72, 72, 72, 72, 205, 205, 205, 205, 205, 205, 205, 323, 205, 205, 521, 205, 205, 205, 522, 523, 524, 205, 525, 205, 205, 205, 526, 72, 205, 205, 205, 205, 527, 72, 72, 72, 205, 205, 205, 205, 205, 282, 265, 528, 9, 529, 11, 530, 531, 532, 235, 9, 533, 534, 535, 536, 537, 9, 529, 11, 538, 539, 11, 540, 541, 542, 543, 9, 544, 11, 9, 529, 11, 530, 531, 11, 235, 9, 533, 543, 9, 544, 11, 9, 529, 11, 545, 9, 546, 547, 548, 549, 11, 550, 9, 551, 552, 553, 554, 11, 555, 9, 556, 11, 557, 558, 558, 558, 32, 32, 32, 559, 32, 32, 560, 561, 562, 563, 46, 72, 72, 72, 72, 72, 49, 49, 49, 49, 564, 565, 72, 72, 566, 49, 567, 568, 569, 570, 571, 572, 573, 202, 574, 202, 72, 72, 72, 575, 205, 205, 324, 205, 205, 205, 205, 205, 205, 322, 333, 576, 576, 576, 205, 323, 173, 205, 333, 205, 205, 205, 324, 205, 205, 281, 72, 72, 72, 72, 577, 205, 578, 205, 205, 281, 526, 303, 72, 72, 205, 205, 205, 205, 205, 205, 205, 579, 205, 205, 205, 205, 205, 205, 205, 321, 205, 205, 580, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 422, 581, 322, 205, 205, 205, 205, 205, 205, 205, 322, 205, 205, 205, 205, 205, 582, 72, 72, 324, 205, 205, 205, 583, 174, 205, 205, 583, 205, 584, 72, 72, 72, 72, 72, 72, 526, 72, 72, 72, 72, 72, 72, 582, 72, 72, 72, 422, 72, 72, 72, 49, 49, 49, 49, 49, 314, 72, 72, 49, 49, 49, 73, 49, 49, 49, 49, 49, 201, 49, 49, 49, 49, 49, 49, 49, 49, 518, 72, 72, 72, 72, 72, 49, 201, 72, 72, 72, 72, 72, 72, 585, 72, 586, 586, 586, 586, 586, 586, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 72, 388, 388, 388, 388, 388, 388, 388, 587, }; static RE_UINT8 re_general_category_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 2, 4, 5, 6, 2, 7, 7, 7, 7, 7, 2, 8, 9, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 18, 19, 1, 20, 20, 21, 22, 23, 24, 25, 26, 27, 15, 2, 28, 29, 27, 30, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 31, 11, 11, 11, 32, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 33, 16, 16, 16, 16, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 34, 34, 34, 34, 34, 34, 16, 32, 32, 32, 32, 32, 32, 32, 11, 34, 34, 16, 34, 32, 32, 11, 34, 11, 16, 11, 11, 34, 32, 11, 32, 16, 11, 34, 32, 32, 32, 11, 34, 16, 32, 11, 34, 11, 34, 34, 32, 35, 32, 16, 36, 36, 37, 34, 38, 37, 34, 34, 34, 34, 34, 34, 34, 34, 16, 32, 34, 38, 32, 11, 32, 32, 32, 32, 32, 32, 16, 16, 16, 11, 34, 32, 34, 34, 11, 32, 32, 32, 32, 32, 16, 16, 39, 16, 16, 16, 16, 16, 40, 40, 40, 40, 40, 40, 40, 40, 40, 41, 41, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 41, 40, 40, 42, 41, 41, 41, 42, 42, 41, 41, 41, 41, 41, 41, 41, 41, 43, 43, 43, 43, 43, 43, 43, 43, 32, 32, 42, 32, 44, 45, 16, 10, 44, 44, 41, 46, 11, 47, 47, 11, 34, 11, 11, 11, 11, 11, 11, 11, 11, 48, 11, 11, 11, 11, 16, 16, 16, 16, 16, 16, 16, 16, 16, 34, 16, 11, 32, 16, 32, 32, 32, 32, 16, 16, 32, 49, 34, 32, 34, 11, 32, 50, 43, 43, 51, 32, 32, 32, 11, 34, 34, 34, 34, 34, 34, 16, 48, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 47, 52, 2, 2, 2, 53, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 54, 55, 56, 57, 58, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 59, 60, 61, 43, 60, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 44, 44, 36, 63, 64, 44, 44, 44, 44, 44, 65, 65, 65, 8, 9, 66, 2, 67, 43, 43, 43, 43, 43, 61, 68, 2, 69, 36, 36, 36, 36, 70, 43, 43, 7, 7, 7, 7, 7, 2, 2, 36, 71, 36, 36, 36, 36, 36, 36, 36, 36, 36, 72, 43, 43, 43, 73, 50, 43, 43, 74, 75, 76, 43, 43, 36, 7, 7, 7, 7, 7, 36, 77, 78, 2, 2, 2, 2, 2, 2, 2, 79, 70, 36, 36, 36, 36, 36, 36, 36, 43, 43, 43, 43, 43, 80, 81, 36, 36, 36, 36, 43, 43, 43, 43, 43, 71, 44, 44, 44, 44, 44, 44, 44, 7, 7, 7, 7, 7, 36, 36, 36, 36, 36, 36, 36, 36, 70, 43, 43, 43, 43, 40, 21, 2, 82, 44, 44, 36, 36, 36, 43, 43, 75, 43, 43, 43, 43, 75, 43, 75, 43, 43, 44, 2, 2, 2, 2, 2, 2, 2, 64, 36, 36, 36, 36, 70, 43, 44, 64, 44, 44, 44, 44, 44, 44, 44, 44, 36, 36, 62, 44, 44, 44, 44, 44, 44, 58, 43, 43, 43, 43, 43, 43, 43, 83, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 83, 71, 84, 85, 43, 43, 43, 83, 84, 85, 84, 70, 43, 43, 43, 36, 36, 36, 36, 36, 43, 2, 7, 7, 7, 7, 7, 86, 36, 36, 36, 36, 36, 36, 36, 70, 84, 81, 36, 36, 36, 62, 81, 62, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 36, 36, 36, 62, 62, 44, 36, 36, 44, 71, 84, 85, 43, 80, 87, 88, 87, 85, 62, 44, 44, 44, 87, 44, 44, 36, 81, 36, 43, 44, 7, 7, 7, 7, 7, 36, 20, 27, 27, 27, 57, 44, 44, 58, 83, 81, 36, 36, 62, 44, 81, 62, 36, 81, 62, 36, 44, 80, 84, 85, 80, 44, 58, 80, 58, 43, 44, 58, 44, 44, 44, 81, 36, 62, 62, 44, 44, 44, 7, 7, 7, 7, 7, 43, 36, 70, 44, 44, 44, 44, 44, 58, 83, 81, 36, 36, 36, 36, 81, 36, 81, 36, 36, 36, 36, 36, 36, 62, 36, 81, 36, 36, 44, 71, 84, 85, 43, 43, 58, 83, 87, 85, 44, 62, 44, 44, 44, 44, 44, 44, 44, 66, 44, 44, 44, 81, 44, 44, 44, 58, 84, 81, 36, 36, 36, 62, 81, 62, 36, 81, 36, 36, 44, 71, 85, 85, 43, 80, 87, 88, 87, 85, 44, 44, 44, 44, 83, 44, 44, 36, 81, 78, 27, 27, 27, 44, 44, 44, 44, 44, 71, 81, 36, 36, 62, 44, 36, 62, 36, 36, 44, 81, 62, 62, 36, 44, 81, 62, 44, 36, 62, 44, 36, 36, 36, 36, 36, 36, 44, 44, 84, 83, 88, 44, 84, 88, 84, 85, 44, 62, 44, 44, 87, 44, 44, 44, 44, 27, 89, 67, 67, 57, 90, 44, 44, 83, 84, 81, 36, 36, 36, 62, 36, 62, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 44, 81, 43, 83, 84, 88, 43, 80, 43, 43, 44, 44, 44, 58, 80, 36, 62, 44, 44, 44, 44, 44, 44, 27, 27, 27, 89, 58, 84, 81, 36, 36, 36, 62, 36, 36, 36, 81, 36, 36, 44, 71, 85, 84, 84, 88, 83, 88, 84, 43, 44, 44, 44, 87, 88, 44, 44, 44, 62, 81, 62, 44, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 62, 81, 84, 85, 43, 80, 84, 88, 84, 85, 62, 44, 44, 44, 87, 44, 44, 44, 81, 27, 27, 27, 44, 56, 36, 36, 36, 44, 84, 81, 36, 36, 36, 36, 36, 36, 36, 36, 62, 44, 36, 36, 36, 36, 81, 36, 36, 36, 36, 81, 44, 36, 36, 36, 62, 44, 80, 44, 87, 84, 43, 80, 80, 84, 84, 84, 84, 44, 84, 64, 44, 44, 44, 44, 44, 81, 36, 36, 36, 36, 36, 36, 36, 70, 36, 43, 43, 43, 80, 44, 91, 36, 36, 36, 75, 43, 43, 43, 61, 7, 7, 7, 7, 7, 2, 44, 44, 81, 62, 62, 81, 62, 62, 81, 44, 44, 44, 36, 36, 81, 36, 36, 36, 81, 36, 81, 81, 44, 36, 81, 36, 70, 36, 43, 43, 43, 58, 71, 44, 36, 36, 62, 82, 43, 43, 43, 44, 7, 7, 7, 7, 7, 44, 36, 36, 77, 67, 2, 2, 2, 2, 2, 2, 2, 92, 92, 67, 43, 67, 67, 67, 7, 7, 7, 7, 7, 27, 27, 27, 27, 27, 50, 50, 50, 4, 4, 84, 36, 36, 36, 36, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 44, 58, 43, 43, 43, 43, 43, 43, 83, 43, 43, 61, 43, 36, 36, 70, 43, 43, 43, 43, 43, 58, 43, 43, 43, 43, 43, 43, 43, 43, 43, 80, 67, 67, 67, 67, 76, 67, 67, 90, 67, 2, 2, 92, 67, 21, 64, 44, 44, 36, 36, 36, 36, 36, 93, 85, 43, 83, 43, 43, 43, 85, 83, 85, 71, 7, 7, 7, 7, 7, 2, 2, 2, 36, 36, 36, 84, 43, 36, 36, 43, 71, 84, 94, 93, 84, 84, 84, 36, 70, 43, 71, 36, 36, 36, 36, 36, 36, 83, 85, 83, 84, 84, 85, 93, 7, 7, 7, 7, 7, 84, 85, 67, 11, 11, 11, 48, 44, 44, 48, 44, 36, 36, 36, 36, 36, 63, 69, 36, 36, 36, 36, 36, 62, 36, 36, 44, 36, 36, 36, 62, 62, 36, 36, 44, 62, 36, 36, 44, 36, 36, 36, 62, 62, 36, 36, 44, 36, 36, 36, 36, 36, 36, 36, 62, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 58, 43, 2, 2, 2, 2, 95, 27, 27, 27, 27, 27, 27, 27, 27, 27, 96, 44, 67, 67, 67, 67, 67, 44, 44, 44, 11, 11, 11, 44, 16, 16, 16, 44, 97, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 63, 72, 98, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 99, 100, 44, 36, 36, 36, 36, 36, 63, 2, 101, 102, 36, 36, 36, 62, 44, 44, 44, 36, 36, 36, 36, 36, 36, 62, 36, 36, 43, 80, 44, 44, 44, 44, 44, 36, 43, 61, 64, 44, 44, 44, 44, 36, 43, 44, 44, 44, 44, 44, 44, 62, 43, 44, 44, 44, 44, 44, 44, 36, 36, 43, 85, 43, 43, 43, 84, 84, 84, 84, 83, 85, 43, 43, 43, 43, 43, 2, 86, 2, 66, 70, 44, 7, 7, 7, 7, 7, 44, 44, 44, 27, 27, 27, 27, 27, 44, 44, 44, 2, 2, 2, 103, 2, 60, 43, 68, 36, 104, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 44, 44, 44, 44, 36, 36, 36, 36, 70, 62, 44, 44, 36, 36, 36, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 62, 43, 83, 84, 85, 83, 84, 44, 44, 84, 83, 84, 84, 85, 43, 44, 44, 90, 44, 2, 7, 7, 7, 7, 7, 36, 36, 36, 36, 36, 36, 36, 44, 36, 36, 36, 36, 36, 36, 44, 44, 36, 36, 36, 36, 36, 44, 44, 44, 7, 7, 7, 7, 7, 96, 44, 67, 67, 67, 67, 67, 67, 67, 67, 67, 36, 36, 36, 70, 83, 85, 44, 2, 36, 36, 93, 83, 43, 43, 43, 80, 83, 83, 85, 43, 43, 43, 83, 84, 84, 85, 43, 43, 43, 43, 80, 58, 2, 2, 2, 86, 2, 2, 2, 44, 43, 43, 43, 43, 43, 43, 43, 105, 43, 43, 94, 36, 36, 36, 36, 36, 36, 36, 83, 43, 43, 83, 83, 84, 84, 83, 94, 36, 36, 36, 44, 44, 92, 67, 67, 67, 67, 50, 43, 43, 43, 43, 67, 67, 67, 67, 90, 44, 43, 94, 36, 36, 36, 36, 36, 36, 93, 43, 43, 84, 43, 85, 43, 36, 36, 36, 36, 83, 43, 84, 85, 85, 43, 84, 44, 44, 44, 44, 2, 2, 36, 36, 84, 84, 84, 84, 43, 43, 43, 43, 84, 43, 44, 54, 2, 2, 7, 7, 7, 7, 7, 44, 81, 36, 36, 36, 36, 36, 40, 40, 40, 2, 2, 2, 2, 2, 44, 44, 44, 44, 43, 61, 43, 43, 43, 43, 43, 43, 83, 43, 43, 43, 71, 36, 70, 36, 36, 84, 71, 62, 43, 44, 44, 44, 16, 16, 16, 16, 16, 16, 40, 40, 40, 40, 40, 40, 40, 45, 16, 16, 16, 16, 16, 16, 45, 16, 16, 16, 16, 16, 16, 16, 16, 106, 40, 40, 43, 43, 43, 44, 44, 44, 43, 43, 32, 32, 32, 16, 16, 16, 16, 32, 16, 16, 16, 16, 11, 11, 11, 11, 16, 16, 16, 44, 11, 11, 11, 44, 16, 16, 16, 16, 48, 48, 48, 48, 16, 16, 16, 16, 16, 16, 16, 44, 16, 16, 16, 16, 107, 107, 107, 107, 16, 16, 108, 16, 11, 11, 109, 110, 41, 16, 108, 16, 11, 11, 109, 41, 16, 16, 44, 16, 11, 11, 111, 41, 16, 16, 16, 16, 11, 11, 112, 41, 44, 16, 108, 16, 11, 11, 109, 113, 114, 114, 114, 114, 114, 115, 65, 65, 116, 116, 116, 2, 117, 118, 117, 118, 2, 2, 2, 2, 119, 65, 65, 120, 2, 2, 2, 2, 121, 122, 2, 123, 124, 2, 125, 126, 2, 2, 2, 2, 2, 9, 124, 2, 2, 2, 2, 127, 65, 65, 68, 65, 65, 65, 65, 65, 128, 44, 27, 27, 27, 8, 125, 129, 27, 27, 27, 27, 27, 8, 125, 100, 40, 40, 40, 40, 40, 40, 82, 44, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 130, 43, 43, 43, 43, 43, 43, 131, 51, 132, 51, 132, 43, 43, 43, 43, 43, 80, 44, 44, 44, 44, 44, 44, 44, 67, 133, 67, 134, 67, 34, 11, 16, 11, 32, 134, 67, 49, 11, 11, 67, 67, 67, 133, 133, 133, 11, 11, 135, 11, 11, 35, 36, 39, 67, 16, 11, 8, 8, 49, 16, 16, 26, 67, 136, 27, 27, 27, 27, 27, 27, 27, 27, 101, 101, 101, 101, 101, 101, 101, 101, 101, 137, 138, 101, 139, 67, 44, 44, 8, 8, 140, 67, 67, 8, 67, 67, 140, 26, 67, 140, 67, 67, 67, 140, 67, 67, 67, 67, 67, 67, 67, 8, 67, 140, 140, 67, 67, 67, 67, 67, 67, 67, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 67, 67, 67, 67, 4, 4, 67, 67, 8, 67, 67, 67, 141, 142, 67, 67, 67, 67, 67, 67, 67, 67, 140, 67, 67, 67, 67, 67, 67, 26, 8, 8, 8, 8, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 8, 8, 8, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 90, 44, 44, 67, 67, 67, 90, 44, 44, 44, 44, 27, 27, 27, 27, 27, 27, 67, 67, 67, 67, 67, 67, 67, 27, 27, 27, 67, 67, 67, 26, 67, 67, 67, 67, 26, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 8, 8, 8, 8, 67, 67, 67, 67, 67, 67, 67, 26, 67, 67, 67, 67, 4, 4, 4, 4, 4, 4, 4, 27, 27, 27, 27, 27, 27, 27, 67, 67, 67, 67, 67, 67, 8, 8, 125, 143, 8, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 4, 8, 125, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 143, 8, 8, 8, 8, 8, 8, 8, 4, 4, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 140, 26, 8, 8, 140, 67, 67, 67, 44, 67, 67, 67, 67, 67, 67, 67, 67, 44, 67, 67, 67, 67, 67, 67, 67, 67, 67, 44, 56, 67, 67, 67, 67, 67, 90, 67, 67, 67, 67, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 67, 67, 11, 11, 11, 11, 11, 11, 11, 47, 16, 16, 16, 16, 16, 16, 16, 108, 32, 11, 32, 34, 34, 34, 34, 11, 32, 32, 34, 16, 16, 16, 40, 11, 32, 32, 136, 67, 67, 134, 34, 145, 43, 32, 44, 44, 54, 2, 95, 2, 16, 16, 16, 53, 44, 44, 53, 44, 36, 36, 36, 36, 44, 44, 44, 52, 64, 44, 44, 44, 44, 44, 44, 58, 36, 36, 36, 62, 44, 44, 44, 44, 36, 36, 36, 62, 36, 36, 36, 62, 2, 117, 117, 2, 121, 122, 117, 2, 2, 2, 2, 6, 2, 103, 117, 2, 117, 4, 4, 4, 4, 2, 2, 86, 2, 2, 2, 2, 2, 116, 2, 2, 103, 146, 44, 44, 44, 44, 44, 44, 67, 67, 67, 67, 67, 56, 67, 67, 67, 67, 44, 44, 44, 44, 44, 44, 67, 67, 67, 44, 44, 44, 44, 44, 67, 67, 67, 67, 67, 67, 44, 44, 1, 2, 147, 148, 4, 4, 4, 4, 4, 67, 4, 4, 4, 4, 149, 150, 151, 101, 101, 101, 101, 43, 43, 84, 152, 40, 40, 67, 101, 153, 63, 67, 36, 36, 36, 62, 58, 154, 155, 69, 36, 36, 36, 36, 36, 63, 40, 69, 44, 44, 81, 36, 36, 36, 36, 36, 67, 27, 27, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 90, 27, 27, 27, 27, 27, 67, 67, 67, 67, 67, 67, 67, 27, 27, 27, 27, 156, 27, 27, 27, 27, 27, 27, 27, 36, 36, 104, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 157, 2, 7, 7, 7, 7, 7, 36, 44, 44, 32, 32, 32, 32, 32, 32, 32, 70, 51, 158, 43, 43, 43, 43, 43, 86, 32, 32, 32, 32, 32, 32, 40, 43, 36, 36, 36, 101, 101, 101, 101, 101, 43, 2, 2, 2, 44, 44, 44, 44, 41, 41, 41, 155, 40, 40, 40, 40, 41, 32, 32, 32, 32, 32, 32, 32, 16, 32, 32, 32, 32, 32, 32, 32, 45, 16, 16, 16, 34, 34, 34, 32, 32, 32, 32, 32, 42, 159, 34, 35, 32, 32, 16, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 11, 11, 44, 11, 11, 32, 32, 44, 44, 44, 44, 44, 44, 44, 81, 40, 35, 36, 36, 36, 71, 36, 71, 36, 70, 36, 36, 36, 93, 85, 83, 67, 67, 44, 44, 27, 27, 27, 67, 160, 44, 44, 44, 36, 36, 2, 2, 44, 44, 44, 44, 84, 36, 36, 36, 36, 36, 36, 36, 36, 36, 84, 84, 84, 84, 84, 84, 84, 84, 80, 44, 44, 44, 44, 2, 43, 36, 36, 36, 2, 72, 72, 44, 36, 36, 36, 43, 43, 43, 43, 2, 36, 36, 36, 70, 43, 43, 43, 43, 43, 84, 44, 44, 44, 44, 44, 54, 36, 70, 84, 43, 43, 84, 83, 84, 161, 2, 2, 2, 2, 2, 2, 52, 7, 7, 7, 7, 7, 44, 44, 2, 36, 36, 70, 69, 36, 36, 36, 36, 7, 7, 7, 7, 7, 36, 36, 62, 36, 36, 36, 36, 70, 43, 43, 83, 85, 83, 85, 80, 44, 44, 44, 44, 36, 70, 36, 36, 36, 36, 83, 44, 7, 7, 7, 7, 7, 44, 2, 2, 69, 36, 36, 77, 67, 93, 83, 36, 71, 43, 71, 70, 71, 36, 36, 43, 70, 62, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 81, 104, 2, 36, 36, 36, 36, 36, 93, 43, 84, 2, 104, 162, 80, 44, 44, 44, 44, 81, 36, 36, 62, 81, 36, 36, 62, 81, 36, 36, 62, 44, 44, 44, 44, 16, 16, 16, 16, 16, 110, 40, 40, 16, 16, 16, 44, 44, 44, 44, 44, 36, 93, 85, 84, 83, 161, 85, 44, 36, 36, 44, 44, 44, 44, 44, 44, 36, 36, 36, 62, 44, 81, 36, 36, 163, 163, 163, 163, 163, 163, 163, 163, 164, 164, 164, 164, 164, 164, 164, 164, 16, 16, 16, 108, 44, 44, 44, 44, 44, 53, 16, 16, 44, 44, 81, 71, 36, 36, 36, 36, 165, 36, 36, 36, 36, 36, 36, 62, 36, 36, 62, 62, 36, 81, 62, 36, 36, 36, 36, 36, 36, 41, 41, 41, 41, 41, 41, 41, 41, 44, 44, 44, 44, 44, 44, 44, 44, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 144, 44, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 160, 44, 2, 2, 2, 166, 126, 44, 44, 44, 6, 167, 168, 144, 144, 144, 144, 144, 144, 144, 126, 166, 126, 2, 123, 169, 2, 64, 2, 2, 149, 144, 144, 126, 2, 170, 8, 171, 66, 2, 44, 44, 36, 36, 62, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 79, 54, 2, 3, 2, 4, 5, 6, 2, 16, 16, 16, 16, 16, 17, 18, 125, 126, 4, 2, 36, 36, 36, 36, 36, 69, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 40, 44, 36, 36, 36, 44, 36, 36, 36, 44, 36, 36, 36, 44, 36, 62, 44, 20, 172, 57, 130, 26, 8, 140, 90, 44, 44, 44, 44, 79, 65, 67, 44, 36, 36, 36, 36, 36, 36, 81, 36, 36, 36, 36, 36, 36, 62, 36, 81, 2, 64, 44, 173, 27, 27, 27, 27, 27, 27, 44, 56, 67, 67, 67, 67, 101, 101, 139, 27, 89, 67, 67, 67, 67, 67, 67, 67, 67, 27, 90, 44, 90, 44, 44, 44, 44, 44, 44, 44, 67, 67, 67, 67, 67, 67, 50, 44, 174, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 44, 44, 27, 27, 44, 44, 44, 44, 44, 44, 148, 36, 36, 36, 36, 175, 44, 44, 36, 36, 36, 43, 43, 80, 44, 44, 36, 36, 36, 36, 36, 36, 36, 54, 36, 36, 44, 44, 36, 36, 36, 36, 176, 101, 101, 44, 44, 44, 44, 44, 11, 11, 11, 11, 16, 16, 16, 16, 36, 36, 44, 44, 44, 44, 44, 54, 36, 36, 36, 44, 62, 36, 36, 36, 36, 36, 36, 81, 62, 44, 62, 81, 36, 36, 36, 54, 27, 27, 27, 27, 36, 36, 36, 77, 156, 27, 27, 27, 44, 44, 44, 173, 27, 27, 27, 27, 36, 62, 36, 44, 44, 173, 27, 27, 36, 36, 36, 27, 27, 27, 44, 54, 36, 36, 36, 36, 36, 44, 44, 54, 36, 36, 36, 36, 44, 44, 27, 36, 44, 27, 27, 27, 27, 27, 27, 27, 70, 43, 58, 80, 44, 44, 43, 43, 36, 36, 81, 36, 81, 36, 36, 36, 36, 36, 44, 44, 43, 80, 44, 58, 27, 27, 27, 27, 44, 44, 44, 44, 2, 2, 2, 2, 64, 44, 44, 44, 36, 36, 36, 36, 36, 36, 177, 30, 36, 36, 36, 36, 36, 36, 177, 27, 36, 36, 36, 36, 78, 36, 36, 36, 36, 36, 70, 80, 44, 173, 27, 27, 2, 2, 2, 64, 44, 44, 44, 44, 36, 36, 36, 44, 54, 2, 2, 2, 36, 36, 36, 44, 27, 27, 27, 27, 36, 62, 44, 44, 27, 27, 27, 27, 36, 44, 44, 44, 54, 2, 64, 44, 44, 44, 44, 44, 173, 27, 27, 27, 36, 36, 36, 36, 62, 44, 44, 44, 11, 47, 44, 44, 44, 44, 44, 44, 16, 108, 44, 44, 44, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 96, 85, 94, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 43, 43, 43, 43, 43, 43, 43, 61, 2, 2, 2, 44, 27, 27, 27, 7, 7, 7, 7, 7, 44, 44, 44, 44, 44, 44, 44, 58, 84, 85, 43, 83, 85, 61, 178, 2, 2, 44, 44, 44, 44, 44, 44, 44, 43, 71, 36, 36, 36, 36, 36, 36, 36, 36, 36, 70, 43, 43, 85, 43, 43, 43, 80, 7, 7, 7, 7, 7, 2, 2, 44, 44, 44, 44, 44, 44, 36, 70, 2, 62, 44, 44, 44, 44, 36, 93, 84, 43, 43, 43, 43, 83, 94, 36, 63, 2, 2, 43, 61, 44, 7, 7, 7, 7, 7, 63, 63, 2, 173, 27, 27, 27, 27, 27, 27, 27, 27, 27, 96, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 84, 85, 43, 84, 83, 43, 2, 2, 2, 44, 36, 36, 36, 62, 62, 36, 36, 81, 36, 36, 36, 36, 36, 36, 36, 81, 36, 36, 36, 36, 63, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 70, 84, 85, 43, 43, 43, 80, 44, 44, 43, 84, 81, 36, 36, 36, 62, 81, 83, 84, 88, 87, 88, 87, 84, 44, 62, 44, 44, 87, 44, 44, 81, 36, 36, 84, 44, 43, 43, 43, 80, 44, 43, 43, 80, 44, 44, 44, 44, 44, 84, 85, 43, 43, 83, 83, 84, 85, 83, 43, 36, 72, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 93, 84, 43, 43, 44, 84, 84, 43, 85, 61, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 36, 36, 43, 44, 84, 85, 43, 43, 43, 83, 85, 85, 61, 2, 62, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 70, 85, 84, 43, 43, 43, 85, 44, 44, 44, 44, 36, 36, 36, 36, 36, 44, 58, 43, 84, 43, 43, 85, 43, 43, 44, 44, 7, 7, 7, 7, 7, 27, 2, 92, 27, 96, 44, 44, 44, 44, 44, 81, 101, 101, 101, 101, 101, 101, 101, 175, 2, 2, 64, 44, 44, 44, 44, 44, 43, 43, 61, 44, 44, 44, 44, 44, 43, 43, 43, 61, 2, 2, 67, 67, 40, 40, 92, 44, 44, 44, 44, 44, 7, 7, 7, 7, 7, 173, 27, 27, 27, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 44, 44, 81, 36, 93, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 88, 43, 74, 40, 40, 40, 40, 40, 40, 36, 44, 44, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 44, 50, 61, 65, 65, 44, 44, 44, 44, 44, 44, 67, 67, 67, 90, 56, 67, 67, 67, 67, 67, 179, 85, 43, 67, 179, 84, 84, 180, 65, 65, 65, 181, 43, 43, 43, 76, 50, 43, 43, 43, 67, 67, 67, 67, 67, 67, 67, 43, 43, 67, 67, 67, 67, 67, 90, 44, 44, 44, 67, 43, 76, 44, 44, 44, 44, 44, 27, 44, 44, 44, 44, 44, 44, 44, 11, 11, 11, 11, 11, 16, 16, 16, 16, 16, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 16, 16, 16, 108, 16, 16, 16, 16, 16, 11, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 47, 11, 44, 47, 48, 47, 48, 11, 47, 11, 11, 11, 11, 16, 16, 53, 53, 16, 16, 16, 53, 16, 16, 16, 16, 16, 16, 16, 11, 48, 11, 47, 48, 11, 11, 11, 47, 11, 11, 11, 47, 16, 16, 16, 16, 16, 11, 48, 11, 47, 11, 11, 47, 47, 44, 11, 11, 11, 47, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 11, 11, 11, 11, 11, 16, 16, 16, 16, 16, 16, 16, 16, 44, 11, 11, 11, 11, 31, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 33, 16, 16, 16, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 31, 16, 16, 16, 16, 33, 16, 16, 16, 11, 11, 11, 11, 31, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 33, 16, 16, 16, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 31, 16, 16, 16, 16, 33, 16, 16, 16, 11, 11, 11, 11, 31, 16, 16, 16, 16, 33, 16, 16, 16, 32, 44, 7, 7, 7, 7, 7, 7, 7, 7, 7, 43, 43, 43, 76, 67, 50, 43, 43, 43, 43, 43, 43, 43, 43, 76, 67, 67, 67, 50, 67, 67, 67, 67, 67, 67, 67, 76, 21, 2, 2, 44, 44, 44, 44, 44, 44, 44, 58, 43, 43, 36, 36, 62, 173, 27, 27, 27, 27, 43, 43, 43, 80, 44, 44, 44, 44, 36, 36, 81, 36, 36, 36, 36, 36, 81, 62, 62, 81, 81, 36, 36, 36, 36, 62, 36, 36, 81, 81, 44, 44, 44, 62, 44, 81, 81, 81, 81, 36, 81, 62, 62, 81, 81, 81, 81, 81, 81, 62, 62, 81, 36, 62, 36, 36, 36, 62, 36, 36, 81, 36, 62, 62, 36, 36, 36, 36, 36, 81, 36, 36, 81, 36, 81, 36, 36, 81, 36, 36, 8, 44, 44, 44, 44, 44, 44, 44, 56, 67, 67, 67, 67, 67, 67, 67, 44, 44, 44, 67, 67, 67, 67, 67, 67, 90, 44, 44, 44, 44, 44, 44, 67, 67, 67, 67, 67, 25, 41, 41, 67, 67, 56, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 90, 44, 67, 67, 90, 44, 44, 44, 44, 44, 67, 67, 67, 67, 44, 44, 44, 44, 67, 67, 67, 67, 67, 67, 67, 44, 79, 44, 44, 44, 44, 44, 44, 44, 65, 65, 65, 65, 65, 65, 65, 65, 164, 164, 164, 164, 164, 164, 164, 44, }; static RE_UINT8 re_general_category_stage_5[] = { 15, 15, 12, 23, 23, 23, 25, 23, 20, 21, 23, 24, 23, 19, 9, 9, 24, 24, 24, 23, 23, 1, 1, 1, 1, 20, 23, 21, 26, 22, 26, 2, 2, 2, 2, 20, 24, 21, 24, 15, 25, 25, 27, 23, 26, 27, 5, 28, 24, 16, 27, 26, 27, 24, 11, 11, 26, 11, 5, 29, 11, 23, 1, 24, 1, 2, 2, 24, 2, 1, 2, 5, 5, 5, 1, 3, 3, 2, 5, 2, 4, 4, 26, 26, 4, 26, 6, 6, 0, 0, 4, 2, 1, 23, 1, 0, 0, 1, 24, 1, 27, 6, 7, 7, 0, 4, 0, 2, 0, 23, 19, 0, 0, 27, 27, 25, 0, 6, 19, 6, 23, 6, 6, 23, 5, 0, 5, 23, 23, 0, 16, 16, 23, 25, 27, 27, 16, 0, 4, 5, 5, 6, 6, 5, 23, 5, 6, 16, 6, 4, 4, 6, 6, 27, 5, 27, 27, 5, 0, 16, 6, 0, 0, 5, 4, 0, 6, 8, 8, 8, 8, 6, 23, 4, 0, 8, 8, 0, 11, 27, 27, 0, 0, 25, 23, 27, 5, 8, 8, 5, 23, 11, 11, 0, 19, 5, 12, 5, 5, 20, 21, 0, 10, 10, 10, 5, 19, 23, 5, 4, 7, 0, 2, 4, 3, 3, 2, 0, 3, 26, 2, 26, 0, 26, 1, 26, 26, 0, 12, 12, 12, 16, 19, 19, 28, 29, 20, 28, 13, 14, 16, 12, 23, 28, 29, 23, 23, 22, 22, 23, 24, 20, 21, 23, 23, 12, 11, 4, 21, 4, 25, 0, 6, 7, 7, 6, 1, 27, 27, 1, 27, 2, 2, 27, 10, 1, 2, 10, 10, 11, 24, 27, 27, 20, 21, 27, 21, 24, 21, 20, 2, 6, 20, 0, 27, 4, 5, 10, 19, 20, 21, 21, 27, 10, 19, 4, 10, 4, 6, 26, 26, 4, 27, 11, 4, 23, 7, 23, 26, 1, 25, 27, 8, 23, 4, 8, 18, 18, 17, 17, 5, 24, 23, 20, 19, 22, 22, 20, 22, 22, 24, 19, 24, 0, 24, 26, 0, 11, 6, 11, 10, 0, 23, 10, 5, 11, 23, 16, 27, 8, 8, 16, 16, 6, }; /* General_Category: 9628 bytes. */ RE_UINT32 re_get_general_category(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 11; code = ch ^ (f << 11); pos = (RE_UINT32)re_general_category_stage_1[f] << 4; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_general_category_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_general_category_stage_3[pos + f] << 3; f = code >> 1; code ^= f << 1; pos = (RE_UINT32)re_general_category_stage_4[pos + f] << 1; value = re_general_category_stage_5[pos + code]; return value; } /* Block. */ static RE_UINT8 re_block_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 16, 16, 16, 16, 18, 16, 19, 20, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 24, 25, 16, 16, 26, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 27, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, }; static RE_UINT8 re_block_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 9, 10, 11, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 28, 29, 30, 31, 31, 32, 32, 32, 33, 34, 34, 34, 34, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 50, 51, 51, 52, 53, 54, 55, 56, 56, 57, 57, 58, 59, 60, 61, 62, 62, 63, 64, 65, 65, 66, 67, 68, 68, 69, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 82, 83, 83, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 85, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 87, 87, 87, 87, 87, 87, 87, 87, 87, 88, 89, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 103, 104, 104, 104, 104, 104, 104, 104, 105, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 108, 108, 108, 108, 109, 110, 110, 110, 110, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 119, 126, 126, 126, 119, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 119, 119, 137, 119, 119, 119, 138, 139, 140, 141, 142, 143, 144, 119, 119, 145, 119, 146, 147, 148, 149, 119, 119, 150, 119, 119, 119, 151, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 152, 152, 152, 152, 152, 152, 152, 152, 153, 154, 155, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 156, 156, 156, 156, 156, 156, 156, 156, 157, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 158, 158, 158, 158, 158, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 159, 159, 159, 159, 160, 161, 162, 163, 119, 119, 119, 119, 119, 119, 164, 165, 166, 166, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 167, 168, 119, 119, 119, 119, 119, 119, 169, 169, 170, 170, 171, 119, 172, 119, 173, 173, 173, 173, 173, 173, 173, 173, 174, 174, 174, 174, 174, 175, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 176, 177, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 178, 178, 119, 119, 179, 180, 181, 181, 182, 182, 183, 183, 183, 183, 183, 183, 184, 185, 186, 187, 188, 188, 189, 189, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 191, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 193, 194, 195, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 197, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 198, 198, 198, 198, 199, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 200, 119, 201, 202, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, }; static RE_UINT16 re_block_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 40, 41, 41, 42, 42, 42, 42, 42, 42, 43, 43, 44, 44, 45, 45, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 51, 51, 51, 52, 52, 52, 52, 52, 52, 53, 53, 54, 54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 60, 61, 61, 61, 19, 19, 19, 19, 62, 63, 63, 63, 64, 64, 64, 64, 64, 64, 64, 64, 65, 65, 65, 65, 66, 66, 66, 66, 67, 67, 67, 67, 67, 67, 67, 67, 68, 68, 68, 68, 68, 68, 68, 68, 69, 69, 69, 69, 69, 69, 69, 70, 70, 70, 71, 71, 71, 72, 72, 72, 73, 73, 73, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 75, 76, 76, 76, 76, 76, 76, 76, 76, 77, 77, 77, 77, 77, 77, 77, 77, 78, 78, 78, 78, 79, 79, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 81, 81, 81, 81, 81, 81, 81, 81, 82, 82, 83, 83, 83, 83, 83, 83, 84, 84, 84, 84, 84, 84, 84, 84, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 86, 86, 86, 87, 88, 88, 88, 88, 88, 88, 88, 88, 89, 89, 89, 89, 89, 89, 89, 89, 90, 90, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 91, 91, 91, 91, 92, 92, 92, 92, 92, 92, 92, 92, 93, 93, 93, 93, 93, 93, 94, 94, 95, 95, 95, 95, 95, 95, 95, 95, 96, 96, 96, 97, 97, 97, 97, 97, 98, 98, 98, 98, 98, 98, 99, 99, 100, 100, 100, 100, 100, 100, 100, 100, 101, 101, 101, 101, 101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 19, 103, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105, 106, 106, 106, 106, 106, 106, 107, 107, 107, 108, 108, 108, 108, 108, 108, 109, 110, 110, 111, 111, 111, 112, 113, 113, 113, 113, 113, 113, 113, 113, 114, 114, 114, 114, 114, 114, 114, 114, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 116, 116, 116, 116, 117, 117, 117, 117, 117, 117, 117, 117, 118, 118, 118, 118, 118, 118, 118, 118, 118, 119, 119, 119, 119, 120, 120, 120, 121, 121, 121, 121, 121, 121, 121, 121, 121, 121, 121, 121, 122, 122, 122, 122, 122, 122, 123, 123, 123, 123, 123, 123, 124, 124, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 126, 126, 126, 127, 128, 128, 128, 128, 129, 129, 129, 129, 129, 129, 130, 130, 131, 131, 131, 132, 132, 132, 133, 133, 134, 134, 134, 134, 134, 134, 135, 135, 136, 136, 136, 136, 136, 136, 137, 137, 138, 138, 138, 138, 138, 138, 139, 139, 140, 140, 140, 141, 141, 141, 141, 142, 142, 142, 142, 142, 143, 143, 143, 143, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 145, 145, 145, 145, 145, 146, 146, 146, 146, 146, 146, 146, 146, 147, 147, 147, 147, 147, 147, 147, 147, 148, 148, 148, 148, 148, 148, 148, 148, 149, 149, 149, 149, 149, 149, 149, 149, 150, 150, 150, 150, 150, 150, 150, 150, 151, 151, 151, 151, 151, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 153, 154, 155, 156, 156, 157, 157, 158, 158, 158, 158, 158, 158, 158, 158, 158, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 160, 161, 161, 161, 161, 161, 161, 161, 161, 162, 162, 162, 162, 162, 162, 162, 162, 163, 163, 163, 163, 164, 164, 164, 164, 164, 165, 165, 165, 165, 166, 166, 166, 19, 19, 19, 19, 19, 19, 19, 19, 167, 167, 168, 168, 168, 168, 169, 169, 170, 170, 170, 171, 171, 172, 172, 172, 173, 173, 174, 174, 174, 174, 19, 19, 175, 175, 175, 175, 175, 176, 176, 176, 177, 177, 177, 19, 19, 19, 19, 19, 178, 178, 178, 179, 179, 179, 179, 19, 180, 180, 180, 180, 180, 180, 180, 180, 181, 181, 181, 181, 182, 182, 183, 183, 184, 184, 184, 19, 19, 19, 185, 185, 186, 186, 187, 187, 19, 19, 19, 19, 188, 188, 189, 189, 189, 189, 189, 189, 190, 190, 190, 190, 190, 190, 191, 191, 192, 192, 19, 19, 193, 193, 193, 193, 194, 194, 194, 194, 195, 195, 196, 196, 197, 197, 197, 19, 19, 19, 19, 19, 198, 198, 198, 198, 198, 19, 19, 19, 199, 199, 199, 199, 199, 199, 199, 199, 19, 19, 19, 19, 19, 19, 200, 200, 201, 201, 201, 201, 201, 201, 201, 201, 202, 202, 202, 202, 202, 203, 203, 203, 204, 204, 204, 204, 204, 205, 205, 205, 206, 206, 206, 206, 206, 206, 207, 207, 208, 208, 208, 208, 208, 19, 19, 19, 209, 209, 209, 210, 210, 210, 210, 210, 211, 211, 211, 211, 211, 211, 211, 211, 212, 212, 212, 212, 212, 212, 19, 19, 213, 213, 213, 213, 213, 213, 213, 213, 214, 214, 214, 214, 214, 214, 19, 19, 215, 215, 215, 215, 215, 19, 19, 19, 216, 216, 216, 216, 19, 19, 19, 19, 19, 19, 217, 217, 217, 217, 217, 217, 19, 19, 19, 19, 218, 218, 218, 218, 219, 219, 219, 219, 219, 219, 219, 219, 220, 220, 220, 220, 220, 220, 220, 220, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 19, 19, 19, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222, 19, 19, 19, 19, 19, 223, 223, 223, 223, 223, 223, 223, 223, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 225, 225, 225, 19, 19, 19, 19, 19, 19, 226, 226, 226, 227, 227, 227, 227, 227, 227, 227, 227, 227, 19, 19, 19, 19, 19, 19, 19, 228, 228, 228, 228, 228, 228, 228, 228, 228, 228, 19, 19, 19, 19, 19, 19, 229, 229, 229, 229, 229, 229, 229, 229, 230, 230, 230, 230, 230, 230, 230, 230, 230, 230, 231, 19, 19, 19, 19, 19, 232, 232, 232, 232, 232, 232, 232, 232, 233, 233, 233, 233, 233, 233, 233, 233, 234, 234, 234, 234, 234, 19, 19, 19, 235, 235, 235, 235, 235, 235, 236, 236, 237, 237, 237, 237, 237, 237, 237, 237, 238, 238, 238, 238, 238, 238, 238, 238, 238, 238, 238, 19, 19, 19, 19, 19, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 19, 19, 240, 240, 240, 240, 240, 240, 240, 240, 241, 241, 241, 242, 242, 242, 242, 242, 242, 242, 243, 243, 243, 243, 243, 243, 244, 244, 244, 244, 244, 244, 244, 244, 245, 245, 245, 245, 245, 245, 245, 245, 246, 246, 246, 246, 246, 246, 246, 246, 247, 247, 247, 247, 247, 248, 248, 248, 249, 249, 249, 249, 249, 249, 249, 249, 250, 250, 250, 250, 250, 250, 250, 250, 251, 251, 251, 251, 251, 251, 251, 251, 252, 252, 252, 252, 252, 252, 252, 252, 253, 253, 253, 253, 253, 253, 253, 253, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 19, 19, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 19, 19, 19, 19, 19, 258, 258, 258, 258, 258, 258, 258, 258, 258, 258, 19, 19, 19, 19, 19, 19, 259, 259, 259, 259, 259, 259, 259, 259, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 19, 261, 261, 261, 261, 261, 261, 261, 261, 262, 262, 262, 262, 262, 262, 262, 262, }; static RE_UINT16 re_block_stage_4[] = { 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 39, 39, 39, 40, 40, 40, 40, 41, 41, 41, 41, 42, 42, 42, 42, 43, 43, 43, 43, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 48, 48, 48, 48, 49, 49, 49, 49, 50, 50, 50, 50, 51, 51, 51, 51, 52, 52, 52, 52, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 55, 55, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 61, 61, 61, 61, 62, 62, 62, 62, 63, 63, 63, 63, 64, 64, 64, 64, 65, 65, 65, 65, 66, 66, 66, 66, 67, 67, 67, 67, 68, 68, 68, 68, 69, 69, 69, 69, 70, 70, 70, 70, 71, 71, 71, 71, 72, 72, 72, 72, 73, 73, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 76, 76, 76, 76, 77, 77, 77, 77, 78, 78, 78, 78, 79, 79, 79, 79, 80, 80, 80, 80, 81, 81, 81, 81, 82, 82, 82, 82, 83, 83, 83, 83, 84, 84, 84, 84, 85, 85, 85, 85, 86, 86, 86, 86, 87, 87, 87, 87, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 90, 91, 91, 91, 91, 92, 92, 92, 92, 93, 93, 93, 93, 94, 94, 94, 94, 95, 95, 95, 95, 96, 96, 96, 96, 97, 97, 97, 97, 98, 98, 98, 98, 99, 99, 99, 99, 100, 100, 100, 100, 101, 101, 101, 101, 102, 102, 102, 102, 103, 103, 103, 103, 104, 104, 104, 104, 105, 105, 105, 105, 106, 106, 106, 106, 107, 107, 107, 107, 108, 108, 108, 108, 109, 109, 109, 109, 110, 110, 110, 110, 111, 111, 111, 111, 112, 112, 112, 112, 113, 113, 113, 113, 114, 114, 114, 114, 115, 115, 115, 115, 116, 116, 116, 116, 117, 117, 117, 117, 118, 118, 118, 118, 119, 119, 119, 119, 120, 120, 120, 120, 121, 121, 121, 121, 122, 122, 122, 122, 123, 123, 123, 123, 124, 124, 124, 124, 125, 125, 125, 125, 126, 126, 126, 126, 127, 127, 127, 127, 128, 128, 128, 128, 129, 129, 129, 129, 130, 130, 130, 130, 131, 131, 131, 131, 132, 132, 132, 132, 133, 133, 133, 133, 134, 134, 134, 134, 135, 135, 135, 135, 136, 136, 136, 136, 137, 137, 137, 137, 138, 138, 138, 138, 139, 139, 139, 139, 140, 140, 140, 140, 141, 141, 141, 141, 142, 142, 142, 142, 143, 143, 143, 143, 144, 144, 144, 144, 145, 145, 145, 145, 146, 146, 146, 146, 147, 147, 147, 147, 148, 148, 148, 148, 149, 149, 149, 149, 150, 150, 150, 150, 151, 151, 151, 151, 152, 152, 152, 152, 153, 153, 153, 153, 154, 154, 154, 154, 155, 155, 155, 155, 156, 156, 156, 156, 157, 157, 157, 157, 158, 158, 158, 158, 159, 159, 159, 159, 160, 160, 160, 160, 161, 161, 161, 161, 162, 162, 162, 162, 163, 163, 163, 163, 164, 164, 164, 164, 165, 165, 165, 165, 166, 166, 166, 166, 167, 167, 167, 167, 168, 168, 168, 168, 169, 169, 169, 169, 170, 170, 170, 170, 171, 171, 171, 171, 172, 172, 172, 172, 173, 173, 173, 173, 174, 174, 174, 174, 175, 175, 175, 175, 176, 176, 176, 176, 177, 177, 177, 177, 178, 178, 178, 178, 179, 179, 179, 179, 180, 180, 180, 180, 181, 181, 181, 181, 182, 182, 182, 182, 183, 183, 183, 183, 184, 184, 184, 184, 185, 185, 185, 185, 186, 186, 186, 186, 187, 187, 187, 187, 188, 188, 188, 188, 189, 189, 189, 189, 190, 190, 190, 190, 191, 191, 191, 191, 192, 192, 192, 192, 193, 193, 193, 193, 194, 194, 194, 194, 195, 195, 195, 195, 196, 196, 196, 196, 197, 197, 197, 197, 198, 198, 198, 198, 199, 199, 199, 199, 200, 200, 200, 200, 201, 201, 201, 201, 202, 202, 202, 202, 203, 203, 203, 203, 204, 204, 204, 204, 205, 205, 205, 205, 206, 206, 206, 206, 207, 207, 207, 207, 208, 208, 208, 208, 209, 209, 209, 209, 210, 210, 210, 210, 211, 211, 211, 211, 212, 212, 212, 212, 213, 213, 213, 213, 214, 214, 214, 214, 215, 215, 215, 215, 216, 216, 216, 216, 217, 217, 217, 217, 218, 218, 218, 218, 219, 219, 219, 219, 220, 220, 220, 220, 221, 221, 221, 221, 222, 222, 222, 222, 223, 223, 223, 223, 224, 224, 224, 224, 225, 225, 225, 225, 226, 226, 226, 226, 227, 227, 227, 227, 228, 228, 228, 228, 229, 229, 229, 229, 230, 230, 230, 230, 231, 231, 231, 231, 232, 232, 232, 232, 233, 233, 233, 233, 234, 234, 234, 234, 235, 235, 235, 235, 236, 236, 236, 236, 237, 237, 237, 237, 238, 238, 238, 238, 239, 239, 239, 239, 240, 240, 240, 240, 241, 241, 241, 241, 242, 242, 242, 242, 243, 243, 243, 243, 244, 244, 244, 244, 245, 245, 245, 245, 246, 246, 246, 246, 247, 247, 247, 247, 248, 248, 248, 248, 249, 249, 249, 249, 250, 250, 250, 250, 251, 251, 251, 251, 252, 252, 252, 252, 253, 253, 253, 253, 254, 254, 254, 254, 255, 255, 255, 255, 256, 256, 256, 256, 257, 257, 257, 257, 258, 258, 258, 258, 259, 259, 259, 259, 260, 260, 260, 260, 261, 261, 261, 261, 262, 262, 262, 262, }; static RE_UINT16 re_block_stage_5[] = { 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 0, 0, 0, 0, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 39, 39, 39, 40, 40, 40, 40, 41, 41, 41, 41, 42, 42, 42, 42, 43, 43, 43, 43, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 48, 48, 48, 48, 49, 49, 49, 49, 50, 50, 50, 50, 51, 51, 51, 51, 52, 52, 52, 52, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 55, 55, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 61, 61, 61, 61, 62, 62, 62, 62, 63, 63, 63, 63, 64, 64, 64, 64, 65, 65, 65, 65, 66, 66, 66, 66, 67, 67, 67, 67, 68, 68, 68, 68, 69, 69, 69, 69, 70, 70, 70, 70, 71, 71, 71, 71, 72, 72, 72, 72, 73, 73, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 76, 76, 76, 76, 77, 77, 77, 77, 78, 78, 78, 78, 79, 79, 79, 79, 80, 80, 80, 80, 81, 81, 81, 81, 82, 82, 82, 82, 83, 83, 83, 83, 84, 84, 84, 84, 85, 85, 85, 85, 86, 86, 86, 86, 87, 87, 87, 87, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 90, 91, 91, 91, 91, 92, 92, 92, 92, 93, 93, 93, 93, 94, 94, 94, 94, 95, 95, 95, 95, 96, 96, 96, 96, 97, 97, 97, 97, 98, 98, 98, 98, 99, 99, 99, 99, 100, 100, 100, 100, 101, 101, 101, 101, 102, 102, 102, 102, 103, 103, 103, 103, 104, 104, 104, 104, 105, 105, 105, 105, 106, 106, 106, 106, 107, 107, 107, 107, 108, 108, 108, 108, 109, 109, 109, 109, 110, 110, 110, 110, 111, 111, 111, 111, 112, 112, 112, 112, 113, 113, 113, 113, 114, 114, 114, 114, 115, 115, 115, 115, 116, 116, 116, 116, 117, 117, 117, 117, 118, 118, 118, 118, 119, 119, 119, 119, 120, 120, 120, 120, 121, 121, 121, 121, 122, 122, 122, 122, 123, 123, 123, 123, 124, 124, 124, 124, 125, 125, 125, 125, 126, 126, 126, 126, 127, 127, 127, 127, 128, 128, 128, 128, 129, 129, 129, 129, 130, 130, 130, 130, 131, 131, 131, 131, 132, 132, 132, 132, 133, 133, 133, 133, 134, 134, 134, 134, 135, 135, 135, 135, 136, 136, 136, 136, 137, 137, 137, 137, 138, 138, 138, 138, 139, 139, 139, 139, 140, 140, 140, 140, 141, 141, 141, 141, 142, 142, 142, 142, 143, 143, 143, 143, 144, 144, 144, 144, 145, 145, 145, 145, 146, 146, 146, 146, 147, 147, 147, 147, 148, 148, 148, 148, 149, 149, 149, 149, 150, 150, 150, 150, 151, 151, 151, 151, 152, 152, 152, 152, 153, 153, 153, 153, 154, 154, 154, 154, 155, 155, 155, 155, 156, 156, 156, 156, 157, 157, 157, 157, 158, 158, 158, 158, 159, 159, 159, 159, 160, 160, 160, 160, 161, 161, 161, 161, 162, 162, 162, 162, 163, 163, 163, 163, 164, 164, 164, 164, 165, 165, 165, 165, 166, 166, 166, 166, 167, 167, 167, 167, 168, 168, 168, 168, 169, 169, 169, 169, 170, 170, 170, 170, 171, 171, 171, 171, 172, 172, 172, 172, 173, 173, 173, 173, 174, 174, 174, 174, 175, 175, 175, 175, 176, 176, 176, 176, 177, 177, 177, 177, 178, 178, 178, 178, 179, 179, 179, 179, 180, 180, 180, 180, 181, 181, 181, 181, 182, 182, 182, 182, 183, 183, 183, 183, 184, 184, 184, 184, 185, 185, 185, 185, 186, 186, 186, 186, 187, 187, 187, 187, 188, 188, 188, 188, 189, 189, 189, 189, 190, 190, 190, 190, 191, 191, 191, 191, 192, 192, 192, 192, 193, 193, 193, 193, 194, 194, 194, 194, 195, 195, 195, 195, 196, 196, 196, 196, 197, 197, 197, 197, 198, 198, 198, 198, 199, 199, 199, 199, 200, 200, 200, 200, 201, 201, 201, 201, 202, 202, 202, 202, 203, 203, 203, 203, 204, 204, 204, 204, 205, 205, 205, 205, 206, 206, 206, 206, 207, 207, 207, 207, 208, 208, 208, 208, 209, 209, 209, 209, 210, 210, 210, 210, 211, 211, 211, 211, 212, 212, 212, 212, 213, 213, 213, 213, 214, 214, 214, 214, 215, 215, 215, 215, 216, 216, 216, 216, 217, 217, 217, 217, 218, 218, 218, 218, 219, 219, 219, 219, 220, 220, 220, 220, 221, 221, 221, 221, 222, 222, 222, 222, 223, 223, 223, 223, 224, 224, 224, 224, 225, 225, 225, 225, 226, 226, 226, 226, 227, 227, 227, 227, 228, 228, 228, 228, 229, 229, 229, 229, 230, 230, 230, 230, 231, 231, 231, 231, 232, 232, 232, 232, 233, 233, 233, 233, 234, 234, 234, 234, 235, 235, 235, 235, 236, 236, 236, 236, 237, 237, 237, 237, 238, 238, 238, 238, 239, 239, 239, 239, 240, 240, 240, 240, 241, 241, 241, 241, 242, 242, 242, 242, 243, 243, 243, 243, 244, 244, 244, 244, 245, 245, 245, 245, 246, 246, 246, 246, 247, 247, 247, 247, 248, 248, 248, 248, 249, 249, 249, 249, 250, 250, 250, 250, 251, 251, 251, 251, 252, 252, 252, 252, 253, 253, 253, 253, 254, 254, 254, 254, 255, 255, 255, 255, 256, 256, 256, 256, 257, 257, 257, 257, 258, 258, 258, 258, 259, 259, 259, 259, 260, 260, 260, 260, 261, 261, 261, 261, 262, 262, 262, 262, }; /* Block: 8720 bytes. */ RE_UINT32 re_get_block(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_block_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_block_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_block_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_block_stage_4[pos + f] << 2; value = re_block_stage_5[pos + code]; return value; } /* Script. */ static RE_UINT8 re_script_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 12, 12, 12, 12, 13, 14, 14, 14, 14, 15, 16, 17, 18, 19, 20, 14, 21, 14, 22, 14, 14, 14, 14, 23, 14, 14, 14, 14, 14, 14, 14, 14, 24, 25, 14, 14, 26, 27, 14, 28, 29, 30, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 31, 7, 32, 33, 7, 34, 14, 14, 14, 14, 14, 35, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, }; static RE_UINT8 re_script_stage_2[] = { 0, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 32, 33, 34, 35, 36, 37, 37, 37, 37, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 2, 2, 53, 54, 55, 56, 57, 58, 59, 59, 59, 60, 61, 59, 59, 59, 59, 59, 59, 59, 62, 62, 59, 59, 59, 59, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 59, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 80, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 81, 82, 82, 82, 82, 82, 82, 82, 82, 82, 83, 84, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 97, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 71, 71, 99, 100, 101, 102, 103, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 98, 114, 115, 116, 117, 118, 119, 98, 120, 120, 121, 98, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 98, 98, 132, 98, 98, 98, 133, 134, 135, 136, 137, 138, 139, 98, 98, 140, 98, 141, 142, 143, 144, 98, 98, 145, 98, 98, 98, 146, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 147, 147, 147, 147, 147, 147, 147, 148, 149, 147, 150, 98, 98, 98, 98, 98, 151, 151, 151, 151, 151, 151, 151, 151, 152, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 153, 153, 153, 153, 154, 98, 98, 98, 155, 155, 155, 155, 156, 157, 158, 159, 98, 98, 98, 98, 98, 98, 160, 161, 162, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 163, 164, 98, 98, 98, 98, 98, 98, 59, 165, 166, 167, 168, 98, 169, 98, 170, 171, 172, 59, 59, 173, 59, 174, 175, 175, 175, 175, 175, 176, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 177, 178, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 179, 180, 98, 98, 181, 182, 183, 184, 185, 98, 59, 59, 59, 59, 186, 187, 59, 188, 189, 190, 191, 192, 193, 194, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 195, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 196, 71, 197, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 198, 98, 98, 71, 71, 71, 71, 199, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 200, 98, 201, 202, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, }; static RE_UINT16 re_script_stage_3[] = { 0, 0, 0, 0, 1, 2, 1, 2, 0, 0, 3, 3, 4, 5, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 0, 0, 7, 0, 8, 8, 8, 8, 8, 8, 8, 9, 10, 11, 12, 11, 11, 11, 13, 11, 14, 14, 14, 14, 14, 14, 14, 14, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16, 17, 18, 16, 17, 19, 20, 21, 21, 22, 21, 23, 24, 25, 26, 27, 27, 28, 29, 27, 30, 27, 27, 27, 27, 27, 31, 27, 27, 32, 33, 33, 33, 34, 27, 27, 27, 35, 35, 35, 36, 37, 37, 37, 38, 39, 39, 40, 41, 42, 43, 44, 44, 44, 44, 27, 45, 44, 44, 46, 27, 47, 47, 47, 47, 47, 48, 49, 47, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 123, 124, 123, 125, 44, 44, 126, 127, 128, 129, 130, 131, 44, 44, 132, 132, 132, 132, 133, 132, 134, 135, 132, 133, 132, 136, 136, 137, 44, 44, 138, 138, 138, 138, 138, 138, 138, 138, 138, 138, 139, 139, 140, 139, 139, 141, 142, 142, 142, 142, 142, 142, 142, 142, 143, 143, 143, 143, 144, 145, 143, 143, 144, 143, 143, 146, 147, 148, 143, 143, 143, 147, 143, 143, 143, 149, 143, 150, 143, 151, 152, 152, 152, 152, 152, 153, 154, 154, 154, 154, 154, 154, 154, 154, 155, 156, 157, 157, 157, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 168, 168, 168, 168, 169, 170, 170, 171, 172, 173, 173, 173, 173, 173, 174, 173, 173, 175, 154, 154, 154, 154, 176, 177, 178, 179, 179, 180, 181, 182, 183, 184, 184, 185, 184, 186, 187, 168, 168, 188, 189, 190, 190, 190, 191, 190, 192, 193, 193, 194, 195, 44, 44, 44, 44, 196, 196, 196, 196, 197, 196, 196, 198, 199, 199, 199, 199, 200, 200, 200, 201, 202, 202, 202, 203, 204, 205, 205, 205, 44, 44, 44, 44, 206, 207, 208, 209, 4, 4, 210, 4, 4, 211, 212, 213, 4, 4, 4, 214, 8, 8, 8, 215, 11, 216, 11, 11, 216, 217, 11, 218, 11, 11, 11, 219, 219, 220, 11, 221, 222, 0, 0, 0, 0, 0, 223, 224, 225, 226, 0, 225, 44, 8, 8, 227, 0, 0, 228, 229, 230, 0, 4, 4, 231, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 232, 0, 0, 233, 44, 232, 44, 0, 0, 234, 234, 234, 234, 234, 234, 234, 234, 0, 0, 0, 0, 0, 0, 0, 235, 0, 236, 0, 237, 238, 239, 240, 44, 241, 241, 242, 241, 241, 242, 4, 4, 243, 243, 243, 243, 243, 243, 243, 244, 139, 139, 140, 245, 245, 245, 246, 247, 143, 248, 249, 249, 249, 249, 14, 14, 0, 0, 0, 0, 250, 44, 44, 44, 251, 252, 251, 251, 251, 251, 251, 253, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 254, 44, 255, 256, 0, 257, 258, 259, 260, 260, 260, 260, 261, 262, 263, 263, 263, 263, 264, 265, 266, 267, 268, 142, 142, 142, 142, 269, 0, 266, 270, 0, 0, 271, 263, 142, 269, 0, 0, 0, 0, 142, 272, 0, 0, 0, 0, 0, 263, 263, 273, 263, 263, 263, 263, 263, 274, 0, 0, 251, 251, 251, 254, 0, 0, 0, 0, 251, 251, 251, 251, 251, 254, 44, 44, 275, 275, 275, 275, 275, 275, 275, 275, 276, 275, 275, 275, 277, 278, 278, 278, 279, 279, 279, 279, 279, 279, 279, 279, 279, 279, 280, 44, 14, 14, 14, 14, 14, 14, 281, 281, 281, 281, 281, 282, 0, 0, 283, 4, 4, 4, 4, 4, 284, 4, 285, 286, 44, 44, 44, 287, 288, 288, 289, 290, 291, 291, 291, 292, 293, 293, 293, 293, 294, 295, 47, 296, 297, 297, 298, 299, 299, 300, 142, 301, 302, 302, 302, 302, 303, 304, 138, 305, 306, 306, 306, 307, 308, 309, 138, 138, 310, 310, 310, 310, 311, 312, 313, 314, 315, 316, 249, 4, 4, 317, 318, 152, 152, 152, 152, 152, 313, 313, 319, 320, 142, 142, 321, 142, 322, 142, 142, 323, 44, 44, 44, 44, 44, 44, 44, 44, 251, 251, 251, 251, 251, 251, 324, 251, 251, 251, 251, 251, 251, 325, 44, 44, 326, 327, 21, 328, 329, 27, 27, 27, 27, 27, 27, 27, 330, 46, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 331, 44, 27, 27, 27, 27, 332, 27, 27, 333, 44, 44, 334, 8, 290, 335, 0, 0, 336, 337, 338, 27, 27, 27, 27, 27, 27, 27, 339, 340, 0, 1, 2, 1, 2, 341, 262, 263, 342, 142, 269, 343, 344, 345, 346, 347, 348, 349, 350, 351, 351, 44, 44, 348, 348, 348, 348, 348, 348, 348, 352, 353, 0, 0, 354, 11, 11, 11, 11, 355, 255, 356, 44, 44, 0, 0, 357, 358, 359, 360, 360, 360, 361, 362, 255, 363, 363, 364, 365, 366, 367, 367, 368, 369, 370, 371, 371, 372, 373, 44, 44, 374, 374, 374, 374, 374, 375, 375, 375, 376, 377, 378, 44, 44, 44, 44, 44, 379, 379, 380, 381, 381, 381, 382, 44, 383, 383, 383, 383, 383, 383, 383, 383, 383, 383, 383, 384, 383, 385, 386, 44, 387, 388, 388, 389, 390, 391, 392, 392, 393, 394, 395, 44, 44, 44, 396, 397, 398, 399, 400, 401, 44, 44, 44, 44, 402, 402, 403, 404, 403, 405, 403, 403, 406, 407, 408, 409, 410, 411, 412, 412, 413, 413, 44, 44, 414, 414, 415, 416, 417, 417, 417, 418, 419, 420, 421, 422, 423, 424, 425, 44, 44, 44, 44, 44, 426, 426, 426, 426, 427, 44, 44, 44, 428, 428, 428, 429, 428, 428, 428, 430, 44, 44, 44, 44, 44, 44, 27, 431, 432, 432, 432, 432, 433, 434, 432, 435, 436, 436, 436, 436, 437, 438, 439, 440, 441, 441, 441, 442, 443, 444, 444, 445, 446, 446, 446, 446, 447, 446, 448, 449, 450, 451, 450, 452, 44, 44, 44, 44, 453, 454, 455, 456, 456, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 467, 467, 467, 468, 469, 44, 44, 470, 470, 470, 471, 470, 472, 44, 44, 473, 473, 473, 473, 474, 475, 44, 44, 476, 476, 476, 477, 478, 44, 44, 44, 479, 480, 481, 479, 44, 44, 44, 44, 44, 44, 482, 482, 482, 482, 482, 483, 44, 44, 44, 44, 484, 484, 484, 485, 486, 486, 486, 486, 486, 486, 486, 486, 486, 487, 44, 44, 44, 44, 44, 44, 486, 486, 486, 486, 486, 486, 488, 489, 486, 486, 486, 486, 490, 44, 44, 44, 491, 491, 491, 491, 491, 491, 491, 491, 491, 491, 492, 44, 44, 44, 44, 44, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 494, 44, 44, 44, 281, 281, 281, 281, 281, 281, 281, 281, 281, 281, 281, 495, 496, 497, 498, 44, 44, 44, 44, 44, 44, 499, 500, 501, 502, 502, 502, 502, 503, 504, 505, 506, 502, 44, 44, 44, 44, 44, 44, 44, 507, 507, 507, 507, 508, 507, 507, 509, 510, 507, 44, 44, 44, 44, 44, 44, 511, 44, 44, 44, 44, 44, 44, 44, 512, 512, 512, 512, 512, 512, 513, 514, 515, 516, 271, 44, 44, 44, 44, 44, 0, 0, 0, 0, 0, 0, 0, 517, 0, 0, 518, 0, 0, 0, 519, 520, 521, 0, 522, 0, 0, 0, 523, 44, 11, 11, 11, 11, 524, 44, 44, 44, 0, 0, 0, 0, 0, 233, 0, 239, 0, 0, 0, 0, 0, 223, 0, 0, 0, 525, 526, 527, 528, 0, 0, 0, 529, 530, 0, 531, 532, 533, 0, 0, 0, 0, 236, 0, 0, 0, 0, 0, 0, 0, 0, 0, 534, 0, 0, 0, 535, 535, 535, 535, 535, 535, 535, 535, 536, 537, 538, 44, 44, 44, 44, 44, 539, 539, 539, 539, 539, 539, 539, 539, 539, 539, 539, 539, 540, 541, 44, 44, 542, 27, 543, 544, 545, 546, 547, 548, 549, 550, 551, 550, 44, 44, 44, 330, 0, 0, 255, 0, 0, 0, 0, 0, 0, 271, 225, 340, 340, 340, 0, 517, 552, 0, 225, 0, 0, 0, 255, 0, 0, 232, 44, 44, 44, 44, 553, 0, 554, 0, 0, 232, 523, 239, 44, 44, 0, 0, 0, 0, 0, 0, 0, 555, 0, 0, 528, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 556, 552, 271, 0, 0, 0, 0, 0, 0, 0, 271, 0, 0, 0, 0, 0, 557, 44, 44, 255, 0, 0, 0, 558, 290, 0, 0, 558, 0, 559, 44, 44, 44, 44, 44, 44, 523, 44, 44, 44, 44, 44, 44, 557, 44, 44, 44, 556, 44, 44, 44, 251, 251, 251, 251, 251, 560, 44, 44, 251, 251, 251, 561, 251, 251, 251, 251, 251, 324, 251, 251, 251, 251, 251, 251, 251, 251, 562, 44, 44, 44, 44, 44, 251, 324, 44, 44, 44, 44, 44, 44, 563, 44, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 44, }; static RE_UINT16 re_script_stage_4[] = { 0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 3, 0, 0, 0, 4, 0, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 5, 0, 2, 5, 6, 0, 7, 7, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15, 8, 8, 8, 8, 16, 8, 8, 8, 17, 18, 18, 18, 19, 19, 19, 19, 19, 20, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 21, 22, 22, 22, 24, 21, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 12, 12, 26, 26, 27, 12, 26, 28, 12, 12, 29, 30, 29, 31, 29, 29, 32, 33, 29, 29, 29, 29, 31, 29, 34, 7, 7, 35, 29, 29, 36, 29, 29, 29, 29, 29, 29, 30, 37, 37, 37, 38, 37, 37, 37, 37, 37, 37, 39, 40, 41, 41, 41, 41, 42, 12, 12, 12, 43, 43, 43, 43, 43, 43, 44, 12, 45, 45, 45, 45, 45, 45, 45, 46, 45, 45, 45, 47, 48, 48, 48, 48, 48, 48, 48, 49, 12, 12, 12, 12, 29, 50, 12, 12, 51, 29, 29, 29, 52, 52, 52, 52, 53, 52, 52, 52, 52, 54, 52, 52, 55, 56, 55, 57, 57, 55, 55, 55, 55, 55, 58, 55, 59, 60, 61, 55, 55, 57, 57, 62, 12, 63, 12, 64, 55, 60, 55, 55, 55, 55, 55, 12, 65, 65, 66, 67, 68, 69, 69, 69, 69, 69, 70, 69, 70, 71, 72, 70, 66, 67, 68, 72, 73, 12, 65, 74, 12, 75, 69, 69, 69, 72, 12, 12, 76, 76, 77, 78, 78, 77, 77, 77, 77, 77, 79, 77, 79, 76, 80, 77, 77, 78, 78, 80, 81, 12, 12, 12, 77, 82, 77, 77, 80, 12, 83, 12, 84, 84, 85, 86, 86, 85, 85, 85, 85, 85, 87, 85, 87, 84, 88, 85, 85, 86, 86, 88, 12, 89, 12, 90, 85, 89, 85, 85, 85, 85, 12, 12, 91, 92, 93, 91, 94, 95, 96, 94, 97, 98, 93, 91, 99, 99, 95, 91, 93, 91, 94, 95, 98, 97, 12, 12, 12, 91, 99, 99, 99, 99, 93, 12, 100, 101, 100, 102, 102, 100, 100, 100, 100, 100, 102, 100, 100, 100, 103, 101, 100, 102, 102, 103, 12, 104, 105, 12, 100, 106, 100, 100, 12, 12, 100, 100, 107, 107, 108, 109, 109, 108, 108, 108, 108, 108, 109, 108, 108, 107, 110, 108, 108, 109, 109, 110, 12, 111, 12, 112, 108, 113, 108, 108, 111, 12, 12, 12, 114, 114, 115, 116, 116, 115, 115, 115, 115, 115, 115, 115, 115, 115, 117, 114, 115, 116, 116, 117, 12, 118, 12, 118, 115, 119, 115, 115, 115, 120, 114, 115, 121, 122, 123, 123, 123, 124, 121, 123, 123, 123, 123, 123, 125, 123, 123, 126, 123, 124, 127, 128, 123, 129, 123, 123, 12, 121, 123, 123, 121, 130, 12, 12, 131, 132, 132, 132, 132, 132, 132, 132, 132, 132, 133, 134, 132, 132, 132, 12, 135, 136, 137, 138, 12, 139, 140, 139, 140, 141, 142, 140, 139, 139, 143, 144, 139, 137, 139, 144, 139, 139, 144, 139, 145, 145, 145, 145, 145, 145, 146, 145, 145, 145, 145, 147, 146, 145, 145, 145, 145, 145, 145, 148, 145, 149, 150, 12, 151, 151, 151, 151, 152, 152, 152, 152, 152, 153, 12, 154, 152, 152, 155, 152, 156, 156, 156, 156, 157, 157, 157, 157, 157, 157, 158, 159, 157, 160, 158, 159, 158, 159, 157, 160, 158, 159, 157, 157, 157, 160, 157, 157, 157, 157, 160, 161, 157, 157, 157, 162, 157, 157, 159, 12, 163, 163, 163, 163, 163, 164, 163, 164, 165, 165, 165, 165, 166, 166, 166, 166, 166, 166, 166, 167, 168, 168, 168, 168, 168, 168, 169, 170, 168, 168, 171, 12, 172, 172, 172, 173, 172, 174, 12, 12, 175, 175, 175, 175, 175, 176, 12, 12, 177, 177, 177, 177, 177, 12, 12, 12, 178, 178, 178, 179, 179, 12, 12, 12, 180, 180, 180, 180, 180, 180, 180, 181, 180, 180, 181, 12, 182, 183, 184, 185, 184, 184, 186, 12, 184, 184, 184, 184, 184, 184, 12, 12, 184, 184, 185, 12, 165, 187, 12, 12, 188, 188, 188, 188, 188, 188, 188, 189, 188, 188, 188, 12, 190, 188, 188, 188, 191, 191, 191, 191, 191, 191, 191, 192, 191, 193, 12, 12, 194, 194, 194, 194, 194, 194, 194, 12, 194, 194, 195, 12, 194, 194, 196, 197, 198, 198, 198, 198, 198, 198, 198, 199, 200, 200, 200, 200, 200, 200, 200, 201, 200, 200, 200, 202, 200, 200, 203, 12, 200, 200, 200, 203, 7, 7, 7, 204, 205, 205, 205, 205, 205, 205, 205, 12, 205, 205, 205, 206, 207, 207, 207, 207, 208, 208, 208, 208, 208, 12, 12, 208, 209, 209, 209, 209, 209, 209, 210, 209, 209, 209, 211, 212, 213, 213, 213, 213, 207, 207, 12, 12, 214, 7, 7, 7, 215, 7, 216, 217, 0, 218, 219, 12, 2, 220, 221, 2, 2, 2, 2, 222, 223, 220, 224, 2, 2, 2, 225, 2, 2, 2, 2, 226, 7, 219, 12, 7, 8, 227, 8, 227, 8, 8, 228, 228, 8, 8, 8, 227, 8, 15, 8, 8, 8, 10, 8, 229, 10, 15, 8, 14, 0, 0, 0, 230, 0, 231, 0, 0, 232, 0, 0, 233, 0, 0, 0, 234, 2, 2, 2, 235, 236, 12, 12, 12, 0, 237, 238, 0, 4, 0, 0, 0, 0, 0, 0, 4, 2, 2, 5, 12, 0, 0, 234, 12, 0, 234, 12, 12, 239, 239, 239, 239, 0, 240, 0, 0, 0, 241, 0, 0, 0, 0, 241, 242, 0, 0, 231, 0, 241, 12, 12, 12, 12, 12, 12, 0, 243, 243, 243, 243, 243, 243, 243, 244, 18, 18, 18, 18, 18, 12, 245, 18, 246, 246, 246, 246, 246, 246, 12, 247, 248, 12, 12, 247, 157, 160, 12, 12, 157, 160, 157, 160, 234, 12, 12, 12, 249, 249, 249, 249, 249, 249, 250, 249, 249, 12, 12, 12, 249, 251, 12, 12, 0, 0, 0, 12, 0, 252, 0, 0, 253, 249, 254, 255, 0, 0, 249, 0, 256, 257, 257, 257, 257, 257, 257, 257, 257, 258, 259, 260, 261, 262, 262, 262, 262, 262, 262, 262, 262, 262, 263, 261, 12, 264, 265, 265, 265, 265, 265, 265, 265, 265, 265, 266, 267, 156, 156, 156, 156, 156, 156, 268, 265, 265, 269, 12, 0, 12, 12, 12, 156, 156, 156, 270, 262, 262, 262, 271, 262, 262, 0, 0, 272, 272, 272, 272, 272, 272, 272, 273, 272, 274, 12, 12, 275, 275, 275, 275, 276, 276, 276, 276, 276, 276, 276, 12, 277, 277, 277, 277, 277, 277, 12, 12, 238, 2, 2, 2, 2, 2, 233, 2, 2, 2, 2, 278, 2, 2, 12, 12, 12, 279, 2, 2, 280, 280, 280, 280, 280, 280, 280, 12, 0, 0, 241, 12, 281, 281, 281, 281, 281, 281, 12, 12, 282, 282, 282, 282, 282, 283, 12, 284, 282, 282, 285, 12, 52, 52, 52, 286, 287, 287, 287, 287, 287, 287, 287, 288, 289, 289, 289, 289, 289, 12, 12, 290, 156, 156, 156, 291, 292, 292, 292, 292, 292, 292, 292, 293, 292, 292, 294, 295, 151, 151, 151, 296, 297, 297, 297, 297, 297, 298, 12, 12, 297, 297, 297, 299, 297, 297, 299, 297, 300, 300, 300, 300, 301, 12, 12, 12, 12, 12, 302, 300, 303, 303, 303, 303, 303, 304, 12, 12, 161, 160, 161, 160, 161, 160, 12, 12, 2, 2, 3, 2, 2, 305, 12, 12, 303, 303, 303, 306, 303, 303, 306, 12, 156, 12, 12, 12, 156, 268, 307, 156, 156, 156, 156, 12, 249, 249, 249, 251, 249, 249, 251, 12, 2, 308, 12, 12, 309, 22, 12, 25, 26, 27, 26, 310, 311, 312, 26, 26, 313, 12, 12, 12, 29, 29, 29, 314, 315, 29, 29, 29, 29, 29, 12, 12, 29, 29, 29, 313, 7, 7, 7, 316, 234, 0, 0, 0, 0, 234, 0, 12, 29, 317, 29, 29, 29, 29, 29, 318, 242, 0, 0, 0, 0, 319, 262, 262, 262, 262, 262, 320, 321, 156, 321, 156, 321, 156, 321, 291, 0, 234, 0, 234, 12, 12, 242, 241, 322, 322, 322, 323, 322, 322, 322, 322, 322, 324, 322, 322, 322, 322, 324, 325, 322, 322, 322, 326, 322, 322, 324, 12, 234, 134, 0, 0, 0, 134, 0, 0, 8, 8, 8, 327, 327, 12, 12, 12, 0, 0, 0, 328, 329, 329, 329, 329, 329, 329, 329, 330, 331, 331, 331, 331, 332, 12, 12, 12, 216, 0, 0, 0, 333, 333, 333, 333, 333, 12, 12, 12, 334, 334, 334, 334, 334, 334, 335, 12, 336, 336, 336, 336, 336, 336, 337, 12, 338, 338, 338, 338, 338, 338, 338, 339, 340, 340, 340, 340, 340, 12, 340, 340, 340, 341, 12, 12, 342, 342, 342, 342, 343, 343, 343, 343, 344, 344, 344, 344, 344, 344, 344, 345, 344, 344, 345, 12, 346, 346, 346, 346, 346, 346, 12, 12, 347, 347, 347, 347, 347, 12, 12, 348, 349, 349, 349, 349, 349, 350, 12, 12, 349, 351, 12, 12, 349, 349, 12, 12, 352, 353, 354, 352, 352, 352, 352, 352, 352, 355, 356, 357, 358, 358, 358, 358, 358, 359, 358, 358, 360, 360, 360, 360, 361, 361, 361, 361, 361, 361, 361, 362, 12, 363, 361, 361, 364, 364, 364, 364, 365, 366, 367, 364, 368, 368, 368, 368, 368, 368, 368, 369, 370, 370, 370, 370, 370, 370, 371, 372, 373, 373, 373, 373, 374, 374, 374, 374, 374, 374, 12, 374, 375, 374, 374, 374, 376, 377, 12, 376, 376, 378, 378, 376, 376, 376, 376, 376, 376, 12, 379, 380, 376, 376, 12, 12, 376, 376, 381, 12, 382, 382, 382, 382, 383, 383, 383, 383, 384, 384, 384, 384, 384, 385, 386, 384, 384, 385, 12, 12, 387, 387, 387, 387, 387, 388, 389, 387, 390, 390, 390, 390, 390, 391, 390, 390, 392, 392, 392, 392, 393, 12, 392, 392, 394, 394, 394, 394, 395, 12, 396, 397, 12, 12, 396, 394, 398, 398, 398, 398, 398, 398, 399, 12, 400, 400, 400, 400, 401, 12, 12, 12, 401, 12, 402, 400, 29, 29, 29, 403, 404, 404, 404, 404, 404, 404, 404, 405, 406, 404, 404, 404, 12, 12, 12, 407, 408, 408, 408, 408, 409, 12, 12, 12, 410, 410, 410, 410, 410, 410, 411, 12, 410, 410, 412, 12, 413, 413, 413, 413, 413, 414, 413, 413, 413, 12, 12, 12, 415, 415, 415, 415, 415, 416, 12, 12, 417, 417, 417, 417, 417, 417, 417, 418, 122, 123, 123, 123, 123, 130, 12, 12, 419, 419, 419, 419, 420, 419, 419, 419, 419, 419, 419, 421, 422, 423, 424, 425, 422, 422, 422, 425, 422, 422, 426, 12, 427, 427, 427, 427, 427, 427, 428, 12, 427, 427, 429, 12, 430, 431, 430, 432, 432, 430, 430, 430, 430, 430, 433, 430, 433, 431, 434, 430, 430, 432, 432, 434, 435, 436, 12, 431, 430, 437, 430, 435, 430, 435, 12, 12, 438, 438, 438, 438, 438, 438, 12, 12, 438, 438, 439, 12, 440, 440, 440, 440, 440, 441, 440, 440, 440, 440, 440, 441, 442, 442, 442, 442, 442, 443, 12, 12, 442, 442, 444, 12, 445, 445, 445, 445, 445, 445, 12, 12, 445, 445, 446, 12, 447, 447, 447, 447, 447, 447, 448, 449, 447, 447, 447, 12, 450, 450, 450, 450, 451, 12, 12, 452, 453, 453, 453, 453, 453, 453, 454, 12, 455, 455, 455, 455, 455, 455, 456, 12, 455, 455, 455, 457, 455, 458, 12, 12, 455, 12, 12, 12, 459, 459, 459, 459, 459, 459, 459, 460, 461, 461, 461, 461, 461, 462, 12, 12, 277, 277, 463, 12, 464, 464, 464, 464, 464, 464, 464, 465, 464, 464, 466, 467, 468, 468, 468, 468, 468, 468, 468, 469, 468, 469, 12, 12, 470, 470, 470, 470, 470, 471, 12, 12, 470, 470, 472, 470, 472, 470, 470, 470, 470, 470, 12, 473, 474, 474, 474, 474, 474, 475, 12, 12, 474, 474, 474, 476, 12, 12, 12, 477, 478, 12, 12, 12, 479, 479, 479, 479, 479, 479, 480, 12, 479, 479, 479, 481, 479, 479, 481, 12, 479, 479, 482, 479, 0, 241, 12, 12, 0, 234, 242, 0, 0, 483, 230, 0, 0, 0, 483, 7, 214, 484, 7, 0, 0, 0, 485, 230, 0, 0, 486, 12, 8, 227, 12, 12, 0, 0, 0, 231, 487, 488, 242, 231, 0, 0, 489, 242, 0, 242, 0, 0, 0, 489, 234, 242, 0, 231, 0, 231, 0, 0, 489, 234, 0, 490, 240, 0, 231, 0, 0, 0, 0, 0, 0, 240, 491, 491, 491, 491, 491, 491, 491, 12, 12, 12, 492, 491, 493, 491, 491, 491, 494, 494, 494, 494, 494, 495, 494, 494, 494, 496, 12, 12, 29, 497, 29, 29, 498, 499, 497, 29, 403, 29, 500, 12, 501, 51, 500, 497, 498, 499, 500, 500, 498, 499, 403, 29, 403, 29, 497, 502, 29, 29, 503, 29, 29, 29, 29, 12, 497, 497, 503, 29, 0, 0, 0, 486, 12, 240, 0, 0, 504, 12, 12, 12, 0, 0, 489, 0, 486, 12, 12, 12, 0, 486, 12, 12, 0, 0, 12, 12, 0, 0, 0, 241, 249, 505, 12, 12, 249, 506, 12, 12, 251, 12, 12, 12, 507, 12, 12, 12, }; static RE_UINT8 re_script_stage_5[] = { 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 35, 35, 41, 41, 41, 41, 3, 3, 3, 3, 1, 3, 3, 3, 0, 0, 3, 3, 3, 3, 1, 3, 0, 0, 0, 0, 3, 1, 3, 1, 3, 3, 3, 0, 3, 0, 3, 3, 3, 3, 0, 3, 3, 3, 55, 55, 55, 55, 55, 55, 4, 4, 4, 4, 4, 41, 41, 4, 0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 0, 0, 1, 5, 0, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 6, 0, 0, 0, 7, 7, 7, 7, 7, 1, 7, 7, 1, 7, 7, 7, 7, 7, 7, 1, 1, 0, 7, 1, 7, 7, 7, 41, 41, 41, 7, 7, 41, 7, 7, 7, 8, 8, 8, 8, 8, 8, 0, 8, 8, 8, 8, 0, 0, 8, 8, 8, 9, 9, 9, 9, 9, 9, 0, 0, 66, 66, 66, 66, 66, 66, 66, 0, 82, 82, 82, 82, 82, 82, 0, 0, 82, 82, 82, 0, 95, 95, 95, 95, 0, 0, 95, 0, 7, 0, 0, 0, 0, 0, 0, 7, 10, 10, 10, 10, 10, 41, 41, 10, 1, 1, 10, 10, 11, 11, 11, 11, 0, 11, 11, 11, 11, 0, 0, 11, 11, 0, 11, 11, 11, 0, 11, 0, 0, 0, 11, 11, 11, 11, 0, 0, 11, 11, 11, 0, 0, 0, 0, 11, 11, 11, 0, 11, 0, 12, 12, 12, 12, 12, 12, 0, 0, 0, 0, 12, 12, 0, 0, 12, 12, 12, 12, 12, 12, 0, 12, 12, 0, 12, 12, 0, 12, 12, 0, 0, 0, 12, 0, 0, 12, 0, 12, 0, 0, 0, 12, 12, 0, 13, 13, 13, 13, 13, 13, 13, 13, 13, 0, 13, 13, 0, 13, 13, 13, 13, 0, 0, 13, 0, 0, 0, 0, 0, 13, 13, 0, 13, 0, 0, 0, 14, 14, 14, 14, 14, 14, 14, 14, 0, 0, 14, 14, 0, 14, 14, 14, 14, 0, 0, 0, 0, 14, 14, 14, 14, 0, 14, 0, 0, 15, 15, 0, 15, 15, 15, 15, 15, 15, 0, 15, 0, 15, 15, 15, 15, 0, 0, 0, 15, 15, 0, 0, 0, 0, 15, 15, 0, 0, 0, 15, 15, 15, 15, 16, 16, 16, 16, 0, 16, 16, 16, 16, 0, 16, 16, 16, 16, 0, 0, 0, 16, 16, 0, 16, 16, 16, 0, 0, 0, 16, 16, 0, 17, 17, 17, 17, 17, 17, 17, 17, 0, 17, 17, 17, 17, 0, 0, 0, 17, 17, 0, 0, 0, 17, 0, 0, 0, 17, 17, 0, 18, 18, 18, 18, 18, 18, 18, 18, 0, 18, 18, 18, 18, 18, 0, 0, 0, 0, 18, 0, 0, 18, 18, 18, 18, 0, 0, 0, 0, 19, 19, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 19, 19, 0, 19, 0, 19, 0, 0, 0, 0, 19, 0, 0, 0, 0, 19, 19, 0, 19, 0, 19, 0, 0, 0, 0, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 0, 0, 0, 0, 1, 0, 21, 21, 0, 21, 0, 0, 21, 21, 0, 21, 0, 0, 21, 0, 0, 21, 21, 21, 21, 0, 21, 21, 21, 0, 21, 0, 21, 0, 0, 21, 21, 21, 21, 0, 21, 21, 21, 0, 0, 22, 22, 22, 22, 0, 22, 22, 22, 22, 0, 0, 0, 22, 0, 22, 22, 22, 1, 1, 1, 1, 22, 22, 0, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 0, 24, 0, 24, 0, 0, 24, 24, 24, 1, 25, 25, 25, 25, 26, 26, 26, 26, 26, 0, 26, 26, 26, 26, 0, 0, 26, 26, 26, 0, 0, 26, 26, 26, 26, 0, 0, 0, 27, 27, 27, 27, 27, 27, 0, 0, 28, 28, 28, 28, 29, 29, 29, 29, 29, 0, 0, 0, 30, 30, 30, 30, 30, 30, 30, 1, 1, 1, 30, 30, 30, 0, 0, 0, 42, 42, 42, 42, 42, 0, 42, 42, 42, 0, 0, 0, 43, 43, 43, 43, 43, 1, 1, 0, 44, 44, 44, 44, 45, 45, 45, 45, 45, 0, 45, 45, 31, 31, 31, 31, 31, 31, 0, 0, 32, 32, 1, 1, 32, 1, 32, 32, 32, 32, 32, 32, 32, 32, 32, 0, 32, 32, 0, 0, 28, 28, 0, 0, 46, 46, 46, 46, 46, 46, 46, 0, 46, 0, 0, 0, 47, 47, 47, 47, 47, 47, 0, 0, 47, 0, 0, 0, 56, 56, 56, 56, 56, 56, 0, 0, 56, 56, 56, 0, 0, 0, 56, 56, 54, 54, 54, 54, 0, 0, 54, 54, 78, 78, 78, 78, 78, 78, 78, 0, 78, 0, 0, 78, 78, 78, 0, 0, 41, 41, 41, 0, 62, 62, 62, 62, 62, 0, 0, 0, 67, 67, 67, 67, 93, 93, 93, 93, 68, 68, 68, 68, 0, 0, 0, 68, 68, 68, 0, 0, 0, 68, 68, 68, 69, 69, 69, 69, 41, 41, 41, 1, 41, 1, 41, 41, 41, 1, 1, 1, 1, 41, 1, 1, 41, 1, 1, 0, 41, 41, 0, 0, 2, 2, 3, 3, 3, 3, 3, 4, 2, 3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 2, 4, 2, 2, 2, 2, 2, 2, 3, 3, 3, 0, 0, 0, 3, 0, 3, 0, 3, 3, 3, 41, 41, 1, 1, 1, 0, 1, 1, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 1, 0, 2, 0, 0, 0, 41, 0, 0, 0, 1, 1, 3, 1, 1, 1, 2, 2, 53, 53, 53, 53, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 57, 57, 57, 57, 57, 57, 57, 0, 0, 55, 55, 55, 58, 58, 58, 58, 0, 0, 0, 58, 58, 0, 0, 0, 36, 36, 36, 36, 36, 36, 0, 36, 36, 36, 0, 0, 1, 36, 1, 36, 1, 36, 36, 36, 36, 36, 41, 41, 41, 41, 25, 25, 0, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 0, 0, 41, 41, 1, 1, 33, 33, 33, 1, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 1, 0, 35, 35, 35, 35, 35, 35, 35, 35, 35, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 35, 35, 35, 0, 25, 25, 25, 1, 34, 34, 34, 0, 37, 37, 37, 37, 37, 0, 0, 0, 37, 37, 37, 0, 83, 83, 83, 83, 70, 70, 70, 70, 84, 84, 84, 84, 2, 2, 0, 0, 0, 0, 0, 2, 59, 59, 59, 59, 65, 65, 65, 65, 71, 71, 71, 71, 71, 0, 0, 0, 0, 0, 71, 71, 71, 71, 0, 0, 10, 10, 0, 0, 72, 72, 72, 72, 72, 72, 1, 72, 73, 73, 73, 73, 0, 0, 0, 73, 25, 0, 0, 0, 85, 85, 85, 85, 85, 85, 0, 1, 85, 85, 0, 0, 0, 0, 85, 85, 23, 23, 23, 0, 77, 77, 77, 77, 77, 77, 77, 0, 77, 77, 0, 0, 79, 79, 79, 79, 79, 79, 79, 0, 0, 0, 0, 79, 86, 86, 86, 86, 86, 86, 86, 0, 2, 3, 0, 0, 86, 86, 0, 0, 0, 0, 0, 25, 2, 2, 2, 0, 0, 0, 0, 5, 6, 0, 6, 0, 6, 6, 0, 6, 6, 0, 6, 6, 7, 7, 0, 0, 7, 7, 1, 1, 0, 0, 7, 7, 41, 41, 4, 4, 7, 0, 7, 7, 7, 0, 0, 1, 1, 1, 34, 34, 34, 34, 1, 1, 0, 0, 25, 25, 48, 48, 48, 48, 0, 48, 48, 48, 48, 48, 48, 0, 48, 48, 0, 48, 48, 48, 0, 0, 3, 0, 0, 0, 1, 41, 0, 0, 74, 74, 74, 74, 74, 0, 0, 0, 75, 75, 75, 75, 75, 0, 0, 0, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 0, 120, 120, 120, 120, 120, 120, 120, 0, 49, 49, 49, 49, 49, 49, 0, 49, 60, 60, 60, 60, 60, 60, 0, 0, 40, 40, 40, 40, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 0, 0, 106, 106, 106, 106, 103, 103, 103, 103, 0, 0, 0, 103, 110, 110, 110, 110, 110, 110, 110, 0, 110, 110, 0, 0, 52, 52, 52, 52, 52, 52, 0, 0, 52, 0, 52, 52, 52, 52, 0, 52, 52, 0, 0, 0, 52, 0, 0, 52, 87, 87, 87, 87, 87, 87, 0, 87, 118, 118, 118, 118, 117, 117, 117, 117, 117, 117, 117, 0, 0, 0, 0, 117, 128, 128, 128, 128, 128, 128, 128, 0, 128, 128, 0, 0, 0, 0, 0, 128, 64, 64, 64, 64, 0, 0, 0, 64, 76, 76, 76, 76, 76, 76, 0, 0, 0, 0, 0, 76, 98, 98, 98, 98, 97, 97, 97, 97, 0, 0, 97, 97, 61, 61, 61, 61, 0, 61, 61, 0, 0, 61, 61, 61, 61, 61, 61, 0, 0, 0, 0, 61, 61, 0, 0, 0, 88, 88, 88, 88, 116, 116, 116, 116, 112, 112, 112, 112, 112, 112, 112, 0, 0, 0, 0, 112, 80, 80, 80, 80, 80, 80, 0, 0, 0, 80, 80, 80, 89, 89, 89, 89, 89, 89, 0, 0, 90, 90, 90, 90, 90, 90, 90, 0, 121, 121, 121, 121, 121, 121, 0, 0, 0, 121, 121, 121, 121, 0, 0, 0, 91, 91, 91, 91, 91, 0, 0, 0, 130, 130, 130, 130, 130, 130, 130, 0, 0, 0, 130, 130, 7, 7, 7, 0, 94, 94, 94, 94, 94, 94, 0, 0, 0, 0, 94, 94, 0, 0, 0, 94, 92, 92, 92, 92, 92, 92, 0, 0, 101, 101, 101, 101, 101, 0, 0, 0, 101, 101, 0, 0, 96, 96, 96, 96, 96, 0, 96, 96, 111, 111, 111, 111, 111, 111, 111, 0, 100, 100, 100, 100, 100, 100, 0, 0, 109, 109, 109, 109, 109, 109, 0, 109, 109, 109, 0, 0, 129, 129, 129, 129, 129, 129, 129, 0, 129, 0, 129, 129, 129, 129, 0, 129, 129, 129, 0, 0, 123, 123, 123, 123, 123, 123, 123, 0, 123, 123, 0, 0, 107, 107, 107, 107, 0, 107, 107, 107, 107, 0, 0, 107, 107, 0, 107, 107, 107, 107, 0, 0, 107, 0, 0, 0, 0, 0, 0, 107, 0, 0, 107, 107, 124, 124, 124, 124, 124, 124, 0, 0, 122, 122, 122, 122, 122, 122, 0, 0, 114, 114, 114, 114, 114, 0, 0, 0, 114, 114, 0, 0, 102, 102, 102, 102, 102, 102, 0, 0, 126, 126, 126, 126, 126, 126, 0, 0, 0, 126, 126, 126, 125, 125, 125, 125, 125, 125, 125, 0, 0, 0, 0, 125, 119, 119, 119, 119, 119, 0, 0, 0, 63, 63, 63, 63, 63, 63, 0, 0, 63, 63, 63, 0, 63, 0, 0, 0, 81, 81, 81, 81, 81, 81, 81, 0, 127, 127, 127, 127, 127, 127, 127, 0, 84, 0, 0, 0, 115, 115, 115, 115, 115, 115, 115, 0, 115, 115, 0, 0, 0, 0, 115, 115, 104, 104, 104, 104, 104, 104, 0, 0, 108, 108, 108, 108, 108, 108, 0, 0, 108, 108, 0, 108, 0, 108, 108, 108, 99, 99, 99, 99, 99, 0, 0, 0, 99, 99, 99, 0, 0, 0, 0, 99, 34, 33, 0, 0, 105, 105, 105, 105, 105, 105, 105, 0, 105, 0, 0, 0, 105, 105, 0, 0, 1, 1, 1, 41, 1, 41, 41, 41, 1, 1, 41, 41, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 131, 131, 131, 131, 0, 0, 0, 131, 0, 131, 131, 131, 113, 113, 113, 113, 113, 0, 0, 113, 113, 113, 113, 0, 0, 7, 7, 7, 0, 7, 7, 0, 7, 0, 0, 7, 0, 7, 0, 7, 0, 0, 7, 0, 7, 0, 7, 0, 7, 7, 0, 7, 33, 1, 1, 0, 36, 36, 36, 0, 36, 0, 0, 0, 0, 1, 0, 0, }; /* Script: 10928 bytes. */ RE_UINT32 re_get_script(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 11; code = ch ^ (f << 11); pos = (RE_UINT32)re_script_stage_1[f] << 4; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_script_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_script_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_script_stage_4[pos + f] << 2; value = re_script_stage_5[pos + code]; return value; } /* Word_Break. */ static RE_UINT8 re_word_break_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 5, 6, 6, 7, 4, 8, 9, 10, 11, 12, 13, 4, 14, 4, 4, 4, 4, 15, 4, 16, 17, 18, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 19, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_word_break_stage_2[] = { 0, 1, 2, 2, 2, 3, 4, 5, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 2, 2, 31, 32, 33, 34, 35, 2, 2, 2, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 2, 50, 2, 2, 51, 52, 53, 54, 55, 56, 57, 57, 57, 57, 57, 58, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 59, 60, 61, 62, 63, 57, 57, 57, 64, 65, 66, 67, 57, 68, 69, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 2, 2, 2, 2, 2, 70, 2, 2, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 83, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 84, 85, 2, 2, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 57, 96, 97, 98, 2, 99, 100, 57, 2, 2, 101, 57, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 57, 57, 57, 57, 57, 57, 112, 113, 114, 115, 116, 117, 118, 57, 57, 119, 57, 120, 121, 122, 123, 57, 57, 124, 57, 57, 57, 125, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 2, 2, 2, 126, 127, 2, 128, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 2, 2, 2, 2, 129, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 130, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 131, 132, 133, 134, 57, 57, 57, 57, 57, 57, 135, 136, 137, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 138, 139, 57, 57, 57, 57, 57, 57, 57, 57, 140, 141, 142, 57, 57, 57, 143, 144, 145, 2, 2, 146, 147, 148, 57, 57, 57, 57, 149, 150, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 151, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 152, 153, 57, 57, 57, 57, 154, 155, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 156, 57, 157, 158, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, }; static RE_UINT8 re_word_break_stage_3[] = { 0, 1, 0, 0, 2, 3, 4, 5, 6, 7, 7, 8, 6, 7, 7, 9, 10, 0, 0, 0, 0, 11, 12, 13, 7, 7, 14, 7, 7, 7, 14, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 15, 7, 16, 0, 17, 18, 0, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 21, 22, 23, 7, 7, 24, 7, 7, 7, 7, 7, 7, 7, 7, 7, 25, 7, 26, 27, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 7, 14, 28, 6, 7, 7, 7, 7, 29, 30, 19, 19, 19, 19, 31, 32, 0, 33, 33, 33, 34, 35, 0, 36, 37, 19, 38, 7, 7, 7, 7, 7, 39, 19, 19, 4, 40, 41, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 42, 43, 44, 45, 4, 46, 0, 47, 48, 7, 7, 7, 19, 19, 19, 49, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 50, 19, 51, 0, 4, 52, 7, 7, 7, 39, 53, 54, 7, 7, 50, 55, 56, 57, 0, 0, 7, 7, 7, 58, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 17, 0, 0, 0, 0, 0, 59, 19, 19, 19, 60, 7, 7, 7, 7, 7, 7, 61, 19, 19, 62, 7, 63, 4, 6, 7, 64, 65, 66, 7, 7, 67, 68, 69, 70, 71, 72, 73, 63, 4, 74, 0, 75, 76, 66, 7, 7, 67, 77, 78, 79, 80, 81, 82, 83, 4, 84, 0, 75, 25, 24, 7, 7, 67, 85, 69, 31, 86, 87, 0, 63, 4, 0, 28, 75, 65, 66, 7, 7, 67, 85, 69, 70, 80, 88, 73, 63, 4, 28, 0, 89, 90, 91, 92, 93, 90, 7, 94, 95, 96, 97, 0, 83, 4, 0, 0, 98, 20, 67, 7, 7, 67, 7, 99, 100, 96, 101, 9, 63, 4, 0, 0, 75, 20, 67, 7, 7, 67, 102, 69, 100, 96, 101, 103, 63, 4, 104, 0, 75, 20, 67, 7, 7, 7, 7, 105, 100, 106, 72, 107, 63, 4, 0, 108, 109, 7, 14, 108, 7, 7, 24, 110, 14, 111, 112, 19, 83, 4, 113, 0, 0, 0, 0, 0, 0, 0, 114, 115, 72, 116, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 114, 118, 0, 119, 4, 117, 0, 0, 0, 0, 87, 0, 0, 120, 4, 117, 121, 122, 7, 6, 7, 7, 7, 17, 30, 19, 100, 123, 19, 30, 19, 19, 19, 124, 125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 19, 116, 4, 117, 88, 126, 127, 119, 128, 0, 129, 31, 4, 130, 7, 7, 7, 7, 25, 131, 7, 7, 7, 7, 7, 132, 7, 7, 7, 7, 7, 7, 7, 7, 7, 91, 14, 91, 7, 7, 7, 7, 7, 91, 7, 7, 7, 7, 91, 14, 91, 7, 14, 7, 7, 7, 7, 7, 7, 7, 91, 7, 7, 7, 7, 7, 7, 7, 7, 133, 0, 0, 0, 0, 7, 7, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 134, 134, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 65, 7, 7, 6, 7, 7, 9, 7, 7, 7, 7, 7, 7, 7, 7, 7, 90, 7, 87, 7, 20, 135, 0, 7, 7, 135, 0, 7, 7, 136, 0, 7, 20, 137, 0, 0, 0, 0, 0, 0, 0, 138, 19, 19, 19, 139, 140, 4, 117, 0, 0, 0, 141, 4, 117, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 0, 7, 7, 7, 7, 7, 142, 7, 7, 7, 7, 7, 7, 7, 7, 134, 0, 7, 7, 7, 14, 19, 139, 19, 139, 83, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 117, 0, 0, 0, 0, 7, 7, 143, 139, 0, 0, 0, 0, 0, 0, 144, 116, 19, 19, 19, 70, 4, 117, 4, 117, 0, 0, 19, 116, 0, 0, 0, 0, 0, 0, 0, 0, 145, 7, 7, 7, 7, 7, 146, 19, 145, 147, 4, 117, 0, 59, 139, 0, 148, 7, 7, 7, 62, 149, 4, 52, 7, 7, 7, 7, 50, 19, 139, 0, 7, 7, 7, 7, 146, 19, 19, 0, 4, 150, 4, 52, 7, 7, 7, 134, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151, 19, 19, 152, 153, 120, 7, 7, 7, 7, 7, 7, 7, 7, 19, 19, 19, 19, 19, 19, 119, 138, 7, 7, 134, 134, 7, 7, 7, 7, 134, 134, 7, 154, 7, 7, 7, 134, 7, 7, 7, 7, 7, 7, 20, 155, 156, 17, 157, 147, 7, 17, 156, 17, 0, 158, 0, 159, 160, 161, 0, 162, 163, 0, 164, 0, 165, 166, 28, 107, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 19, 19, 19, 19, 167, 0, 168, 108, 110, 169, 18, 170, 7, 171, 172, 173, 0, 0, 7, 7, 7, 7, 7, 87, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 174, 7, 7, 7, 7, 7, 7, 74, 0, 0, 7, 7, 7, 7, 7, 14, 7, 7, 7, 7, 7, 14, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 17, 175, 176, 0, 7, 7, 7, 7, 25, 131, 7, 7, 7, 7, 7, 7, 7, 107, 0, 72, 7, 7, 14, 0, 14, 14, 14, 14, 14, 14, 14, 14, 19, 19, 19, 19, 0, 0, 0, 0, 0, 107, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 131, 0, 0, 0, 0, 129, 177, 93, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 178, 179, 179, 179, 179, 179, 179, 179, 179, 179, 179, 179, 180, 172, 7, 7, 7, 7, 134, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 0, 0, 7, 7, 7, 9, 0, 0, 0, 0, 0, 0, 179, 179, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 179, 179, 179, 179, 179, 181, 179, 179, 179, 179, 179, 179, 179, 179, 179, 179, 179, 0, 0, 0, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 134, 7, 17, 7, 7, 4, 182, 0, 0, 7, 7, 7, 7, 7, 143, 151, 183, 7, 7, 7, 50, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 120, 0, 0, 0, 107, 7, 108, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 66, 7, 7, 7, 134, 7, 0, 0, 0, 0, 0, 0, 0, 107, 7, 184, 185, 7, 7, 39, 0, 0, 0, 7, 7, 7, 7, 7, 7, 147, 0, 27, 7, 7, 7, 7, 7, 146, 19, 124, 0, 4, 117, 19, 19, 27, 186, 4, 52, 7, 7, 50, 119, 7, 7, 143, 19, 139, 0, 7, 7, 7, 17, 60, 7, 7, 7, 7, 7, 39, 19, 167, 107, 4, 117, 140, 0, 4, 117, 7, 7, 7, 7, 7, 62, 116, 0, 185, 187, 4, 117, 0, 0, 0, 188, 0, 0, 0, 0, 0, 0, 127, 189, 81, 0, 0, 0, 7, 39, 190, 0, 191, 191, 191, 0, 14, 14, 7, 7, 7, 7, 7, 132, 134, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 39, 192, 4, 117, 7, 7, 7, 7, 147, 0, 7, 7, 14, 193, 7, 7, 7, 7, 7, 147, 14, 0, 193, 194, 33, 195, 196, 197, 198, 33, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 74, 0, 0, 0, 193, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 134, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 108, 7, 7, 7, 7, 7, 7, 0, 0, 0, 0, 0, 7, 147, 19, 19, 199, 0, 19, 19, 200, 0, 0, 201, 202, 0, 0, 0, 20, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 203, 204, 3, 0, 205, 6, 7, 7, 8, 6, 7, 7, 9, 206, 179, 179, 179, 179, 179, 179, 207, 7, 7, 7, 14, 108, 108, 108, 208, 0, 0, 0, 209, 7, 102, 7, 7, 14, 7, 7, 210, 7, 134, 7, 134, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 140, 7, 7, 7, 17, 7, 7, 7, 7, 7, 7, 87, 0, 167, 0, 0, 0, 7, 7, 7, 7, 0, 0, 7, 7, 7, 9, 7, 7, 7, 7, 50, 115, 7, 7, 7, 134, 7, 7, 7, 7, 147, 7, 169, 0, 0, 0, 0, 0, 7, 7, 7, 134, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 0, 7, 7, 7, 7, 7, 7, 147, 0, 0, 0, 7, 7, 7, 7, 7, 7, 14, 0, 7, 7, 134, 0, 7, 0, 0, 0, 134, 67, 7, 7, 7, 7, 25, 211, 7, 7, 134, 0, 7, 7, 14, 0, 7, 7, 7, 14, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 212, 0, 7, 7, 134, 0, 7, 7, 7, 74, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 174, 0, 0, 0, 0, 0, 0, 0, 0, 213, 138, 102, 6, 7, 7, 147, 79, 0, 0, 0, 0, 7, 7, 7, 17, 7, 7, 7, 17, 0, 0, 0, 0, 7, 6, 7, 7, 214, 0, 0, 0, 7, 7, 7, 7, 7, 7, 134, 0, 7, 7, 134, 0, 7, 7, 9, 0, 7, 7, 74, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 87, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 9, 0, 7, 7, 7, 7, 7, 7, 9, 0, 148, 7, 7, 7, 7, 7, 7, 19, 116, 0, 0, 0, 83, 4, 0, 72, 148, 7, 7, 7, 7, 7, 19, 215, 0, 0, 7, 7, 7, 87, 4, 117, 148, 7, 7, 7, 143, 19, 216, 4, 0, 0, 7, 7, 7, 7, 217, 0, 148, 7, 7, 7, 7, 7, 39, 19, 218, 219, 4, 220, 0, 0, 0, 0, 7, 7, 24, 7, 7, 146, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 170, 7, 25, 7, 87, 7, 7, 7, 7, 7, 143, 19, 115, 4, 117, 98, 65, 66, 7, 7, 67, 85, 69, 70, 80, 97, 172, 221, 124, 124, 0, 7, 7, 7, 7, 7, 7, 19, 19, 222, 0, 4, 117, 0, 0, 0, 0, 7, 7, 7, 7, 7, 143, 119, 19, 167, 0, 0, 187, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 19, 19, 223, 0, 4, 117, 0, 0, 0, 0, 7, 7, 7, 7, 7, 39, 19, 0, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 144, 19, 139, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 4, 117, 0, 107, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 87, 7, 7, 7, 74, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 147, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 14, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 87, 7, 7, 7, 14, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 134, 124, 0, 7, 7, 7, 7, 7, 7, 116, 0, 147, 0, 4, 117, 193, 7, 7, 172, 7, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 17, 0, 62, 19, 19, 19, 19, 116, 0, 72, 148, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 224, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 7, 17, 7, 87, 7, 225, 226, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 144, 227, 228, 229, 230, 139, 0, 0, 0, 231, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 219, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 20, 7, 7, 7, 7, 7, 7, 7, 7, 20, 232, 233, 7, 234, 102, 7, 7, 7, 7, 7, 7, 7, 25, 235, 20, 20, 7, 7, 7, 236, 155, 108, 67, 7, 7, 7, 7, 7, 7, 7, 7, 7, 134, 7, 7, 7, 67, 7, 7, 132, 7, 7, 7, 132, 7, 7, 20, 7, 7, 7, 20, 7, 7, 14, 7, 7, 7, 14, 7, 7, 7, 67, 7, 7, 7, 67, 7, 7, 132, 237, 4, 4, 4, 4, 4, 4, 19, 19, 19, 19, 19, 19, 116, 59, 19, 19, 19, 19, 19, 124, 140, 0, 238, 0, 0, 59, 30, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 17, 0, 116, 0, 0, 0, 0, 0, 102, 7, 7, 7, 239, 6, 132, 240, 168, 241, 239, 154, 239, 132, 132, 82, 7, 24, 7, 147, 242, 24, 7, 147, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 74, 7, 7, 7, 74, 7, 7, 7, 74, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 243, 244, 244, 244, 245, 0, 0, 0, 166, 166, 166, 166, 166, 166, 166, 166, 166, 166, 166, 166, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 0, }; static RE_UINT8 re_word_break_stage_4[] = { 0, 0, 1, 2, 3, 4, 0, 5, 6, 6, 7, 0, 8, 9, 9, 9, 10, 11, 10, 0, 0, 12, 13, 14, 0, 15, 13, 0, 9, 10, 16, 17, 16, 18, 9, 19, 0, 20, 21, 21, 9, 22, 17, 23, 0, 24, 10, 22, 25, 9, 9, 25, 26, 21, 27, 9, 28, 0, 29, 0, 30, 21, 21, 31, 32, 31, 33, 33, 34, 0, 35, 36, 37, 38, 0, 39, 40, 41, 42, 21, 43, 44, 45, 9, 9, 46, 21, 47, 21, 48, 49, 27, 50, 51, 0, 52, 53, 9, 40, 8, 9, 54, 55, 0, 50, 9, 21, 16, 56, 0, 57, 21, 21, 58, 58, 59, 58, 0, 60, 21, 21, 9, 54, 61, 58, 21, 54, 62, 58, 8, 9, 51, 51, 9, 22, 9, 20, 17, 16, 61, 21, 63, 63, 64, 0, 60, 0, 25, 16, 0, 30, 8, 10, 65, 22, 66, 16, 49, 40, 60, 63, 59, 67, 0, 8, 20, 0, 62, 27, 68, 22, 8, 31, 59, 19, 0, 0, 69, 70, 8, 10, 17, 22, 16, 66, 22, 65, 19, 16, 69, 40, 69, 49, 59, 19, 60, 21, 8, 16, 46, 21, 49, 0, 32, 9, 8, 0, 13, 66, 0, 10, 46, 49, 64, 0, 65, 17, 9, 69, 8, 9, 28, 71, 60, 21, 72, 69, 0, 67, 21, 40, 0, 21, 40, 73, 0, 31, 74, 21, 59, 59, 0, 0, 75, 67, 69, 9, 58, 21, 74, 0, 71, 59, 69, 49, 63, 30, 74, 69, 21, 76, 59, 0, 28, 10, 9, 10, 30, 9, 16, 54, 74, 54, 0, 77, 0, 0, 21, 21, 0, 0, 67, 60, 78, 79, 0, 9, 42, 0, 30, 21, 45, 9, 21, 9, 0, 80, 9, 21, 27, 73, 8, 40, 21, 45, 53, 54, 81, 82, 82, 9, 20, 17, 22, 9, 17, 0, 83, 84, 0, 0, 85, 86, 87, 0, 11, 88, 89, 0, 88, 37, 90, 37, 37, 74, 0, 13, 65, 8, 16, 22, 25, 16, 9, 0, 8, 16, 13, 0, 17, 65, 42, 27, 0, 91, 92, 93, 94, 95, 95, 96, 95, 95, 96, 50, 0, 21, 97, 98, 98, 42, 9, 65, 28, 9, 59, 60, 59, 74, 69, 17, 99, 8, 10, 40, 59, 65, 9, 0, 100, 101, 33, 33, 34, 33, 102, 103, 101, 104, 89, 11, 88, 0, 105, 5, 106, 9, 107, 0, 108, 109, 0, 0, 110, 95, 111, 17, 19, 112, 0, 10, 25, 19, 51, 10, 16, 58, 32, 9, 99, 40, 14, 21, 113, 42, 13, 45, 19, 69, 74, 114, 19, 54, 69, 21, 25, 74, 19, 94, 0, 16, 32, 37, 0, 59, 30, 115, 37, 116, 21, 40, 30, 69, 59, 13, 66, 8, 22, 25, 8, 10, 8, 25, 10, 9, 62, 0, 74, 66, 51, 82, 0, 82, 8, 8, 8, 0, 117, 118, 118, 14, 0, }; static RE_UINT8 re_word_break_stage_5[] = { 0, 0, 0, 0, 0, 0, 5, 6, 6, 4, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 13, 0, 14, 0, 15, 15, 15, 15, 15, 15, 12, 13, 0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 0, 0, 0, 0, 16, 0, 6, 0, 0, 0, 0, 11, 0, 0, 9, 0, 0, 0, 11, 0, 12, 11, 11, 0, 0, 0, 0, 11, 11, 0, 0, 0, 12, 11, 0, 0, 0, 11, 0, 11, 0, 7, 7, 7, 7, 11, 0, 11, 11, 11, 11, 13, 11, 0, 0, 11, 12, 11, 11, 0, 11, 11, 11, 0, 7, 7, 7, 11, 11, 0, 11, 0, 0, 0, 13, 0, 0, 0, 7, 7, 7, 7, 7, 0, 7, 0, 7, 7, 0, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 11, 12, 0, 0, 0, 9, 9, 9, 9, 9, 9, 0, 0, 13, 13, 0, 0, 7, 7, 7, 0, 9, 0, 0, 0, 11, 11, 11, 7, 15, 15, 0, 15, 13, 0, 11, 11, 7, 11, 11, 11, 0, 11, 7, 7, 7, 9, 0, 7, 7, 11, 11, 7, 7, 0, 7, 7, 15, 15, 11, 11, 11, 0, 0, 11, 0, 0, 0, 9, 11, 7, 11, 11, 11, 11, 7, 7, 7, 11, 0, 0, 13, 0, 11, 0, 7, 7, 11, 7, 11, 7, 7, 7, 7, 7, 0, 0, 0, 0, 0, 7, 7, 11, 7, 7, 0, 0, 15, 15, 7, 0, 0, 7, 7, 7, 11, 0, 0, 0, 0, 11, 0, 11, 11, 0, 0, 7, 0, 0, 11, 7, 0, 0, 0, 0, 7, 7, 0, 0, 7, 11, 0, 0, 7, 0, 7, 0, 7, 0, 15, 15, 0, 0, 7, 0, 0, 0, 0, 7, 0, 7, 15, 15, 7, 7, 11, 0, 7, 7, 7, 7, 9, 0, 11, 7, 11, 0, 7, 7, 7, 11, 7, 11, 11, 0, 0, 11, 0, 11, 7, 7, 9, 9, 14, 14, 0, 0, 14, 0, 0, 12, 6, 6, 9, 9, 9, 9, 9, 0, 16, 0, 0, 0, 13, 0, 0, 0, 9, 0, 9, 9, 0, 10, 10, 10, 10, 10, 0, 0, 0, 7, 7, 10, 10, 0, 0, 0, 10, 10, 10, 10, 10, 10, 10, 0, 7, 7, 0, 11, 11, 11, 7, 11, 11, 7, 7, 0, 0, 3, 7, 3, 3, 0, 3, 3, 3, 0, 3, 0, 3, 3, 0, 3, 13, 0, 0, 12, 0, 16, 16, 16, 13, 12, 0, 0, 11, 0, 0, 9, 0, 0, 0, 14, 0, 0, 12, 13, 0, 0, 10, 10, 10, 10, 7, 7, 0, 9, 9, 9, 7, 0, 15, 15, 15, 15, 11, 0, 7, 7, 7, 9, 9, 9, 9, 7, 0, 0, 8, 8, 8, 8, 8, 8, }; /* Word_Break: 4424 bytes. */ RE_UINT32 re_get_word_break(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_word_break_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_word_break_stage_2[pos + f] << 4; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_word_break_stage_3[pos + f] << 1; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_word_break_stage_4[pos + f] << 2; value = re_word_break_stage_5[pos + code]; return value; } /* Grapheme_Cluster_Break. */ static RE_UINT8 re_grapheme_cluster_break_stage_1[] = { 0, 1, 2, 2, 2, 3, 4, 5, 6, 2, 2, 7, 2, 8, 9, 10, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 11, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_grapheme_cluster_break_stage_2[] = { 0, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 1, 1, 1, 18, 19, 20, 21, 22, 23, 24, 1, 1, 25, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 26, 27, 1, 1, 28, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 29, 1, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 41, 42, 42, 42, 42, 42, 42, 42, 42, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 43, 1, 1, 44, 45, 1, 46, 47, 48, 1, 1, 1, 1, 1, 1, 49, 1, 1, 1, 1, 1, 50, 51, 52, 53, 54, 55, 56, 57, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 58, 59, 1, 1, 1, 60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 61, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 62, 63, 1, 1, 1, 1, 1, 1, 1, 64, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 65, 1, 1, 1, 1, 1, 1, 1, 1, 66, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 42, 67, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_grapheme_cluster_break_stage_3[] = { 0, 1, 2, 2, 2, 2, 2, 3, 1, 1, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 6, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 5, 8, 9, 2, 2, 2, 10, 11, 2, 2, 12, 5, 2, 13, 2, 2, 2, 2, 2, 14, 15, 2, 3, 16, 2, 5, 17, 2, 2, 2, 2, 2, 18, 13, 2, 2, 12, 19, 2, 20, 21, 2, 2, 22, 2, 2, 2, 2, 2, 2, 2, 2, 23, 5, 24, 2, 2, 25, 26, 27, 28, 2, 29, 2, 2, 30, 31, 32, 28, 2, 33, 2, 2, 34, 35, 16, 2, 36, 33, 2, 2, 34, 37, 2, 28, 2, 29, 2, 2, 38, 31, 39, 28, 2, 40, 2, 2, 41, 42, 32, 2, 2, 43, 2, 2, 44, 45, 46, 28, 2, 29, 2, 2, 47, 48, 46, 28, 2, 29, 2, 2, 41, 49, 32, 28, 2, 50, 2, 2, 2, 51, 52, 2, 50, 2, 2, 2, 53, 54, 2, 2, 2, 2, 2, 2, 55, 56, 2, 2, 2, 2, 57, 2, 58, 2, 2, 2, 59, 60, 61, 5, 62, 63, 2, 2, 2, 2, 2, 64, 65, 2, 66, 13, 67, 68, 69, 2, 2, 2, 2, 2, 2, 70, 70, 70, 70, 70, 70, 71, 71, 71, 71, 72, 73, 73, 73, 73, 73, 2, 2, 2, 2, 2, 64, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 74, 2, 74, 2, 28, 2, 28, 2, 2, 2, 75, 76, 77, 2, 2, 78, 2, 2, 2, 2, 2, 2, 2, 2, 2, 79, 2, 2, 2, 2, 2, 2, 2, 80, 81, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 82, 2, 2, 2, 83, 84, 85, 2, 2, 2, 86, 2, 2, 2, 2, 87, 2, 2, 88, 89, 2, 12, 19, 90, 2, 91, 2, 2, 2, 92, 93, 2, 2, 94, 95, 2, 2, 2, 2, 2, 2, 2, 2, 2, 96, 97, 98, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 99, 100, 2, 101, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 5, 5, 13, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 102, 103, 2, 2, 2, 2, 2, 2, 2, 102, 2, 2, 2, 2, 2, 2, 5, 5, 2, 2, 104, 2, 2, 2, 2, 2, 2, 105, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 102, 106, 2, 44, 2, 2, 2, 2, 2, 103, 107, 2, 108, 2, 2, 2, 2, 2, 109, 2, 2, 110, 111, 2, 5, 103, 2, 2, 112, 2, 113, 93, 70, 114, 24, 2, 2, 115, 116, 2, 117, 2, 2, 2, 118, 119, 120, 2, 2, 121, 2, 2, 2, 122, 16, 2, 123, 124, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 125, 2, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 131, 71, 132, 73, 73, 133, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 134, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 2, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 44, 2, 2, 2, 2, 2, 135, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 69, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 13, 2, 2, 2, 2, 2, 2, 2, 2, 136, 2, 2, 2, 2, 2, 2, 2, 2, 137, 2, 2, 138, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 46, 2, 139, 2, 2, 140, 141, 2, 2, 102, 90, 2, 2, 142, 2, 2, 2, 2, 143, 2, 144, 145, 2, 2, 2, 146, 90, 2, 2, 147, 148, 2, 2, 2, 2, 2, 149, 150, 2, 2, 2, 2, 2, 2, 2, 2, 2, 102, 151, 2, 93, 2, 2, 30, 152, 32, 153, 145, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 154, 155, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 102, 156, 13, 157, 2, 2, 2, 2, 2, 158, 13, 2, 2, 2, 2, 2, 159, 160, 2, 2, 2, 2, 2, 64, 161, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 145, 2, 2, 2, 141, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 162, 163, 164, 102, 143, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 165, 166, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 167, 168, 169, 2, 170, 2, 2, 2, 2, 2, 2, 2, 2, 2, 74, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 171, 5, 5, 62, 117, 172, 12, 7, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 141, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 173, 174, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, }; static RE_UINT8 re_grapheme_cluster_break_stage_4[] = { 0, 0, 1, 2, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 6, 6, 6, 7, 6, 8, 3, 9, 6, 6, 6, 6, 6, 6, 10, 11, 10, 3, 3, 0, 12, 3, 3, 6, 6, 13, 14, 3, 3, 7, 6, 15, 3, 3, 3, 3, 16, 6, 17, 6, 18, 19, 8, 20, 3, 3, 3, 6, 6, 13, 3, 3, 16, 6, 6, 6, 3, 3, 3, 3, 16, 10, 6, 6, 9, 9, 8, 3, 3, 9, 3, 7, 6, 6, 6, 21, 3, 3, 3, 3, 3, 22, 23, 24, 6, 25, 26, 9, 6, 3, 3, 16, 3, 3, 3, 27, 3, 3, 3, 3, 3, 3, 28, 24, 29, 30, 31, 3, 7, 3, 3, 32, 3, 3, 3, 3, 3, 3, 23, 33, 7, 18, 8, 8, 20, 3, 3, 24, 10, 34, 31, 3, 3, 3, 19, 3, 16, 3, 3, 35, 3, 3, 3, 3, 3, 3, 22, 36, 37, 38, 31, 25, 3, 3, 3, 3, 3, 3, 16, 25, 39, 19, 8, 3, 11, 3, 3, 3, 3, 3, 40, 41, 42, 38, 8, 24, 23, 38, 31, 37, 3, 3, 3, 3, 3, 35, 7, 43, 44, 45, 46, 47, 6, 13, 3, 3, 7, 6, 13, 47, 6, 10, 15, 3, 3, 6, 8, 3, 3, 8, 3, 3, 48, 20, 37, 9, 6, 6, 21, 6, 19, 3, 9, 6, 6, 9, 6, 6, 6, 6, 15, 3, 35, 3, 3, 3, 3, 3, 9, 49, 6, 32, 33, 3, 37, 8, 16, 9, 15, 3, 3, 35, 33, 3, 20, 3, 3, 3, 20, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 16, 15, 3, 3, 3, 53, 6, 54, 45, 41, 24, 6, 6, 3, 3, 20, 3, 3, 7, 55, 3, 3, 20, 3, 21, 46, 25, 3, 41, 45, 24, 3, 3, 7, 56, 3, 3, 57, 6, 13, 44, 9, 6, 25, 46, 6, 6, 18, 6, 6, 6, 13, 6, 58, 3, 3, 3, 49, 21, 25, 41, 58, 3, 3, 59, 3, 3, 3, 60, 54, 53, 8, 3, 22, 54, 61, 54, 3, 3, 3, 3, 45, 45, 6, 6, 43, 3, 3, 13, 6, 6, 6, 49, 6, 15, 20, 37, 15, 8, 3, 6, 8, 3, 6, 3, 3, 4, 62, 3, 3, 0, 63, 3, 3, 3, 7, 8, 3, 3, 3, 3, 3, 16, 6, 3, 3, 11, 3, 13, 6, 6, 8, 35, 35, 7, 3, 64, 65, 3, 3, 66, 3, 3, 3, 3, 45, 45, 45, 45, 15, 3, 3, 3, 16, 6, 8, 3, 7, 6, 6, 50, 50, 50, 67, 7, 43, 54, 25, 58, 3, 3, 3, 3, 20, 3, 3, 3, 3, 9, 21, 65, 33, 3, 3, 7, 3, 3, 68, 3, 3, 3, 15, 19, 18, 15, 16, 3, 3, 64, 54, 3, 69, 3, 3, 64, 26, 36, 31, 70, 71, 71, 71, 71, 71, 71, 70, 71, 71, 71, 71, 71, 71, 70, 71, 71, 70, 71, 71, 71, 3, 3, 3, 51, 72, 73, 52, 52, 52, 52, 3, 3, 3, 3, 35, 0, 0, 0, 3, 3, 16, 13, 3, 9, 11, 3, 6, 3, 3, 13, 7, 74, 3, 3, 3, 3, 3, 6, 6, 6, 13, 3, 3, 46, 21, 33, 5, 13, 3, 3, 3, 3, 7, 6, 24, 6, 15, 3, 3, 7, 3, 3, 3, 64, 43, 6, 21, 58, 3, 16, 15, 3, 3, 3, 46, 54, 49, 3, 3, 46, 6, 13, 3, 25, 30, 30, 66, 37, 16, 6, 15, 56, 6, 75, 61, 49, 3, 3, 3, 43, 8, 45, 53, 3, 3, 3, 8, 46, 6, 21, 61, 3, 3, 7, 26, 6, 53, 3, 3, 43, 53, 6, 3, 76, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 77, 3, 3, 3, 11, 0, 3, 3, 3, 3, 78, 8, 60, 79, 0, 80, 6, 13, 9, 6, 3, 3, 3, 16, 8, 6, 13, 7, 6, 3, 15, 3, 3, 3, 81, 82, 82, 82, 82, 82, 82, }; static RE_UINT8 re_grapheme_cluster_break_stage_5[] = { 3, 3, 3, 3, 3, 3, 2, 3, 3, 1, 3, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0, 0, 4, 4, 4, 4, 0, 0, 0, 4, 4, 4, 0, 0, 0, 4, 4, 4, 4, 4, 0, 4, 0, 4, 4, 0, 3, 3, 0, 0, 4, 4, 4, 0, 3, 0, 0, 0, 4, 0, 0, 0, 0, 0, 4, 4, 4, 3, 0, 4, 4, 0, 0, 4, 4, 0, 4, 4, 0, 4, 0, 0, 4, 4, 4, 6, 0, 0, 4, 6, 4, 0, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 4, 6, 6, 0, 4, 6, 6, 4, 0, 4, 6, 4, 0, 0, 6, 6, 0, 0, 6, 6, 4, 0, 0, 0, 4, 4, 6, 6, 4, 4, 0, 4, 6, 0, 6, 0, 0, 4, 0, 4, 6, 6, 0, 0, 0, 6, 6, 6, 0, 6, 6, 6, 0, 4, 4, 4, 0, 6, 4, 6, 6, 4, 6, 6, 0, 4, 6, 6, 6, 4, 4, 4, 0, 4, 0, 6, 6, 6, 6, 6, 6, 6, 4, 0, 4, 0, 6, 0, 4, 0, 4, 4, 6, 4, 4, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 4, 4, 6, 4, 4, 4, 6, 6, 4, 4, 3, 0, 4, 6, 6, 4, 0, 6, 4, 6, 6, 0, 0, 0, 4, 4, 6, 0, 0, 6, 4, 4, 6, 4, 6, 4, 4, 4, 3, 3, 3, 3, 3, 0, 0, 0, 0, 6, 6, 4, 4, 6, 6, 6, 0, 0, 7, 0, 0, 0, 4, 6, 0, 0, 0, 6, 4, 0, 10, 11, 11, 11, 11, 11, 11, 11, 8, 8, 8, 0, 0, 0, 0, 9, 6, 4, 6, 0, 4, 6, 4, 6, 0, 6, 6, 6, 6, 6, 6, 0, 0, 4, 6, 4, 4, 4, 4, 3, 3, 3, 3, 4, 0, 0, 5, 5, 5, 5, 5, 5, }; /* Grapheme_Cluster_Break: 2640 bytes. */ RE_UINT32 re_get_grapheme_cluster_break(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_grapheme_cluster_break_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_grapheme_cluster_break_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_grapheme_cluster_break_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_grapheme_cluster_break_stage_4[pos + f] << 2; value = re_grapheme_cluster_break_stage_5[pos + code]; return value; } /* Sentence_Break. */ static RE_UINT8 re_sentence_break_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 6, 7, 5, 5, 8, 9, 10, 11, 12, 13, 14, 15, 9, 16, 9, 9, 9, 9, 17, 9, 18, 19, 20, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 21, 22, 23, 9, 9, 24, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 25, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, }; static RE_UINT8 re_sentence_break_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 17, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 33, 33, 36, 33, 37, 33, 33, 38, 39, 40, 33, 41, 42, 33, 33, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 43, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 44, 17, 17, 17, 17, 45, 17, 46, 47, 48, 49, 50, 51, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 52, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 53, 54, 17, 55, 56, 57, 58, 59, 60, 61, 62, 63, 17, 64, 65, 66, 67, 68, 69, 33, 33, 33, 70, 71, 72, 73, 74, 75, 76, 77, 78, 33, 79, 33, 33, 33, 33, 33, 17, 17, 17, 80, 81, 82, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 17, 17, 83, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 84, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 85, 86, 33, 33, 33, 87, 88, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 89, 33, 33, 33, 33, 90, 91, 33, 92, 93, 94, 95, 33, 33, 96, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 97, 33, 33, 33, 33, 33, 98, 33, 33, 99, 33, 33, 33, 33, 100, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 17, 17, 17, 17, 101, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 102, 103, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 104, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 105, 33, 33, 33, 33, 33, 106, 107, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, }; static RE_UINT16 re_sentence_break_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 8, 16, 17, 18, 19, 20, 21, 22, 23, 23, 23, 24, 25, 26, 27, 28, 29, 30, 18, 8, 31, 8, 32, 8, 8, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 41, 41, 44, 45, 46, 47, 48, 41, 41, 49, 50, 51, 52, 53, 54, 55, 55, 56, 55, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 71, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 85, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 55, 99, 100, 101, 55, 102, 103, 104, 105, 106, 107, 108, 55, 41, 109, 110, 111, 112, 29, 113, 114, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 115, 41, 116, 117, 118, 41, 119, 41, 120, 121, 122, 29, 29, 123, 96, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 124, 125, 41, 41, 126, 127, 128, 129, 130, 41, 131, 132, 133, 134, 41, 41, 135, 41, 136, 41, 137, 138, 139, 140, 141, 41, 142, 143, 55, 144, 41, 145, 146, 147, 148, 55, 55, 149, 131, 150, 151, 152, 153, 41, 154, 41, 155, 156, 157, 55, 55, 158, 159, 18, 18, 18, 18, 18, 18, 23, 160, 8, 8, 8, 8, 161, 8, 8, 8, 162, 163, 164, 165, 163, 166, 167, 168, 169, 170, 171, 172, 173, 55, 174, 175, 176, 177, 178, 30, 179, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 180, 181, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 182, 30, 183, 55, 55, 184, 185, 55, 55, 186, 187, 55, 55, 55, 55, 188, 55, 189, 190, 29, 191, 192, 193, 8, 8, 8, 194, 18, 195, 41, 196, 197, 198, 198, 23, 199, 200, 201, 55, 55, 55, 55, 55, 202, 203, 96, 41, 204, 96, 41, 114, 205, 206, 41, 41, 207, 208, 55, 209, 41, 41, 41, 41, 41, 137, 55, 55, 41, 41, 41, 41, 41, 41, 137, 55, 41, 41, 41, 41, 210, 55, 209, 211, 212, 213, 8, 214, 215, 41, 41, 216, 217, 218, 8, 219, 220, 221, 55, 222, 223, 224, 41, 225, 226, 131, 227, 228, 50, 229, 230, 231, 58, 232, 233, 234, 41, 235, 236, 237, 41, 238, 239, 240, 241, 242, 243, 244, 18, 18, 41, 245, 41, 41, 41, 41, 41, 246, 247, 248, 41, 41, 41, 249, 41, 41, 250, 55, 251, 252, 253, 41, 41, 254, 255, 41, 41, 256, 209, 41, 257, 41, 258, 259, 260, 261, 262, 263, 41, 41, 41, 264, 265, 2, 266, 267, 268, 138, 269, 270, 271, 272, 273, 55, 41, 41, 41, 208, 55, 55, 41, 56, 55, 55, 55, 274, 55, 55, 55, 55, 231, 41, 275, 276, 41, 209, 277, 278, 279, 41, 280, 55, 29, 281, 282, 41, 279, 133, 55, 55, 41, 283, 41, 284, 55, 55, 55, 55, 41, 197, 137, 258, 55, 55, 55, 55, 285, 286, 137, 197, 138, 55, 55, 287, 137, 250, 55, 55, 41, 288, 55, 55, 289, 290, 291, 231, 231, 55, 104, 292, 41, 137, 137, 293, 254, 55, 55, 55, 41, 41, 294, 55, 29, 295, 18, 296, 152, 297, 298, 299, 152, 300, 301, 302, 152, 303, 304, 305, 152, 232, 306, 55, 307, 308, 55, 55, 309, 310, 311, 312, 313, 71, 314, 315, 55, 55, 55, 55, 55, 55, 55, 55, 41, 47, 316, 55, 55, 55, 55, 55, 41, 317, 318, 55, 41, 47, 319, 55, 41, 320, 133, 55, 321, 322, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 29, 18, 323, 55, 55, 55, 55, 55, 55, 41, 324, 41, 41, 41, 41, 250, 55, 55, 55, 41, 41, 41, 207, 41, 41, 41, 41, 41, 41, 284, 55, 55, 55, 55, 55, 41, 207, 55, 55, 55, 55, 55, 55, 41, 41, 325, 55, 55, 55, 55, 55, 41, 324, 138, 326, 55, 55, 209, 327, 41, 328, 329, 330, 122, 55, 55, 55, 41, 41, 331, 332, 333, 55, 55, 55, 334, 55, 55, 55, 55, 55, 55, 55, 41, 41, 41, 335, 336, 337, 55, 55, 55, 55, 55, 338, 339, 340, 55, 55, 55, 55, 341, 55, 55, 55, 55, 55, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 342, 343, 355, 345, 356, 357, 358, 349, 359, 360, 361, 362, 363, 364, 191, 365, 366, 367, 368, 23, 369, 23, 370, 371, 372, 55, 55, 41, 41, 41, 41, 41, 41, 373, 55, 374, 375, 376, 377, 378, 379, 55, 55, 55, 380, 381, 381, 382, 55, 55, 55, 55, 55, 55, 383, 55, 55, 55, 55, 41, 41, 41, 41, 41, 41, 197, 55, 41, 56, 41, 41, 41, 41, 41, 41, 279, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 334, 55, 55, 279, 55, 55, 55, 55, 55, 55, 55, 384, 385, 385, 385, 55, 55, 55, 55, 23, 23, 23, 23, 23, 23, 23, 386, }; static RE_UINT8 re_sentence_break_stage_4[] = { 0, 0, 1, 2, 0, 0, 0, 0, 3, 4, 5, 6, 7, 7, 8, 9, 10, 11, 11, 11, 11, 11, 12, 13, 14, 15, 15, 15, 15, 15, 16, 13, 0, 17, 0, 0, 0, 0, 0, 0, 18, 0, 19, 20, 0, 21, 19, 0, 11, 11, 11, 11, 11, 22, 11, 23, 15, 15, 15, 15, 15, 24, 15, 15, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 27, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 28, 29, 30, 31, 32, 33, 28, 31, 34, 28, 25, 31, 29, 31, 32, 26, 35, 34, 36, 28, 31, 26, 26, 26, 26, 27, 25, 25, 25, 25, 30, 31, 25, 25, 25, 25, 25, 25, 25, 15, 33, 30, 26, 23, 25, 25, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 37, 15, 15, 15, 15, 15, 15, 15, 15, 38, 36, 39, 40, 36, 36, 41, 0, 0, 0, 15, 42, 0, 43, 0, 0, 0, 0, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 25, 45, 46, 47, 0, 48, 22, 49, 32, 11, 11, 11, 50, 11, 11, 15, 15, 15, 15, 15, 15, 15, 15, 51, 33, 34, 25, 25, 25, 25, 25, 25, 15, 52, 30, 32, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 15, 15, 15, 15, 53, 44, 54, 25, 25, 25, 25, 25, 28, 26, 26, 29, 25, 25, 25, 25, 25, 25, 25, 25, 10, 11, 11, 11, 11, 11, 11, 11, 11, 22, 55, 56, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 57, 0, 58, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 59, 60, 59, 0, 0, 36, 36, 36, 36, 36, 36, 61, 0, 36, 0, 0, 0, 62, 63, 0, 64, 44, 44, 65, 66, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 67, 44, 44, 44, 44, 44, 7, 7, 68, 69, 70, 36, 36, 36, 36, 36, 36, 36, 36, 71, 44, 72, 44, 73, 74, 75, 7, 7, 76, 77, 78, 0, 0, 79, 80, 36, 36, 36, 36, 36, 36, 36, 44, 44, 44, 44, 44, 44, 65, 81, 36, 36, 36, 36, 36, 82, 44, 44, 83, 0, 0, 0, 7, 7, 76, 36, 36, 36, 36, 36, 36, 36, 67, 44, 44, 41, 84, 0, 36, 36, 36, 36, 36, 82, 85, 44, 44, 86, 86, 87, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 88, 0, 0, 89, 44, 44, 44, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 82, 90, 44, 44, 44, 44, 86, 44, 36, 36, 82, 91, 7, 7, 81, 36, 36, 36, 86, 81, 36, 77, 77, 36, 36, 36, 36, 36, 92, 36, 43, 40, 41, 90, 44, 93, 93, 94, 0, 89, 0, 95, 82, 96, 7, 7, 41, 0, 0, 0, 58, 81, 61, 97, 77, 36, 36, 36, 36, 36, 92, 36, 92, 98, 41, 74, 65, 89, 93, 87, 99, 0, 81, 43, 0, 96, 7, 7, 75, 100, 0, 0, 58, 81, 36, 95, 95, 36, 36, 36, 36, 36, 92, 36, 92, 81, 41, 90, 44, 59, 59, 87, 88, 0, 0, 0, 82, 96, 7, 7, 0, 0, 55, 0, 58, 81, 36, 77, 77, 36, 36, 36, 44, 93, 93, 87, 0, 101, 0, 95, 82, 96, 7, 7, 55, 0, 0, 0, 102, 81, 61, 40, 92, 41, 98, 92, 97, 88, 61, 40, 36, 36, 41, 101, 65, 101, 74, 87, 88, 89, 0, 0, 0, 96, 7, 7, 0, 0, 0, 0, 44, 81, 36, 92, 92, 36, 36, 36, 36, 36, 92, 36, 36, 36, 41, 103, 44, 74, 74, 87, 0, 60, 61, 0, 82, 96, 7, 7, 0, 0, 0, 0, 58, 81, 36, 92, 92, 36, 36, 36, 36, 36, 92, 36, 36, 81, 41, 90, 44, 74, 74, 87, 0, 60, 0, 104, 82, 96, 7, 7, 98, 0, 0, 0, 36, 36, 36, 36, 36, 36, 61, 103, 44, 74, 74, 94, 0, 89, 0, 97, 82, 96, 7, 7, 0, 0, 40, 36, 101, 81, 36, 36, 36, 61, 40, 36, 36, 36, 36, 36, 95, 36, 36, 55, 36, 61, 105, 89, 44, 106, 44, 44, 0, 96, 7, 7, 101, 0, 0, 0, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 80, 44, 65, 0, 36, 67, 44, 65, 7, 7, 107, 0, 98, 77, 43, 55, 0, 36, 81, 36, 81, 108, 40, 81, 80, 44, 59, 83, 36, 43, 44, 87, 7, 7, 107, 36, 88, 0, 0, 0, 0, 0, 87, 0, 7, 7, 107, 0, 0, 109, 110, 111, 36, 36, 81, 36, 36, 36, 36, 36, 36, 36, 36, 88, 58, 44, 44, 44, 44, 74, 36, 86, 44, 44, 58, 44, 44, 44, 44, 44, 44, 44, 44, 112, 0, 105, 0, 0, 0, 0, 0, 0, 36, 36, 67, 44, 44, 44, 44, 113, 7, 7, 114, 0, 36, 82, 75, 82, 90, 73, 44, 75, 86, 70, 36, 36, 82, 44, 44, 85, 7, 7, 115, 87, 11, 50, 0, 116, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 61, 36, 36, 36, 92, 41, 36, 61, 92, 41, 36, 36, 92, 41, 36, 36, 36, 36, 36, 36, 36, 36, 92, 41, 36, 61, 92, 41, 36, 36, 36, 61, 36, 36, 36, 36, 36, 36, 92, 41, 36, 36, 36, 36, 36, 36, 36, 36, 61, 58, 117, 9, 118, 0, 0, 0, 0, 0, 36, 36, 36, 36, 0, 0, 0, 0, 11, 11, 11, 11, 11, 119, 15, 39, 36, 36, 36, 120, 36, 36, 36, 36, 121, 36, 36, 36, 36, 36, 122, 123, 36, 36, 61, 40, 36, 36, 88, 0, 36, 36, 36, 92, 82, 112, 0, 0, 36, 36, 36, 36, 82, 124, 0, 0, 36, 36, 36, 36, 82, 0, 0, 0, 36, 36, 36, 92, 125, 0, 0, 0, 36, 36, 36, 36, 36, 44, 44, 44, 44, 44, 44, 44, 44, 97, 0, 100, 7, 7, 107, 0, 0, 0, 0, 0, 126, 0, 127, 128, 7, 7, 107, 0, 36, 36, 36, 36, 36, 36, 0, 0, 36, 36, 129, 0, 36, 36, 36, 36, 36, 36, 36, 36, 36, 41, 0, 0, 36, 36, 36, 36, 36, 36, 36, 61, 44, 44, 44, 0, 44, 44, 44, 0, 0, 91, 7, 7, 36, 36, 36, 36, 36, 36, 36, 41, 36, 88, 0, 0, 36, 36, 36, 0, 36, 36, 36, 36, 36, 36, 41, 0, 7, 7, 107, 0, 36, 36, 36, 36, 36, 67, 44, 0, 36, 36, 36, 36, 36, 86, 44, 65, 44, 44, 44, 44, 44, 44, 44, 93, 7, 7, 107, 0, 7, 7, 107, 0, 0, 97, 130, 0, 44, 44, 44, 65, 44, 70, 36, 36, 36, 36, 36, 36, 44, 70, 36, 0, 7, 7, 114, 131, 0, 0, 89, 44, 44, 0, 0, 0, 113, 36, 36, 36, 36, 36, 36, 36, 86, 44, 44, 75, 7, 7, 76, 36, 36, 82, 44, 44, 44, 0, 0, 0, 36, 44, 44, 44, 44, 44, 9, 118, 7, 7, 107, 81, 7, 7, 76, 36, 36, 36, 36, 36, 36, 36, 36, 132, 0, 0, 0, 0, 65, 44, 44, 44, 44, 44, 70, 80, 82, 133, 87, 0, 44, 44, 44, 44, 44, 87, 0, 44, 25, 25, 25, 25, 25, 34, 15, 27, 15, 15, 11, 11, 15, 39, 11, 119, 15, 15, 11, 11, 15, 15, 11, 11, 15, 39, 11, 119, 15, 15, 134, 134, 15, 15, 11, 11, 15, 15, 15, 39, 15, 15, 11, 11, 15, 135, 11, 136, 46, 135, 11, 137, 15, 46, 11, 0, 15, 15, 11, 137, 46, 135, 11, 137, 138, 138, 139, 140, 141, 142, 143, 143, 0, 144, 145, 146, 0, 0, 147, 148, 0, 149, 148, 0, 0, 0, 0, 150, 62, 151, 62, 62, 21, 0, 0, 152, 0, 0, 0, 147, 15, 15, 15, 42, 0, 0, 0, 0, 44, 44, 44, 44, 44, 44, 44, 44, 112, 0, 0, 0, 48, 153, 154, 155, 23, 116, 10, 119, 0, 156, 49, 157, 11, 38, 158, 33, 0, 159, 39, 160, 0, 0, 0, 0, 161, 38, 88, 0, 0, 0, 0, 0, 0, 0, 143, 0, 0, 0, 0, 0, 0, 0, 147, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 162, 11, 11, 15, 15, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 143, 123, 0, 143, 143, 143, 5, 0, 0, 0, 147, 0, 0, 0, 0, 0, 0, 0, 163, 143, 143, 0, 0, 0, 0, 4, 143, 143, 143, 143, 143, 123, 0, 0, 0, 0, 0, 0, 0, 143, 0, 0, 0, 0, 0, 0, 0, 0, 5, 11, 11, 11, 22, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 24, 31, 164, 26, 32, 25, 29, 15, 33, 25, 42, 153, 165, 54, 0, 0, 0, 15, 166, 0, 21, 36, 36, 36, 36, 36, 36, 0, 97, 0, 0, 0, 89, 36, 36, 36, 36, 36, 61, 0, 0, 36, 61, 36, 61, 36, 61, 36, 61, 143, 143, 143, 5, 0, 0, 0, 5, 143, 143, 5, 167, 0, 0, 0, 118, 168, 0, 0, 0, 0, 0, 0, 0, 169, 81, 143, 143, 5, 143, 143, 170, 81, 36, 82, 44, 81, 41, 36, 88, 36, 36, 36, 36, 36, 61, 60, 81, 0, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 41, 81, 36, 36, 36, 36, 36, 36, 61, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 61, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 36, 88, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 36, 171, 36, 36, 36, 172, 36, 36, 36, 36, 7, 7, 76, 0, 0, 0, 0, 0, 25, 25, 25, 173, 65, 44, 44, 174, 25, 25, 25, 25, 25, 25, 25, 175, 36, 36, 36, 36, 176, 9, 0, 0, 0, 0, 0, 0, 0, 97, 36, 36, 177, 25, 25, 25, 27, 25, 25, 25, 25, 25, 25, 25, 15, 15, 26, 30, 25, 25, 178, 179, 25, 27, 25, 25, 25, 25, 31, 119, 11, 25, 0, 0, 0, 0, 0, 0, 0, 97, 180, 36, 181, 181, 67, 36, 36, 36, 36, 36, 67, 44, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 131, 0, 0, 75, 36, 36, 36, 36, 36, 36, 36, 44, 112, 0, 131, 7, 7, 107, 0, 44, 44, 44, 44, 75, 36, 97, 55, 36, 82, 44, 176, 36, 36, 36, 36, 36, 67, 44, 44, 44, 0, 0, 0, 36, 36, 36, 36, 36, 36, 36, 88, 36, 36, 36, 36, 67, 44, 44, 44, 112, 0, 148, 97, 7, 7, 107, 0, 36, 80, 36, 36, 7, 7, 76, 61, 36, 36, 86, 44, 44, 65, 0, 0, 67, 36, 36, 87, 7, 7, 107, 182, 36, 36, 36, 36, 36, 61, 183, 75, 36, 36, 36, 36, 90, 73, 70, 82, 129, 0, 0, 0, 0, 0, 97, 41, 36, 36, 67, 44, 184, 185, 0, 0, 81, 61, 81, 61, 81, 61, 0, 0, 36, 61, 36, 61, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 24, 15, 15, 39, 0, 0, 15, 15, 15, 15, 67, 44, 186, 87, 7, 7, 107, 0, 36, 0, 0, 0, 36, 36, 36, 36, 36, 61, 97, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 0, 36, 36, 36, 41, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 41, 0, 15, 24, 0, 0, 187, 15, 0, 188, 36, 36, 92, 36, 36, 61, 36, 43, 95, 92, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 41, 0, 0, 0, 0, 0, 0, 0, 97, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 189, 36, 36, 36, 36, 40, 36, 36, 36, 36, 36, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 36, 36, 0, 44, 44, 44, 44, 190, 4, 123, 0, 44, 44, 44, 44, 191, 170, 143, 143, 143, 192, 123, 0, 6, 193, 194, 195, 141, 0, 0, 0, 36, 92, 36, 36, 36, 36, 36, 36, 36, 36, 36, 196, 57, 0, 5, 6, 0, 0, 197, 9, 14, 15, 15, 15, 15, 15, 16, 198, 199, 200, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 82, 40, 36, 40, 36, 40, 36, 40, 88, 0, 0, 0, 0, 0, 0, 201, 0, 36, 36, 36, 81, 36, 36, 36, 36, 36, 61, 36, 36, 36, 36, 61, 95, 36, 36, 36, 41, 36, 36, 36, 41, 0, 0, 0, 0, 0, 0, 0, 99, 36, 36, 36, 36, 88, 0, 0, 0, 112, 0, 0, 0, 0, 0, 0, 0, 36, 36, 61, 0, 36, 36, 36, 36, 36, 36, 36, 36, 36, 82, 65, 0, 36, 36, 36, 36, 36, 36, 36, 41, 36, 0, 36, 36, 81, 41, 0, 0, 11, 11, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 36, 36, 36, 36, 36, 36, 0, 0, 36, 36, 36, 36, 36, 0, 0, 0, 0, 0, 0, 0, 36, 41, 92, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 95, 88, 77, 36, 36, 36, 36, 61, 41, 0, 0, 36, 36, 36, 36, 36, 36, 0, 40, 86, 60, 0, 44, 36, 81, 81, 36, 36, 36, 36, 36, 36, 0, 65, 89, 0, 0, 0, 0, 0, 131, 0, 0, 36, 185, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 61, 0, 0, 0, 36, 36, 88, 0, 0, 0, 0, 0, 11, 11, 11, 11, 22, 0, 0, 0, 15, 15, 15, 15, 24, 0, 0, 0, 36, 36, 36, 36, 36, 36, 44, 44, 44, 186, 118, 0, 0, 0, 0, 0, 0, 96, 7, 7, 0, 0, 0, 89, 36, 36, 36, 36, 44, 44, 65, 202, 148, 0, 0, 0, 36, 36, 36, 36, 36, 36, 88, 0, 7, 7, 107, 0, 36, 67, 44, 44, 44, 203, 7, 7, 182, 0, 0, 0, 36, 36, 36, 36, 36, 36, 36, 36, 67, 104, 0, 0, 70, 204, 101, 205, 7, 7, 206, 172, 36, 36, 36, 36, 95, 36, 36, 36, 36, 36, 36, 44, 44, 44, 207, 118, 36, 61, 92, 95, 36, 36, 36, 95, 36, 36, 208, 0, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 67, 44, 44, 65, 0, 7, 7, 107, 0, 44, 81, 36, 77, 77, 36, 36, 36, 44, 93, 93, 87, 88, 89, 0, 81, 82, 101, 44, 112, 44, 112, 0, 0, 44, 95, 0, 0, 7, 7, 107, 0, 36, 36, 36, 67, 44, 87, 44, 44, 209, 0, 182, 130, 130, 130, 36, 87, 124, 88, 0, 0, 7, 7, 107, 0, 36, 36, 67, 44, 44, 44, 0, 0, 36, 36, 36, 36, 36, 36, 41, 58, 44, 44, 44, 0, 7, 7, 107, 78, 7, 7, 107, 0, 0, 0, 0, 97, 36, 36, 36, 36, 36, 36, 88, 0, 36, 61, 0, 0, 0, 0, 0, 0, 7, 7, 107, 131, 0, 0, 0, 0, 36, 36, 36, 41, 44, 205, 0, 0, 36, 36, 36, 36, 44, 186, 118, 0, 36, 118, 0, 0, 7, 7, 107, 0, 97, 36, 36, 36, 36, 36, 0, 81, 36, 88, 0, 0, 86, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 65, 0, 0, 0, 89, 113, 36, 36, 36, 41, 0, 0, 0, 0, 0, 0, 0, 36, 36, 61, 0, 36, 36, 36, 88, 36, 36, 88, 0, 36, 36, 41, 210, 62, 0, 0, 0, 0, 0, 0, 0, 0, 58, 87, 58, 211, 62, 212, 44, 65, 58, 44, 0, 0, 0, 0, 0, 0, 0, 101, 87, 0, 0, 0, 0, 101, 112, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 135, 15, 15, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 49, 48, 213, 10, 49, 11, 155, 166, 14, 15, 14, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 50, 22, 10, 11, 49, 11, 214, 15, 15, 15, 15, 15, 15, 50, 22, 11, 156, 162, 11, 214, 15, 15, 15, 15, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 11, 11, 11, 155, 15, 15, 15, 15, 155, 15, 15, 15, 15, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 11, 11, 11, 11, 15, 39, 11, 11, 11, 11, 11, 11, 214, 15, 15, 15, 15, 15, 24, 15, 33, 11, 11, 11, 11, 11, 22, 15, 15, 15, 15, 15, 15, 135, 15, 11, 11, 11, 11, 11, 11, 214, 15, 15, 15, 15, 15, 24, 15, 33, 11, 11, 15, 15, 135, 15, 11, 11, 11, 11, 11, 11, 214, 15, 15, 15, 15, 15, 24, 15, 27, 96, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 44, 44, 44, 44, 44, 65, 89, 44, 44, 44, 44, 112, 0, 99, 0, 0, 0, 112, 118, 0, 0, 0, 89, 44, 58, 44, 44, 44, 0, 0, 0, 0, 36, 88, 0, 0, 44, 65, 0, 0, 36, 81, 36, 36, 36, 36, 36, 36, 98, 77, 81, 36, 61, 36, 108, 0, 104, 97, 108, 81, 98, 77, 108, 108, 98, 77, 61, 36, 61, 36, 81, 43, 36, 36, 95, 36, 36, 36, 36, 0, 81, 81, 95, 36, 36, 36, 36, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 119, 0, 11, 11, 11, 11, 11, 11, 119, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 163, 123, 0, 20, 0, 0, 0, 0, 0, 0, 0, 62, 62, 62, 62, 62, 62, 62, 62, 44, 44, 44, 44, 0, 0, 0, 0, }; static RE_UINT8 re_sentence_break_stage_5[] = { 0, 0, 0, 0, 0, 6, 2, 6, 6, 1, 0, 0, 6, 12, 13, 0, 0, 0, 0, 13, 13, 13, 0, 0, 14, 14, 11, 0, 10, 10, 10, 10, 10, 10, 14, 0, 0, 0, 0, 12, 0, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 13, 0, 13, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 13, 0, 4, 0, 0, 6, 0, 0, 0, 0, 0, 7, 13, 0, 5, 0, 0, 0, 7, 0, 0, 8, 8, 8, 0, 8, 8, 8, 7, 7, 7, 7, 0, 8, 7, 8, 7, 7, 8, 7, 8, 7, 7, 8, 7, 8, 8, 7, 8, 7, 8, 7, 7, 7, 8, 8, 7, 8, 7, 8, 8, 7, 8, 8, 8, 7, 7, 8, 8, 8, 7, 7, 7, 8, 7, 7, 9, 9, 9, 9, 9, 9, 7, 7, 7, 7, 9, 9, 9, 7, 7, 0, 0, 0, 0, 9, 9, 9, 9, 0, 0, 7, 0, 0, 0, 9, 0, 9, 0, 3, 3, 3, 3, 9, 0, 8, 7, 0, 0, 7, 7, 7, 7, 0, 8, 0, 0, 8, 0, 8, 0, 8, 8, 8, 8, 0, 8, 7, 7, 7, 8, 8, 7, 0, 8, 8, 7, 0, 3, 3, 3, 8, 7, 0, 9, 0, 0, 0, 14, 0, 0, 0, 12, 0, 0, 0, 3, 3, 3, 3, 3, 0, 3, 0, 3, 3, 0, 9, 9, 9, 0, 5, 5, 5, 5, 5, 5, 0, 0, 14, 14, 0, 0, 3, 3, 3, 0, 5, 0, 0, 12, 9, 9, 9, 3, 10, 10, 0, 10, 10, 0, 9, 9, 3, 9, 9, 9, 12, 9, 3, 3, 3, 5, 0, 3, 3, 9, 9, 3, 3, 0, 3, 3, 3, 3, 9, 9, 10, 10, 9, 9, 9, 0, 0, 9, 12, 12, 12, 0, 0, 0, 0, 5, 9, 3, 9, 9, 0, 9, 9, 9, 9, 9, 3, 3, 3, 9, 0, 0, 14, 12, 9, 0, 3, 3, 9, 3, 9, 3, 3, 3, 3, 3, 0, 0, 9, 0, 0, 0, 0, 0, 0, 3, 3, 9, 3, 3, 12, 12, 10, 10, 9, 0, 9, 9, 3, 0, 0, 3, 3, 3, 9, 0, 9, 9, 0, 9, 0, 0, 10, 10, 0, 0, 0, 9, 0, 9, 9, 0, 0, 3, 0, 0, 9, 3, 0, 0, 0, 0, 3, 3, 0, 0, 3, 9, 0, 9, 3, 3, 0, 0, 9, 0, 0, 0, 3, 0, 3, 0, 3, 0, 10, 10, 0, 0, 0, 9, 0, 9, 0, 3, 0, 3, 0, 3, 13, 13, 13, 13, 3, 3, 3, 0, 0, 0, 3, 3, 3, 9, 10, 10, 12, 12, 10, 10, 3, 3, 0, 8, 0, 0, 0, 0, 12, 0, 12, 0, 0, 0, 8, 8, 0, 0, 9, 0, 12, 9, 6, 9, 9, 9, 9, 9, 9, 13, 13, 0, 0, 0, 3, 12, 12, 0, 9, 0, 3, 3, 0, 0, 14, 12, 14, 12, 0, 3, 3, 3, 5, 0, 9, 3, 9, 0, 12, 12, 12, 12, 0, 0, 12, 12, 9, 9, 12, 12, 3, 9, 9, 0, 0, 8, 0, 8, 7, 0, 7, 7, 8, 0, 7, 0, 8, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 5, 3, 3, 5, 5, 0, 0, 0, 14, 14, 0, 0, 0, 13, 13, 13, 13, 11, 0, 0, 0, 4, 4, 5, 5, 5, 5, 5, 6, 0, 13, 13, 0, 12, 12, 0, 0, 0, 13, 13, 12, 0, 0, 0, 6, 5, 0, 5, 5, 0, 13, 13, 7, 0, 0, 0, 8, 0, 0, 7, 8, 8, 8, 7, 7, 8, 0, 8, 0, 8, 8, 0, 7, 9, 7, 0, 0, 0, 8, 7, 7, 0, 0, 7, 0, 9, 9, 9, 8, 0, 0, 8, 8, 0, 0, 13, 13, 8, 7, 7, 8, 7, 8, 7, 3, 7, 7, 0, 7, 0, 0, 12, 9, 0, 0, 13, 0, 6, 14, 12, 0, 0, 13, 13, 13, 9, 9, 0, 12, 9, 0, 12, 12, 8, 7, 9, 3, 3, 3, 0, 9, 7, 7, 3, 3, 3, 3, 0, 12, 0, 0, 8, 7, 9, 0, 0, 8, 7, 8, 7, 9, 7, 7, 7, 9, 9, 9, 3, 9, 0, 12, 12, 12, 0, 0, 9, 3, 12, 12, 9, 9, 9, 3, 3, 0, 3, 3, 3, 12, 0, 0, 0, 7, 0, 9, 3, 9, 9, 9, 13, 13, 14, 14, 0, 14, 0, 14, 14, 0, 13, 0, 0, 13, 0, 14, 12, 12, 14, 13, 13, 13, 13, 13, 13, 0, 9, 0, 0, 5, 0, 0, 14, 0, 0, 13, 0, 13, 13, 12, 13, 13, 14, 0, 9, 9, 0, 5, 5, 5, 0, 5, 12, 12, 3, 0, 10, 10, 9, 12, 12, 0, 3, 12, 0, 0, 10, 10, 9, 0, 12, 12, 0, 12, 9, 12, 0, 0, 3, 0, 12, 12, 0, 3, 3, 12, 3, 3, 3, 5, 5, 5, 5, 3, 0, 8, 8, 0, 8, 0, 7, 7, }; /* Sentence_Break: 6372 bytes. */ RE_UINT32 re_get_sentence_break(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_sentence_break_stage_1[f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_sentence_break_stage_2[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_sentence_break_stage_3[pos + f] << 3; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_sentence_break_stage_4[pos + f] << 2; value = re_sentence_break_stage_5[pos + code]; return value; } /* Math. */ static RE_UINT8 re_math_stage_1[] = { 0, 1, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_math_stage_2[] = { 0, 1, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 6, 1, 1, }; static RE_UINT8 re_math_stage_3[] = { 0, 1, 1, 2, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 5, 6, 7, 1, 8, 9, 10, 1, 6, 6, 11, 1, 1, 1, 1, 1, 1, 1, 12, 1, 1, 13, 14, 1, 1, 1, 1, 15, 16, 17, 18, 1, 1, 1, 1, 1, 1, 19, 1, }; static RE_UINT8 re_math_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 9, 10, 11, 12, 13, 0, 14, 15, 16, 17, 18, 0, 19, 20, 21, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 25, 0, 26, 27, 28, 29, 30, 0, 0, 0, 0, 0, 31, 32, 33, 34, 0, 35, 36, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 19, 37, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 0, 0, 0, 1, 3, 3, 0, 0, 0, 0, 40, 23, 23, 41, 23, 42, 43, 44, 23, 45, 46, 47, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 48, 23, 23, 23, 23, 23, 23, 23, 23, 49, 23, 44, 50, 51, 52, 53, 54, 0, 55, }; static RE_UINT8 re_math_stage_5[] = { 0, 0, 0, 0, 0, 8, 0, 112, 0, 0, 0, 64, 0, 0, 0, 80, 0, 16, 2, 0, 0, 0, 128, 0, 0, 0, 39, 0, 0, 0, 115, 0, 192, 1, 0, 0, 0, 0, 64, 0, 0, 0, 28, 0, 17, 0, 4, 0, 30, 0, 0, 124, 0, 124, 0, 0, 0, 0, 255, 31, 98, 248, 0, 0, 132, 252, 47, 63, 16, 179, 251, 241, 255, 11, 0, 0, 0, 0, 255, 255, 255, 126, 195, 240, 255, 255, 255, 47, 48, 0, 240, 255, 255, 255, 255, 255, 0, 15, 0, 0, 3, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 248, 255, 255, 191, 0, 0, 0, 1, 240, 7, 0, 0, 0, 3, 192, 255, 240, 195, 140, 15, 0, 148, 31, 0, 255, 96, 0, 0, 0, 5, 0, 0, 0, 15, 224, 0, 0, 159, 31, 0, 0, 0, 2, 0, 0, 126, 1, 0, 0, 4, 30, 0, 0, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 255, 207, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 0, 0, 3, 0, }; /* Math: 538 bytes. */ RE_UINT32 re_get_math(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_math_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_math_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_math_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_math_stage_4[pos + f] << 5; pos += code; value = (re_math_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Alphabetic. */ static RE_UINT8 re_alphabetic_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_alphabetic_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 27, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_alphabetic_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 31, 36, 37, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 48, 49, 1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 1, 59, 60, 61, 62, 63, 64, 31, 31, 31, 65, 66, 67, 68, 69, 70, 71, 72, 73, 31, 74, 31, 31, 31, 31, 31, 1, 1, 1, 75, 76, 77, 31, 31, 1, 1, 1, 1, 78, 31, 31, 31, 31, 31, 31, 31, 1, 1, 79, 31, 1, 1, 80, 81, 31, 31, 31, 82, 83, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 84, 31, 31, 31, 31, 31, 31, 31, 85, 86, 87, 88, 89, 31, 31, 31, 31, 31, 90, 31, 31, 91, 31, 31, 31, 31, 31, 31, 1, 1, 1, 1, 1, 1, 92, 1, 1, 1, 1, 1, 1, 1, 1, 93, 94, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 95, 31, 1, 1, 96, 31, 31, 31, 31, 31, }; static RE_UINT8 re_alphabetic_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 0, 0, 7, 8, 9, 10, 4, 11, 4, 4, 4, 4, 12, 4, 4, 4, 4, 13, 14, 15, 16, 17, 18, 19, 20, 4, 21, 22, 4, 4, 23, 24, 25, 4, 26, 4, 4, 27, 28, 29, 30, 31, 32, 0, 0, 33, 0, 34, 4, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 47, 51, 52, 53, 54, 55, 0, 56, 57, 58, 59, 60, 61, 62, 63, 60, 64, 65, 66, 67, 68, 69, 70, 15, 71, 72, 0, 73, 74, 75, 0, 76, 0, 77, 78, 79, 80, 0, 0, 4, 81, 25, 82, 83, 4, 84, 85, 4, 4, 86, 4, 87, 88, 89, 4, 90, 4, 91, 0, 92, 4, 4, 93, 15, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 94, 1, 4, 4, 95, 96, 97, 97, 98, 4, 99, 100, 0, 0, 4, 4, 101, 4, 102, 4, 103, 104, 105, 25, 106, 4, 107, 108, 0, 109, 4, 104, 110, 0, 111, 0, 0, 4, 112, 113, 0, 4, 114, 4, 115, 4, 103, 116, 117, 0, 0, 0, 118, 4, 4, 4, 4, 4, 4, 0, 119, 93, 4, 120, 117, 4, 121, 122, 123, 0, 0, 0, 124, 125, 0, 0, 0, 126, 127, 128, 4, 129, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 4, 108, 4, 131, 104, 4, 4, 4, 4, 132, 4, 84, 4, 133, 134, 135, 135, 4, 0, 136, 0, 0, 0, 0, 0, 0, 137, 138, 15, 4, 139, 15, 4, 85, 140, 141, 4, 4, 142, 71, 0, 25, 4, 4, 4, 4, 4, 103, 0, 0, 4, 4, 4, 4, 4, 4, 103, 0, 4, 4, 4, 4, 31, 0, 25, 117, 143, 144, 4, 145, 4, 4, 4, 92, 146, 147, 4, 4, 148, 149, 0, 146, 150, 16, 4, 97, 4, 4, 59, 151, 28, 102, 152, 80, 4, 153, 136, 154, 4, 134, 155, 156, 4, 104, 157, 158, 159, 160, 85, 161, 4, 4, 4, 162, 4, 4, 4, 4, 4, 163, 164, 109, 4, 4, 4, 165, 4, 4, 166, 0, 167, 168, 169, 4, 4, 27, 170, 4, 4, 117, 25, 4, 171, 4, 16, 172, 0, 0, 0, 173, 4, 4, 4, 80, 0, 1, 1, 174, 4, 104, 175, 0, 176, 177, 178, 0, 4, 4, 4, 71, 0, 0, 4, 33, 0, 0, 0, 0, 0, 0, 0, 0, 80, 4, 179, 0, 4, 25, 102, 71, 117, 4, 180, 0, 4, 4, 4, 4, 117, 0, 0, 0, 4, 181, 4, 59, 0, 0, 0, 0, 4, 134, 103, 16, 0, 0, 0, 0, 182, 183, 103, 134, 104, 0, 0, 184, 103, 166, 0, 0, 4, 185, 0, 0, 186, 97, 0, 80, 80, 0, 77, 187, 4, 103, 103, 152, 27, 0, 0, 0, 4, 4, 129, 0, 4, 152, 4, 152, 4, 4, 188, 0, 147, 32, 25, 129, 4, 152, 25, 189, 4, 4, 190, 0, 191, 192, 0, 0, 193, 194, 4, 129, 38, 47, 195, 59, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 196, 0, 0, 0, 0, 0, 4, 197, 198, 0, 4, 104, 199, 0, 4, 103, 0, 0, 200, 162, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 201, 0, 0, 0, 0, 0, 0, 4, 32, 4, 4, 4, 4, 166, 0, 0, 0, 4, 4, 4, 142, 4, 4, 4, 4, 4, 4, 59, 0, 0, 0, 0, 0, 4, 142, 0, 0, 0, 0, 0, 0, 4, 4, 202, 0, 0, 0, 0, 0, 4, 32, 104, 0, 0, 0, 25, 155, 4, 134, 59, 203, 92, 0, 0, 0, 4, 4, 204, 104, 170, 0, 0, 0, 205, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 206, 207, 0, 0, 0, 4, 4, 208, 4, 209, 210, 211, 4, 212, 213, 214, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 215, 216, 85, 208, 208, 131, 131, 217, 217, 218, 0, 4, 4, 4, 4, 4, 4, 187, 0, 211, 219, 220, 221, 222, 223, 0, 0, 0, 25, 224, 224, 108, 0, 0, 0, 4, 4, 4, 4, 4, 4, 134, 0, 4, 33, 4, 4, 4, 4, 4, 4, 117, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 205, 0, 0, 117, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_alphabetic_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 32, 0, 0, 0, 0, 0, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 255, 255, 254, 0, 192, 255, 255, 255, 255, 239, 31, 254, 225, 0, 156, 0, 0, 255, 255, 0, 224, 255, 255, 255, 255, 3, 0, 0, 252, 255, 255, 255, 7, 48, 4, 255, 255, 255, 252, 255, 31, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 248, 3, 255, 255, 255, 255, 255, 239, 255, 223, 225, 255, 15, 0, 254, 255, 239, 159, 249, 255, 255, 253, 197, 227, 159, 89, 128, 176, 15, 0, 3, 0, 238, 135, 249, 255, 255, 253, 109, 195, 135, 25, 2, 94, 0, 0, 63, 0, 238, 191, 251, 255, 255, 253, 237, 227, 191, 27, 1, 0, 15, 0, 0, 2, 238, 159, 249, 255, 159, 25, 192, 176, 15, 0, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 29, 129, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 29, 96, 7, 15, 0, 0, 0, 238, 223, 253, 255, 255, 253, 239, 227, 223, 29, 96, 64, 15, 0, 6, 0, 255, 255, 255, 231, 223, 93, 128, 128, 15, 0, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 128, 95, 255, 0, 0, 12, 0, 255, 255, 255, 7, 127, 32, 0, 0, 150, 37, 240, 254, 174, 236, 255, 59, 95, 32, 0, 240, 1, 0, 0, 0, 255, 254, 255, 255, 255, 31, 254, 255, 3, 255, 255, 254, 255, 255, 255, 31, 255, 255, 127, 249, 231, 193, 255, 255, 127, 64, 0, 48, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 135, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 15, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 207, 255, 255, 1, 128, 16, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 1, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 0, 0, 255, 255, 255, 15, 254, 255, 31, 0, 128, 0, 0, 0, 255, 255, 239, 255, 239, 15, 0, 0, 255, 243, 0, 252, 191, 255, 3, 0, 0, 224, 0, 252, 255, 255, 255, 63, 0, 222, 111, 0, 128, 255, 31, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 62, 80, 189, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 0, 0, 192, 255, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 255, 255, 127, 0, 127, 127, 127, 127, 0, 128, 0, 0, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 224, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 0, 12, 0, 0, 255, 127, 240, 143, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 0, 0, 252, 40, 255, 255, 7, 0, 255, 255, 247, 255, 223, 255, 0, 124, 255, 63, 0, 0, 255, 255, 127, 196, 5, 0, 0, 56, 255, 255, 60, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 7, 0, 0, 15, 0, 255, 255, 127, 248, 255, 255, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 0, 0, 223, 255, 192, 255, 255, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 15, 255, 62, 0, 255, 0, 255, 255, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 31, 0, 0, 0, 63, 0, 0, 0, 255, 255, 71, 0, 30, 0, 0, 20, 255, 255, 251, 255, 255, 255, 159, 0, 127, 189, 255, 191, 255, 1, 255, 255, 159, 25, 129, 224, 179, 0, 0, 0, 255, 255, 63, 127, 0, 0, 0, 63, 17, 0, 0, 0, 255, 255, 255, 227, 0, 0, 0, 128, 127, 0, 0, 0, 248, 255, 255, 224, 31, 0, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 67, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 15, 0, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 255, 3, 255, 255, }; /* Alphabetic: 2085 bytes. */ RE_UINT32 re_get_alphabetic(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_alphabetic_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_alphabetic_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_alphabetic_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_alphabetic_stage_4[pos + f] << 5; pos += code; value = (re_alphabetic_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Lowercase. */ static RE_UINT8 re_lowercase_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_lowercase_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 8, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_lowercase_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 8, 9, 10, 11, 12, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 16, 17, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 19, 6, 6, 6, 20, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 23, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 24, 25, 26, 27, 6, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_lowercase_stage_4[] = { 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 5, 13, 14, 15, 16, 17, 18, 19, 0, 0, 20, 21, 22, 23, 24, 25, 0, 26, 15, 5, 27, 5, 28, 5, 5, 29, 0, 30, 31, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 15, 15, 15, 15, 15, 15, 0, 0, 5, 5, 5, 5, 33, 5, 5, 5, 34, 35, 36, 37, 35, 38, 39, 40, 0, 0, 0, 41, 42, 0, 0, 0, 43, 44, 45, 26, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 47, 0, 26, 48, 49, 5, 5, 5, 50, 15, 51, 0, 0, 0, 0, 0, 0, 0, 0, 5, 52, 53, 0, 0, 0, 0, 54, 5, 55, 56, 57, 0, 58, 0, 26, 59, 60, 15, 15, 0, 0, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 62, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 64, 0, 0, 0, 0, 0, 0, 15, 0, 65, 66, 67, 31, 68, 69, 70, 71, 72, 73, 74, 75, 76, 65, 66, 77, 31, 68, 78, 63, 71, 79, 80, 81, 82, 78, 83, 26, 84, 71, 85, 0, }; static RE_UINT8 re_lowercase_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 0, 0, 0, 128, 255, 255, 127, 255, 170, 170, 170, 170, 170, 170, 170, 85, 85, 171, 170, 170, 170, 170, 170, 212, 41, 49, 36, 78, 42, 45, 81, 230, 64, 82, 85, 181, 170, 170, 41, 170, 170, 170, 250, 147, 133, 170, 255, 255, 255, 255, 255, 255, 255, 255, 239, 255, 255, 255, 255, 1, 3, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 0, 0, 138, 60, 0, 0, 1, 0, 0, 240, 255, 255, 255, 127, 227, 170, 170, 170, 47, 25, 0, 0, 255, 255, 2, 168, 170, 170, 84, 213, 170, 170, 170, 170, 0, 0, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 63, 170, 170, 234, 191, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 255, 0, 255, 0, 255, 63, 255, 0, 223, 64, 220, 0, 207, 0, 255, 0, 220, 0, 0, 0, 2, 128, 0, 0, 255, 31, 0, 196, 8, 0, 0, 128, 16, 50, 192, 67, 0, 0, 16, 0, 0, 0, 255, 3, 0, 0, 255, 255, 255, 127, 98, 21, 218, 63, 26, 80, 8, 0, 191, 32, 0, 0, 170, 42, 0, 0, 170, 170, 170, 58, 168, 170, 171, 170, 170, 170, 255, 149, 170, 80, 186, 170, 170, 2, 160, 0, 0, 0, 0, 7, 255, 255, 255, 247, 63, 0, 255, 255, 127, 0, 248, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 7, 0, 0, 0, 0, 252, 255, 255, 15, 0, 0, 192, 223, 255, 252, 255, 255, 15, 0, 0, 192, 235, 239, 255, 0, 0, 0, 252, 255, 255, 15, 0, 0, 192, 255, 255, 255, 0, 0, 0, 252, 255, 255, 15, 0, 0, 192, 255, 255, 255, 0, 192, 255, 255, 0, 0, 192, 255, 63, 0, 0, 0, 252, 255, 255, 247, 3, 0, 0, 240, 255, 255, 223, 15, 255, 127, 63, 0, 255, 253, 0, 0, 247, 11, 0, 0, }; /* Lowercase: 777 bytes. */ RE_UINT32 re_get_lowercase(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_lowercase_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_lowercase_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_lowercase_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_lowercase_stage_4[pos + f] << 5; pos += code; value = (re_lowercase_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Uppercase. */ static RE_UINT8 re_uppercase_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_uppercase_stage_2[] = { 0, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 8, 9, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 12, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_uppercase_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 16, 6, 6, 6, 6, 17, 6, 6, 6, 6, 6, 6, 6, 18, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 20, 21, 22, 23, 6, 24, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_uppercase_stage_4[] = { 0, 0, 1, 0, 0, 0, 2, 0, 3, 4, 5, 6, 7, 8, 9, 10, 3, 11, 12, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 0, 3, 20, 3, 21, 3, 3, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 24, 0, 0, 0, 0, 0, 0, 18, 18, 25, 3, 3, 3, 3, 26, 3, 3, 3, 27, 28, 29, 30, 0, 31, 32, 33, 34, 35, 36, 19, 37, 0, 0, 0, 0, 0, 0, 0, 0, 38, 19, 0, 18, 39, 0, 40, 3, 3, 3, 41, 0, 0, 3, 42, 43, 0, 0, 0, 0, 44, 3, 45, 46, 47, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 18, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 49, 0, 0, 0, 0, 0, 0, 0, 18, 0, 0, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 50, 51, 52, 53, 63, 25, 56, 57, 53, 64, 65, 66, 67, 38, 39, 56, 68, 69, 0, 0, 56, 70, 70, 57, 0, 0, 0, }; static RE_UINT8 re_uppercase_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 255, 255, 127, 127, 85, 85, 85, 85, 85, 85, 85, 170, 170, 84, 85, 85, 85, 85, 85, 43, 214, 206, 219, 177, 213, 210, 174, 17, 144, 164, 170, 74, 85, 85, 210, 85, 85, 85, 5, 108, 122, 85, 0, 0, 0, 0, 69, 128, 64, 215, 254, 255, 251, 15, 0, 0, 0, 128, 28, 85, 85, 85, 144, 230, 255, 255, 255, 255, 255, 255, 0, 0, 1, 84, 85, 85, 171, 42, 85, 85, 85, 85, 254, 255, 255, 255, 127, 0, 191, 32, 0, 0, 255, 255, 63, 0, 85, 85, 21, 64, 0, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 170, 0, 255, 0, 0, 0, 0, 0, 15, 0, 15, 0, 15, 0, 31, 0, 15, 132, 56, 39, 62, 80, 61, 15, 192, 32, 0, 0, 0, 8, 0, 0, 0, 0, 0, 192, 255, 255, 127, 0, 0, 157, 234, 37, 192, 5, 40, 4, 0, 85, 21, 0, 0, 85, 85, 85, 5, 84, 85, 84, 85, 85, 85, 0, 106, 85, 40, 69, 85, 85, 61, 95, 0, 255, 0, 0, 0, 255, 255, 7, 0, 255, 255, 255, 3, 0, 0, 240, 255, 255, 63, 0, 0, 0, 255, 255, 255, 3, 0, 0, 208, 100, 222, 63, 0, 0, 0, 255, 255, 255, 3, 0, 0, 176, 231, 223, 31, 0, 0, 0, 123, 95, 252, 1, 0, 0, 240, 255, 255, 63, 0, 0, 0, 3, 0, 0, 240, 1, 0, 0, 0, 252, 255, 255, 7, 0, 0, 0, 240, 255, 255, 31, 0, 255, 1, 0, 0, 0, 4, 0, 0, 255, 3, 255, 255, }; /* Uppercase: 701 bytes. */ RE_UINT32 re_get_uppercase(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_uppercase_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_uppercase_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_uppercase_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_uppercase_stage_4[pos + f] << 5; pos += code; value = (re_uppercase_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Cased. */ static RE_UINT8 re_cased_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_cased_stage_2[] = { 0, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 9, 10, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 12, 1, 1, 1, 13, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_cased_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 11, 12, 13, 6, 6, 14, 6, 6, 6, 6, 6, 6, 6, 15, 16, 6, 6, 6, 6, 6, 6, 6, 6, 17, 18, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 20, 6, 6, 6, 21, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, 23, 6, 6, 6, 24, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 25, 26, 27, 28, 6, 29, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_cased_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 4, 4, 4, 4, 5, 6, 4, 4, 4, 4, 4, 7, 8, 9, 10, 0, 0, 11, 12, 13, 14, 4, 15, 4, 4, 4, 4, 16, 4, 4, 4, 4, 17, 18, 19, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 21, 0, 0, 0, 0, 0, 0, 4, 4, 22, 4, 4, 4, 4, 4, 4, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 22, 4, 23, 24, 4, 25, 26, 27, 0, 0, 0, 28, 29, 0, 0, 0, 30, 31, 32, 4, 33, 0, 0, 0, 0, 0, 0, 0, 0, 34, 4, 35, 4, 36, 37, 4, 4, 4, 4, 38, 4, 21, 0, 0, 0, 0, 0, 0, 0, 0, 4, 39, 24, 0, 0, 0, 0, 40, 4, 4, 41, 42, 0, 43, 0, 44, 5, 45, 4, 4, 0, 0, 46, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 4, 4, 47, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 48, 4, 48, 0, 0, 0, 0, 0, 4, 4, 0, 4, 4, 49, 4, 50, 51, 52, 4, 53, 54, 55, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 56, 57, 5, 49, 49, 36, 36, 58, 58, 59, 0, 0, 44, 60, 60, 35, 0, 0, 0, }; static RE_UINT8 re_cased_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 255, 255, 255, 247, 240, 255, 255, 255, 255, 255, 239, 255, 255, 255, 255, 1, 3, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 0, 0, 207, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 0, 254, 255, 255, 255, 255, 0, 0, 0, 191, 32, 0, 0, 255, 255, 63, 63, 63, 63, 255, 170, 255, 255, 255, 63, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 62, 80, 189, 31, 242, 224, 67, 0, 0, 24, 0, 0, 0, 0, 0, 192, 255, 255, 3, 0, 0, 255, 127, 255, 255, 255, 255, 255, 127, 31, 120, 12, 0, 255, 63, 0, 0, 252, 255, 255, 255, 255, 120, 255, 255, 255, 63, 255, 0, 0, 0, 0, 7, 0, 0, 255, 255, 63, 0, 255, 255, 127, 0, 248, 0, 255, 255, 0, 0, 255, 255, 7, 0, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 15, 0, 0, 255, 3, 255, 255, }; /* Cased: 709 bytes. */ RE_UINT32 re_get_cased(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_cased_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_cased_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_cased_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_cased_stage_4[pos + f] << 5; pos += code; value = (re_cased_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Case_Ignorable. */ static RE_UINT8 re_case_ignorable_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, }; static RE_UINT8 re_case_ignorable_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 7, 7, 7, 7, 7, 7, 7, 7, 7, 10, 11, 12, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 7, 7, 7, 7, 7, 7, 7, 7, 7, 15, 7, 7, 16, 17, 7, 18, 19, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 20, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, }; static RE_UINT8 re_case_ignorable_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 1, 17, 1, 1, 1, 18, 19, 20, 21, 22, 23, 24, 1, 25, 26, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 28, 29, 1, 30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 31, 1, 1, 1, 32, 1, 33, 34, 35, 36, 37, 38, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 40, 41, 1, 42, 43, 44, 1, 1, 1, 1, 1, 1, 45, 1, 1, 1, 1, 1, 46, 47, 48, 49, 50, 51, 52, 53, 1, 1, 54, 55, 1, 1, 1, 56, 1, 1, 1, 1, 57, 1, 1, 1, 1, 58, 59, 1, 1, 1, 1, 1, 1, 1, 60, 1, 1, 1, 1, 1, 61, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 62, 1, 1, 1, 1, 63, 64, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_case_ignorable_stage_4[] = { 0, 1, 2, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 6, 6, 6, 6, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 10, 0, 11, 12, 13, 14, 15, 0, 16, 17, 0, 0, 18, 19, 20, 5, 21, 0, 0, 22, 0, 23, 24, 25, 26, 0, 0, 0, 0, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 33, 37, 38, 36, 33, 39, 35, 32, 40, 41, 35, 42, 0, 43, 0, 3, 44, 45, 35, 32, 40, 46, 35, 32, 0, 34, 35, 0, 0, 47, 0, 0, 48, 49, 0, 0, 50, 51, 0, 52, 53, 0, 54, 55, 56, 57, 0, 0, 58, 59, 60, 61, 0, 0, 33, 0, 0, 62, 0, 0, 0, 0, 0, 63, 63, 64, 64, 0, 65, 66, 0, 67, 0, 68, 0, 0, 69, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 71, 0, 72, 73, 0, 74, 0, 0, 75, 76, 42, 77, 78, 79, 0, 80, 0, 81, 0, 82, 0, 0, 83, 84, 0, 85, 6, 86, 87, 6, 6, 88, 0, 0, 0, 0, 0, 89, 90, 91, 92, 93, 0, 94, 95, 0, 5, 96, 0, 0, 0, 97, 0, 0, 0, 98, 0, 0, 0, 99, 0, 0, 0, 6, 0, 100, 0, 0, 0, 0, 0, 0, 101, 102, 0, 0, 103, 0, 0, 104, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 82, 106, 0, 0, 107, 108, 0, 0, 109, 6, 78, 0, 17, 110, 0, 0, 52, 111, 112, 0, 0, 0, 0, 113, 114, 0, 115, 116, 0, 28, 117, 100, 112, 0, 118, 119, 120, 0, 121, 122, 123, 0, 0, 87, 0, 0, 0, 0, 124, 2, 0, 0, 0, 0, 125, 78, 0, 126, 127, 128, 0, 0, 0, 0, 129, 1, 2, 3, 17, 44, 0, 0, 130, 0, 0, 0, 0, 0, 0, 0, 131, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 132, 0, 0, 0, 0, 133, 134, 0, 0, 0, 0, 0, 112, 32, 135, 136, 129, 78, 137, 0, 0, 28, 138, 0, 139, 78, 140, 141, 0, 0, 142, 0, 0, 0, 0, 129, 143, 78, 33, 3, 144, 0, 0, 0, 0, 0, 0, 0, 0, 0, 145, 146, 0, 0, 0, 0, 0, 0, 147, 148, 0, 0, 149, 3, 0, 0, 150, 0, 0, 62, 151, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152, 0, 153, 75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 154, 0, 0, 0, 0, 0, 0, 0, 155, 75, 0, 0, 0, 0, 0, 156, 157, 158, 0, 0, 0, 0, 159, 0, 0, 0, 0, 0, 6, 160, 6, 161, 162, 163, 0, 0, 0, 0, 0, 0, 0, 0, 153, 0, 0, 0, 0, 0, 0, 0, 0, 87, 32, 6, 6, 6, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 127, }; static RE_UINT8 re_case_ignorable_stage_5[] = { 0, 0, 0, 0, 128, 64, 0, 4, 0, 0, 0, 64, 1, 0, 0, 0, 0, 161, 144, 1, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 48, 4, 176, 0, 0, 0, 248, 3, 0, 0, 0, 0, 0, 2, 0, 0, 254, 255, 255, 255, 255, 191, 182, 0, 0, 0, 0, 0, 16, 0, 63, 0, 255, 23, 1, 248, 255, 255, 0, 0, 1, 0, 0, 0, 192, 191, 255, 61, 0, 0, 0, 128, 2, 0, 255, 7, 0, 0, 192, 255, 1, 0, 0, 248, 63, 4, 0, 0, 192, 255, 255, 63, 0, 0, 0, 0, 0, 14, 248, 255, 255, 255, 7, 0, 0, 0, 0, 0, 0, 20, 254, 33, 254, 0, 12, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 16, 30, 32, 0, 0, 12, 0, 0, 0, 6, 0, 0, 0, 134, 57, 2, 0, 0, 0, 35, 0, 190, 33, 0, 0, 0, 0, 0, 144, 30, 32, 64, 0, 4, 0, 0, 0, 1, 32, 0, 0, 0, 0, 0, 192, 193, 61, 96, 0, 64, 48, 0, 0, 0, 4, 92, 0, 0, 0, 242, 7, 192, 127, 0, 0, 0, 0, 242, 27, 64, 63, 0, 0, 0, 0, 0, 3, 0, 0, 160, 2, 0, 0, 254, 127, 223, 224, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 0, 224, 253, 102, 0, 0, 0, 195, 1, 0, 30, 0, 100, 32, 0, 32, 0, 0, 0, 224, 0, 0, 28, 0, 0, 0, 12, 0, 0, 0, 176, 63, 64, 254, 143, 32, 0, 120, 0, 0, 8, 0, 0, 0, 0, 2, 0, 0, 135, 1, 4, 14, 0, 0, 128, 9, 0, 0, 64, 127, 229, 31, 248, 159, 128, 0, 255, 127, 15, 0, 0, 0, 0, 0, 208, 23, 0, 248, 15, 0, 3, 0, 0, 0, 60, 59, 0, 0, 64, 163, 3, 0, 0, 240, 207, 0, 0, 0, 0, 63, 0, 0, 247, 255, 253, 33, 16, 3, 0, 240, 255, 255, 255, 7, 0, 1, 0, 0, 0, 248, 255, 255, 63, 240, 0, 0, 0, 160, 3, 224, 0, 224, 0, 224, 0, 96, 0, 248, 0, 3, 144, 124, 0, 0, 223, 255, 2, 128, 0, 0, 255, 31, 255, 255, 1, 0, 0, 0, 0, 48, 0, 128, 3, 0, 0, 128, 0, 128, 0, 128, 0, 0, 32, 0, 0, 0, 0, 60, 62, 8, 0, 0, 0, 126, 0, 0, 0, 112, 0, 0, 32, 0, 0, 16, 0, 0, 0, 128, 247, 191, 0, 0, 0, 240, 0, 0, 3, 0, 0, 7, 0, 0, 68, 8, 0, 0, 96, 0, 0, 0, 16, 0, 0, 0, 255, 255, 3, 0, 192, 63, 0, 0, 128, 255, 3, 0, 0, 0, 200, 19, 0, 126, 102, 0, 8, 16, 0, 0, 0, 0, 1, 16, 0, 0, 157, 193, 2, 0, 0, 32, 0, 48, 88, 0, 32, 33, 0, 0, 0, 0, 252, 255, 255, 255, 8, 0, 255, 255, 0, 0, 0, 0, 36, 0, 0, 0, 0, 128, 8, 0, 0, 14, 0, 0, 0, 32, 0, 0, 192, 7, 110, 240, 0, 0, 0, 0, 0, 135, 0, 0, 0, 255, 127, 0, 0, 0, 0, 0, 120, 38, 128, 239, 31, 0, 0, 0, 8, 0, 0, 0, 192, 127, 0, 28, 0, 0, 0, 128, 211, 0, 248, 7, 0, 0, 192, 31, 31, 0, 0, 0, 248, 133, 13, 0, 0, 0, 0, 0, 60, 176, 1, 0, 0, 48, 0, 0, 248, 167, 0, 40, 191, 0, 188, 15, 0, 0, 0, 0, 31, 0, 0, 0, 127, 0, 0, 128, 255, 255, 0, 0, 0, 96, 128, 3, 248, 255, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, }; /* Case_Ignorable: 1474 bytes. */ RE_UINT32 re_get_case_ignorable(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_case_ignorable_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_case_ignorable_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_case_ignorable_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_case_ignorable_stage_4[pos + f] << 5; pos += code; value = (re_case_ignorable_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Lowercased. */ static RE_UINT8 re_changes_when_lowercased_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_lowercased_stage_2[] = { 0, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 8, 9, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_changes_when_lowercased_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 16, 6, 6, 6, 6, 17, 6, 6, 6, 6, 6, 6, 6, 18, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_lowercased_stage_4[] = { 0, 0, 1, 0, 0, 0, 2, 0, 3, 4, 5, 6, 7, 8, 9, 10, 3, 11, 12, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 0, 3, 20, 3, 21, 3, 3, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 24, 0, 0, 0, 0, 0, 0, 18, 18, 25, 3, 3, 3, 3, 26, 3, 3, 3, 27, 28, 29, 30, 28, 31, 32, 33, 0, 34, 0, 19, 35, 0, 0, 0, 0, 0, 0, 0, 0, 36, 19, 0, 18, 37, 0, 38, 3, 3, 3, 39, 0, 0, 3, 40, 41, 0, 0, 0, 0, 42, 3, 43, 44, 45, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 18, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 47, 0, 0, 0, 0, 0, 0, 0, 18, 0, 0, }; static RE_UINT8 re_changes_when_lowercased_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 255, 255, 127, 127, 85, 85, 85, 85, 85, 85, 85, 170, 170, 84, 85, 85, 85, 85, 85, 43, 214, 206, 219, 177, 213, 210, 174, 17, 176, 173, 170, 74, 85, 85, 214, 85, 85, 85, 5, 108, 122, 85, 0, 0, 0, 0, 69, 128, 64, 215, 254, 255, 251, 15, 0, 0, 0, 128, 0, 85, 85, 85, 144, 230, 255, 255, 255, 255, 255, 255, 0, 0, 1, 84, 85, 85, 171, 42, 85, 85, 85, 85, 254, 255, 255, 255, 127, 0, 191, 32, 0, 0, 255, 255, 63, 0, 85, 85, 21, 64, 0, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 170, 0, 255, 0, 0, 0, 255, 0, 31, 0, 31, 0, 15, 0, 31, 0, 31, 64, 12, 4, 0, 8, 0, 0, 0, 0, 0, 192, 255, 255, 127, 0, 0, 157, 234, 37, 192, 5, 40, 4, 0, 85, 21, 0, 0, 85, 85, 85, 5, 84, 85, 84, 85, 85, 85, 0, 106, 85, 40, 69, 85, 85, 61, 95, 0, 255, 0, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Lowercased: 538 bytes. */ RE_UINT32 re_get_changes_when_lowercased(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_changes_when_lowercased_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_changes_when_lowercased_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_lowercased_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_lowercased_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_lowercased_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Uppercased. */ static RE_UINT8 re_changes_when_uppercased_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_changes_when_uppercased_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_uppercased_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 8, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 14, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 15, 16, 6, 6, 6, 17, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 18, 6, 6, 6, 19, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_uppercased_stage_4[] = { 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 5, 13, 14, 15, 16, 0, 0, 0, 0, 0, 17, 18, 19, 20, 21, 22, 0, 23, 24, 5, 25, 5, 26, 5, 5, 27, 0, 28, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 31, 0, 0, 0, 0, 5, 5, 5, 5, 32, 5, 5, 5, 33, 34, 35, 36, 24, 37, 38, 39, 0, 0, 40, 23, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 42, 0, 23, 43, 44, 5, 5, 5, 45, 24, 46, 0, 0, 0, 0, 0, 0, 0, 0, 5, 47, 48, 0, 0, 0, 0, 49, 5, 50, 51, 52, 0, 0, 0, 0, 53, 23, 24, 24, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 55, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 57, 0, 0, 0, 0, 0, 0, 24, 0, }; static RE_UINT8 re_changes_when_uppercased_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 32, 0, 0, 0, 0, 128, 255, 255, 127, 255, 170, 170, 170, 170, 170, 170, 170, 84, 85, 171, 170, 170, 170, 170, 170, 212, 41, 17, 36, 70, 42, 33, 81, 162, 96, 91, 85, 181, 170, 170, 45, 170, 168, 170, 10, 144, 133, 170, 223, 26, 107, 155, 38, 32, 137, 31, 4, 96, 32, 0, 0, 0, 0, 0, 138, 56, 0, 0, 1, 0, 0, 240, 255, 255, 255, 127, 227, 170, 170, 170, 47, 9, 0, 0, 255, 255, 255, 255, 255, 255, 2, 168, 170, 170, 84, 213, 170, 170, 170, 170, 0, 0, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 34, 170, 170, 234, 15, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 255, 0, 255, 0, 255, 63, 255, 255, 223, 80, 220, 16, 207, 0, 255, 0, 220, 16, 0, 64, 0, 0, 16, 0, 0, 0, 255, 3, 0, 0, 255, 255, 255, 127, 98, 21, 72, 0, 10, 80, 8, 0, 191, 32, 0, 0, 170, 42, 0, 0, 170, 170, 170, 10, 168, 170, 168, 170, 170, 170, 0, 148, 170, 16, 138, 170, 170, 2, 160, 0, 0, 0, 8, 0, 127, 0, 248, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Uppercased: 609 bytes. */ RE_UINT32 re_get_changes_when_uppercased(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_changes_when_uppercased_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_changes_when_uppercased_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_uppercased_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_uppercased_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_uppercased_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Titlecased. */ static RE_UINT8 re_changes_when_titlecased_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_changes_when_titlecased_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_titlecased_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 8, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 14, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 15, 16, 6, 6, 6, 17, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 18, 6, 6, 6, 19, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_titlecased_stage_4[] = { 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 5, 13, 14, 15, 16, 0, 0, 0, 0, 0, 17, 18, 19, 20, 21, 22, 0, 23, 24, 5, 25, 5, 26, 5, 5, 27, 0, 28, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 31, 0, 0, 0, 0, 5, 5, 5, 5, 32, 5, 5, 5, 33, 34, 35, 36, 34, 37, 38, 39, 0, 0, 40, 23, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 42, 0, 23, 43, 44, 5, 5, 5, 45, 24, 46, 0, 0, 0, 0, 0, 0, 0, 0, 5, 47, 48, 0, 0, 0, 0, 49, 5, 50, 51, 52, 0, 0, 0, 0, 53, 23, 24, 24, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 55, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 57, 0, 0, 0, 0, 0, 0, 24, 0, }; static RE_UINT8 re_changes_when_titlecased_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 32, 0, 0, 0, 0, 128, 255, 255, 127, 255, 170, 170, 170, 170, 170, 170, 170, 84, 85, 171, 170, 170, 170, 170, 170, 212, 41, 17, 36, 70, 42, 33, 81, 162, 208, 86, 85, 181, 170, 170, 43, 170, 168, 170, 10, 144, 133, 170, 223, 26, 107, 155, 38, 32, 137, 31, 4, 96, 32, 0, 0, 0, 0, 0, 138, 56, 0, 0, 1, 0, 0, 240, 255, 255, 255, 127, 227, 170, 170, 170, 47, 9, 0, 0, 255, 255, 255, 255, 255, 255, 2, 168, 170, 170, 84, 213, 170, 170, 170, 170, 0, 0, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 34, 170, 170, 234, 15, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 255, 0, 255, 0, 255, 63, 255, 0, 223, 64, 220, 0, 207, 0, 255, 0, 220, 0, 0, 64, 0, 0, 16, 0, 0, 0, 255, 3, 0, 0, 255, 255, 255, 127, 98, 21, 72, 0, 10, 80, 8, 0, 191, 32, 0, 0, 170, 42, 0, 0, 170, 170, 170, 10, 168, 170, 168, 170, 170, 170, 0, 148, 170, 16, 138, 170, 170, 2, 160, 0, 0, 0, 8, 0, 127, 0, 248, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Titlecased: 609 bytes. */ RE_UINT32 re_get_changes_when_titlecased(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_changes_when_titlecased_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_changes_when_titlecased_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_titlecased_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_titlecased_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_titlecased_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Casefolded. */ static RE_UINT8 re_changes_when_casefolded_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_changes_when_casefolded_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_casefolded_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 17, 6, 6, 6, 18, 6, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_casefolded_stage_4[] = { 0, 0, 1, 0, 0, 2, 3, 0, 4, 5, 6, 7, 8, 9, 10, 11, 4, 12, 13, 0, 0, 0, 0, 0, 0, 0, 14, 15, 16, 17, 18, 19, 20, 21, 0, 4, 22, 4, 23, 4, 4, 24, 25, 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 27, 0, 0, 0, 0, 0, 0, 0, 0, 28, 4, 4, 4, 4, 29, 4, 4, 4, 30, 31, 32, 33, 20, 34, 35, 36, 0, 37, 0, 21, 38, 0, 0, 0, 0, 0, 0, 0, 0, 39, 21, 0, 20, 40, 0, 41, 4, 4, 4, 42, 0, 0, 4, 43, 44, 0, 0, 0, 0, 45, 4, 46, 47, 48, 0, 0, 0, 0, 0, 49, 20, 20, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 20, 51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 52, 0, 0, 0, 0, 0, 0, 0, 20, 0, 0, }; static RE_UINT8 re_changes_when_casefolded_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 32, 0, 255, 255, 127, 255, 85, 85, 85, 85, 85, 85, 85, 170, 170, 86, 85, 85, 85, 85, 85, 171, 214, 206, 219, 177, 213, 210, 174, 17, 176, 173, 170, 74, 85, 85, 214, 85, 85, 85, 5, 108, 122, 85, 0, 0, 32, 0, 0, 0, 0, 0, 69, 128, 64, 215, 254, 255, 251, 15, 0, 0, 4, 128, 99, 85, 85, 85, 179, 230, 255, 255, 255, 255, 255, 255, 0, 0, 1, 84, 85, 85, 171, 42, 85, 85, 85, 85, 254, 255, 255, 255, 127, 0, 128, 0, 0, 0, 191, 32, 0, 0, 0, 0, 0, 63, 85, 85, 21, 76, 0, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 170, 0, 255, 0, 0, 255, 255, 156, 31, 156, 31, 0, 15, 0, 31, 156, 31, 64, 12, 4, 0, 8, 0, 0, 0, 0, 0, 192, 255, 255, 127, 0, 0, 157, 234, 37, 192, 5, 40, 4, 0, 85, 21, 0, 0, 85, 85, 85, 5, 84, 85, 84, 85, 85, 85, 0, 106, 85, 40, 69, 85, 85, 61, 95, 0, 0, 0, 255, 255, 127, 0, 248, 0, 255, 0, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Casefolded: 581 bytes. */ RE_UINT32 re_get_changes_when_casefolded(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_changes_when_casefolded_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_changes_when_casefolded_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_casefolded_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_casefolded_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_casefolded_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Casemapped. */ static RE_UINT8 re_changes_when_casemapped_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_changes_when_casemapped_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_casemapped_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 16, 17, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 19, 6, 6, 6, 20, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 23, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_casemapped_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 5, 4, 4, 6, 7, 8, 4, 4, 9, 10, 11, 12, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 4, 4, 4, 4, 19, 4, 4, 4, 4, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 24, 0, 0, 0, 0, 0, 0, 4, 4, 25, 0, 0, 0, 26, 0, 0, 0, 0, 4, 4, 4, 4, 27, 4, 4, 4, 25, 4, 28, 29, 4, 30, 31, 32, 0, 33, 34, 4, 35, 0, 0, 0, 0, 0, 0, 0, 0, 36, 4, 37, 4, 38, 39, 40, 4, 4, 4, 41, 4, 24, 0, 0, 0, 0, 0, 0, 0, 0, 4, 42, 43, 0, 0, 0, 0, 44, 4, 45, 46, 47, 0, 0, 0, 0, 48, 49, 4, 4, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 4, 4, 51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 52, 4, 52, 0, 0, 0, 0, 0, 4, 4, 0, }; static RE_UINT8 re_changes_when_casemapped_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 32, 0, 255, 255, 127, 255, 255, 255, 255, 255, 255, 255, 255, 254, 255, 223, 255, 247, 255, 243, 255, 179, 240, 255, 255, 255, 253, 255, 15, 252, 255, 255, 223, 26, 107, 155, 38, 32, 137, 31, 4, 96, 32, 0, 0, 0, 0, 0, 207, 184, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 227, 255, 255, 255, 191, 239, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 0, 254, 255, 255, 255, 255, 0, 0, 0, 191, 32, 0, 0, 255, 255, 63, 63, 0, 0, 0, 34, 255, 255, 255, 79, 63, 63, 255, 170, 255, 255, 255, 63, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 64, 12, 4, 0, 0, 64, 0, 0, 24, 0, 0, 0, 0, 0, 192, 255, 255, 3, 0, 0, 255, 127, 255, 255, 255, 255, 255, 127, 255, 255, 109, 192, 15, 120, 12, 0, 255, 63, 0, 0, 255, 255, 255, 15, 252, 255, 252, 255, 255, 255, 0, 254, 255, 56, 207, 255, 255, 63, 255, 0, 0, 0, 8, 0, 0, 0, 255, 255, 127, 0, 248, 0, 255, 255, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Casemapped: 597 bytes. */ RE_UINT32 re_get_changes_when_casemapped(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_changes_when_casemapped_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_changes_when_casemapped_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_casemapped_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_casemapped_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_casemapped_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* ID_Start. */ static RE_UINT8 re_id_start_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_id_start_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 27, 7, 28, 29, 7, 30, 13, 13, 13, 13, 13, 31, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_id_start_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 31, 31, 34, 35, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 38, 1, 39, 40, 41, 42, 43, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 46, 47, 1, 48, 49, 50, 51, 52, 53, 54, 55, 56, 1, 57, 58, 59, 60, 61, 62, 31, 31, 31, 63, 64, 65, 66, 67, 68, 69, 70, 71, 31, 72, 31, 31, 31, 31, 31, 1, 1, 1, 73, 74, 75, 31, 31, 1, 1, 1, 1, 76, 31, 31, 31, 31, 31, 31, 31, 1, 1, 77, 31, 1, 1, 78, 79, 31, 31, 31, 80, 81, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 82, 31, 31, 31, 31, 31, 31, 31, 83, 84, 85, 86, 87, 31, 31, 31, 31, 31, 88, 31, 1, 1, 1, 1, 1, 1, 89, 1, 1, 1, 1, 1, 1, 1, 1, 90, 91, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 92, 31, 1, 1, 93, 31, 31, 31, 31, 31, }; static RE_UINT8 re_id_start_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 0, 0, 0, 7, 8, 9, 4, 10, 4, 4, 4, 4, 11, 4, 4, 4, 4, 12, 13, 14, 15, 0, 16, 17, 0, 4, 18, 19, 4, 4, 20, 21, 22, 23, 24, 4, 4, 25, 26, 27, 28, 29, 30, 0, 0, 31, 0, 0, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 45, 49, 50, 51, 52, 46, 0, 53, 54, 55, 56, 53, 57, 58, 59, 53, 60, 61, 62, 63, 64, 65, 0, 14, 66, 65, 0, 67, 68, 69, 0, 70, 0, 71, 72, 73, 0, 0, 0, 4, 74, 75, 76, 77, 4, 78, 79, 4, 4, 80, 4, 81, 82, 83, 4, 84, 4, 85, 0, 23, 4, 4, 86, 14, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 87, 1, 4, 4, 88, 89, 90, 90, 91, 4, 92, 93, 0, 0, 4, 4, 94, 4, 95, 4, 96, 97, 0, 16, 98, 4, 99, 100, 0, 101, 4, 31, 0, 0, 102, 0, 0, 103, 92, 104, 0, 105, 106, 4, 107, 4, 108, 109, 110, 0, 0, 0, 111, 4, 4, 4, 4, 4, 4, 0, 0, 86, 4, 112, 110, 4, 113, 114, 115, 0, 0, 0, 116, 117, 0, 0, 0, 118, 119, 120, 4, 121, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 122, 97, 4, 4, 4, 4, 123, 4, 78, 4, 124, 101, 125, 125, 0, 126, 127, 14, 4, 128, 14, 4, 79, 103, 129, 4, 4, 130, 85, 0, 16, 4, 4, 4, 4, 4, 96, 0, 0, 4, 4, 4, 4, 4, 4, 96, 0, 4, 4, 4, 4, 72, 0, 16, 110, 131, 132, 4, 133, 110, 4, 4, 23, 134, 135, 4, 4, 136, 137, 0, 134, 138, 139, 4, 92, 135, 92, 0, 140, 26, 141, 65, 142, 32, 143, 144, 145, 4, 121, 146, 147, 4, 148, 149, 150, 151, 152, 79, 141, 4, 4, 4, 139, 4, 4, 4, 4, 4, 153, 154, 155, 4, 4, 4, 156, 4, 4, 157, 0, 158, 159, 160, 4, 4, 90, 161, 4, 4, 110, 16, 4, 162, 4, 15, 163, 0, 0, 0, 164, 4, 4, 4, 142, 0, 1, 1, 165, 4, 97, 166, 0, 167, 168, 169, 0, 4, 4, 4, 85, 0, 0, 4, 31, 0, 0, 0, 0, 0, 0, 0, 0, 142, 4, 170, 0, 4, 16, 171, 96, 110, 4, 172, 0, 4, 4, 4, 4, 110, 0, 0, 0, 4, 173, 4, 108, 0, 0, 0, 0, 4, 101, 96, 15, 0, 0, 0, 0, 174, 175, 96, 101, 97, 0, 0, 176, 96, 157, 0, 0, 4, 177, 0, 0, 178, 92, 0, 142, 142, 0, 71, 179, 4, 96, 96, 143, 90, 0, 0, 0, 4, 4, 121, 0, 4, 143, 4, 143, 105, 94, 0, 0, 105, 23, 16, 121, 105, 65, 16, 180, 105, 143, 181, 0, 182, 183, 0, 0, 184, 185, 97, 0, 48, 45, 186, 56, 0, 0, 0, 0, 0, 0, 0, 0, 4, 23, 187, 0, 0, 0, 0, 0, 4, 130, 188, 0, 4, 23, 189, 0, 4, 18, 0, 0, 157, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 190, 0, 0, 0, 0, 0, 0, 4, 30, 4, 4, 4, 4, 157, 0, 0, 0, 4, 4, 4, 130, 4, 4, 4, 4, 4, 4, 108, 0, 0, 0, 0, 0, 4, 130, 0, 0, 0, 0, 0, 0, 4, 4, 65, 0, 0, 0, 0, 0, 4, 30, 97, 0, 0, 0, 16, 191, 4, 23, 108, 192, 23, 0, 0, 0, 4, 4, 193, 0, 161, 0, 0, 0, 56, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 194, 195, 0, 0, 0, 4, 4, 196, 4, 197, 198, 199, 4, 200, 201, 202, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 203, 204, 79, 196, 196, 122, 122, 205, 205, 146, 0, 4, 4, 4, 4, 4, 4, 179, 0, 199, 206, 207, 208, 209, 210, 0, 0, 4, 4, 4, 4, 4, 4, 101, 0, 4, 31, 4, 4, 4, 4, 4, 4, 110, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 56, 0, 0, 110, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_id_start_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 0, 0, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 255, 255, 7, 7, 0, 255, 7, 0, 0, 0, 192, 254, 255, 255, 255, 47, 0, 96, 192, 0, 156, 0, 0, 253, 255, 255, 255, 0, 0, 0, 224, 255, 255, 63, 0, 2, 0, 0, 252, 255, 255, 255, 7, 48, 4, 255, 255, 63, 4, 16, 1, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 240, 255, 255, 255, 255, 255, 255, 35, 0, 0, 1, 255, 3, 0, 254, 255, 225, 159, 249, 255, 255, 253, 197, 35, 0, 64, 0, 176, 3, 0, 3, 0, 224, 135, 249, 255, 255, 253, 109, 3, 0, 0, 0, 94, 0, 0, 28, 0, 224, 191, 251, 255, 255, 253, 237, 35, 0, 0, 1, 0, 3, 0, 0, 2, 224, 159, 249, 255, 0, 0, 0, 176, 3, 0, 2, 0, 232, 199, 61, 214, 24, 199, 255, 3, 224, 223, 253, 255, 255, 253, 255, 35, 0, 0, 0, 7, 3, 0, 0, 0, 255, 253, 239, 35, 0, 0, 0, 64, 3, 0, 6, 0, 255, 255, 255, 39, 0, 64, 0, 128, 3, 0, 0, 252, 224, 255, 127, 252, 255, 255, 251, 47, 127, 0, 0, 0, 255, 255, 13, 0, 150, 37, 240, 254, 174, 236, 13, 32, 95, 0, 0, 240, 1, 0, 0, 0, 255, 254, 255, 255, 255, 31, 0, 0, 0, 31, 0, 0, 255, 7, 0, 128, 0, 0, 63, 60, 98, 192, 225, 255, 3, 64, 0, 0, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 7, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 3, 0, 255, 255, 3, 0, 255, 223, 1, 0, 255, 255, 15, 0, 0, 0, 128, 16, 255, 255, 255, 0, 255, 5, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 0, 0, 255, 255, 127, 0, 128, 0, 0, 0, 224, 255, 255, 255, 224, 15, 0, 0, 248, 255, 255, 255, 1, 192, 0, 252, 63, 0, 0, 0, 15, 0, 0, 0, 0, 224, 0, 252, 255, 255, 255, 63, 0, 222, 99, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 63, 80, 253, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 127, 127, 127, 127, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 248, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 0, 12, 0, 0, 255, 127, 0, 128, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 7, 0, 0, 0, 0, 0, 252, 40, 63, 0, 255, 255, 255, 255, 255, 31, 255, 255, 7, 0, 0, 128, 0, 0, 223, 255, 0, 124, 247, 15, 0, 0, 255, 255, 127, 196, 255, 255, 98, 62, 5, 0, 0, 56, 255, 7, 28, 0, 126, 126, 126, 0, 127, 127, 255, 255, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 15, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 160, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 0, 0, 223, 255, 192, 255, 255, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 255, 7, 255, 255, 15, 255, 62, 0, 255, 0, 255, 255, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 1, 0, 239, 254, 31, 0, 0, 0, 255, 255, 71, 0, 30, 0, 0, 20, 255, 255, 251, 255, 255, 15, 0, 0, 127, 189, 255, 191, 255, 1, 255, 255, 0, 0, 1, 224, 176, 0, 0, 0, 0, 0, 0, 15, 16, 0, 0, 0, 0, 0, 0, 128, 255, 63, 0, 0, 248, 255, 255, 224, 31, 0, 1, 0, 255, 7, 255, 31, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* ID_Start: 1997 bytes. */ RE_UINT32 re_get_id_start(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_id_start_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_id_start_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_id_start_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_id_start_stage_4[pos + f] << 5; pos += code; value = (re_id_start_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* ID_Continue. */ static RE_UINT8 re_id_continue_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, }; static RE_UINT8 re_id_continue_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 33, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_id_continue_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 31, 31, 34, 35, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 38, 1, 39, 40, 41, 42, 43, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 46, 47, 1, 48, 49, 50, 51, 52, 53, 54, 55, 56, 1, 57, 58, 59, 60, 61, 62, 31, 31, 31, 63, 64, 65, 66, 67, 68, 69, 70, 71, 31, 72, 31, 31, 31, 31, 31, 1, 1, 1, 73, 74, 75, 31, 31, 1, 1, 1, 1, 76, 31, 31, 31, 31, 31, 31, 31, 1, 1, 77, 31, 1, 1, 78, 79, 31, 31, 31, 80, 81, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 82, 31, 31, 31, 31, 83, 84, 31, 85, 86, 87, 88, 31, 31, 89, 31, 31, 31, 31, 31, 90, 31, 31, 31, 31, 31, 91, 31, 1, 1, 1, 1, 1, 1, 92, 1, 1, 1, 1, 1, 1, 1, 1, 93, 94, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 95, 31, 1, 1, 96, 31, 31, 31, 31, 31, 31, 97, 31, 31, 31, 31, 31, 31, }; static RE_UINT8 re_id_continue_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 6, 6, 13, 6, 6, 6, 6, 14, 15, 16, 17, 18, 19, 20, 21, 6, 6, 22, 6, 6, 23, 24, 25, 6, 26, 6, 6, 27, 6, 28, 6, 29, 30, 0, 0, 31, 0, 32, 6, 6, 6, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 43, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 57, 61, 62, 63, 64, 65, 66, 67, 16, 68, 69, 0, 70, 71, 72, 0, 73, 74, 75, 76, 77, 78, 79, 0, 6, 6, 80, 6, 81, 6, 82, 83, 6, 6, 84, 6, 85, 86, 87, 6, 88, 6, 61, 89, 90, 6, 6, 91, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 92, 3, 6, 6, 93, 94, 31, 95, 96, 6, 6, 97, 98, 99, 6, 6, 100, 6, 101, 6, 102, 103, 104, 105, 106, 6, 107, 108, 0, 30, 6, 103, 109, 110, 111, 0, 0, 6, 6, 112, 113, 6, 6, 6, 95, 6, 100, 114, 81, 0, 0, 115, 116, 6, 6, 6, 6, 6, 6, 6, 117, 91, 6, 118, 81, 6, 119, 120, 121, 0, 122, 123, 124, 125, 0, 125, 126, 127, 128, 129, 6, 130, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 131, 103, 6, 6, 6, 6, 132, 6, 82, 6, 133, 134, 135, 135, 6, 136, 137, 16, 6, 138, 16, 6, 83, 139, 140, 6, 6, 141, 68, 0, 25, 6, 6, 6, 6, 6, 102, 0, 0, 6, 6, 6, 6, 6, 6, 102, 0, 6, 6, 6, 6, 142, 0, 25, 81, 143, 144, 6, 145, 6, 6, 6, 27, 146, 147, 6, 6, 148, 149, 0, 146, 6, 150, 6, 95, 6, 6, 151, 152, 6, 153, 95, 78, 6, 6, 154, 103, 6, 134, 155, 156, 6, 6, 157, 158, 159, 160, 83, 161, 6, 6, 6, 162, 6, 6, 6, 6, 6, 163, 164, 30, 6, 6, 6, 153, 6, 6, 165, 0, 166, 167, 168, 6, 6, 27, 169, 6, 6, 81, 25, 6, 170, 6, 150, 171, 90, 172, 173, 174, 6, 6, 6, 78, 1, 2, 3, 105, 6, 103, 175, 0, 176, 177, 178, 0, 6, 6, 6, 68, 0, 0, 6, 31, 0, 0, 0, 179, 0, 0, 0, 0, 78, 6, 180, 181, 6, 25, 101, 68, 81, 6, 182, 0, 6, 6, 6, 6, 81, 98, 0, 0, 6, 183, 6, 184, 0, 0, 0, 0, 6, 134, 102, 150, 0, 0, 0, 0, 185, 186, 102, 134, 103, 0, 0, 187, 102, 165, 0, 0, 6, 188, 0, 0, 189, 190, 0, 78, 78, 0, 75, 191, 6, 102, 102, 192, 27, 0, 0, 0, 6, 6, 130, 0, 6, 192, 6, 192, 6, 6, 191, 193, 6, 68, 25, 194, 6, 195, 25, 196, 6, 6, 197, 0, 198, 100, 0, 0, 199, 200, 6, 201, 34, 43, 202, 203, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 204, 0, 0, 0, 0, 0, 6, 205, 206, 0, 6, 6, 207, 0, 6, 100, 98, 0, 208, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 209, 0, 0, 0, 0, 0, 0, 6, 210, 6, 6, 6, 6, 165, 0, 0, 0, 6, 6, 6, 141, 6, 6, 6, 6, 6, 6, 184, 0, 0, 0, 0, 0, 6, 141, 0, 0, 0, 0, 0, 0, 6, 6, 191, 0, 0, 0, 0, 0, 6, 210, 103, 98, 0, 0, 25, 106, 6, 134, 211, 212, 90, 0, 0, 0, 6, 6, 213, 103, 214, 0, 0, 0, 215, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 216, 217, 0, 0, 0, 0, 0, 0, 218, 219, 220, 0, 0, 0, 0, 221, 0, 0, 0, 0, 0, 6, 6, 195, 6, 222, 223, 224, 6, 225, 226, 227, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 228, 229, 83, 195, 195, 131, 131, 230, 230, 231, 6, 6, 232, 6, 233, 234, 235, 0, 0, 6, 6, 6, 6, 6, 6, 236, 0, 224, 237, 238, 239, 240, 241, 0, 0, 6, 6, 6, 6, 6, 6, 134, 0, 6, 31, 6, 6, 6, 6, 6, 6, 81, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 215, 0, 0, 81, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 90, }; static RE_UINT8 re_id_continue_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 135, 254, 255, 255, 7, 0, 4, 160, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 255, 255, 223, 188, 192, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 251, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 254, 255, 255, 255, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 195, 255, 255, 255, 255, 239, 159, 255, 253, 255, 159, 0, 0, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 63, 4, 255, 63, 0, 0, 255, 255, 255, 15, 255, 255, 31, 0, 248, 255, 255, 255, 207, 255, 254, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 3, 0, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 0, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 0, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 0, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 12, 0, 255, 255, 255, 7, 255, 127, 255, 3, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 1, 0, 0, 3, 255, 3, 160, 194, 255, 254, 255, 255, 255, 31, 254, 255, 223, 255, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 255, 3, 255, 255, 255, 255, 255, 63, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 0, 254, 3, 0, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 31, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 143, 48, 255, 3, 0, 0, 0, 56, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 15, 192, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 7, 255, 255, 255, 159, 255, 3, 255, 3, 128, 0, 255, 63, 255, 15, 255, 3, 0, 248, 15, 0, 255, 227, 255, 255, 0, 0, 247, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 0, 128, 1, 0, 16, 0, 0, 0, 2, 128, 0, 0, 255, 31, 226, 255, 1, 0, 132, 252, 47, 63, 80, 253, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 255, 127, 255, 255, 31, 248, 15, 0, 255, 128, 0, 128, 255, 255, 127, 0, 127, 127, 127, 127, 224, 0, 0, 0, 254, 255, 62, 31, 255, 255, 127, 254, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 0, 0, 255, 31, 255, 255, 255, 15, 0, 0, 255, 255, 240, 191, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 255, 0, 0, 0, 31, 0, 255, 3, 255, 255, 255, 40, 255, 63, 255, 255, 1, 128, 255, 3, 255, 63, 255, 3, 255, 255, 127, 252, 7, 0, 0, 56, 255, 255, 124, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 55, 255, 3, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 255, 255, 24, 0, 0, 224, 0, 0, 0, 0, 223, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 0, 0, 0, 32, 255, 255, 1, 0, 1, 0, 0, 0, 15, 255, 62, 0, 255, 0, 255, 255, 15, 0, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 255, 255, 15, 135, 127, 0, 0, 0, 255, 255, 7, 0, 192, 255, 0, 128, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 79, 0, 31, 28, 255, 23, 255, 255, 251, 255, 127, 189, 255, 191, 255, 1, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 191, 0, 255, 3, 255, 255, 63, 255, 1, 0, 0, 63, 17, 0, 255, 3, 255, 255, 255, 227, 255, 3, 0, 128, 255, 255, 255, 1, 15, 0, 255, 3, 248, 255, 255, 224, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 99, 224, 227, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 207, 255, 255, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, 31, 0, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* ID_Continue: 2186 bytes. */ RE_UINT32 re_get_id_continue(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_id_continue_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_id_continue_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_id_continue_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_id_continue_stage_4[pos + f] << 5; pos += code; value = (re_id_continue_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* XID_Start. */ static RE_UINT8 re_xid_start_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_xid_start_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 27, 7, 28, 29, 7, 30, 13, 13, 13, 13, 13, 31, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_xid_start_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 31, 31, 34, 35, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 38, 1, 39, 40, 41, 42, 43, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 1, 58, 59, 60, 61, 62, 63, 31, 31, 31, 64, 65, 66, 67, 68, 69, 70, 71, 72, 31, 73, 31, 31, 31, 31, 31, 1, 1, 1, 74, 75, 76, 31, 31, 1, 1, 1, 1, 77, 31, 31, 31, 31, 31, 31, 31, 1, 1, 78, 31, 1, 1, 79, 80, 31, 31, 31, 81, 82, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 83, 31, 31, 31, 31, 31, 31, 31, 84, 85, 86, 87, 88, 31, 31, 31, 31, 31, 89, 31, 1, 1, 1, 1, 1, 1, 90, 1, 1, 1, 1, 1, 1, 1, 1, 91, 92, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 93, 31, 1, 1, 94, 31, 31, 31, 31, 31, }; static RE_UINT8 re_xid_start_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 0, 0, 0, 7, 8, 9, 4, 10, 4, 4, 4, 4, 11, 4, 4, 4, 4, 12, 13, 14, 15, 0, 16, 17, 0, 4, 18, 19, 4, 4, 20, 21, 22, 23, 24, 4, 4, 25, 26, 27, 28, 29, 30, 0, 0, 31, 0, 0, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 45, 49, 50, 51, 52, 46, 0, 53, 54, 55, 56, 53, 57, 58, 59, 53, 60, 61, 62, 63, 64, 65, 0, 14, 66, 65, 0, 67, 68, 69, 0, 70, 0, 71, 72, 73, 0, 0, 0, 4, 74, 75, 76, 77, 4, 78, 79, 4, 4, 80, 4, 81, 82, 83, 4, 84, 4, 85, 0, 23, 4, 4, 86, 14, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 87, 1, 4, 4, 88, 89, 90, 90, 91, 4, 92, 93, 0, 0, 4, 4, 94, 4, 95, 4, 96, 97, 0, 16, 98, 4, 99, 100, 0, 101, 4, 31, 0, 0, 102, 0, 0, 103, 92, 104, 0, 105, 106, 4, 107, 4, 108, 109, 110, 0, 0, 0, 111, 4, 4, 4, 4, 4, 4, 0, 0, 86, 4, 112, 110, 4, 113, 114, 115, 0, 0, 0, 116, 117, 0, 0, 0, 118, 119, 120, 4, 121, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 122, 97, 4, 4, 4, 4, 123, 4, 78, 4, 124, 101, 125, 125, 0, 126, 127, 14, 4, 128, 14, 4, 79, 103, 129, 4, 4, 130, 85, 0, 16, 4, 4, 4, 4, 4, 96, 0, 0, 4, 4, 4, 4, 4, 4, 96, 0, 4, 4, 4, 4, 72, 0, 16, 110, 131, 132, 4, 133, 110, 4, 4, 23, 134, 135, 4, 4, 136, 137, 0, 134, 138, 139, 4, 92, 135, 92, 0, 140, 26, 141, 65, 142, 32, 143, 144, 145, 4, 121, 146, 147, 4, 148, 149, 150, 151, 152, 79, 141, 4, 4, 4, 139, 4, 4, 4, 4, 4, 153, 154, 155, 4, 4, 4, 156, 4, 4, 157, 0, 158, 159, 160, 4, 4, 90, 161, 4, 4, 4, 110, 32, 4, 4, 4, 4, 4, 110, 16, 4, 162, 4, 15, 163, 0, 0, 0, 164, 4, 4, 4, 142, 0, 1, 1, 165, 110, 97, 166, 0, 167, 168, 169, 0, 4, 4, 4, 85, 0, 0, 4, 31, 0, 0, 0, 0, 0, 0, 0, 0, 142, 4, 170, 0, 4, 16, 171, 96, 110, 4, 172, 0, 4, 4, 4, 4, 110, 0, 0, 0, 4, 173, 4, 108, 0, 0, 0, 0, 4, 101, 96, 15, 0, 0, 0, 0, 174, 175, 96, 101, 97, 0, 0, 176, 96, 157, 0, 0, 4, 177, 0, 0, 178, 92, 0, 142, 142, 0, 71, 179, 4, 96, 96, 143, 90, 0, 0, 0, 4, 4, 121, 0, 4, 143, 4, 143, 105, 94, 0, 0, 105, 23, 16, 121, 105, 65, 16, 180, 105, 143, 181, 0, 182, 183, 0, 0, 184, 185, 97, 0, 48, 45, 186, 56, 0, 0, 0, 0, 0, 0, 0, 0, 4, 23, 187, 0, 0, 0, 0, 0, 4, 130, 188, 0, 4, 23, 189, 0, 4, 18, 0, 0, 157, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 190, 0, 0, 0, 0, 0, 0, 4, 30, 4, 4, 4, 4, 157, 0, 0, 0, 4, 4, 4, 130, 4, 4, 4, 4, 4, 4, 108, 0, 0, 0, 0, 0, 4, 130, 0, 0, 0, 0, 0, 0, 4, 4, 65, 0, 0, 0, 0, 0, 4, 30, 97, 0, 0, 0, 16, 191, 4, 23, 108, 192, 23, 0, 0, 0, 4, 4, 193, 0, 161, 0, 0, 0, 56, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 194, 195, 0, 0, 0, 4, 4, 196, 4, 197, 198, 199, 4, 200, 201, 202, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 203, 204, 79, 196, 196, 122, 122, 205, 205, 146, 0, 4, 4, 4, 4, 4, 4, 179, 0, 199, 206, 207, 208, 209, 210, 0, 0, 4, 4, 4, 4, 4, 4, 101, 0, 4, 31, 4, 4, 4, 4, 4, 4, 110, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 56, 0, 0, 110, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_xid_start_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 0, 0, 223, 184, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 255, 255, 7, 7, 0, 255, 7, 0, 0, 0, 192, 254, 255, 255, 255, 47, 0, 96, 192, 0, 156, 0, 0, 253, 255, 255, 255, 0, 0, 0, 224, 255, 255, 63, 0, 2, 0, 0, 252, 255, 255, 255, 7, 48, 4, 255, 255, 63, 4, 16, 1, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 240, 255, 255, 255, 255, 255, 255, 35, 0, 0, 1, 255, 3, 0, 254, 255, 225, 159, 249, 255, 255, 253, 197, 35, 0, 64, 0, 176, 3, 0, 3, 0, 224, 135, 249, 255, 255, 253, 109, 3, 0, 0, 0, 94, 0, 0, 28, 0, 224, 191, 251, 255, 255, 253, 237, 35, 0, 0, 1, 0, 3, 0, 0, 2, 224, 159, 249, 255, 0, 0, 0, 176, 3, 0, 2, 0, 232, 199, 61, 214, 24, 199, 255, 3, 224, 223, 253, 255, 255, 253, 255, 35, 0, 0, 0, 7, 3, 0, 0, 0, 255, 253, 239, 35, 0, 0, 0, 64, 3, 0, 6, 0, 255, 255, 255, 39, 0, 64, 0, 128, 3, 0, 0, 252, 224, 255, 127, 252, 255, 255, 251, 47, 127, 0, 0, 0, 255, 255, 5, 0, 150, 37, 240, 254, 174, 236, 5, 32, 95, 0, 0, 240, 1, 0, 0, 0, 255, 254, 255, 255, 255, 31, 0, 0, 0, 31, 0, 0, 255, 7, 0, 128, 0, 0, 63, 60, 98, 192, 225, 255, 3, 64, 0, 0, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 7, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 3, 0, 255, 255, 3, 0, 255, 223, 1, 0, 255, 255, 15, 0, 0, 0, 128, 16, 255, 255, 255, 0, 255, 5, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 0, 0, 255, 255, 127, 0, 128, 0, 0, 0, 224, 255, 255, 255, 224, 15, 0, 0, 248, 255, 255, 255, 1, 192, 0, 252, 63, 0, 0, 0, 15, 0, 0, 0, 0, 224, 0, 252, 255, 255, 255, 63, 0, 222, 99, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 63, 80, 253, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 127, 127, 127, 127, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 224, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 0, 12, 0, 0, 255, 127, 0, 128, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 7, 0, 0, 0, 0, 0, 252, 40, 63, 0, 255, 255, 255, 255, 255, 31, 255, 255, 7, 0, 0, 128, 0, 0, 223, 255, 0, 124, 247, 15, 0, 0, 255, 255, 127, 196, 255, 255, 98, 62, 5, 0, 0, 56, 255, 7, 28, 0, 126, 126, 126, 0, 127, 127, 255, 255, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 15, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 160, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 3, 0, 0, 138, 170, 192, 255, 255, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 255, 7, 255, 255, 15, 255, 62, 0, 255, 0, 255, 255, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 1, 0, 239, 254, 31, 0, 0, 0, 255, 255, 71, 0, 30, 0, 0, 20, 255, 255, 251, 255, 255, 15, 0, 0, 127, 189, 255, 191, 255, 1, 255, 255, 0, 0, 1, 224, 176, 0, 0, 0, 0, 0, 0, 15, 16, 0, 0, 0, 0, 0, 0, 128, 255, 63, 0, 0, 248, 255, 255, 224, 31, 0, 1, 0, 255, 7, 255, 31, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* XID_Start: 2005 bytes. */ RE_UINT32 re_get_xid_start(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_xid_start_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_xid_start_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_xid_start_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_xid_start_stage_4[pos + f] << 5; pos += code; value = (re_xid_start_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* XID_Continue. */ static RE_UINT8 re_xid_continue_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, }; static RE_UINT8 re_xid_continue_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 33, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_xid_continue_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 31, 31, 34, 35, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 38, 1, 39, 40, 41, 42, 43, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 1, 58, 59, 60, 61, 62, 63, 31, 31, 31, 64, 65, 66, 67, 68, 69, 70, 71, 72, 31, 73, 31, 31, 31, 31, 31, 1, 1, 1, 74, 75, 76, 31, 31, 1, 1, 1, 1, 77, 31, 31, 31, 31, 31, 31, 31, 1, 1, 78, 31, 1, 1, 79, 80, 31, 31, 31, 81, 82, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 83, 31, 31, 31, 31, 84, 85, 31, 86, 87, 88, 89, 31, 31, 90, 31, 31, 31, 31, 31, 91, 31, 31, 31, 31, 31, 92, 31, 1, 1, 1, 1, 1, 1, 93, 1, 1, 1, 1, 1, 1, 1, 1, 94, 95, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 96, 31, 1, 1, 97, 31, 31, 31, 31, 31, 31, 98, 31, 31, 31, 31, 31, 31, }; static RE_UINT8 re_xid_continue_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 6, 6, 13, 6, 6, 6, 6, 14, 15, 16, 17, 18, 19, 20, 21, 6, 6, 22, 6, 6, 23, 24, 25, 6, 26, 6, 6, 27, 6, 28, 6, 29, 30, 0, 0, 31, 0, 32, 6, 6, 6, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 43, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 57, 61, 62, 63, 64, 65, 66, 67, 16, 68, 69, 0, 70, 71, 72, 0, 73, 74, 75, 76, 77, 78, 79, 0, 6, 6, 80, 6, 81, 6, 82, 83, 6, 6, 84, 6, 85, 86, 87, 6, 88, 6, 61, 89, 90, 6, 6, 91, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 92, 3, 6, 6, 93, 94, 31, 95, 96, 6, 6, 97, 98, 99, 6, 6, 100, 6, 101, 6, 102, 103, 104, 105, 106, 6, 107, 108, 0, 30, 6, 103, 109, 110, 111, 0, 0, 6, 6, 112, 113, 6, 6, 6, 95, 6, 100, 114, 81, 0, 0, 115, 116, 6, 6, 6, 6, 6, 6, 6, 117, 91, 6, 118, 81, 6, 119, 120, 121, 0, 122, 123, 124, 125, 0, 125, 126, 127, 128, 129, 6, 130, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 131, 103, 6, 6, 6, 6, 132, 6, 82, 6, 133, 134, 135, 135, 6, 136, 137, 16, 6, 138, 16, 6, 83, 139, 140, 6, 6, 141, 68, 0, 25, 6, 6, 6, 6, 6, 102, 0, 0, 6, 6, 6, 6, 6, 6, 102, 0, 6, 6, 6, 6, 142, 0, 25, 81, 143, 144, 6, 145, 6, 6, 6, 27, 146, 147, 6, 6, 148, 149, 0, 146, 6, 150, 6, 95, 6, 6, 151, 152, 6, 153, 95, 78, 6, 6, 154, 103, 6, 134, 155, 156, 6, 6, 157, 158, 159, 160, 83, 161, 6, 6, 6, 162, 6, 6, 6, 6, 6, 163, 164, 30, 6, 6, 6, 153, 6, 6, 165, 0, 166, 167, 168, 6, 6, 27, 169, 6, 6, 6, 81, 170, 6, 6, 6, 6, 6, 81, 25, 6, 171, 6, 150, 1, 90, 172, 173, 174, 6, 6, 6, 78, 1, 2, 3, 105, 6, 103, 175, 0, 176, 177, 178, 0, 6, 6, 6, 68, 0, 0, 6, 31, 0, 0, 0, 179, 0, 0, 0, 0, 78, 6, 180, 181, 6, 25, 101, 68, 81, 6, 182, 0, 6, 6, 6, 6, 81, 98, 0, 0, 6, 183, 6, 184, 0, 0, 0, 0, 6, 134, 102, 150, 0, 0, 0, 0, 185, 186, 102, 134, 103, 0, 0, 187, 102, 165, 0, 0, 6, 188, 0, 0, 189, 190, 0, 78, 78, 0, 75, 191, 6, 102, 102, 192, 27, 0, 0, 0, 6, 6, 130, 0, 6, 192, 6, 192, 6, 6, 191, 193, 6, 68, 25, 194, 6, 195, 25, 196, 6, 6, 197, 0, 198, 100, 0, 0, 199, 200, 6, 201, 34, 43, 202, 203, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 204, 0, 0, 0, 0, 0, 6, 205, 206, 0, 6, 6, 207, 0, 6, 100, 98, 0, 208, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 209, 0, 0, 0, 0, 0, 0, 6, 210, 6, 6, 6, 6, 165, 0, 0, 0, 6, 6, 6, 141, 6, 6, 6, 6, 6, 6, 184, 0, 0, 0, 0, 0, 6, 141, 0, 0, 0, 0, 0, 0, 6, 6, 191, 0, 0, 0, 0, 0, 6, 210, 103, 98, 0, 0, 25, 106, 6, 134, 211, 212, 90, 0, 0, 0, 6, 6, 213, 103, 214, 0, 0, 0, 215, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 216, 217, 0, 0, 0, 0, 0, 0, 218, 219, 220, 0, 0, 0, 0, 221, 0, 0, 0, 0, 0, 6, 6, 195, 6, 222, 223, 224, 6, 225, 226, 227, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 228, 229, 83, 195, 195, 131, 131, 230, 230, 231, 6, 6, 232, 6, 233, 234, 235, 0, 0, 6, 6, 6, 6, 6, 6, 236, 0, 224, 237, 238, 239, 240, 241, 0, 0, 6, 6, 6, 6, 6, 6, 134, 0, 6, 31, 6, 6, 6, 6, 6, 6, 81, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 215, 0, 0, 81, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 90, }; static RE_UINT8 re_xid_continue_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 135, 254, 255, 255, 7, 0, 4, 160, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 255, 255, 223, 184, 192, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 251, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 254, 255, 255, 255, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 195, 255, 255, 255, 255, 239, 159, 255, 253, 255, 159, 0, 0, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 63, 4, 255, 63, 0, 0, 255, 255, 255, 15, 255, 255, 31, 0, 248, 255, 255, 255, 207, 255, 254, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 3, 0, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 0, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 0, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 0, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 12, 0, 255, 255, 255, 7, 255, 127, 255, 3, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 1, 0, 0, 3, 255, 3, 160, 194, 255, 254, 255, 255, 255, 31, 254, 255, 223, 255, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 255, 3, 255, 255, 255, 255, 255, 63, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 0, 254, 3, 0, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 31, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 143, 48, 255, 3, 0, 0, 0, 56, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 15, 192, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 7, 255, 255, 255, 159, 255, 3, 255, 3, 128, 0, 255, 63, 255, 15, 255, 3, 0, 248, 15, 0, 255, 227, 255, 255, 0, 0, 247, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 0, 128, 1, 0, 16, 0, 0, 0, 2, 128, 0, 0, 255, 31, 226, 255, 1, 0, 132, 252, 47, 63, 80, 253, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 255, 127, 255, 255, 31, 248, 15, 0, 255, 128, 0, 128, 255, 255, 127, 0, 127, 127, 127, 127, 224, 0, 0, 0, 254, 255, 62, 31, 255, 255, 127, 230, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 0, 0, 255, 31, 255, 255, 255, 15, 0, 0, 255, 255, 240, 191, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 255, 0, 0, 0, 31, 0, 255, 3, 255, 255, 255, 40, 255, 63, 255, 255, 1, 128, 255, 3, 255, 63, 255, 3, 255, 255, 127, 252, 7, 0, 0, 56, 255, 255, 124, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 55, 255, 3, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 240, 255, 255, 255, 255, 255, 252, 255, 255, 255, 24, 0, 0, 224, 0, 0, 0, 0, 138, 170, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 0, 0, 0, 32, 255, 255, 1, 0, 1, 0, 0, 0, 15, 255, 62, 0, 255, 0, 255, 255, 15, 0, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 255, 255, 15, 135, 127, 0, 0, 0, 255, 255, 7, 0, 192, 255, 0, 128, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 79, 0, 31, 28, 255, 23, 255, 255, 251, 255, 127, 189, 255, 191, 255, 1, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 191, 0, 255, 3, 255, 255, 63, 255, 1, 0, 0, 63, 17, 0, 255, 3, 255, 255, 255, 227, 255, 3, 0, 128, 255, 255, 255, 1, 15, 0, 255, 3, 248, 255, 255, 224, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 99, 224, 227, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 207, 255, 255, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, 31, 0, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* XID_Continue: 2194 bytes. */ RE_UINT32 re_get_xid_continue(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_xid_continue_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_xid_continue_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_xid_continue_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_xid_continue_stage_4[pos + f] << 5; pos += code; value = (re_xid_continue_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Default_Ignorable_Code_Point. */ static RE_UINT8 re_default_ignorable_code_point_stage_1[] = { 0, 1, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, }; static RE_UINT8 re_default_ignorable_code_point_stage_2[] = { 0, 1, 2, 3, 4, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 1, 1, 8, 1, 1, 1, 1, 1, 9, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_default_ignorable_code_point_stage_3[] = { 0, 1, 1, 2, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 5, 6, 1, 1, 1, 1, 1, 1, 1, 7, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 10, 1, 1, 1, 1, 11, 1, 1, 1, 1, 12, 1, 1, 1, 1, 1, 1, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_default_ignorable_code_point_stage_4[] = { 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 8, 9, 0, 10, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 5, 0, 12, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 15, 15, 15, 15, 15, 15, 15, 15, }; static RE_UINT8 re_default_ignorable_code_point_stage_5[] = { 0, 0, 0, 0, 0, 32, 0, 0, 0, 128, 0, 0, 0, 0, 0, 16, 0, 0, 0, 128, 1, 0, 0, 0, 0, 0, 48, 0, 0, 120, 0, 0, 0, 248, 0, 0, 0, 124, 0, 0, 255, 255, 0, 0, 16, 0, 0, 0, 0, 0, 255, 1, 15, 0, 0, 0, 0, 0, 248, 7, 255, 255, 255, 255, }; /* Default_Ignorable_Code_Point: 370 bytes. */ RE_UINT32 re_get_default_ignorable_code_point(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_default_ignorable_code_point_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_default_ignorable_code_point_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_default_ignorable_code_point_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_default_ignorable_code_point_stage_4[pos + f] << 5; pos += code; value = (re_default_ignorable_code_point_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Grapheme_Extend. */ static RE_UINT8 re_grapheme_extend_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, }; static RE_UINT8 re_grapheme_extend_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 7, 7, 7, 7, 7, 7, 7, 7, 7, 10, 11, 12, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 7, 7, 7, 7, 7, 7, 7, 7, 7, 15, 7, 7, 16, 17, 7, 18, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 19, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, }; static RE_UINT8 re_grapheme_extend_stage_3[] = { 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 0, 0, 15, 0, 0, 0, 16, 17, 18, 19, 20, 21, 22, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 25, 0, 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 28, 29, 30, 31, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 33, 34, 0, 35, 36, 37, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 0, 39, 40, 41, 42, 43, 44, 45, 46, 0, 0, 47, 48, 0, 0, 0, 49, 0, 0, 0, 0, 50, 0, 0, 0, 0, 51, 52, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_grapheme_extend_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 7, 0, 8, 9, 0, 0, 10, 11, 12, 13, 14, 0, 0, 15, 0, 16, 17, 18, 19, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 24, 28, 29, 30, 31, 28, 29, 32, 24, 25, 33, 34, 24, 35, 36, 37, 0, 38, 39, 40, 24, 25, 41, 42, 24, 25, 36, 27, 24, 0, 0, 43, 0, 0, 44, 45, 0, 0, 46, 47, 0, 48, 49, 0, 50, 51, 52, 53, 0, 0, 54, 55, 56, 57, 0, 0, 0, 0, 0, 58, 0, 0, 0, 0, 0, 59, 59, 60, 60, 0, 61, 62, 0, 63, 0, 0, 0, 0, 64, 0, 0, 0, 65, 0, 0, 0, 0, 0, 0, 66, 0, 67, 68, 0, 69, 0, 0, 70, 71, 35, 16, 72, 73, 0, 74, 0, 75, 0, 0, 0, 0, 76, 77, 0, 0, 0, 0, 0, 0, 1, 78, 79, 0, 0, 0, 0, 0, 13, 80, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 82, 0, 0, 0, 1, 0, 83, 0, 0, 84, 0, 0, 0, 0, 0, 0, 85, 39, 0, 0, 86, 87, 88, 0, 0, 0, 0, 89, 90, 0, 91, 92, 0, 21, 93, 0, 94, 0, 95, 96, 29, 0, 97, 25, 98, 0, 0, 0, 0, 0, 0, 0, 99, 36, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100, 0, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 101, 0, 0, 0, 0, 102, 103, 0, 0, 0, 0, 0, 88, 25, 104, 105, 82, 72, 106, 0, 0, 21, 107, 0, 108, 72, 109, 110, 0, 0, 111, 0, 0, 0, 0, 82, 112, 72, 26, 113, 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 115, 116, 0, 0, 0, 0, 0, 0, 117, 118, 0, 0, 119, 38, 0, 0, 120, 0, 0, 58, 121, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 122, 0, 123, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 124, 0, 0, 0, 0, 0, 0, 0, 125, 0, 0, 0, 0, 0, 0, 126, 127, 128, 0, 0, 0, 0, 129, 0, 0, 0, 0, 0, 1, 130, 1, 131, 132, 133, 0, 0, 0, 0, 0, 0, 0, 0, 123, 0, 1, 1, 1, 1, 1, 1, 1, 2, }; static RE_UINT8 re_grapheme_extend_stage_5[] = { 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 248, 3, 0, 0, 0, 0, 254, 255, 255, 255, 255, 191, 182, 0, 0, 0, 0, 0, 255, 7, 0, 248, 255, 255, 0, 0, 1, 0, 0, 0, 192, 159, 159, 61, 0, 0, 0, 0, 2, 0, 0, 0, 255, 255, 255, 7, 0, 0, 192, 255, 1, 0, 0, 248, 15, 0, 0, 0, 192, 251, 239, 62, 0, 0, 0, 0, 0, 14, 248, 255, 255, 255, 7, 0, 0, 0, 0, 0, 0, 20, 254, 33, 254, 0, 12, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 80, 30, 32, 128, 0, 6, 0, 0, 0, 0, 0, 0, 16, 134, 57, 2, 0, 0, 0, 35, 0, 190, 33, 0, 0, 0, 0, 0, 208, 30, 32, 192, 0, 4, 0, 0, 0, 0, 0, 0, 64, 1, 32, 128, 0, 1, 0, 0, 0, 0, 0, 0, 192, 193, 61, 96, 0, 0, 0, 0, 144, 68, 48, 96, 0, 0, 132, 92, 128, 0, 0, 242, 7, 128, 127, 0, 0, 0, 0, 242, 27, 0, 63, 0, 0, 0, 0, 0, 3, 0, 0, 160, 2, 0, 0, 254, 127, 223, 224, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 0, 224, 253, 102, 0, 0, 0, 195, 1, 0, 30, 0, 100, 32, 0, 32, 0, 0, 0, 224, 0, 0, 28, 0, 0, 0, 12, 0, 0, 0, 176, 63, 64, 254, 15, 32, 0, 56, 0, 0, 0, 2, 0, 0, 135, 1, 4, 14, 0, 0, 128, 9, 0, 0, 64, 127, 229, 31, 248, 159, 0, 0, 255, 127, 15, 0, 0, 0, 0, 0, 208, 23, 3, 0, 0, 0, 60, 59, 0, 0, 64, 163, 3, 0, 0, 240, 207, 0, 0, 0, 247, 255, 253, 33, 16, 3, 255, 255, 63, 240, 0, 48, 0, 0, 255, 255, 1, 0, 0, 128, 3, 0, 0, 0, 0, 128, 0, 252, 0, 0, 0, 0, 0, 6, 0, 128, 247, 63, 0, 0, 3, 0, 68, 8, 0, 0, 96, 0, 0, 0, 16, 0, 0, 0, 255, 255, 3, 0, 192, 63, 0, 0, 128, 255, 3, 0, 0, 0, 200, 19, 32, 0, 0, 0, 0, 126, 102, 0, 8, 16, 0, 0, 0, 0, 157, 193, 0, 48, 64, 0, 32, 33, 0, 0, 0, 0, 0, 32, 0, 0, 192, 7, 110, 240, 0, 0, 0, 0, 0, 135, 0, 0, 0, 255, 127, 0, 0, 0, 0, 0, 120, 6, 128, 239, 31, 0, 0, 0, 8, 0, 0, 0, 192, 127, 0, 28, 0, 0, 0, 128, 211, 0, 248, 7, 0, 0, 1, 0, 128, 0, 192, 31, 31, 0, 0, 0, 249, 165, 13, 0, 0, 0, 0, 128, 60, 176, 1, 0, 0, 48, 0, 0, 248, 167, 0, 40, 191, 0, 188, 15, 0, 0, 0, 0, 31, 0, 0, 0, 127, 0, 0, 128, 7, 0, 0, 0, 0, 96, 160, 195, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, }; /* Grapheme_Extend: 1274 bytes. */ RE_UINT32 re_get_grapheme_extend(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_grapheme_extend_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_grapheme_extend_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_grapheme_extend_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_grapheme_extend_stage_4[pos + f] << 5; pos += code; value = (re_grapheme_extend_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Grapheme_Base. */ static RE_UINT8 re_grapheme_base_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_grapheme_base_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 13, 13, 13, 13, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 13, 16, 17, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 19, 29, 30, 19, 19, 13, 31, 19, 19, 19, 32, 19, 19, 19, 19, 19, 19, 19, 19, 33, 34, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 35, 19, 19, 36, 19, 19, 19, 19, 37, 38, 39, 19, 19, 19, 40, 41, 42, 43, 44, 19, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 45, 13, 13, 13, 46, 47, 13, 13, 13, 13, 48, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 49, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, }; static RE_UINT8 re_grapheme_base_stage_3[] = { 0, 1, 2, 2, 2, 2, 3, 4, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 2, 2, 30, 31, 32, 33, 2, 2, 2, 2, 2, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 2, 47, 2, 2, 48, 49, 50, 51, 2, 52, 2, 2, 2, 53, 54, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 55, 56, 57, 58, 59, 60, 61, 62, 2, 63, 64, 65, 66, 67, 68, 69, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 70, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 71, 2, 72, 2, 2, 73, 74, 2, 75, 76, 77, 78, 79, 80, 81, 82, 83, 2, 2, 2, 2, 2, 2, 2, 84, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 2, 2, 86, 87, 88, 89, 2, 2, 90, 91, 92, 93, 94, 95, 96, 53, 97, 98, 85, 99, 100, 101, 2, 102, 103, 85, 2, 2, 104, 85, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 85, 85, 115, 85, 85, 85, 116, 117, 118, 119, 120, 121, 122, 85, 85, 123, 85, 124, 125, 126, 127, 85, 85, 128, 85, 85, 85, 129, 85, 85, 2, 2, 2, 2, 2, 2, 2, 130, 131, 2, 132, 85, 85, 85, 85, 85, 133, 85, 85, 85, 85, 85, 85, 85, 2, 2, 2, 2, 134, 85, 85, 85, 2, 2, 2, 2, 135, 136, 137, 138, 85, 85, 85, 85, 85, 85, 139, 140, 141, 85, 85, 85, 85, 85, 85, 85, 142, 143, 85, 85, 85, 85, 85, 85, 2, 144, 145, 146, 147, 85, 148, 85, 149, 150, 151, 2, 2, 152, 2, 153, 2, 2, 2, 2, 154, 155, 85, 85, 2, 156, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 157, 158, 85, 85, 159, 160, 161, 162, 163, 85, 2, 2, 2, 2, 164, 165, 2, 166, 167, 168, 169, 170, 171, 172, 85, 85, 85, 85, 2, 2, 2, 2, 2, 173, 2, 2, 2, 2, 2, 2, 2, 2, 174, 2, 175, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 176, 85, 85, 2, 2, 2, 2, 177, 85, 85, 85, }; static RE_UINT8 re_grapheme_base_stage_4[] = { 0, 0, 1, 1, 1, 1, 1, 2, 0, 0, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 4, 5, 1, 6, 1, 1, 1, 1, 1, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1, 9, 8, 1, 10, 0, 0, 11, 12, 1, 13, 14, 15, 16, 1, 1, 13, 0, 1, 8, 1, 1, 1, 1, 1, 17, 18, 1, 19, 20, 1, 0, 21, 1, 1, 1, 1, 1, 22, 23, 1, 1, 13, 24, 1, 25, 26, 2, 1, 27, 0, 0, 0, 0, 1, 14, 0, 0, 0, 0, 28, 1, 1, 29, 30, 31, 32, 1, 33, 34, 35, 36, 37, 38, 39, 40, 41, 34, 35, 42, 43, 44, 15, 45, 46, 6, 35, 47, 48, 43, 39, 49, 50, 34, 35, 51, 52, 38, 39, 53, 54, 55, 56, 57, 58, 43, 15, 13, 59, 20, 35, 60, 61, 62, 39, 63, 64, 20, 35, 65, 66, 11, 39, 67, 64, 20, 1, 68, 69, 70, 39, 71, 72, 73, 1, 74, 75, 76, 15, 45, 8, 1, 1, 77, 78, 40, 0, 0, 79, 80, 81, 82, 83, 84, 0, 0, 1, 4, 1, 85, 86, 1, 87, 70, 88, 0, 0, 89, 90, 13, 0, 0, 1, 1, 87, 91, 1, 92, 8, 93, 94, 3, 1, 1, 95, 1, 1, 1, 1, 1, 1, 1, 96, 97, 1, 1, 96, 1, 1, 98, 99, 100, 1, 1, 1, 99, 1, 1, 1, 13, 1, 87, 1, 101, 1, 1, 1, 1, 1, 102, 1, 87, 1, 1, 1, 1, 1, 103, 3, 104, 1, 105, 1, 104, 3, 43, 1, 1, 1, 106, 107, 108, 101, 101, 13, 101, 1, 1, 1, 1, 1, 53, 1, 1, 109, 1, 1, 1, 1, 22, 1, 2, 110, 111, 112, 1, 19, 14, 1, 1, 40, 1, 101, 113, 1, 1, 1, 114, 1, 1, 1, 115, 116, 117, 101, 101, 19, 0, 0, 0, 0, 0, 118, 1, 1, 119, 120, 1, 13, 108, 121, 1, 122, 1, 1, 1, 123, 124, 1, 1, 40, 125, 126, 1, 1, 1, 0, 0, 0, 0, 53, 127, 128, 129, 1, 1, 1, 1, 0, 0, 0, 0, 1, 102, 1, 1, 102, 130, 1, 19, 1, 1, 1, 131, 131, 132, 1, 133, 13, 1, 134, 1, 1, 1, 0, 32, 2, 87, 1, 2, 0, 0, 0, 0, 40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 13, 1, 1, 75, 0, 13, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 135, 1, 136, 1, 126, 35, 104, 137, 0, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 138, 1, 1, 95, 1, 1, 1, 134, 43, 1, 75, 139, 139, 139, 139, 0, 0, 1, 1, 1, 1, 117, 0, 0, 0, 1, 140, 1, 1, 1, 1, 1, 141, 1, 1, 1, 1, 1, 22, 0, 40, 1, 1, 101, 1, 8, 1, 1, 1, 1, 142, 1, 1, 1, 1, 1, 1, 143, 1, 19, 8, 1, 1, 1, 1, 2, 1, 1, 13, 1, 1, 141, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 22, 1, 1, 1, 1, 1, 1, 1, 1, 1, 22, 0, 0, 87, 1, 1, 1, 75, 1, 1, 1, 1, 1, 40, 0, 1, 1, 2, 144, 1, 19, 1, 1, 1, 1, 1, 145, 1, 1, 19, 53, 0, 0, 0, 146, 147, 1, 148, 101, 1, 1, 1, 53, 1, 1, 1, 1, 149, 101, 0, 150, 1, 1, 151, 1, 75, 152, 1, 87, 28, 1, 1, 153, 154, 155, 131, 2, 1, 1, 156, 157, 158, 84, 1, 159, 1, 1, 1, 160, 161, 162, 163, 22, 164, 165, 139, 1, 1, 1, 22, 1, 1, 1, 1, 1, 1, 1, 166, 101, 1, 1, 141, 1, 142, 1, 1, 40, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 19, 1, 1, 1, 1, 1, 1, 101, 0, 0, 75, 167, 1, 168, 169, 1, 1, 1, 1, 1, 1, 1, 104, 28, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 121, 1, 1, 53, 0, 0, 19, 0, 101, 0, 1, 1, 170, 171, 131, 1, 1, 1, 1, 1, 1, 1, 87, 8, 1, 1, 1, 1, 1, 1, 1, 1, 19, 1, 2, 172, 173, 139, 174, 159, 1, 100, 175, 19, 19, 0, 0, 176, 1, 1, 177, 1, 1, 1, 1, 87, 40, 43, 0, 0, 1, 1, 87, 1, 87, 1, 1, 1, 43, 8, 40, 1, 1, 141, 1, 13, 1, 1, 22, 1, 154, 1, 1, 178, 22, 0, 0, 1, 19, 101, 0, 0, 0, 0, 0, 1, 1, 53, 1, 1, 1, 179, 0, 1, 1, 1, 75, 1, 22, 53, 0, 180, 1, 1, 181, 1, 182, 1, 1, 1, 2, 146, 0, 0, 0, 1, 183, 1, 184, 1, 57, 0, 0, 0, 0, 1, 1, 1, 185, 1, 121, 1, 1, 43, 186, 1, 141, 53, 103, 1, 1, 1, 1, 0, 0, 1, 1, 187, 75, 1, 1, 1, 71, 1, 136, 1, 188, 1, 189, 190, 0, 0, 0, 0, 0, 1, 1, 1, 1, 103, 0, 0, 0, 1, 1, 1, 117, 1, 1, 1, 7, 0, 0, 0, 0, 0, 0, 1, 2, 20, 1, 1, 53, 191, 121, 1, 0, 121, 1, 1, 192, 104, 1, 103, 101, 28, 1, 193, 15, 141, 1, 1, 194, 121, 1, 1, 195, 60, 1, 8, 14, 1, 6, 2, 196, 0, 0, 0, 0, 197, 154, 101, 1, 1, 2, 117, 101, 50, 34, 35, 198, 199, 200, 141, 0, 1, 1, 1, 201, 202, 101, 0, 0, 1, 1, 2, 203, 8, 40, 0, 0, 1, 1, 1, 204, 61, 101, 0, 0, 1, 1, 205, 206, 101, 0, 0, 0, 1, 101, 207, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 208, 0, 0, 0, 0, 1, 1, 1, 103, 1, 101, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 14, 1, 1, 1, 1, 141, 0, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 75, 0, 0, 0, 1, 1, 1, 103, 1, 2, 155, 0, 0, 0, 0, 0, 0, 1, 19, 209, 1, 1, 1, 146, 22, 140, 6, 210, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 14, 1, 1, 2, 0, 28, 0, 0, 0, 0, 0, 0, 104, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 13, 87, 103, 211, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 22, 1, 1, 9, 1, 1, 1, 212, 0, 213, 1, 155, 1, 1, 1, 103, 0, 1, 1, 1, 1, 214, 0, 0, 0, 1, 1, 1, 1, 1, 75, 1, 104, 1, 1, 1, 1, 1, 131, 1, 1, 1, 3, 215, 29, 216, 1, 1, 1, 217, 218, 1, 219, 220, 20, 1, 1, 1, 1, 136, 1, 1, 1, 1, 1, 1, 1, 1, 1, 163, 1, 1, 1, 0, 0, 0, 221, 0, 0, 21, 131, 222, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 223, 0, 0, 0, 216, 1, 224, 225, 226, 227, 228, 229, 140, 40, 230, 40, 0, 0, 0, 104, 1, 1, 40, 1, 1, 1, 1, 1, 1, 141, 2, 8, 8, 8, 1, 22, 87, 1, 2, 1, 1, 1, 40, 1, 1, 13, 0, 0, 0, 0, 15, 1, 117, 1, 1, 13, 103, 104, 0, 0, 1, 1, 1, 1, 1, 1, 1, 140, 1, 1, 216, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 43, 87, 141, 1, 1, 1, 1, 1, 1, 1, 141, 1, 1, 1, 1, 1, 14, 0, 0, 40, 1, 1, 1, 53, 101, 1, 1, 53, 1, 19, 0, 0, 0, 0, 0, 0, 103, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 43, 0, 0, 0, 1, 1, 1, 1, 1, 75, 0, 0, 1, 1, 1, 14, 1, 1, 1, 1, 1, 19, 1, 1, 1, 1, 1, 1, 1, 1, 104, 0, 0, 0, 0, 0, 1, 19, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_grapheme_base_stage_5[] = { 0, 0, 255, 255, 255, 127, 255, 223, 255, 252, 240, 215, 251, 255, 7, 252, 254, 255, 127, 254, 255, 230, 0, 64, 73, 0, 255, 7, 31, 0, 192, 255, 0, 200, 63, 64, 96, 194, 255, 63, 253, 255, 0, 224, 63, 0, 2, 0, 240, 7, 63, 4, 16, 1, 255, 65, 248, 255, 255, 235, 1, 222, 1, 255, 243, 255, 237, 159, 249, 255, 255, 253, 197, 163, 129, 89, 0, 176, 195, 255, 255, 15, 232, 135, 109, 195, 1, 0, 0, 94, 28, 0, 232, 191, 237, 227, 1, 26, 3, 2, 236, 159, 237, 35, 129, 25, 255, 0, 232, 199, 61, 214, 24, 199, 255, 131, 198, 29, 238, 223, 255, 35, 30, 0, 0, 7, 0, 255, 236, 223, 239, 99, 155, 13, 6, 0, 255, 167, 193, 93, 0, 128, 63, 254, 236, 255, 127, 252, 251, 47, 127, 0, 3, 127, 13, 128, 127, 128, 150, 37, 240, 254, 174, 236, 13, 32, 95, 0, 255, 243, 95, 253, 255, 254, 255, 31, 32, 31, 0, 192, 191, 223, 2, 153, 255, 60, 225, 255, 155, 223, 191, 32, 255, 61, 127, 61, 61, 127, 61, 255, 127, 255, 255, 3, 63, 63, 255, 1, 3, 0, 99, 0, 79, 192, 191, 1, 240, 31, 255, 5, 120, 14, 251, 1, 241, 255, 255, 199, 127, 198, 191, 0, 26, 224, 7, 0, 240, 255, 47, 232, 251, 15, 252, 255, 195, 196, 191, 92, 12, 240, 48, 248, 255, 227, 8, 0, 2, 222, 111, 0, 255, 170, 223, 255, 207, 239, 220, 127, 255, 128, 207, 255, 63, 255, 0, 240, 12, 254, 127, 127, 255, 251, 15, 0, 127, 248, 224, 255, 8, 192, 252, 0, 128, 255, 187, 247, 159, 15, 15, 192, 252, 63, 63, 192, 12, 128, 55, 236, 255, 191, 255, 195, 255, 129, 25, 0, 247, 47, 255, 239, 98, 62, 5, 0, 0, 248, 255, 207, 126, 126, 126, 0, 223, 30, 248, 160, 127, 95, 219, 255, 247, 255, 127, 15, 252, 252, 252, 28, 0, 48, 255, 183, 135, 255, 143, 255, 15, 255, 15, 128, 63, 253, 191, 145, 191, 255, 55, 248, 255, 143, 255, 240, 239, 254, 31, 248, 7, 255, 3, 30, 0, 254, 128, 63, 135, 217, 127, 16, 119, 0, 63, 128, 44, 63, 127, 189, 237, 163, 158, 57, 1, 224, 6, 90, 242, 0, 3, 79, 7, 88, 255, 215, 64, 0, 67, 0, 7, 128, 32, 0, 255, 224, 255, 147, 95, 60, 24, 240, 35, 0, 100, 222, 239, 255, 191, 231, 223, 223, 255, 123, 95, 252, 128, 7, 239, 15, 159, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 238, 251, }; /* Grapheme_Base: 2544 bytes. */ RE_UINT32 re_get_grapheme_base(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_grapheme_base_stage_1[f] << 5; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_grapheme_base_stage_2[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_grapheme_base_stage_3[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_grapheme_base_stage_4[pos + f] << 4; pos += code; value = (re_grapheme_base_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Grapheme_Link. */ static RE_UINT8 re_grapheme_link_stage_1[] = { 0, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_grapheme_link_stage_2[] = { 0, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 8, 0, 9, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_grapheme_link_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 0, 0, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 0, 0, 0, 0, 8, 0, 9, 10, 0, 0, 11, 0, 0, 0, 0, 0, 12, 9, 13, 14, 0, 15, 0, 16, 0, 0, 0, 0, 17, 0, 0, 0, 18, 19, 20, 14, 21, 22, 1, 0, 0, 23, 0, 17, 17, 24, 25, 0, }; static RE_UINT8 re_grapheme_link_stage_4[] = { 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 0, 0, 5, 0, 0, 6, 6, 0, 0, 0, 0, 7, 0, 0, 0, 0, 8, 0, 0, 4, 0, 0, 9, 0, 10, 0, 0, 0, 11, 12, 0, 0, 0, 0, 0, 13, 0, 0, 0, 8, 0, 0, 0, 0, 14, 0, 0, 0, 1, 0, 11, 0, 0, 0, 0, 12, 11, 0, 15, 0, 0, 0, 16, 0, 0, 0, 17, 0, 0, 0, 0, 0, 2, 0, 0, 18, 0, 0, 14, 0, 0, 0, 19, 0, 0, }; static RE_UINT8 re_grapheme_link_stage_5[] = { 0, 0, 0, 0, 0, 32, 0, 0, 0, 4, 0, 0, 0, 0, 0, 4, 16, 0, 0, 0, 0, 0, 0, 6, 0, 0, 16, 0, 0, 0, 4, 0, 1, 0, 0, 0, 0, 12, 0, 0, 0, 0, 12, 0, 0, 0, 0, 128, 64, 0, 0, 0, 0, 0, 8, 0, 0, 0, 64, 0, 0, 0, 0, 2, 0, 0, 24, 0, 0, 0, 32, 0, 4, 0, 0, 0, 0, 8, 0, 0, }; /* Grapheme_Link: 404 bytes. */ RE_UINT32 re_get_grapheme_link(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 14; code = ch ^ (f << 14); pos = (RE_UINT32)re_grapheme_link_stage_1[f] << 4; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_grapheme_link_stage_2[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_grapheme_link_stage_3[pos + f] << 2; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_grapheme_link_stage_4[pos + f] << 5; pos += code; value = (re_grapheme_link_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* White_Space. */ static RE_UINT8 re_white_space_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_white_space_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_white_space_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_white_space_stage_4[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 4, 5, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_white_space_stage_5[] = { 0, 62, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 255, 7, 0, 0, 0, 131, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, }; /* White_Space: 169 bytes. */ RE_UINT32 re_get_white_space(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_white_space_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_white_space_stage_2[pos + f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_white_space_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_white_space_stage_4[pos + f] << 6; pos += code; value = (re_white_space_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Bidi_Control. */ static RE_UINT8 re_bidi_control_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_bidi_control_stage_2[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_bidi_control_stage_3[] = { 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_bidi_control_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_bidi_control_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 192, 0, 0, 0, 124, 0, 0, 0, 0, 0, 0, 192, 3, 0, 0, }; /* Bidi_Control: 129 bytes. */ RE_UINT32 re_get_bidi_control(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_bidi_control_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_bidi_control_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_bidi_control_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_bidi_control_stage_4[pos + f] << 6; pos += code; value = (re_bidi_control_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Join_Control. */ static RE_UINT8 re_join_control_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_join_control_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_join_control_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_join_control_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_join_control_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, }; /* Join_Control: 97 bytes. */ RE_UINT32 re_get_join_control(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_join_control_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_join_control_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_join_control_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_join_control_stage_4[pos + f] << 6; pos += code; value = (re_join_control_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Dash. */ static RE_UINT8 re_dash_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_dash_stage_2[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_dash_stage_3[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 4, 1, 1, 1, 5, 6, 1, 1, 1, 1, 1, 7, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, }; static RE_UINT8 re_dash_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 5, 6, 7, 1, 1, 1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 9, 3, 1, 1, 1, 1, 1, 1, 10, 1, 11, 1, 1, 1, 1, 1, 12, 13, 1, 1, 14, 1, 1, 1, }; static RE_UINT8 re_dash_stage_5[] = { 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 64, 1, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 128, 4, 0, 0, 0, 12, 0, 0, 0, 16, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 1, 8, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, }; /* Dash: 297 bytes. */ RE_UINT32 re_get_dash(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_dash_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_dash_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_dash_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_dash_stage_4[pos + f] << 6; pos += code; value = (re_dash_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Hyphen. */ static RE_UINT8 re_hyphen_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_hyphen_stage_2[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_hyphen_stage_3[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 5, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, }; static RE_UINT8 re_hyphen_stage_4[] = { 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 7, 1, 1, 8, 9, 1, 1, }; static RE_UINT8 re_hyphen_stage_5[] = { 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, }; /* Hyphen: 241 bytes. */ RE_UINT32 re_get_hyphen(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_hyphen_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_hyphen_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_hyphen_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_hyphen_stage_4[pos + f] << 6; pos += code; value = (re_hyphen_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Quotation_Mark. */ static RE_UINT8 re_quotation_mark_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_quotation_mark_stage_2[] = { 0, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_quotation_mark_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, }; static RE_UINT8 re_quotation_mark_stage_4[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 7, 8, 1, 1, }; static RE_UINT8 re_quotation_mark_stage_5[] = { 0, 0, 0, 0, 132, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 8, 0, 0, 0, 255, 0, 0, 0, 6, 4, 0, 0, 0, 0, 0, 0, 0, 0, 240, 0, 224, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 0, 132, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, }; /* Quotation_Mark: 209 bytes. */ RE_UINT32 re_get_quotation_mark(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_quotation_mark_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_quotation_mark_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_quotation_mark_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_quotation_mark_stage_4[pos + f] << 6; pos += code; value = (re_quotation_mark_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Terminal_Punctuation. */ static RE_UINT8 re_terminal_punctuation_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_terminal_punctuation_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 10, 11, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 12, 13, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 14, 15, 9, 16, 9, 17, 18, 9, 9, 9, 19, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 20, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 21, 9, 9, 9, 9, 9, 9, 22, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, }; static RE_UINT8 re_terminal_punctuation_stage_3[] = { 0, 1, 1, 1, 1, 1, 2, 3, 1, 1, 1, 4, 5, 6, 7, 8, 9, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 12, 1, 13, 1, 1, 1, 1, 1, 14, 1, 1, 1, 1, 1, 15, 16, 17, 18, 19, 1, 20, 1, 1, 21, 22, 1, 23, 1, 1, 1, 1, 1, 1, 1, 24, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 25, 1, 1, 1, 26, 1, 1, 1, 1, 1, 1, 1, 1, 27, 1, 1, 28, 29, 1, 1, 30, 31, 32, 33, 34, 35, 1, 36, 1, 1, 1, 1, 37, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 39, 40, 1, 41, 1, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 1, 1, 1, 1, 1, 52, 53, 1, 54, 1, 55, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 56, 57, 58, 1, 1, 41, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 59, 1, 1, }; static RE_UINT8 re_terminal_punctuation_stage_4[] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 4, 0, 5, 0, 6, 0, 0, 0, 0, 0, 7, 0, 8, 0, 0, 0, 0, 0, 0, 9, 0, 10, 2, 0, 0, 0, 0, 11, 0, 0, 12, 0, 13, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 15, 0, 0, 0, 16, 0, 0, 0, 17, 0, 18, 0, 0, 0, 0, 19, 0, 20, 0, 0, 0, 0, 0, 11, 0, 0, 21, 0, 0, 0, 0, 22, 0, 0, 23, 0, 24, 0, 25, 26, 0, 0, 27, 28, 0, 29, 0, 0, 0, 0, 0, 0, 24, 30, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 33, 0, 0, 34, 0, 0, 0, 0, 26, 0, 0, 0, 35, 0, 0, 0, 36, 37, 0, 0, 0, 38, 0, 0, 39, 0, 1, 0, 0, 40, 36, 0, 41, 0, 0, 0, 42, 0, 36, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 43, 0, 44, 0, 0, 45, 0, 0, 0, 0, 0, 46, 0, 0, 24, 47, 0, 0, 0, 48, 0, 0, 0, 49, 0, 0, 50, 0, 0, 0, 4, 0, 0, 0, 0, 51, 0, 0, 0, 29, 0, 0, 52, 0, 0, 0, 0, 0, 53, 0, 0, 0, 33, 0, 0, 0, 54, 0, 55, 56, 0, 57, 0, 0, 0, }; static RE_UINT8 re_terminal_punctuation_stage_5[] = { 0, 0, 0, 0, 2, 80, 0, 140, 0, 0, 0, 64, 128, 0, 0, 0, 0, 2, 0, 0, 8, 0, 0, 0, 0, 16, 0, 136, 0, 0, 16, 0, 255, 23, 0, 0, 0, 0, 0, 3, 0, 0, 255, 127, 48, 0, 0, 0, 0, 0, 0, 12, 0, 225, 7, 0, 0, 12, 0, 0, 254, 1, 0, 0, 0, 96, 0, 0, 0, 56, 0, 0, 0, 0, 96, 0, 0, 0, 112, 4, 60, 3, 0, 0, 0, 15, 0, 0, 0, 0, 0, 236, 0, 0, 0, 248, 0, 0, 0, 192, 0, 0, 0, 48, 128, 3, 0, 0, 0, 64, 0, 16, 2, 0, 0, 0, 6, 0, 0, 0, 0, 224, 0, 0, 0, 0, 248, 0, 0, 0, 192, 0, 0, 192, 0, 0, 0, 128, 0, 0, 0, 0, 0, 224, 0, 0, 0, 128, 0, 0, 3, 0, 0, 8, 0, 0, 0, 0, 247, 0, 18, 0, 0, 0, 0, 0, 1, 0, 0, 0, 128, 0, 0, 0, 63, 0, 0, 0, 0, 252, 0, 0, 0, 30, 128, 63, 0, 0, 3, 0, 0, 0, 14, 0, 0, 0, 96, 32, 0, 192, 0, 0, 0, 31, 60, 254, 255, 0, 0, 0, 0, 112, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 128, 3, 16, 0, 0, 0, 128, 7, 0, 0, }; /* Terminal_Punctuation: 850 bytes. */ RE_UINT32 re_get_terminal_punctuation(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_terminal_punctuation_stage_1[f] << 5; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_terminal_punctuation_stage_2[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_terminal_punctuation_stage_3[pos + f] << 2; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_terminal_punctuation_stage_4[pos + f] << 5; pos += code; value = (re_terminal_punctuation_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Math. */ static RE_UINT8 re_other_math_stage_1[] = { 0, 1, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_other_math_stage_2[] = { 0, 1, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 6, 1, 1, }; static RE_UINT8 re_other_math_stage_3[] = { 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 4, 1, 5, 1, 6, 7, 8, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 10, 11, 1, 1, 1, 1, 12, 13, 14, 15, 1, 1, 1, 1, 1, 1, 16, 1, }; static RE_UINT8 re_other_math_stage_4[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 4, 5, 6, 7, 8, 0, 9, 10, 11, 12, 13, 0, 14, 15, 16, 17, 18, 0, 0, 0, 0, 19, 20, 21, 0, 0, 0, 0, 0, 22, 23, 24, 25, 0, 26, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 28, 0, 0, 0, 0, 29, 0, 30, 31, 0, 0, 0, 32, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 34, 34, 35, 34, 36, 37, 38, 34, 39, 40, 41, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 42, 43, 44, 35, 35, 45, 45, 46, 46, 47, 34, 38, 48, 49, 50, 51, 52, 0, 0, }; static RE_UINT8 re_other_math_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 39, 0, 0, 0, 51, 0, 0, 0, 64, 0, 0, 0, 28, 0, 1, 0, 0, 0, 30, 0, 0, 96, 0, 96, 0, 0, 0, 0, 255, 31, 98, 248, 0, 0, 132, 252, 47, 62, 16, 179, 251, 241, 224, 3, 0, 0, 0, 0, 224, 243, 182, 62, 195, 240, 255, 63, 235, 47, 48, 0, 0, 0, 0, 15, 0, 0, 0, 0, 176, 0, 0, 0, 1, 0, 4, 0, 0, 0, 3, 192, 127, 240, 193, 140, 15, 0, 148, 31, 0, 0, 96, 0, 0, 0, 5, 0, 0, 0, 15, 96, 0, 0, 192, 255, 0, 0, 248, 255, 255, 1, 0, 0, 0, 15, 0, 0, 0, 48, 10, 1, 0, 0, 0, 0, 0, 80, 255, 255, 255, 255, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 255, 255, 247, 255, 127, 255, 255, 255, 253, 255, 255, 247, 207, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* Other_Math: 502 bytes. */ RE_UINT32 re_get_other_math(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_other_math_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_other_math_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_other_math_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_other_math_stage_4[pos + f] << 5; pos += code; value = (re_other_math_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Hex_Digit. */ static RE_UINT8 re_hex_digit_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_hex_digit_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_hex_digit_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, }; static RE_UINT8 re_hex_digit_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, }; static RE_UINT8 re_hex_digit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, }; /* Hex_Digit: 129 bytes. */ RE_UINT32 re_get_hex_digit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_hex_digit_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_hex_digit_stage_2[pos + f] << 3; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_hex_digit_stage_3[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_hex_digit_stage_4[pos + f] << 7; pos += code; value = (re_hex_digit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* ASCII_Hex_Digit. */ static RE_UINT8 re_ascii_hex_digit_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ascii_hex_digit_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ascii_hex_digit_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ascii_hex_digit_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ascii_hex_digit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; /* ASCII_Hex_Digit: 97 bytes. */ RE_UINT32 re_get_ascii_hex_digit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_ascii_hex_digit_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_ascii_hex_digit_stage_2[pos + f] << 3; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_ascii_hex_digit_stage_3[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_ascii_hex_digit_stage_4[pos + f] << 7; pos += code; value = (re_ascii_hex_digit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Alphabetic. */ static RE_UINT8 re_other_alphabetic_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_other_alphabetic_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 11, 12, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 6, 6, 6, 6, 6, 6, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_other_alphabetic_stage_3[] = { 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 0, 0, 14, 0, 0, 0, 15, 16, 17, 18, 19, 20, 21, 0, 0, 0, 0, 0, 0, 22, 0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 25, 26, 27, 28, 0, 0, 0, 0, 0, 0, 0, 29, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 32, 33, 34, 35, 36, 37, 38, 39, 0, 0, 0, 40, 0, 0, 0, 41, 0, 0, 0, 0, 42, 0, 0, 0, 0, 43, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_alphabetic_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 4, 0, 5, 6, 0, 0, 7, 8, 9, 10, 0, 0, 0, 11, 0, 0, 12, 13, 0, 0, 0, 0, 0, 14, 15, 16, 17, 18, 19, 20, 21, 18, 19, 20, 22, 23, 19, 20, 24, 18, 19, 20, 25, 18, 26, 20, 27, 0, 15, 20, 28, 18, 19, 20, 28, 18, 19, 20, 29, 18, 18, 0, 30, 31, 0, 32, 33, 0, 0, 34, 33, 0, 0, 0, 0, 35, 36, 37, 0, 0, 0, 38, 39, 40, 41, 0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 31, 31, 31, 31, 0, 43, 44, 0, 0, 0, 0, 0, 0, 45, 0, 0, 0, 46, 0, 0, 0, 0, 0, 0, 47, 0, 48, 49, 0, 0, 0, 0, 50, 51, 15, 0, 52, 53, 0, 54, 0, 55, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 43, 57, 58, 0, 0, 0, 0, 0, 0, 0, 57, 0, 0, 0, 59, 20, 0, 0, 0, 0, 60, 0, 0, 61, 62, 15, 0, 0, 63, 64, 0, 15, 62, 0, 0, 0, 65, 66, 0, 0, 67, 0, 68, 0, 0, 0, 0, 0, 0, 0, 69, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 71, 0, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 52, 73, 74, 0, 26, 75, 0, 0, 52, 64, 0, 0, 52, 76, 0, 0, 0, 77, 0, 0, 0, 0, 42, 44, 15, 20, 21, 18, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 61, 0, 0, 0, 0, 0, 0, 78, 79, 0, 0, 80, 81, 0, 0, 82, 0, 0, 83, 84, 0, 0, 0, 0, 0, 0, 0, 85, 0, 0, 0, 0, 0, 0, 0, 0, 35, 86, 0, 0, 0, 0, 0, 0, 0, 0, 70, 0, 0, 0, 0, 10, 87, 87, 58, 0, 0, 0, }; static RE_UINT8 re_other_alphabetic_stage_5[] = { 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 255, 191, 182, 0, 0, 0, 0, 0, 255, 7, 0, 248, 255, 254, 0, 0, 1, 0, 0, 0, 192, 31, 158, 33, 0, 0, 0, 0, 2, 0, 0, 0, 255, 255, 192, 255, 1, 0, 0, 0, 192, 248, 239, 30, 0, 0, 248, 3, 255, 255, 15, 0, 0, 0, 0, 0, 0, 204, 255, 223, 224, 0, 12, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 192, 159, 25, 128, 0, 135, 25, 2, 0, 0, 0, 35, 0, 191, 27, 0, 0, 159, 25, 192, 0, 4, 0, 0, 0, 199, 29, 128, 0, 223, 29, 96, 0, 223, 29, 128, 0, 0, 128, 95, 255, 0, 0, 12, 0, 0, 0, 242, 7, 0, 32, 0, 0, 0, 0, 242, 27, 0, 0, 254, 255, 3, 224, 255, 254, 255, 255, 255, 31, 0, 248, 127, 121, 0, 0, 192, 195, 133, 1, 30, 0, 124, 0, 0, 48, 0, 0, 0, 128, 0, 0, 192, 255, 255, 1, 0, 0, 0, 2, 0, 0, 255, 15, 255, 1, 0, 0, 128, 15, 0, 0, 224, 127, 254, 255, 31, 0, 31, 0, 0, 0, 0, 0, 224, 255, 7, 0, 0, 0, 254, 51, 0, 0, 128, 255, 3, 0, 240, 255, 63, 0, 128, 255, 31, 0, 255, 255, 255, 255, 255, 3, 0, 0, 0, 0, 240, 15, 248, 0, 0, 0, 3, 0, 0, 0, 0, 0, 240, 255, 192, 7, 0, 0, 128, 255, 7, 0, 0, 254, 127, 0, 8, 48, 0, 0, 0, 0, 157, 65, 0, 248, 32, 0, 248, 7, 0, 0, 0, 0, 0, 64, 0, 0, 192, 7, 110, 240, 0, 0, 0, 0, 0, 255, 63, 0, 0, 0, 0, 0, 255, 1, 0, 0, 248, 255, 0, 240, 159, 0, 0, 128, 63, 127, 0, 0, 0, 48, 0, 0, 255, 127, 1, 0, 0, 0, 0, 248, 63, 0, 0, 0, 0, 224, 255, 7, 0, 0, 0, 0, 127, 0, 255, 255, 255, 127, 255, 3, 255, 255, }; /* Other_Alphabetic: 945 bytes. */ RE_UINT32 re_get_other_alphabetic(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_alphabetic_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_other_alphabetic_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_other_alphabetic_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_other_alphabetic_stage_4[pos + f] << 5; pos += code; value = (re_other_alphabetic_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Ideographic. */ static RE_UINT8 re_ideographic_stage_1[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ideographic_stage_2[] = { 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 6, 2, 7, 8, 2, 9, 0, 0, 0, 0, 0, 10, }; static RE_UINT8 re_ideographic_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 0, 2, 5, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 6, 2, 2, 2, 2, 2, 2, 2, 2, 7, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 0, 2, 2, 10, 0, 0, 0, 0, 0, }; static RE_UINT8 re_ideographic_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 0, 0, 3, 3, 3, 3, 3, 3, 4, 0, 3, 3, 3, 5, 3, 3, 6, 0, 3, 3, 3, 3, 3, 3, 7, 0, 3, 8, 3, 3, 3, 3, 3, 3, 9, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 10, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_ideographic_stage_5[] = { 0, 0, 0, 0, 192, 0, 0, 0, 254, 3, 0, 7, 255, 255, 255, 255, 255, 255, 63, 0, 255, 63, 255, 255, 255, 255, 255, 3, 255, 255, 127, 0, 255, 255, 31, 0, 255, 255, 255, 63, 3, 0, 0, 0, }; /* Ideographic: 333 bytes. */ RE_UINT32 re_get_ideographic(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_ideographic_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_ideographic_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_ideographic_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_ideographic_stage_4[pos + f] << 5; pos += code; value = (re_ideographic_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Diacritic. */ static RE_UINT8 re_diacritic_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_diacritic_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 7, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 9, 10, 11, 12, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 13, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 14, 4, 4, 15, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_diacritic_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 1, 1, 1, 1, 1, 17, 1, 18, 19, 20, 21, 22, 1, 23, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 24, 1, 25, 1, 26, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 28, 29, 30, 31, 32, 1, 1, 1, 1, 1, 1, 1, 33, 1, 1, 34, 35, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 1, 38, 39, 40, 41, 42, 43, 44, 45, 1, 1, 46, 1, 1, 1, 1, 47, 1, 48, 1, 1, 1, 1, 1, 1, 49, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_diacritic_stage_4[] = { 0, 0, 1, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 5, 5, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 10, 0, 11, 12, 13, 0, 0, 0, 14, 0, 0, 0, 15, 16, 0, 4, 17, 0, 0, 18, 0, 19, 20, 0, 0, 0, 0, 0, 0, 21, 0, 22, 23, 24, 0, 22, 25, 0, 0, 22, 25, 0, 0, 22, 25, 0, 0, 22, 25, 0, 0, 0, 25, 0, 0, 0, 25, 0, 0, 22, 25, 0, 0, 0, 25, 0, 0, 0, 26, 0, 0, 0, 27, 0, 0, 0, 28, 0, 20, 29, 0, 0, 30, 0, 31, 0, 0, 32, 0, 0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 0, 0, 35, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 0, 37, 0, 0, 0, 38, 39, 40, 0, 41, 0, 0, 0, 42, 0, 43, 0, 0, 4, 44, 0, 45, 5, 17, 0, 0, 46, 47, 0, 0, 0, 0, 0, 48, 49, 50, 0, 0, 0, 0, 0, 0, 0, 51, 0, 52, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 54, 0, 0, 22, 0, 0, 0, 55, 56, 0, 0, 57, 58, 59, 0, 0, 60, 0, 0, 20, 0, 0, 0, 0, 0, 0, 39, 61, 0, 62, 63, 0, 0, 63, 2, 64, 0, 0, 0, 65, 0, 15, 66, 67, 0, 0, 68, 0, 0, 0, 0, 69, 1, 0, 0, 0, 0, 0, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 0, 1, 2, 71, 72, 0, 0, 73, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 0, 0, 75, 0, 0, 0, 76, 0, 63, 0, 0, 77, 0, 0, 78, 0, 0, 0, 0, 0, 79, 0, 22, 25, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 0, 0, 0, 15, 2, 0, 0, 15, 0, 0, 0, 42, 0, 0, 0, 82, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 84, 0, 0, 0, 0, 0, 0, 85, 86, 87, 0, 0, 0, 0, 0, 0, 0, 0, 88, 0, }; static RE_UINT8 re_diacritic_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 64, 1, 0, 0, 0, 0, 129, 144, 1, 0, 0, 255, 255, 255, 255, 255, 255, 255, 127, 255, 224, 7, 0, 48, 4, 48, 0, 0, 0, 248, 0, 0, 0, 0, 0, 0, 2, 0, 0, 254, 255, 251, 255, 255, 191, 22, 0, 0, 0, 0, 248, 135, 1, 0, 0, 0, 128, 97, 28, 0, 0, 255, 7, 0, 0, 192, 255, 1, 0, 0, 248, 63, 0, 0, 0, 0, 3, 248, 255, 255, 127, 0, 0, 0, 16, 0, 32, 30, 0, 0, 0, 2, 0, 0, 32, 0, 0, 0, 4, 0, 0, 128, 95, 0, 0, 0, 31, 0, 0, 0, 0, 160, 194, 220, 0, 0, 0, 64, 0, 0, 0, 0, 0, 128, 6, 128, 191, 0, 12, 0, 254, 15, 32, 0, 0, 0, 14, 0, 0, 224, 159, 0, 0, 255, 63, 0, 0, 16, 0, 16, 0, 0, 0, 0, 248, 15, 0, 0, 12, 0, 0, 0, 0, 192, 0, 0, 0, 0, 63, 255, 33, 16, 3, 0, 240, 255, 255, 240, 255, 0, 0, 0, 0, 32, 224, 0, 0, 0, 160, 3, 224, 0, 224, 0, 224, 0, 96, 0, 128, 3, 0, 0, 128, 0, 0, 0, 252, 0, 0, 0, 0, 0, 30, 0, 128, 0, 176, 0, 0, 0, 48, 0, 0, 3, 0, 0, 0, 128, 255, 3, 0, 0, 0, 0, 1, 0, 0, 255, 255, 3, 0, 0, 120, 0, 0, 0, 0, 8, 0, 32, 0, 0, 0, 0, 0, 0, 56, 7, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 248, 0, 48, 0, 0, 255, 255, 0, 0, 0, 0, 1, 0, 0, 0, 0, 192, 8, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 6, 0, 0, 24, 0, 1, 28, 0, 0, 0, 0, 96, 0, 0, 6, 0, 0, 192, 31, 31, 0, 12, 0, 0, 0, 0, 8, 0, 0, 0, 0, 31, 0, 0, 128, 255, 255, 128, 227, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 0, 0, 127, 0, }; /* Diacritic: 997 bytes. */ RE_UINT32 re_get_diacritic(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_diacritic_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_diacritic_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_diacritic_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_diacritic_stage_4[pos + f] << 5; pos += code; value = (re_diacritic_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Extender. */ static RE_UINT8 re_extender_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_extender_stage_2[] = { 0, 1, 2, 3, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 6, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_extender_stage_3[] = { 0, 1, 2, 1, 1, 1, 3, 4, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 7, 1, 8, 1, 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 10, 1, 1, 1, 1, 1, 11, 1, 1, 12, 13, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 14, 1, 1, 1, 15, 1, 16, 1, 1, 1, 1, 1, 17, 1, 1, 1, 1, }; static RE_UINT8 re_extender_stage_4[] = { 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 5, 0, 0, 0, 5, 0, 6, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 9, 0, 10, 0, 0, 0, 0, 11, 12, 0, 0, 13, 0, 0, 14, 15, 0, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 5, 0, 0, 0, 18, 0, 0, 19, 20, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 0, 0, 0, 22, 0, 0, 0, 0, 0, }; static RE_UINT8 re_extender_stage_5[] = { 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0, 0, 4, 64, 0, 0, 0, 0, 4, 0, 0, 8, 0, 0, 0, 128, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 8, 32, 0, 0, 0, 0, 0, 62, 0, 0, 0, 0, 96, 0, 0, 0, 112, 0, 0, 32, 0, 0, 16, 0, 0, 0, 128, 0, 0, 0, 0, 1, 0, 0, 0, 0, 32, 0, 0, 24, 0, 192, 1, 0, 0, 12, 0, 0, 0, }; /* Extender: 414 bytes. */ RE_UINT32 re_get_extender(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_extender_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_extender_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_extender_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_extender_stage_4[pos + f] << 5; pos += code; value = (re_extender_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Lowercase. */ static RE_UINT8 re_other_lowercase_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_lowercase_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_other_lowercase_stage_3[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 4, 2, 5, 2, 2, 2, 6, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 8, 2, 2, }; static RE_UINT8 re_other_lowercase_stage_4[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 6, 7, 0, 0, 8, 9, 0, 0, 10, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 14, 0, 15, 0, 0, 0, 0, 0, 16, 0, 0, }; static RE_UINT8 re_other_lowercase_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 255, 1, 3, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 240, 255, 255, 255, 255, 255, 255, 255, 7, 0, 1, 0, 0, 0, 248, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 2, 128, 0, 0, 255, 31, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 0, 0, 255, 255, 255, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 240, 0, 0, 0, 0, }; /* Other_Lowercase: 297 bytes. */ RE_UINT32 re_get_other_lowercase(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_lowercase_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_other_lowercase_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_lowercase_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_lowercase_stage_4[pos + f] << 6; pos += code; value = (re_other_lowercase_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Uppercase. */ static RE_UINT8 re_other_uppercase_stage_1[] = { 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_uppercase_stage_2[] = { 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, }; static RE_UINT8 re_other_uppercase_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_uppercase_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 3, 4, 4, 5, 0, 0, 0, }; static RE_UINT8 re_other_uppercase_stage_5[] = { 0, 0, 0, 0, 255, 255, 0, 0, 0, 0, 192, 255, 0, 0, 255, 255, 255, 3, 255, 255, 255, 3, 0, 0, }; /* Other_Uppercase: 162 bytes. */ RE_UINT32 re_get_other_uppercase(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_other_uppercase_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_other_uppercase_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_other_uppercase_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_other_uppercase_stage_4[pos + f] << 5; pos += code; value = (re_other_uppercase_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Noncharacter_Code_Point. */ static RE_UINT8 re_noncharacter_code_point_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_noncharacter_code_point_stage_2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, }; static RE_UINT8 re_noncharacter_code_point_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 2, }; static RE_UINT8 re_noncharacter_code_point_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, }; static RE_UINT8 re_noncharacter_code_point_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, }; /* Noncharacter_Code_Point: 121 bytes. */ RE_UINT32 re_get_noncharacter_code_point(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_noncharacter_code_point_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_noncharacter_code_point_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_noncharacter_code_point_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_noncharacter_code_point_stage_4[pos + f] << 6; pos += code; value = (re_noncharacter_code_point_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Grapheme_Extend. */ static RE_UINT8 re_other_grapheme_extend_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_other_grapheme_extend_stage_2[] = { 0, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_grapheme_extend_stage_3[] = { 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 7, 8, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_grapheme_extend_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 2, 1, 2, 0, 0, 0, 3, 1, 2, 0, 4, 5, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 8, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 10, 0, 0, }; static RE_UINT8 re_other_grapheme_extend_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 128, 0, 0, 0, 0, 0, 4, 0, 96, 0, 0, 0, 0, 0, 0, 128, 0, 128, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 0, 0, 0, 0, 0, 192, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 32, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 32, 192, 7, 0, }; /* Other_Grapheme_Extend: 289 bytes. */ RE_UINT32 re_get_other_grapheme_extend(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_grapheme_extend_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_other_grapheme_extend_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_grapheme_extend_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_grapheme_extend_stage_4[pos + f] << 6; pos += code; value = (re_other_grapheme_extend_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* IDS_Binary_Operator. */ static RE_UINT8 re_ids_binary_operator_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ids_binary_operator_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_ids_binary_operator_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_ids_binary_operator_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_ids_binary_operator_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 243, 15, }; /* IDS_Binary_Operator: 97 bytes. */ RE_UINT32 re_get_ids_binary_operator(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_ids_binary_operator_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_ids_binary_operator_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_ids_binary_operator_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_ids_binary_operator_stage_4[pos + f] << 6; pos += code; value = (re_ids_binary_operator_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* IDS_Trinary_Operator. */ static RE_UINT8 re_ids_trinary_operator_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ids_trinary_operator_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_ids_trinary_operator_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_ids_trinary_operator_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_ids_trinary_operator_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, }; /* IDS_Trinary_Operator: 97 bytes. */ RE_UINT32 re_get_ids_trinary_operator(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_ids_trinary_operator_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_ids_trinary_operator_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_ids_trinary_operator_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_ids_trinary_operator_stage_4[pos + f] << 6; pos += code; value = (re_ids_trinary_operator_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Radical. */ static RE_UINT8 re_radical_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_radical_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_radical_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_radical_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 2, 2, 2, 2, 2, 2, 4, 0, }; static RE_UINT8 re_radical_stage_5[] = { 0, 0, 0, 0, 255, 255, 255, 251, 255, 255, 255, 255, 255, 255, 15, 0, 255, 255, 63, 0, }; /* Radical: 117 bytes. */ RE_UINT32 re_get_radical(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_radical_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_radical_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_radical_stage_3[pos + f] << 4; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_radical_stage_4[pos + f] << 5; pos += code; value = (re_radical_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Unified_Ideograph. */ static RE_UINT8 re_unified_ideograph_stage_1[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_unified_ideograph_stage_2[] = { 0, 0, 0, 1, 2, 3, 3, 3, 3, 4, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 7, 8, 0, 0, 0, }; static RE_UINT8 re_unified_ideograph_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 0, 0, 0, 0, 0, 4, 0, 0, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 6, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, }; static RE_UINT8 re_unified_ideograph_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 3, 4, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 0, 0, 0, 0, 0, }; static RE_UINT8 re_unified_ideograph_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 63, 0, 255, 255, 63, 0, 0, 0, 0, 0, 0, 192, 26, 128, 154, 3, 0, 0, 255, 255, 127, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 31, 0, 255, 255, 255, 63, 255, 255, 255, 255, 255, 255, 255, 255, 3, 0, 0, 0, }; /* Unified_Ideograph: 281 bytes. */ RE_UINT32 re_get_unified_ideograph(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_unified_ideograph_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_unified_ideograph_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_unified_ideograph_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_unified_ideograph_stage_4[pos + f] << 6; pos += code; value = (re_unified_ideograph_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Default_Ignorable_Code_Point. */ static RE_UINT8 re_other_default_ignorable_code_point_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, }; static RE_UINT8 re_other_default_ignorable_code_point_stage_2[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_other_default_ignorable_code_point_stage_3[] = { 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 8, 8, 8, 8, 8, 8, }; static RE_UINT8 re_other_default_ignorable_code_point_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 0, 9, 9, 0, 0, 0, 10, 9, 9, 9, 9, 9, 9, 9, 9, }; static RE_UINT8 re_other_default_ignorable_code_point_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 1, 253, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 255, 255, }; /* Other_Default_Ignorable_Code_Point: 281 bytes. */ RE_UINT32 re_get_other_default_ignorable_code_point(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_default_ignorable_code_point_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_other_default_ignorable_code_point_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_default_ignorable_code_point_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_default_ignorable_code_point_stage_4[pos + f] << 6; pos += code; value = (re_other_default_ignorable_code_point_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Deprecated. */ static RE_UINT8 re_deprecated_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, }; static RE_UINT8 re_deprecated_stage_2[] = { 0, 1, 2, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_deprecated_stage_3[] = { 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 6, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_deprecated_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 7, 0, 0, 8, 0, 0, 0, 0, }; static RE_UINT8 re_deprecated_stage_5[] = { 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 8, 0, 0, 0, 128, 2, 24, 0, 0, 0, 0, 252, 0, 0, 0, 6, 0, 0, 2, 0, 0, 0, 0, 0, 0, 128, }; /* Deprecated: 230 bytes. */ RE_UINT32 re_get_deprecated(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_deprecated_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_deprecated_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_deprecated_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_deprecated_stage_4[pos + f] << 5; pos += code; value = (re_deprecated_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Soft_Dotted. */ static RE_UINT8 re_soft_dotted_stage_1[] = { 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_soft_dotted_stage_2[] = { 0, 1, 1, 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, }; static RE_UINT8 re_soft_dotted_stage_3[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 5, 8, 9, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 5, 5, 5, 5, 5, 5, 5, 11, 12, 13, 5, }; static RE_UINT8 re_soft_dotted_stage_4[] = { 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 10, 11, 0, 0, 0, 12, 0, 0, 0, 0, 13, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 17, 18, 0, 19, 20, 0, 21, 0, 22, 23, 0, 24, 0, 17, 18, 0, 19, 20, 0, 21, 0, 0, 0, }; static RE_UINT8 re_soft_dotted_stage_5[] = { 0, 0, 0, 0, 0, 6, 0, 0, 0, 128, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 32, 0, 0, 4, 0, 0, 0, 8, 0, 0, 0, 64, 1, 4, 0, 0, 0, 0, 0, 64, 0, 16, 1, 0, 0, 0, 32, 0, 0, 0, 8, 0, 0, 0, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 16, 12, 0, 0, 0, 0, 0, 192, 0, 0, 12, 0, 0, 0, 0, 0, 192, 0, 0, 12, 0, 192, 0, 0, 0, 0, 0, 0, 12, 0, 192, 0, 0, }; /* Soft_Dotted: 342 bytes. */ RE_UINT32 re_get_soft_dotted(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_soft_dotted_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_soft_dotted_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_soft_dotted_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_soft_dotted_stage_4[pos + f] << 5; pos += code; value = (re_soft_dotted_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Logical_Order_Exception. */ static RE_UINT8 re_logical_order_exception_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_logical_order_exception_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_logical_order_exception_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, }; static RE_UINT8 re_logical_order_exception_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, }; static RE_UINT8 re_logical_order_exception_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 224, 4, 0, 0, 0, 0, 0, 0, 96, 26, }; /* Logical_Order_Exception: 145 bytes. */ RE_UINT32 re_get_logical_order_exception(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_logical_order_exception_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_logical_order_exception_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_logical_order_exception_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_logical_order_exception_stage_4[pos + f] << 6; pos += code; value = (re_logical_order_exception_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_ID_Start. */ static RE_UINT8 re_other_id_start_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_id_start_stage_2[] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_id_start_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_id_start_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_id_start_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 64, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, }; /* Other_ID_Start: 113 bytes. */ RE_UINT32 re_get_other_id_start(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_id_start_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_other_id_start_stage_2[pos + f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_id_start_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_id_start_stage_4[pos + f] << 6; pos += code; value = (re_other_id_start_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_ID_Continue. */ static RE_UINT8 re_other_id_continue_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_id_continue_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_id_continue_stage_3[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_other_id_continue_stage_4[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, }; static RE_UINT8 re_other_id_continue_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 254, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0, }; /* Other_ID_Continue: 145 bytes. */ RE_UINT32 re_get_other_id_continue(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_id_continue_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_other_id_continue_stage_2[pos + f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_id_continue_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_id_continue_stage_4[pos + f] << 6; pos += code; value = (re_other_id_continue_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* STerm. */ static RE_UINT8 re_sterm_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_sterm_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 7, 7, 7, 7, 7, 7, 7, 7, 7, 10, 7, 11, 12, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 7, 7, 7, 15, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, }; static RE_UINT8 re_sterm_stage_3[] = { 0, 1, 1, 1, 1, 2, 3, 4, 1, 5, 1, 1, 1, 1, 1, 1, 6, 1, 1, 7, 1, 1, 8, 9, 10, 11, 12, 13, 14, 1, 1, 1, 15, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 16, 1, 17, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 18, 1, 19, 1, 20, 21, 22, 23, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 24, 25, 1, 1, 26, 1, 1, 1, 1, 1, 27, 28, 29, 1, 1, 30, 31, 32, 1, 1, 33, 34, 1, 1, 1, 1, 1, 1, 1, 1, 35, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, }; static RE_UINT8 re_sterm_stage_4[] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0, 5, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 15, 0, 16, 0, 0, 0, 0, 0, 17, 18, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 21, 0, 0, 0, 0, 0, 0, 22, 0, 0, 0, 23, 0, 0, 21, 0, 0, 24, 0, 0, 0, 0, 25, 0, 0, 0, 26, 0, 0, 0, 0, 27, 0, 0, 0, 0, 0, 0, 0, 28, 0, 0, 29, 0, 0, 0, 0, 0, 1, 0, 0, 30, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 16, 32, 0, 0, 0, 33, 0, 0, 0, 34, 0, 0, 35, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 36, 0, 0, 0, 37, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 0, 0, 0, 39, 0, 40, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 42, 0, 0, 0, }; static RE_UINT8 re_sterm_stage_5[] = { 0, 0, 0, 0, 2, 64, 0, 128, 0, 2, 0, 0, 0, 0, 0, 128, 0, 0, 16, 0, 7, 0, 0, 0, 0, 0, 0, 2, 48, 0, 0, 0, 0, 12, 0, 0, 132, 1, 0, 0, 0, 64, 0, 0, 0, 0, 96, 0, 8, 2, 0, 0, 0, 15, 0, 0, 0, 0, 0, 204, 0, 0, 0, 24, 0, 0, 0, 192, 0, 0, 0, 48, 128, 3, 0, 0, 0, 64, 0, 16, 4, 0, 0, 0, 0, 192, 0, 0, 0, 0, 136, 0, 0, 0, 192, 0, 0, 128, 0, 0, 0, 3, 0, 0, 0, 0, 0, 224, 0, 0, 3, 0, 0, 8, 0, 0, 0, 0, 196, 0, 2, 0, 0, 0, 128, 1, 0, 0, 3, 0, 0, 0, 14, 0, 0, 0, 96, 32, 0, 192, 0, 0, 0, 27, 12, 254, 255, 0, 6, 0, 0, 0, 0, 0, 0, 112, 0, 0, 32, 0, 0, 0, 128, 1, 16, 0, 0, 0, 0, 1, 0, 0, }; /* STerm: 709 bytes. */ RE_UINT32 re_get_sterm(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_sterm_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_sterm_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_sterm_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_sterm_stage_4[pos + f] << 5; pos += code; value = (re_sterm_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Variation_Selector. */ static RE_UINT8 re_variation_selector_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, }; static RE_UINT8 re_variation_selector_stage_2[] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_variation_selector_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_variation_selector_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 4, }; static RE_UINT8 re_variation_selector_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 0, 255, 255, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, }; /* Variation_Selector: 169 bytes. */ RE_UINT32 re_get_variation_selector(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_variation_selector_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_variation_selector_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_variation_selector_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_variation_selector_stage_4[pos + f] << 6; pos += code; value = (re_variation_selector_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Pattern_White_Space. */ static RE_UINT8 re_pattern_white_space_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_white_space_stage_2[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_white_space_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_white_space_stage_4[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_white_space_stage_5[] = { 0, 62, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 192, 0, 0, 0, 3, 0, 0, }; /* Pattern_White_Space: 129 bytes. */ RE_UINT32 re_get_pattern_white_space(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_pattern_white_space_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_pattern_white_space_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_pattern_white_space_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_pattern_white_space_stage_4[pos + f] << 6; pos += code; value = (re_pattern_white_space_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Pattern_Syntax. */ static RE_UINT8 re_pattern_syntax_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_syntax_stage_2[] = { 0, 1, 1, 1, 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_syntax_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 4, 5, 4, 4, 6, 4, 4, 4, 4, 1, 1, 7, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 10, 1, }; static RE_UINT8 re_pattern_syntax_stage_4[] = { 0, 1, 2, 2, 0, 3, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 8, 8, 8, 9, 10, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 11, 12, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, }; static RE_UINT8 re_pattern_syntax_stage_5[] = { 0, 0, 0, 0, 254, 255, 0, 252, 1, 0, 0, 120, 254, 90, 67, 136, 0, 0, 128, 0, 0, 0, 255, 255, 255, 0, 255, 127, 254, 255, 239, 127, 255, 255, 255, 255, 255, 255, 63, 0, 0, 0, 240, 255, 14, 255, 255, 255, 1, 0, 1, 0, 0, 0, 0, 192, 96, 0, 0, 0, }; /* Pattern_Syntax: 277 bytes. */ RE_UINT32 re_get_pattern_syntax(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_pattern_syntax_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_pattern_syntax_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_pattern_syntax_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_pattern_syntax_stage_4[pos + f] << 5; pos += code; value = (re_pattern_syntax_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Hangul_Syllable_Type. */ static RE_UINT8 re_hangul_syllable_type_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_hangul_syllable_type_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_hangul_syllable_type_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 11, }; static RE_UINT8 re_hangul_syllable_type_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 4, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 6, 5, 6, 6, 8, 0, 2, 2, 9, 10, 3, 3, 3, 3, 3, 11, }; static RE_UINT8 re_hangul_syllable_type_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 0, 0, 0, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, }; /* Hangul_Syllable_Type: 497 bytes. */ RE_UINT32 re_get_hangul_syllable_type(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_hangul_syllable_type_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_hangul_syllable_type_stage_2[pos + f] << 4; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_hangul_syllable_type_stage_3[pos + f] << 4; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_hangul_syllable_type_stage_4[pos + f] << 3; value = re_hangul_syllable_type_stage_5[pos + code]; return value; } /* Bidi_Class. */ static RE_UINT8 re_bidi_class_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 7, 8, 9, 5, 5, 5, 5, 10, 5, 5, 5, 5, 11, 5, 12, 13, 14, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 16, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, }; static RE_UINT8 re_bidi_class_stage_2[] = { 0, 1, 2, 2, 2, 3, 4, 5, 2, 6, 2, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 2, 2, 2, 2, 30, 31, 32, 2, 2, 2, 2, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 2, 46, 2, 2, 2, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 53, 53, 53, 58, 53, 53, 2, 2, 53, 53, 53, 53, 59, 60, 2, 61, 62, 63, 64, 65, 53, 66, 67, 68, 2, 69, 70, 71, 72, 73, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 74, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 75, 2, 2, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 2, 86, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 87, 88, 88, 88, 89, 90, 91, 92, 93, 94, 2, 2, 95, 96, 2, 97, 98, 2, 2, 2, 2, 2, 2, 2, 2, 2, 99, 99, 100, 99, 101, 102, 103, 99, 99, 99, 99, 99, 104, 99, 99, 99, 105, 106, 107, 108, 109, 110, 111, 2, 2, 112, 2, 113, 114, 115, 116, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 117, 118, 2, 2, 2, 2, 2, 2, 2, 2, 119, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 120, 2, 2, 2, 2, 2, 2, 2, 2, 121, 122, 123, 2, 124, 2, 2, 2, 2, 2, 2, 125, 126, 127, 2, 2, 2, 2, 128, 129, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 99, 130, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 88, 131, 99, 99, 132, 133, 134, 2, 2, 2, 53, 53, 53, 53, 135, 136, 53, 137, 138, 139, 140, 141, 142, 143, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 144, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 144, 145, 145, 146, 147, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, }; static RE_UINT8 re_bidi_class_stage_3[] = { 0, 1, 2, 3, 4, 5, 4, 6, 7, 8, 9, 10, 11, 12, 11, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 13, 14, 14, 15, 16, 17, 17, 17, 17, 17, 17, 17, 18, 19, 11, 11, 11, 11, 11, 11, 20, 21, 11, 11, 11, 11, 11, 11, 11, 22, 23, 17, 24, 25, 26, 26, 26, 27, 28, 29, 29, 30, 17, 31, 32, 29, 29, 29, 29, 29, 33, 34, 35, 29, 36, 29, 17, 28, 29, 29, 29, 29, 29, 37, 32, 26, 26, 38, 39, 26, 40, 41, 26, 26, 42, 26, 26, 26, 26, 29, 29, 29, 29, 43, 17, 44, 11, 11, 45, 46, 47, 48, 11, 49, 11, 11, 50, 51, 11, 48, 52, 53, 11, 11, 50, 54, 49, 11, 55, 53, 11, 11, 50, 56, 11, 48, 57, 49, 11, 11, 58, 51, 59, 48, 11, 60, 11, 11, 11, 61, 11, 11, 62, 63, 11, 11, 64, 65, 66, 48, 67, 49, 11, 11, 50, 68, 11, 48, 11, 49, 11, 11, 11, 51, 11, 48, 11, 11, 11, 11, 11, 69, 70, 11, 11, 11, 11, 11, 71, 72, 11, 11, 11, 11, 11, 11, 73, 74, 11, 11, 11, 11, 75, 11, 76, 11, 11, 11, 77, 78, 79, 17, 80, 59, 11, 11, 11, 11, 11, 81, 82, 11, 83, 63, 84, 85, 86, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 81, 11, 11, 11, 87, 11, 11, 11, 11, 11, 11, 4, 11, 11, 11, 11, 11, 11, 11, 88, 89, 11, 11, 11, 11, 11, 11, 11, 90, 11, 90, 11, 48, 11, 48, 11, 11, 11, 91, 92, 93, 11, 87, 94, 11, 11, 11, 11, 11, 11, 11, 11, 11, 95, 11, 11, 11, 11, 11, 11, 11, 96, 97, 98, 11, 11, 11, 11, 11, 11, 11, 11, 99, 16, 16, 11, 100, 11, 11, 11, 101, 102, 103, 11, 11, 11, 104, 11, 11, 11, 11, 105, 11, 11, 106, 60, 11, 107, 105, 108, 11, 109, 11, 11, 11, 110, 108, 11, 11, 111, 112, 11, 11, 11, 11, 11, 11, 11, 11, 11, 113, 114, 115, 11, 11, 11, 11, 17, 17, 17, 116, 11, 11, 11, 117, 118, 119, 119, 120, 121, 16, 122, 123, 124, 125, 126, 127, 128, 11, 129, 129, 129, 17, 17, 63, 130, 131, 132, 133, 134, 16, 11, 11, 135, 16, 16, 16, 16, 16, 16, 16, 16, 136, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 137, 11, 11, 11, 5, 16, 138, 16, 16, 16, 16, 16, 139, 16, 16, 140, 11, 139, 11, 16, 16, 141, 142, 11, 11, 11, 11, 143, 16, 16, 16, 144, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 145, 16, 146, 16, 147, 148, 149, 150, 11, 11, 11, 11, 11, 11, 11, 151, 152, 11, 11, 11, 11, 11, 11, 11, 153, 11, 11, 11, 11, 11, 11, 17, 17, 16, 16, 16, 16, 154, 11, 11, 11, 16, 155, 16, 16, 16, 16, 16, 156, 16, 16, 16, 16, 16, 137, 11, 157, 158, 16, 159, 160, 11, 11, 11, 11, 11, 161, 4, 11, 11, 11, 11, 162, 11, 11, 11, 11, 16, 16, 156, 11, 11, 120, 11, 11, 11, 16, 11, 163, 11, 11, 11, 164, 150, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 165, 11, 11, 11, 11, 11, 99, 11, 166, 11, 11, 11, 11, 16, 16, 16, 16, 11, 16, 16, 16, 140, 11, 11, 11, 119, 11, 11, 11, 11, 11, 153, 167, 11, 64, 11, 11, 11, 11, 11, 108, 16, 16, 149, 11, 11, 11, 11, 11, 168, 11, 11, 11, 11, 11, 11, 11, 169, 11, 170, 171, 11, 11, 11, 172, 11, 11, 11, 11, 173, 11, 17, 108, 11, 11, 174, 11, 175, 108, 11, 11, 44, 11, 11, 176, 11, 11, 177, 11, 11, 11, 178, 179, 180, 11, 11, 50, 11, 11, 11, 181, 49, 11, 68, 59, 11, 11, 11, 11, 11, 11, 182, 11, 11, 183, 184, 26, 26, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 185, 29, 29, 29, 29, 29, 29, 29, 29, 29, 8, 8, 186, 17, 87, 17, 16, 16, 187, 188, 29, 29, 29, 29, 29, 29, 29, 29, 189, 190, 3, 4, 5, 4, 5, 137, 11, 11, 11, 11, 11, 11, 11, 191, 192, 193, 11, 11, 11, 16, 16, 16, 16, 194, 157, 4, 11, 11, 11, 11, 86, 11, 11, 11, 11, 11, 11, 195, 142, 11, 11, 11, 11, 11, 11, 11, 196, 26, 26, 26, 26, 26, 26, 26, 26, 26, 197, 26, 26, 26, 26, 26, 26, 198, 26, 26, 199, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 200, 26, 26, 26, 26, 201, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 202, 203, 49, 11, 11, 204, 205, 14, 137, 153, 108, 11, 11, 206, 11, 11, 11, 11, 44, 11, 207, 208, 11, 11, 11, 209, 108, 11, 11, 210, 211, 11, 11, 11, 11, 11, 153, 212, 11, 11, 11, 11, 11, 11, 11, 11, 11, 153, 213, 11, 108, 11, 11, 50, 63, 11, 214, 208, 11, 11, 11, 215, 216, 11, 11, 11, 11, 11, 11, 217, 63, 68, 11, 11, 11, 11, 11, 218, 63, 11, 11, 11, 11, 11, 219, 220, 11, 11, 11, 11, 11, 81, 221, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 208, 11, 11, 11, 205, 11, 11, 11, 11, 153, 44, 11, 11, 11, 11, 11, 11, 11, 222, 223, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 224, 225, 226, 11, 227, 11, 11, 11, 11, 11, 16, 16, 16, 16, 228, 11, 11, 11, 16, 16, 16, 16, 16, 140, 11, 11, 11, 11, 11, 11, 11, 162, 11, 11, 11, 229, 11, 11, 166, 11, 11, 11, 230, 11, 11, 11, 231, 232, 232, 232, 17, 17, 17, 233, 17, 17, 80, 177, 173, 107, 234, 11, 11, 11, 11, 11, 26, 26, 26, 26, 26, 235, 26, 26, 29, 29, 29, 29, 29, 29, 29, 236, 16, 16, 157, 16, 16, 16, 16, 16, 16, 156, 237, 164, 164, 164, 16, 137, 238, 11, 11, 11, 11, 11, 133, 11, 16, 16, 16, 16, 16, 16, 16, 155, 16, 16, 239, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 4, 194, 156, 16, 16, 16, 16, 16, 16, 16, 156, 16, 16, 16, 16, 16, 240, 11, 11, 157, 16, 16, 16, 241, 87, 16, 16, 241, 16, 242, 11, 11, 11, 11, 11, 11, 243, 11, 11, 11, 11, 11, 11, 240, 11, 11, 11, 4, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 244, 8, 8, 8, 8, 8, 8, 8, 8, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 8, }; static RE_UINT8 re_bidi_class_stage_4[] = { 0, 0, 1, 2, 0, 0, 0, 3, 4, 5, 6, 7, 8, 8, 9, 10, 11, 12, 12, 12, 12, 12, 13, 10, 12, 12, 13, 14, 0, 15, 0, 0, 0, 0, 0, 0, 16, 5, 17, 18, 19, 20, 21, 10, 12, 12, 12, 12, 12, 13, 12, 12, 12, 12, 22, 12, 23, 10, 10, 10, 12, 24, 10, 17, 10, 10, 10, 10, 25, 25, 25, 25, 12, 26, 12, 27, 12, 17, 12, 12, 12, 27, 12, 12, 28, 25, 29, 12, 12, 12, 27, 30, 31, 25, 25, 25, 25, 25, 25, 32, 33, 32, 34, 34, 34, 34, 34, 34, 35, 36, 37, 38, 25, 25, 39, 40, 40, 40, 40, 40, 40, 40, 41, 25, 35, 35, 42, 43, 44, 40, 40, 40, 40, 45, 25, 46, 25, 47, 48, 49, 8, 8, 50, 40, 51, 40, 40, 40, 40, 45, 25, 25, 34, 34, 52, 25, 25, 53, 54, 34, 34, 55, 32, 25, 25, 31, 31, 56, 34, 34, 31, 34, 41, 25, 25, 25, 57, 12, 12, 12, 12, 12, 58, 59, 60, 25, 59, 61, 60, 25, 12, 12, 62, 12, 12, 12, 61, 12, 12, 12, 12, 12, 12, 59, 60, 59, 12, 61, 63, 12, 64, 12, 65, 12, 12, 12, 65, 28, 66, 29, 29, 61, 12, 12, 60, 67, 59, 61, 68, 12, 12, 12, 12, 12, 12, 66, 12, 58, 12, 12, 58, 12, 12, 12, 59, 12, 12, 61, 13, 10, 69, 12, 59, 12, 12, 12, 12, 12, 12, 62, 59, 62, 70, 29, 12, 65, 12, 12, 12, 12, 10, 71, 12, 12, 12, 29, 12, 12, 58, 12, 62, 72, 12, 12, 61, 25, 57, 64, 12, 28, 25, 57, 61, 25, 67, 59, 12, 12, 25, 29, 12, 12, 29, 12, 12, 73, 74, 26, 60, 25, 25, 57, 25, 70, 12, 60, 25, 25, 60, 25, 25, 25, 25, 59, 12, 12, 12, 60, 70, 25, 65, 65, 12, 12, 29, 62, 60, 59, 12, 12, 58, 65, 12, 61, 12, 12, 12, 61, 10, 10, 26, 12, 75, 12, 12, 12, 12, 12, 13, 11, 62, 59, 12, 12, 12, 67, 25, 29, 12, 58, 60, 25, 25, 12, 64, 61, 10, 10, 76, 77, 12, 12, 61, 12, 57, 28, 59, 12, 58, 12, 60, 12, 11, 26, 12, 12, 12, 12, 12, 23, 12, 28, 66, 12, 12, 58, 25, 57, 72, 60, 25, 59, 28, 25, 25, 66, 25, 25, 25, 57, 25, 12, 12, 12, 12, 70, 57, 59, 12, 12, 28, 25, 29, 12, 12, 12, 62, 29, 67, 29, 12, 58, 29, 73, 12, 12, 12, 25, 25, 62, 12, 12, 57, 25, 25, 25, 70, 25, 59, 61, 12, 59, 29, 12, 25, 29, 12, 25, 12, 12, 12, 78, 26, 12, 12, 24, 12, 12, 12, 24, 12, 12, 12, 22, 79, 79, 80, 81, 10, 10, 82, 83, 84, 85, 10, 10, 10, 86, 10, 10, 10, 10, 10, 87, 0, 88, 89, 0, 90, 8, 91, 71, 8, 8, 91, 71, 84, 84, 84, 84, 17, 71, 26, 12, 12, 20, 11, 23, 10, 78, 92, 93, 12, 12, 23, 12, 10, 11, 23, 26, 12, 12, 24, 12, 94, 10, 10, 10, 10, 26, 12, 12, 10, 20, 10, 10, 10, 10, 71, 12, 10, 71, 12, 12, 10, 10, 8, 8, 8, 8, 8, 12, 12, 12, 23, 10, 10, 10, 10, 24, 10, 23, 10, 10, 10, 26, 10, 10, 10, 10, 26, 24, 10, 10, 20, 10, 26, 12, 12, 12, 12, 12, 12, 10, 12, 24, 71, 28, 29, 12, 24, 10, 12, 12, 12, 28, 71, 12, 12, 12, 10, 10, 17, 10, 10, 12, 12, 12, 10, 10, 10, 12, 95, 11, 10, 10, 11, 12, 62, 29, 11, 23, 12, 24, 12, 12, 96, 11, 12, 12, 13, 12, 12, 12, 12, 71, 24, 10, 10, 10, 12, 13, 71, 12, 12, 12, 12, 13, 97, 25, 25, 98, 12, 12, 11, 12, 58, 58, 28, 12, 12, 65, 10, 12, 12, 12, 99, 12, 12, 10, 12, 12, 12, 59, 12, 12, 12, 62, 25, 29, 12, 28, 25, 25, 28, 62, 29, 59, 12, 61, 12, 12, 12, 12, 60, 57, 65, 65, 12, 12, 28, 12, 12, 59, 70, 66, 59, 62, 12, 61, 59, 61, 12, 12, 12, 100, 34, 34, 101, 34, 40, 40, 40, 102, 40, 40, 40, 103, 104, 105, 10, 106, 107, 71, 108, 12, 40, 40, 40, 109, 30, 5, 6, 7, 5, 110, 10, 71, 0, 0, 111, 112, 92, 12, 12, 12, 10, 10, 10, 11, 113, 8, 8, 8, 12, 62, 57, 12, 34, 34, 34, 114, 31, 33, 34, 25, 34, 34, 115, 52, 34, 33, 34, 34, 34, 34, 116, 10, 35, 35, 35, 35, 35, 35, 35, 117, 12, 12, 25, 25, 25, 57, 12, 12, 28, 57, 65, 12, 12, 28, 25, 60, 25, 59, 12, 12, 28, 12, 12, 12, 12, 62, 25, 57, 12, 12, 62, 59, 29, 70, 12, 12, 28, 25, 57, 12, 12, 62, 25, 59, 28, 25, 72, 28, 70, 12, 12, 12, 62, 29, 12, 67, 28, 25, 57, 73, 12, 12, 28, 61, 25, 67, 12, 12, 62, 67, 25, 12, 12, 12, 12, 65, 0, 12, 12, 12, 12, 28, 29, 12, 118, 0, 119, 25, 57, 60, 25, 12, 12, 12, 62, 29, 120, 121, 12, 12, 12, 92, 12, 12, 12, 12, 92, 12, 13, 12, 12, 122, 8, 8, 8, 8, 25, 57, 28, 25, 60, 25, 25, 25, 25, 115, 34, 34, 123, 40, 40, 40, 10, 10, 10, 71, 8, 8, 124, 11, 10, 24, 10, 10, 10, 11, 12, 12, 10, 10, 12, 12, 10, 10, 10, 26, 10, 10, 11, 12, 12, 12, 12, 125, }; static RE_UINT8 re_bidi_class_stage_5[] = { 11, 11, 11, 11, 11, 8, 7, 8, 9, 7, 11, 11, 7, 7, 7, 8, 9, 10, 10, 4, 4, 4, 10, 10, 10, 10, 10, 3, 6, 3, 6, 6, 2, 2, 2, 2, 2, 2, 6, 10, 10, 10, 10, 10, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 10, 10, 10, 11, 11, 7, 11, 11, 6, 10, 4, 4, 10, 10, 0, 10, 10, 11, 10, 10, 4, 4, 2, 2, 10, 0, 10, 10, 10, 2, 0, 10, 0, 10, 10, 0, 0, 0, 10, 10, 0, 10, 10, 10, 12, 12, 12, 12, 10, 10, 0, 0, 0, 0, 10, 0, 0, 0, 0, 12, 12, 12, 0, 0, 0, 10, 10, 4, 1, 12, 12, 12, 12, 12, 1, 12, 1, 12, 12, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 10, 10, 13, 4, 4, 13, 6, 13, 10, 10, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 12, 5, 5, 4, 5, 5, 13, 13, 13, 12, 13, 13, 13, 13, 13, 12, 12, 12, 5, 10, 12, 12, 13, 13, 12, 12, 10, 12, 12, 12, 12, 13, 13, 2, 2, 13, 13, 13, 12, 13, 13, 1, 1, 1, 12, 1, 1, 10, 10, 10, 10, 1, 1, 1, 1, 12, 12, 12, 12, 1, 1, 12, 12, 12, 0, 0, 0, 12, 0, 12, 0, 0, 0, 0, 12, 12, 12, 0, 12, 0, 0, 0, 0, 12, 12, 0, 0, 4, 4, 0, 0, 0, 4, 0, 12, 12, 0, 12, 0, 0, 12, 12, 12, 0, 12, 0, 4, 0, 0, 10, 4, 10, 0, 12, 0, 12, 12, 10, 10, 10, 0, 12, 0, 12, 0, 0, 12, 0, 12, 0, 12, 10, 10, 9, 0, 0, 0, 10, 10, 10, 12, 12, 12, 11, 0, 0, 10, 0, 10, 9, 9, 9, 9, 9, 9, 9, 11, 11, 11, 0, 1, 9, 7, 16, 17, 18, 14, 15, 6, 4, 4, 4, 4, 4, 10, 10, 10, 6, 10, 10, 10, 10, 10, 10, 9, 11, 11, 19, 20, 21, 22, 11, 11, 2, 0, 0, 0, 2, 2, 3, 3, 0, 10, 0, 0, 0, 0, 4, 0, 10, 10, 3, 4, 9, 10, 10, 10, 0, 12, 12, 10, 12, 12, 12, 10, 12, 12, 10, 10, 4, 4, 0, 0, 0, 1, 12, 1, 1, 3, 1, 1, 13, 13, 10, 10, 13, 10, 13, 13, 6, 10, 6, 0, 10, 6, 10, 10, 10, 10, 10, 4, 10, 10, 3, 3, 10, 4, 4, 10, 13, 13, 13, 11, 10, 4, 4, 0, 11, 10, 10, 10, 10, 10, 11, 11, 12, 2, 2, 2, 1, 1, 1, 10, 12, 12, 12, 1, 1, 10, 10, 10, 5, 5, 5, 1, 0, 0, 0, 11, 11, 11, 11, 12, 10, 10, 12, 12, 12, 10, 0, 0, 0, 0, 2, 2, 10, 10, 13, 13, 2, 2, 2, 10, 0, 0, 11, 11, }; /* Bidi_Class: 3484 bytes. */ RE_UINT32 re_get_bidi_class(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_bidi_class_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_bidi_class_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_bidi_class_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_bidi_class_stage_4[pos + f] << 2; value = re_bidi_class_stage_5[pos + code]; return value; } /* Canonical_Combining_Class. */ static RE_UINT8 re_canonical_combining_class_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 4, 5, 2, 2, 6, 2, 7, 8, 9, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_canonical_combining_class_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 10, 11, 12, 13, 0, 14, 0, 0, 0, 0, 0, 15, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 18, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 0, 21, 22, 23, 0, 0, 0, 24, 0, 0, 25, 26, 27, 28, 0, 0, 0, 0, 0, 0, 0, 0, 0, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 31, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_canonical_combining_class_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 0, 9, 0, 10, 11, 0, 0, 12, 13, 14, 15, 16, 0, 0, 0, 0, 17, 18, 19, 20, 0, 0, 0, 0, 21, 0, 22, 23, 0, 0, 22, 24, 0, 0, 22, 24, 0, 0, 22, 24, 0, 0, 22, 24, 0, 0, 0, 24, 0, 0, 0, 25, 0, 0, 22, 24, 0, 0, 0, 24, 0, 0, 0, 26, 0, 0, 27, 28, 0, 0, 29, 30, 0, 31, 32, 0, 33, 34, 0, 35, 0, 0, 36, 0, 0, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 39, 0, 0, 0, 0, 40, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 43, 0, 0, 44, 0, 45, 0, 0, 0, 46, 47, 48, 0, 49, 0, 50, 0, 51, 0, 0, 0, 0, 52, 53, 0, 0, 0, 0, 0, 0, 54, 55, 0, 0, 0, 0, 0, 0, 56, 57, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 58, 0, 0, 0, 59, 0, 0, 0, 60, 0, 61, 0, 0, 62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 64, 0, 0, 65, 0, 0, 0, 0, 0, 0, 0, 0, 66, 0, 0, 0, 0, 0, 47, 67, 0, 68, 69, 0, 0, 70, 71, 0, 0, 0, 0, 0, 0, 72, 73, 74, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 75, 0, 0, 0, 0, 0, 0, 0, 0, 76, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 77, 0, 0, 0, 0, 0, 0, 0, 78, 0, 0, 0, 79, 0, 0, 0, 0, 80, 81, 0, 0, 0, 0, 0, 82, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 66, 59, 0, 83, 0, 0, 84, 85, 0, 70, 0, 0, 86, 0, 0, 87, 0, 0, 0, 0, 0, 88, 0, 22, 24, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 90, 0, 0, 0, 0, 0, 0, 59, 91, 0, 0, 59, 0, 0, 0, 92, 0, 0, 0, 93, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 94, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 97, 98, 99, 0, 0, 0, 0, 100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_canonical_combining_class_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 4, 4, 8, 9, 10, 1, 11, 12, 13, 14, 15, 16, 17, 18, 1, 1, 1, 0, 0, 0, 0, 19, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 1, 23, 4, 21, 24, 25, 26, 27, 28, 29, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 31, 0, 0, 0, 32, 33, 34, 35, 1, 36, 0, 0, 0, 0, 37, 0, 0, 0, 0, 0, 0, 0, 0, 38, 1, 39, 14, 39, 40, 41, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 43, 36, 44, 45, 21, 45, 46, 0, 0, 0, 0, 0, 0, 0, 19, 1, 21, 0, 0, 0, 0, 0, 0, 0, 0, 38, 47, 1, 1, 48, 48, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 0, 51, 21, 43, 52, 53, 21, 35, 1, 0, 0, 0, 0, 0, 0, 0, 54, 0, 0, 0, 55, 56, 57, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, 0, 55, 0, 58, 0, 0, 0, 0, 59, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 0, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 65, 66, 0, 0, 0, 0, 0, 67, 68, 69, 70, 71, 72, 0, 0, 0, 0, 0, 0, 0, 73, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 74, 75, 0, 0, 0, 0, 76, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 77, 0, 0, 0, 0, 0, 0, 59, 0, 0, 78, 0, 0, 79, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 0, 0, 0, 0, 0, 0, 19, 81, 0, 77, 0, 0, 0, 0, 48, 1, 82, 0, 0, 0, 0, 1, 52, 15, 41, 0, 0, 0, 0, 0, 54, 0, 0, 0, 77, 0, 0, 0, 0, 0, 0, 0, 0, 19, 10, 1, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 0, 0, 84, 0, 0, 83, 0, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 0, 0, 0, 85, 9, 12, 4, 86, 8, 87, 76, 0, 57, 49, 0, 21, 1, 21, 88, 89, 1, 1, 1, 1, 1, 1, 1, 1, 49, 0, 90, 0, 0, 0, 0, 91, 1, 92, 57, 78, 93, 94, 4, 57, 0, 0, 0, 0, 0, 0, 19, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 96, 97, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 98, 0, 0, 0, 0, 19, 0, 1, 1, 49, 0, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 49, 0, 0, 0, 0, 59, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 49, 0, 0, 0, 0, 0, 51, 64, 0, 0, 0, 0, 0, 0, 0, 0, 95, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 100, 57, 38, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0, 0, 0, 0, 0, 0, 0, 0, 0, 101, 1, 14, 4, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 76, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 38, 85, 0, 0, 0, 0, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 103, 95, 0, 104, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 105, 0, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 77, 0, 0, 77, 0, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 105, 0, 0, 0, 0, 106, 0, 0, 0, 0, 0, 0, 38, 1, 57, 1, 57, 0, 0, 107, 0, 0, 0, 0, 0, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 107, 0, 0, 0, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 87, 0, 0, 0, 0, 0, 0, 1, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 108, 0, 109, 110, 111, 112, 0, 51, 4, 113, 48, 23, 0, 0, 0, 0, 0, 0, 0, 38, 49, 0, 0, 0, 0, 38, 57, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 113, 0, 0, }; static RE_UINT8 re_canonical_combining_class_stage_5[] = { 0, 0, 0, 0, 50, 50, 50, 50, 50, 51, 45, 45, 45, 45, 51, 43, 45, 45, 45, 45, 45, 41, 41, 45, 45, 45, 45, 41, 41, 45, 45, 45, 1, 1, 1, 1, 1, 45, 45, 45, 45, 50, 50, 50, 50, 54, 50, 45, 45, 45, 50, 50, 50, 45, 45, 0, 50, 50, 50, 45, 45, 45, 45, 50, 51, 45, 45, 50, 52, 53, 53, 52, 53, 53, 52, 50, 0, 0, 0, 50, 0, 45, 50, 50, 50, 50, 45, 50, 50, 50, 46, 45, 50, 50, 45, 45, 50, 46, 49, 50, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 15, 16, 17, 0, 18, 0, 19, 20, 0, 50, 45, 0, 13, 25, 26, 27, 0, 0, 0, 0, 22, 23, 24, 25, 26, 27, 28, 29, 50, 50, 45, 45, 50, 45, 50, 50, 45, 30, 0, 0, 0, 0, 0, 50, 50, 50, 0, 0, 50, 50, 0, 45, 50, 50, 45, 0, 0, 0, 31, 0, 0, 50, 45, 50, 50, 45, 45, 50, 45, 45, 50, 45, 50, 45, 50, 50, 0, 50, 50, 0, 50, 0, 50, 50, 50, 50, 50, 0, 0, 0, 45, 45, 45, 0, 0, 0, 45, 50, 45, 45, 45, 22, 23, 24, 50, 2, 0, 0, 0, 0, 4, 0, 0, 0, 50, 45, 50, 50, 0, 0, 0, 0, 32, 33, 0, 0, 0, 4, 0, 34, 34, 4, 0, 35, 35, 35, 35, 36, 36, 0, 0, 37, 37, 37, 37, 45, 45, 0, 0, 0, 45, 0, 45, 0, 43, 0, 0, 0, 38, 39, 0, 40, 0, 0, 0, 0, 0, 39, 39, 39, 39, 0, 0, 39, 0, 50, 50, 4, 0, 50, 50, 0, 0, 45, 0, 0, 0, 0, 2, 0, 4, 4, 0, 0, 45, 0, 0, 4, 0, 0, 0, 0, 50, 0, 0, 0, 49, 0, 0, 0, 46, 50, 45, 45, 0, 0, 0, 50, 0, 0, 45, 0, 0, 4, 4, 0, 0, 2, 0, 50, 50, 50, 0, 50, 0, 1, 1, 1, 0, 0, 0, 50, 53, 42, 45, 41, 50, 50, 50, 52, 45, 50, 45, 50, 50, 1, 1, 1, 1, 1, 50, 0, 1, 1, 50, 45, 50, 1, 1, 0, 0, 0, 4, 0, 0, 44, 49, 51, 46, 47, 47, 0, 3, 3, 0, 50, 0, 50, 50, 45, 0, 0, 50, 0, 0, 21, 0, 0, 45, 0, 50, 50, 1, 45, 0, 0, 50, 45, 0, 0, 4, 2, 0, 0, 2, 4, 0, 0, 0, 4, 2, 0, 0, 1, 0, 0, 43, 43, 1, 1, 1, 0, 0, 0, 48, 43, 43, 43, 43, 43, 0, 45, 45, 45, 0, }; /* Canonical_Combining_Class: 2112 bytes. */ RE_UINT32 re_get_canonical_combining_class(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_canonical_combining_class_stage_1[f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_canonical_combining_class_stage_2[pos + f] << 4; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_canonical_combining_class_stage_3[pos + f] << 3; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_canonical_combining_class_stage_4[pos + f] << 2; value = re_canonical_combining_class_stage_5[pos + code]; return value; } /* Decomposition_Type. */ static RE_UINT8 re_decomposition_type_stage_1[] = { 0, 1, 2, 2, 2, 3, 4, 5, 6, 2, 2, 2, 2, 2, 7, 8, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_decomposition_type_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 16, 7, 17, 18, 19, 20, 21, 22, 23, 24, 7, 7, 7, 7, 7, 25, 7, 26, 27, 28, 29, 30, 31, 32, 33, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 34, 35, 7, 7, 7, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 37, 39, 40, 41, 42, 43, 44, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 45, 46, 7, 47, 48, 49, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 50, 7, 7, 51, 52, 53, 54, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 55, 7, 7, 56, 57, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 37, 37, 58, 7, 7, 7, 7, 7, }; static RE_UINT8 re_decomposition_type_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 3, 5, 6, 7, 8, 9, 10, 11, 8, 12, 0, 0, 13, 14, 15, 16, 17, 18, 6, 19, 20, 21, 0, 0, 0, 0, 0, 0, 0, 22, 0, 23, 24, 0, 0, 0, 0, 0, 25, 0, 0, 26, 27, 14, 28, 14, 29, 30, 0, 31, 32, 33, 0, 33, 0, 32, 0, 34, 0, 0, 0, 0, 35, 36, 37, 38, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 0, 0, 0, 0, 41, 0, 0, 0, 0, 42, 43, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 44, 0, 45, 0, 0, 0, 0, 0, 0, 46, 47, 0, 0, 0, 0, 0, 48, 0, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 51, 0, 0, 0, 52, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 57, 0, 0, 0, 0, 0, 0, 0, 57, 0, 58, 0, 0, 59, 0, 0, 0, 60, 61, 33, 62, 63, 60, 61, 33, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 66, 67, 0, 68, 69, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 71, 72, 73, 74, 75, 0, 76, 73, 73, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 77, 6, 6, 6, 6, 6, 78, 6, 79, 6, 6, 79, 80, 6, 81, 6, 6, 6, 82, 83, 84, 6, 85, 86, 87, 88, 89, 90, 91, 0, 92, 93, 94, 95, 0, 0, 0, 0, 0, 96, 97, 98, 99, 100, 101, 102, 102, 103, 104, 105, 0, 106, 0, 0, 0, 107, 0, 108, 109, 110, 0, 111, 112, 112, 0, 113, 0, 0, 0, 114, 0, 0, 0, 115, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 116, 117, 102, 102, 102, 118, 116, 116, 119, 0, 120, 0, 0, 0, 0, 0, 0, 121, 0, 0, 0, 0, 0, 122, 0, 0, 0, 0, 0, 0, 0, 0, 0, 123, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 124, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 125, 0, 0, 0, 0, 0, 57, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 126, 0, 0, 127, 0, 0, 128, 129, 130, 131, 132, 0, 133, 129, 130, 131, 132, 0, 134, 0, 0, 0, 135, 102, 102, 102, 102, 136, 137, 0, 0, 0, 0, 0, 0, 102, 136, 102, 102, 138, 139, 116, 140, 116, 116, 116, 116, 141, 116, 116, 140, 142, 142, 142, 142, 142, 143, 102, 144, 142, 142, 142, 142, 142, 142, 102, 145, 0, 0, 0, 0, 0, 0, 0, 0, 0, 146, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 147, 0, 0, 0, 0, 0, 0, 0, 148, 0, 0, 0, 0, 0, 149, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 21, 0, 0, 0, 0, 0, 81, 150, 151, 6, 6, 6, 81, 6, 6, 6, 6, 6, 6, 78, 0, 0, 152, 153, 154, 155, 156, 157, 158, 158, 159, 158, 160, 161, 0, 162, 163, 164, 165, 165, 165, 165, 165, 165, 166, 167, 167, 168, 169, 169, 169, 170, 171, 172, 165, 173, 174, 175, 0, 176, 177, 178, 179, 180, 167, 181, 182, 0, 0, 183, 0, 184, 0, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 194, 195, 196, 197, 198, 198, 198, 198, 198, 199, 200, 200, 200, 200, 201, 202, 203, 204, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 205, 206, 0, 0, 0, 0, 0, 0, 0, 207, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 208, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 104, 0, 0, 0, 0, 0, 0, 0, 0, 0, 207, 209, 0, 0, 0, 0, 210, 14, 0, 0, 0, 211, 211, 211, 211, 211, 212, 211, 211, 211, 213, 214, 215, 216, 211, 211, 211, 217, 218, 211, 219, 220, 221, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 222, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 223, 211, 211, 211, 216, 211, 224, 225, 226, 227, 228, 229, 230, 231, 232, 231, 0, 0, 0, 0, 233, 102, 234, 142, 142, 0, 235, 0, 0, 236, 0, 0, 0, 0, 0, 0, 237, 142, 142, 238, 239, 240, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_decomposition_type_stage_4[] = { 0, 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 8, 8, 10, 11, 10, 12, 10, 11, 10, 9, 8, 8, 8, 8, 13, 8, 8, 8, 8, 12, 8, 8, 14, 8, 10, 15, 16, 8, 17, 8, 12, 8, 8, 8, 8, 8, 8, 15, 12, 0, 0, 18, 19, 0, 0, 0, 0, 20, 20, 21, 8, 8, 8, 22, 8, 13, 8, 8, 23, 12, 8, 8, 8, 8, 8, 13, 0, 13, 8, 8, 8, 0, 0, 0, 24, 24, 25, 0, 0, 0, 20, 5, 24, 25, 0, 0, 9, 19, 0, 0, 0, 19, 26, 27, 0, 21, 11, 22, 0, 0, 13, 8, 0, 0, 13, 11, 28, 29, 0, 0, 30, 5, 31, 0, 9, 18, 0, 11, 0, 0, 32, 0, 0, 13, 0, 0, 33, 0, 0, 0, 8, 13, 13, 8, 13, 8, 13, 8, 8, 12, 12, 0, 0, 3, 0, 0, 13, 11, 0, 0, 0, 34, 35, 0, 36, 0, 0, 0, 18, 0, 0, 0, 32, 19, 0, 0, 0, 0, 8, 8, 0, 0, 18, 19, 0, 0, 0, 9, 18, 27, 0, 0, 0, 0, 10, 27, 0, 0, 37, 19, 0, 0, 0, 12, 0, 19, 0, 0, 0, 0, 13, 19, 0, 0, 19, 0, 19, 18, 22, 0, 0, 0, 27, 11, 3, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 1, 18, 0, 0, 32, 27, 18, 0, 19, 18, 38, 17, 0, 32, 0, 0, 0, 0, 27, 0, 0, 0, 0, 0, 25, 0, 27, 36, 36, 27, 0, 0, 0, 0, 0, 18, 32, 9, 0, 0, 0, 0, 0, 0, 39, 24, 24, 39, 24, 24, 24, 24, 40, 24, 24, 24, 24, 41, 42, 43, 0, 0, 0, 25, 0, 0, 0, 44, 24, 8, 8, 45, 0, 8, 8, 12, 0, 8, 12, 8, 12, 8, 8, 46, 46, 8, 8, 8, 12, 8, 22, 8, 47, 21, 22, 8, 8, 8, 13, 8, 10, 13, 22, 8, 48, 49, 50, 30, 0, 51, 3, 0, 0, 0, 30, 0, 52, 3, 53, 0, 54, 0, 3, 5, 0, 0, 3, 0, 3, 55, 24, 24, 24, 42, 42, 42, 43, 42, 42, 42, 56, 0, 0, 35, 0, 57, 34, 58, 59, 59, 60, 61, 62, 63, 64, 65, 66, 66, 67, 68, 59, 69, 61, 62, 0, 70, 70, 70, 70, 20, 20, 20, 20, 0, 0, 71, 0, 0, 0, 13, 0, 0, 0, 0, 27, 0, 0, 0, 10, 0, 19, 32, 19, 0, 36, 0, 72, 35, 0, 0, 0, 32, 37, 32, 0, 36, 0, 0, 10, 12, 12, 12, 0, 0, 0, 0, 8, 8, 0, 13, 12, 0, 0, 33, 0, 73, 73, 73, 73, 73, 20, 20, 20, 20, 74, 73, 73, 73, 73, 75, 0, 0, 0, 0, 35, 0, 30, 0, 0, 0, 0, 0, 19, 0, 0, 0, 76, 0, 0, 0, 44, 0, 0, 0, 3, 20, 5, 0, 0, 77, 0, 0, 0, 0, 26, 30, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 46, 32, 0, 9, 22, 33, 12, 0, 19, 3, 78, 0, 37, 11, 79, 34, 20, 20, 20, 20, 20, 20, 30, 4, 24, 24, 24, 20, 73, 0, 0, 80, 73, 73, 73, 73, 73, 73, 75, 20, 20, 20, 81, 81, 81, 81, 81, 81, 81, 20, 20, 82, 81, 81, 81, 20, 20, 20, 83, 0, 0, 0, 55, 25, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 24, 36, 10, 8, 11, 36, 33, 13, 8, 20, 30, 0, 0, 3, 20, 0, 46, 59, 59, 84, 8, 8, 11, 8, 36, 9, 22, 8, 15, 85, 86, 86, 86, 86, 86, 86, 86, 86, 85, 85, 85, 87, 85, 86, 86, 88, 0, 0, 0, 89, 90, 91, 92, 85, 87, 86, 85, 85, 85, 93, 87, 94, 94, 94, 94, 94, 95, 95, 95, 95, 95, 95, 95, 95, 96, 97, 97, 97, 97, 97, 97, 97, 97, 97, 98, 99, 99, 99, 99, 99, 100, 94, 94, 101, 95, 95, 95, 95, 95, 95, 102, 97, 99, 99, 103, 104, 97, 105, 106, 107, 105, 108, 105, 104, 96, 95, 105, 96, 109, 110, 97, 111, 106, 112, 105, 95, 106, 113, 95, 96, 106, 0, 0, 94, 94, 94, 114, 115, 115, 116, 0, 115, 115, 115, 115, 115, 117, 118, 20, 119, 120, 120, 120, 120, 119, 120, 0, 121, 122, 123, 123, 124, 91, 125, 126, 90, 125, 127, 127, 127, 127, 126, 91, 125, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 126, 125, 126, 91, 128, 129, 130, 130, 130, 130, 130, 130, 130, 131, 132, 132, 132, 132, 132, 132, 132, 132, 132, 132, 133, 134, 132, 134, 132, 134, 132, 134, 135, 130, 136, 132, 133, 0, 0, 27, 19, 0, 0, 18, 0, 0, 0, 0, 13, 0, 0, 18, 36, 8, 19, 0, 0, 0, 0, 18, 8, 59, 59, 59, 59, 59, 137, 59, 59, 59, 59, 59, 137, 138, 139, 61, 137, 59, 59, 66, 61, 59, 61, 59, 59, 59, 66, 140, 61, 59, 137, 59, 137, 59, 59, 66, 140, 59, 141, 142, 59, 137, 59, 59, 59, 59, 62, 59, 59, 59, 59, 59, 142, 139, 143, 61, 59, 140, 59, 144, 0, 138, 145, 144, 61, 139, 143, 144, 144, 139, 143, 140, 59, 140, 59, 61, 141, 59, 59, 66, 59, 59, 59, 59, 0, 61, 61, 66, 59, 20, 20, 30, 0, 20, 20, 146, 75, 0, 0, 4, 0, 147, 0, 0, 0, 148, 0, 0, 0, 81, 81, 148, 0, 20, 20, 35, 0, 149, 0, 0, 0, }; static RE_UINT8 re_decomposition_type_stage_5[] = { 0, 0, 0, 0, 4, 0, 0, 0, 2, 0, 10, 0, 0, 0, 0, 2, 0, 0, 10, 10, 2, 2, 0, 0, 2, 10, 10, 0, 17, 17, 17, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 2, 2, 1, 1, 1, 2, 2, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 1, 1, 0, 1, 1, 1, 2, 2, 2, 10, 10, 10, 10, 10, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 2, 2, 2, 1, 1, 2, 2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 2, 2, 2, 2, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 2, 10, 10, 10, 0, 10, 10, 0, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 0, 0, 0, 0, 10, 1, 1, 2, 1, 0, 1, 0, 1, 1, 2, 1, 2, 1, 1, 2, 0, 1, 1, 2, 2, 2, 2, 2, 4, 0, 4, 0, 0, 0, 0, 0, 4, 2, 0, 2, 2, 2, 0, 2, 0, 10, 10, 0, 0, 11, 0, 0, 0, 2, 2, 3, 2, 0, 2, 3, 3, 3, 3, 3, 3, 0, 3, 2, 0, 0, 3, 3, 3, 3, 3, 0, 0, 10, 2, 10, 0, 3, 0, 1, 0, 3, 0, 1, 1, 3, 3, 0, 3, 3, 2, 2, 2, 2, 3, 0, 2, 3, 0, 0, 0, 17, 17, 17, 17, 0, 17, 0, 0, 2, 2, 0, 2, 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 9, 0, 11, 10, 0, 0, 13, 0, 0, 0, 2, 0, 1, 12, 0, 0, 1, 12, 16, 9, 9, 9, 16, 16, 16, 16, 2, 16, 16, 16, 2, 2, 2, 16, 3, 3, 1, 1, 8, 7, 8, 7, 5, 6, 8, 7, 8, 7, 5, 6, 8, 7, 0, 0, 0, 0, 0, 8, 7, 5, 6, 8, 7, 8, 7, 8, 7, 8, 8, 7, 5, 8, 7, 5, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 7, 7, 7, 7, 5, 5, 5, 7, 8, 0, 0, 5, 7, 5, 5, 7, 5, 7, 7, 5, 5, 7, 7, 5, 5, 7, 5, 5, 7, 7, 5, 7, 7, 5, 7, 5, 5, 5, 7, 0, 0, 5, 5, 5, 7, 7, 7, 5, 7, 5, 7, 8, 0, 0, 0, 12, 12, 12, 12, 12, 12, 0, 0, 12, 0, 0, 12, 12, 2, 2, 2, 15, 15, 15, 0, 15, 15, 15, 15, 8, 6, 8, 0, 8, 0, 8, 6, 8, 6, 8, 6, 8, 8, 7, 8, 7, 8, 7, 5, 6, 8, 7, 8, 6, 8, 7, 5, 7, 0, 0, 0, 0, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 0, 0, 0, 14, 14, 14, 0, 0, 0, 13, 13, 13, 0, 3, 0, 3, 3, 0, 0, 3, 0, 0, 3, 3, 0, 3, 3, 3, 0, 3, 0, 3, 0, 0, 0, 3, 3, 3, 0, 0, 3, 0, 3, 0, 3, 0, 0, 0, 3, 2, 2, 2, 9, 16, 0, 0, 0, 16, 16, 16, 0, 9, 9, 0, 0, }; /* Decomposition_Type: 2964 bytes. */ RE_UINT32 re_get_decomposition_type(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_decomposition_type_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_decomposition_type_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_decomposition_type_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_decomposition_type_stage_4[pos + f] << 2; value = re_decomposition_type_stage_5[pos + code]; return value; } /* East_Asian_Width. */ static RE_UINT8 re_east_asian_width_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 5, 5, 7, 8, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 10, 10, 10, 12, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 13, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 13, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 14, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 15, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 15, }; static RE_UINT8 re_east_asian_width_stage_2[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7, 8, 9, 10, 11, 12, 13, 14, 5, 15, 5, 16, 5, 5, 17, 18, 19, 20, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 24, 5, 5, 5, 5, 25, 5, 5, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 26, 5, 5, 5, 5, 5, 5, 5, 5, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 22, 22, 5, 5, 5, 28, 29, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 30, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 31, 32, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 33, 5, 34, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 35, }; static RE_UINT8 re_east_asian_width_stage_3[] = { 0, 0, 1, 1, 1, 1, 1, 2, 0, 0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 11, 0, 0, 0, 0, 0, 15, 16, 0, 0, 0, 0, 0, 0, 0, 9, 9, 0, 0, 0, 0, 0, 17, 18, 0, 0, 19, 19, 19, 19, 19, 19, 19, 0, 0, 20, 21, 20, 21, 0, 0, 0, 9, 19, 19, 19, 19, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 22, 22, 22, 22, 22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 24, 25, 0, 0, 0, 26, 27, 0, 28, 0, 0, 0, 0, 0, 29, 30, 31, 0, 0, 32, 33, 34, 35, 34, 0, 36, 0, 37, 38, 0, 39, 40, 41, 42, 43, 44, 45, 0, 46, 47, 48, 49, 0, 0, 0, 0, 0, 44, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 19, 19, 19, 19, 19, 19, 19, 51, 19, 19, 19, 19, 19, 33, 19, 19, 52, 19, 53, 21, 54, 55, 56, 57, 0, 58, 59, 0, 0, 60, 0, 61, 0, 0, 62, 0, 62, 63, 19, 64, 19, 0, 0, 0, 65, 0, 38, 0, 66, 0, 0, 0, 0, 0, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 70, 22, 22, 22, 22, 22, 71, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 72, 0, 73, 74, 22, 22, 75, 76, 22, 22, 22, 22, 77, 22, 22, 22, 22, 22, 22, 78, 22, 79, 76, 22, 22, 22, 22, 75, 22, 22, 80, 22, 22, 71, 22, 22, 75, 22, 22, 81, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 75, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 0, 0, 0, 0, 22, 22, 22, 22, 22, 22, 22, 22, 82, 22, 22, 22, 83, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 82, 0, 0, 0, 0, 0, 0, 0, 0, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 71, 0, 0, 0, 0, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 84, 0, 22, 22, 85, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 87, 88, 88, 88, 88, 88, 89, 90, 90, 90, 90, 91, 92, 93, 94, 65, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 19, 97, 19, 19, 19, 34, 19, 19, 96, 0, 0, 0, 0, 0, 0, 98, 22, 22, 80, 99, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 79, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 97, }; static RE_UINT8 re_east_asian_width_stage_4[] = { 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 7, 0, 10, 0, 0, 11, 12, 11, 13, 14, 10, 9, 14, 8, 12, 9, 5, 15, 0, 0, 0, 16, 0, 12, 0, 0, 13, 12, 0, 17, 0, 11, 12, 9, 11, 7, 15, 13, 0, 0, 0, 0, 0, 0, 10, 5, 5, 5, 11, 0, 18, 17, 15, 11, 0, 7, 16, 7, 7, 7, 7, 17, 7, 7, 7, 19, 7, 14, 0, 20, 20, 20, 20, 18, 9, 14, 14, 9, 7, 0, 0, 8, 15, 12, 10, 0, 11, 0, 12, 17, 11, 0, 0, 0, 0, 21, 11, 12, 15, 15, 0, 12, 10, 0, 0, 22, 10, 12, 0, 12, 11, 12, 9, 7, 7, 7, 0, 7, 7, 14, 0, 0, 0, 15, 0, 0, 0, 14, 0, 10, 11, 0, 0, 0, 12, 0, 0, 8, 12, 18, 12, 15, 15, 10, 17, 18, 16, 7, 5, 0, 7, 0, 14, 0, 0, 11, 11, 10, 0, 0, 0, 14, 7, 13, 13, 13, 13, 0, 0, 0, 15, 15, 0, 0, 15, 0, 0, 0, 0, 0, 12, 0, 0, 23, 0, 7, 7, 19, 7, 7, 0, 0, 0, 13, 14, 0, 0, 13, 13, 0, 14, 14, 13, 18, 13, 14, 0, 0, 0, 13, 14, 0, 12, 0, 22, 15, 13, 0, 14, 0, 5, 5, 0, 0, 0, 19, 19, 9, 19, 0, 0, 0, 13, 0, 7, 7, 19, 19, 0, 7, 7, 0, 0, 0, 15, 0, 13, 7, 7, 0, 24, 1, 25, 0, 26, 0, 0, 0, 17, 14, 0, 20, 20, 27, 20, 20, 0, 0, 0, 20, 28, 0, 0, 20, 20, 20, 0, 29, 20, 20, 20, 20, 20, 20, 30, 31, 20, 20, 20, 20, 30, 31, 20, 0, 31, 20, 20, 20, 20, 20, 28, 20, 20, 30, 0, 20, 20, 7, 7, 20, 20, 20, 32, 20, 30, 0, 0, 20, 20, 28, 0, 30, 20, 20, 20, 20, 30, 20, 0, 33, 34, 34, 34, 34, 34, 34, 34, 35, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 38, 36, 38, 36, 38, 36, 38, 39, 34, 40, 36, 37, 28, 0, 0, 0, 7, 7, 9, 0, 7, 7, 7, 14, 30, 0, 0, 0, 20, 20, 32, 0, }; static RE_UINT8 re_east_asian_width_stage_5[] = { 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 0, 0, 1, 5, 5, 1, 5, 5, 1, 1, 0, 1, 0, 5, 1, 1, 5, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 3, 3, 3, 3, 0, 2, 0, 0, 0, 1, 1, 0, 0, 3, 3, 0, 0, 0, 5, 5, 5, 5, 0, 0, 0, 5, 5, 0, 3, 3, 0, 3, 3, 3, 0, 0, 4, 3, 3, 3, 3, 3, 3, 0, 0, 3, 3, 3, 3, 0, 0, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 2, 2, 2, 0, 0, 0, 4, 4, 4, 0, }; /* East_Asian_Width: 1668 bytes. */ RE_UINT32 re_get_east_asian_width(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_east_asian_width_stage_1[f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_east_asian_width_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_east_asian_width_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_east_asian_width_stage_4[pos + f] << 2; value = re_east_asian_width_stage_5[pos + code]; return value; } /* Joining_Group. */ static RE_UINT8 re_joining_group_stage_1[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_joining_group_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_joining_group_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_joining_group_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 0, 0, 21, 0, 22, 0, 0, 23, 24, 25, 26, 0, 0, 0, 27, 28, 29, 30, 31, 32, 33, 0, 0, 0, 0, 34, 35, 36, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 37, 38, 39, 40, 41, 42, 0, 0, }; static RE_UINT8 re_joining_group_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 45, 0, 3, 3, 43, 3, 45, 3, 4, 41, 4, 4, 13, 13, 13, 6, 6, 31, 31, 35, 35, 33, 33, 39, 39, 1, 1, 11, 11, 55, 55, 55, 0, 9, 29, 19, 22, 24, 26, 16, 43, 45, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 29, 0, 3, 3, 3, 0, 3, 43, 43, 45, 4, 4, 4, 4, 4, 4, 4, 4, 13, 13, 13, 13, 13, 13, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 31, 31, 31, 31, 31, 31, 31, 31, 31, 35, 35, 35, 33, 33, 39, 1, 9, 9, 9, 9, 9, 9, 29, 29, 11, 38, 11, 19, 19, 19, 11, 11, 11, 11, 11, 11, 22, 22, 22, 22, 26, 26, 26, 26, 56, 21, 13, 41, 17, 17, 14, 43, 43, 43, 43, 43, 43, 43, 43, 55, 47, 55, 43, 45, 45, 46, 46, 0, 41, 0, 0, 0, 0, 0, 0, 0, 0, 6, 31, 0, 0, 35, 33, 1, 0, 0, 21, 2, 0, 5, 12, 12, 7, 7, 15, 44, 50, 18, 42, 42, 48, 49, 20, 23, 25, 27, 36, 10, 8, 28, 32, 34, 30, 7, 37, 40, 5, 12, 7, 0, 0, 0, 0, 0, 51, 52, 53, 4, 4, 4, 4, 4, 4, 4, 13, 13, 6, 6, 31, 35, 1, 1, 1, 9, 9, 11, 11, 11, 24, 24, 26, 26, 26, 22, 31, 31, 35, 13, 13, 35, 31, 13, 3, 3, 55, 55, 45, 43, 43, 54, 54, 13, 35, 35, 19, 4, 4, 13, 39, 9, 29, 22, 24, 45, 45, 31, 43, 57, 0, 6, 33, 11, 58, 31, 1, 19, 0, 0, 0, 59, 61, 61, 65, 65, 62, 0, 83, 0, 85, 85, 0, 0, 66, 80, 84, 68, 68, 68, 69, 63, 81, 70, 71, 77, 60, 60, 73, 73, 76, 74, 74, 74, 75, 0, 0, 78, 0, 0, 0, 0, 0, 0, 72, 64, 79, 82, 67, }; /* Joining_Group: 586 bytes. */ RE_UINT32 re_get_joining_group(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_joining_group_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_joining_group_stage_2[pos + f] << 4; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_joining_group_stage_3[pos + f] << 4; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_joining_group_stage_4[pos + f] << 3; value = re_joining_group_stage_5[pos + code]; return value; } /* Joining_Type. */ static RE_UINT8 re_joining_type_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 6, 7, 8, 4, 4, 4, 4, 9, 4, 4, 4, 4, 10, 4, 11, 12, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 13, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_joining_type_stage_2[] = { 0, 1, 0, 0, 0, 0, 2, 0, 0, 3, 0, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 0, 0, 0, 0, 27, 0, 0, 0, 0, 0, 0, 0, 28, 29, 30, 31, 32, 0, 33, 34, 35, 36, 37, 38, 0, 39, 0, 0, 0, 0, 40, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 43, 44, 0, 0, 0, 0, 45, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 47, 48, 0, 0, 49, 50, 51, 52, 53, 54, 0, 55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 57, 43, 0, 58, 0, 0, 0, 59, 0, 60, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 63, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 65, 66, 67, 68, 69, 70, 71, 0, 0, 72, 0, 73, 74, 75, 76, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 77, 78, 0, 0, 0, 0, 0, 0, 0, 0, 79, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 0, 81, 82, 83, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 84, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 87, 0, 88, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_joining_type_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 4, 2, 5, 6, 0, 0, 0, 0, 7, 8, 9, 10, 2, 11, 12, 13, 14, 15, 15, 16, 17, 18, 19, 20, 21, 22, 2, 23, 24, 25, 26, 0, 0, 27, 28, 29, 15, 30, 31, 0, 32, 33, 0, 34, 35, 0, 0, 0, 0, 36, 37, 0, 0, 38, 2, 39, 0, 0, 40, 41, 42, 43, 0, 44, 0, 0, 45, 46, 0, 43, 0, 47, 0, 0, 45, 48, 44, 0, 49, 47, 0, 0, 45, 50, 0, 43, 0, 44, 0, 0, 51, 46, 52, 43, 0, 53, 0, 0, 0, 54, 0, 0, 0, 28, 0, 0, 55, 56, 57, 43, 0, 44, 0, 0, 51, 58, 0, 43, 0, 44, 0, 0, 0, 46, 0, 43, 0, 0, 0, 0, 0, 59, 60, 0, 0, 0, 0, 0, 61, 62, 0, 0, 0, 0, 0, 0, 63, 64, 0, 0, 0, 0, 65, 0, 66, 0, 0, 0, 67, 68, 69, 2, 70, 52, 0, 0, 0, 0, 0, 71, 72, 0, 73, 28, 74, 75, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 71, 0, 0, 0, 76, 0, 76, 0, 43, 0, 43, 0, 0, 0, 77, 78, 79, 0, 0, 80, 0, 15, 15, 15, 15, 15, 81, 82, 15, 83, 0, 0, 0, 0, 0, 0, 0, 84, 85, 0, 0, 0, 0, 0, 86, 0, 0, 0, 87, 88, 89, 0, 0, 0, 90, 0, 0, 0, 0, 91, 0, 0, 92, 53, 0, 93, 91, 94, 0, 95, 0, 0, 0, 96, 94, 0, 0, 97, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 100, 101, 0, 0, 0, 0, 2, 2, 2, 102, 103, 0, 104, 0, 0, 0, 105, 0, 0, 0, 0, 0, 0, 2, 2, 28, 0, 0, 0, 0, 0, 0, 20, 94, 0, 0, 0, 0, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 106, 0, 0, 0, 0, 0, 0, 107, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 108, 0, 55, 0, 0, 0, 0, 0, 94, 109, 0, 57, 0, 15, 15, 15, 110, 0, 0, 0, 0, 111, 0, 2, 94, 0, 0, 112, 0, 113, 94, 0, 0, 39, 0, 0, 114, 0, 0, 115, 0, 0, 0, 116, 117, 118, 0, 0, 45, 0, 0, 0, 119, 44, 0, 120, 52, 0, 0, 0, 0, 0, 0, 121, 0, 0, 122, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 123, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 28, 0, 0, 0, 0, 0, 0, 0, 0, 124, 125, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 127, 128, 129, 0, 130, 131, 132, 0, 0, 0, 0, 0, 44, 0, 0, 133, 134, 0, 0, 20, 94, 0, 0, 135, 0, 0, 0, 0, 39, 0, 136, 137, 0, 0, 0, 138, 94, 0, 0, 139, 140, 0, 0, 0, 0, 0, 20, 141, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 142, 0, 94, 0, 0, 45, 28, 0, 143, 137, 0, 0, 0, 144, 145, 0, 0, 0, 0, 0, 0, 146, 28, 120, 0, 0, 0, 0, 0, 147, 28, 0, 0, 0, 0, 0, 148, 149, 0, 0, 0, 0, 0, 71, 150, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 137, 0, 0, 0, 134, 0, 0, 0, 0, 20, 39, 0, 0, 0, 0, 0, 0, 0, 151, 91, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152, 38, 153, 0, 106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 76, 0, 0, 0, 2, 2, 2, 154, 2, 2, 70, 115, 111, 93, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 0, 0, 44, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_joining_type_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 3, 2, 4, 0, 5, 2, 2, 2, 2, 2, 2, 6, 7, 6, 0, 0, 2, 2, 8, 9, 10, 11, 12, 13, 14, 15, 15, 15, 16, 15, 17, 2, 0, 0, 0, 18, 19, 20, 15, 15, 15, 15, 21, 21, 21, 21, 22, 15, 15, 15, 15, 15, 23, 21, 21, 24, 25, 26, 2, 27, 2, 27, 28, 29, 0, 0, 18, 30, 0, 0, 0, 3, 31, 32, 22, 33, 15, 15, 34, 23, 2, 2, 8, 35, 15, 15, 32, 15, 15, 15, 13, 36, 24, 36, 22, 15, 0, 37, 2, 2, 9, 0, 0, 0, 0, 0, 18, 15, 15, 15, 38, 2, 2, 0, 39, 0, 0, 37, 6, 2, 2, 5, 5, 4, 36, 25, 12, 15, 15, 40, 5, 0, 15, 15, 25, 41, 42, 43, 0, 0, 3, 2, 2, 2, 8, 0, 0, 0, 0, 0, 44, 9, 5, 2, 9, 1, 5, 2, 0, 0, 37, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 9, 5, 9, 0, 1, 7, 0, 0, 0, 7, 3, 27, 4, 4, 1, 0, 0, 5, 6, 9, 1, 0, 0, 0, 27, 0, 44, 0, 0, 44, 0, 0, 0, 9, 0, 0, 1, 0, 0, 0, 37, 9, 37, 28, 4, 0, 7, 0, 0, 0, 44, 0, 4, 0, 0, 44, 0, 37, 45, 0, 0, 1, 2, 8, 0, 0, 3, 2, 8, 1, 2, 6, 9, 0, 0, 2, 4, 0, 0, 4, 0, 0, 46, 1, 0, 5, 2, 2, 8, 2, 28, 0, 5, 2, 2, 5, 2, 2, 2, 2, 9, 0, 0, 0, 5, 28, 2, 7, 7, 0, 0, 4, 37, 5, 9, 0, 0, 44, 7, 0, 1, 37, 9, 0, 0, 0, 6, 2, 4, 0, 44, 5, 2, 2, 0, 0, 1, 0, 47, 48, 4, 15, 15, 0, 0, 0, 47, 15, 15, 15, 15, 49, 0, 8, 3, 9, 0, 44, 0, 5, 0, 0, 3, 27, 0, 0, 44, 2, 8, 45, 5, 2, 9, 3, 2, 2, 27, 2, 2, 2, 8, 2, 0, 0, 0, 0, 28, 8, 9, 0, 0, 3, 2, 4, 0, 0, 0, 37, 4, 6, 4, 0, 44, 4, 46, 0, 0, 0, 2, 2, 37, 0, 0, 8, 2, 2, 2, 28, 2, 9, 1, 0, 9, 4, 0, 2, 4, 0, 2, 0, 0, 3, 50, 0, 0, 37, 8, 2, 9, 37, 2, 0, 0, 37, 4, 0, 0, 7, 0, 8, 2, 2, 4, 44, 44, 3, 0, 51, 0, 0, 0, 0, 9, 0, 0, 0, 37, 2, 4, 0, 3, 2, 2, 3, 37, 4, 9, 0, 1, 0, 0, 0, 0, 5, 8, 7, 7, 0, 0, 3, 0, 0, 9, 28, 27, 9, 37, 0, 0, 0, 4, 0, 1, 9, 1, 0, 0, 0, 44, 0, 0, 5, 0, 0, 37, 8, 0, 5, 7, 0, 2, 0, 0, 8, 3, 15, 52, 53, 54, 14, 55, 15, 12, 56, 57, 47, 13, 24, 22, 12, 58, 56, 0, 0, 0, 0, 0, 20, 59, 0, 0, 2, 2, 2, 8, 0, 0, 3, 8, 7, 1, 0, 3, 2, 5, 2, 9, 0, 0, 3, 0, 0, 0, 0, 37, 2, 8, 0, 0, 37, 9, 4, 28, 0, 0, 3, 2, 8, 0, 0, 37, 2, 9, 3, 2, 45, 3, 28, 0, 0, 0, 37, 4, 0, 6, 3, 2, 8, 46, 0, 0, 3, 1, 2, 6, 0, 0, 37, 6, 2, 0, 0, 0, 0, 7, 0, 3, 4, 0, 8, 5, 2, 0, 2, 8, 3, 2, }; static RE_UINT8 re_joining_type_stage_5[] = { 0, 0, 0, 0, 0, 5, 0, 0, 5, 5, 5, 5, 0, 0, 0, 5, 5, 5, 0, 0, 0, 5, 5, 5, 5, 5, 0, 5, 0, 5, 5, 0, 5, 5, 5, 0, 5, 0, 0, 0, 2, 0, 3, 3, 3, 3, 2, 3, 2, 3, 2, 2, 2, 2, 2, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 2, 2, 2, 3, 2, 2, 5, 0, 0, 2, 2, 5, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 3, 2, 2, 3, 2, 3, 2, 3, 2, 2, 3, 3, 0, 3, 5, 5, 5, 0, 0, 5, 5, 0, 5, 5, 5, 5, 3, 3, 2, 0, 0, 2, 3, 5, 2, 2, 2, 3, 3, 3, 2, 2, 3, 2, 3, 2, 3, 2, 0, 3, 2, 2, 3, 2, 2, 2, 0, 0, 5, 5, 2, 2, 2, 5, 0, 0, 1, 0, 3, 2, 0, 0, 3, 0, 3, 2, 2, 3, 3, 2, 2, 0, 0, 0, 0, 0, 5, 0, 5, 0, 5, 0, 0, 5, 0, 5, 0, 0, 0, 2, 0, 0, 1, 5, 2, 5, 2, 0, 0, 1, 5, 5, 2, 2, 4, 0, 2, 3, 0, 3, 0, 3, 3, 0, 0, 4, 3, 3, 2, 2, 2, 4, 2, 3, 0, 0, 3, 5, 5, 0, 3, 2, 3, 3, 3, 2, 2, 0, }; /* Joining_Type: 2292 bytes. */ RE_UINT32 re_get_joining_type(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_joining_type_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_joining_type_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_joining_type_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_joining_type_stage_4[pos + f] << 2; value = re_joining_type_stage_5[pos + code]; return value; } /* Line_Break. */ static RE_UINT8 re_line_break_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 10, 17, 10, 10, 10, 10, 18, 10, 19, 20, 21, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 22, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 22, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 23, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, }; static RE_UINT8 re_line_break_stage_2[] = { 0, 1, 2, 2, 2, 3, 4, 5, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 2, 2, 2, 2, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 2, 51, 2, 2, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 2, 2, 2, 70, 2, 2, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 87, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 88, 79, 79, 79, 79, 79, 79, 79, 79, 89, 2, 2, 90, 91, 2, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 108, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 79, 79, 79, 79, 111, 112, 2, 2, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 110, 123, 124, 125, 2, 126, 127, 110, 2, 2, 128, 110, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 110, 110, 139, 110, 110, 110, 140, 141, 142, 143, 144, 145, 146, 110, 110, 147, 110, 148, 149, 150, 151, 110, 110, 152, 110, 110, 110, 153, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 2, 2, 2, 2, 2, 2, 154, 155, 2, 156, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 2, 2, 2, 157, 158, 159, 2, 160, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 2, 2, 161, 162, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 2, 2, 2, 163, 164, 165, 166, 110, 110, 110, 110, 110, 110, 167, 168, 169, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 170, 171, 110, 110, 110, 110, 110, 110, 2, 172, 173, 174, 175, 110, 176, 110, 177, 178, 179, 2, 2, 180, 2, 181, 2, 2, 2, 2, 182, 183, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 184, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 185, 186, 110, 110, 187, 188, 189, 190, 191, 110, 79, 192, 79, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 204, 205, 110, 206, 207, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, }; static RE_UINT16 re_line_break_stage_3[] = { 0, 1, 2, 3, 4, 5, 4, 6, 7, 1, 8, 9, 4, 10, 4, 10, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 11, 12, 4, 4, 1, 1, 1, 1, 13, 14, 15, 16, 17, 4, 18, 4, 4, 4, 4, 4, 19, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 20, 4, 21, 20, 4, 22, 23, 1, 24, 25, 26, 27, 28, 29, 30, 4, 4, 31, 1, 32, 33, 4, 4, 4, 4, 4, 34, 35, 36, 37, 38, 4, 1, 39, 4, 4, 4, 4, 4, 40, 41, 36, 4, 31, 42, 4, 43, 44, 45, 4, 46, 47, 47, 47, 47, 4, 48, 47, 47, 49, 1, 50, 4, 4, 51, 1, 52, 53, 4, 54, 55, 56, 57, 58, 59, 60, 61, 62, 55, 56, 63, 64, 65, 66, 67, 68, 18, 56, 69, 70, 71, 60, 72, 73, 55, 56, 69, 74, 75, 60, 76, 77, 78, 79, 80, 81, 82, 66, 83, 84, 85, 56, 86, 87, 88, 60, 89, 90, 85, 56, 91, 87, 92, 60, 93, 90, 85, 4, 94, 95, 96, 60, 97, 98, 99, 4, 100, 101, 102, 66, 103, 104, 105, 105, 106, 107, 108, 47, 47, 109, 110, 111, 112, 113, 114, 47, 47, 115, 116, 36, 117, 118, 4, 119, 120, 121, 122, 1, 123, 124, 125, 47, 47, 105, 105, 105, 105, 126, 105, 105, 105, 105, 127, 4, 4, 128, 4, 4, 4, 129, 129, 129, 129, 129, 129, 130, 130, 130, 130, 131, 132, 132, 132, 132, 132, 4, 4, 4, 4, 133, 134, 4, 4, 133, 4, 4, 135, 136, 137, 4, 4, 4, 136, 4, 4, 4, 138, 139, 119, 4, 140, 4, 4, 4, 4, 4, 141, 142, 4, 4, 4, 4, 4, 4, 4, 142, 143, 4, 4, 4, 4, 144, 145, 146, 147, 4, 148, 4, 149, 146, 150, 105, 105, 105, 105, 105, 151, 152, 140, 153, 152, 4, 4, 4, 4, 4, 76, 4, 4, 154, 4, 4, 4, 4, 155, 4, 45, 156, 156, 157, 105, 158, 159, 105, 105, 160, 105, 161, 162, 4, 4, 4, 163, 105, 105, 105, 164, 105, 165, 152, 152, 158, 166, 47, 47, 47, 47, 167, 4, 4, 168, 169, 170, 171, 172, 173, 4, 174, 36, 4, 4, 40, 175, 4, 4, 168, 176, 177, 36, 4, 178, 47, 47, 47, 47, 76, 179, 180, 181, 4, 4, 4, 4, 1, 1, 1, 182, 4, 141, 4, 4, 141, 183, 4, 184, 4, 4, 4, 185, 185, 186, 4, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 119, 197, 198, 199, 1, 1, 200, 201, 202, 203, 4, 4, 204, 205, 206, 207, 206, 4, 4, 4, 208, 4, 4, 209, 210, 211, 212, 213, 214, 215, 4, 216, 217, 218, 219, 4, 4, 220, 4, 221, 222, 223, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 224, 4, 4, 225, 47, 226, 47, 227, 227, 227, 227, 227, 227, 227, 227, 227, 228, 227, 227, 227, 227, 205, 227, 227, 229, 227, 230, 231, 232, 233, 234, 235, 4, 236, 237, 4, 238, 239, 4, 240, 241, 4, 242, 4, 243, 244, 245, 246, 247, 248, 4, 4, 4, 4, 249, 250, 251, 227, 252, 4, 4, 253, 4, 254, 4, 255, 256, 4, 4, 4, 221, 4, 257, 4, 4, 4, 4, 4, 258, 4, 259, 4, 260, 4, 261, 56, 262, 263, 47, 4, 4, 45, 4, 4, 45, 4, 4, 4, 4, 4, 4, 4, 4, 264, 265, 4, 4, 128, 4, 4, 4, 266, 267, 4, 225, 268, 268, 268, 268, 1, 1, 269, 270, 271, 272, 273, 47, 47, 47, 274, 275, 274, 274, 274, 274, 274, 276, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 277, 47, 278, 279, 280, 281, 282, 283, 274, 284, 274, 285, 286, 287, 274, 284, 274, 285, 288, 289, 274, 290, 291, 274, 274, 274, 274, 292, 274, 274, 293, 274, 274, 276, 294, 274, 292, 274, 274, 295, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 292, 274, 274, 274, 274, 4, 4, 4, 4, 274, 296, 274, 274, 274, 274, 274, 274, 297, 274, 274, 274, 298, 4, 4, 178, 299, 4, 300, 47, 4, 4, 264, 301, 4, 302, 4, 4, 4, 4, 4, 303, 4, 4, 184, 76, 47, 47, 47, 304, 305, 4, 306, 307, 4, 4, 4, 308, 309, 4, 4, 168, 310, 152, 1, 311, 36, 4, 312, 4, 313, 314, 129, 315, 50, 4, 4, 316, 317, 318, 105, 319, 4, 4, 320, 321, 322, 323, 105, 105, 105, 105, 105, 105, 324, 325, 31, 326, 327, 328, 268, 4, 4, 4, 155, 4, 4, 4, 4, 4, 4, 4, 329, 152, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 333, 332, 335, 130, 336, 132, 132, 337, 338, 338, 338, 338, 338, 338, 338, 338, 47, 47, 47, 47, 47, 47, 47, 47, 225, 339, 340, 341, 342, 4, 4, 4, 4, 4, 4, 4, 262, 343, 4, 4, 4, 4, 4, 344, 47, 4, 4, 4, 4, 345, 4, 4, 76, 47, 47, 346, 1, 347, 1, 348, 349, 350, 351, 185, 4, 4, 4, 4, 4, 4, 4, 352, 353, 354, 274, 355, 274, 356, 357, 358, 4, 359, 4, 45, 360, 361, 362, 363, 364, 4, 137, 365, 184, 184, 47, 47, 4, 4, 4, 4, 4, 4, 4, 226, 366, 4, 4, 367, 4, 4, 4, 4, 119, 368, 71, 47, 47, 4, 4, 369, 4, 119, 4, 4, 4, 71, 33, 368, 4, 4, 370, 4, 226, 4, 4, 371, 4, 372, 4, 4, 373, 374, 47, 47, 4, 184, 152, 47, 47, 47, 47, 47, 4, 4, 76, 4, 4, 4, 375, 47, 4, 4, 4, 225, 4, 155, 76, 47, 376, 4, 4, 377, 4, 378, 4, 4, 4, 45, 304, 47, 47, 47, 4, 379, 4, 380, 4, 381, 47, 47, 47, 47, 4, 4, 4, 382, 4, 345, 4, 4, 383, 384, 4, 385, 76, 386, 4, 4, 4, 4, 47, 47, 4, 4, 387, 388, 4, 4, 4, 389, 4, 260, 4, 390, 4, 391, 392, 47, 47, 47, 47, 47, 4, 4, 4, 4, 145, 47, 47, 47, 4, 4, 4, 393, 4, 4, 4, 394, 47, 47, 47, 47, 47, 47, 4, 45, 173, 4, 4, 395, 396, 345, 397, 398, 173, 4, 4, 399, 400, 4, 145, 152, 173, 4, 313, 401, 402, 4, 4, 403, 173, 4, 4, 316, 404, 405, 20, 48, 4, 18, 406, 407, 47, 47, 47, 47, 408, 37, 409, 4, 4, 264, 410, 152, 411, 55, 56, 69, 74, 412, 413, 414, 4, 4, 4, 1, 415, 152, 47, 47, 4, 4, 264, 416, 417, 418, 47, 47, 4, 4, 4, 1, 419, 152, 47, 47, 4, 4, 31, 420, 152, 47, 47, 47, 105, 421, 160, 422, 47, 47, 47, 47, 47, 47, 4, 4, 4, 4, 36, 423, 47, 47, 47, 47, 4, 4, 4, 145, 4, 140, 47, 47, 47, 47, 47, 47, 4, 4, 4, 4, 4, 4, 45, 424, 4, 4, 4, 4, 370, 47, 47, 47, 4, 4, 4, 4, 4, 425, 4, 4, 426, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 427, 4, 4, 45, 47, 47, 47, 47, 47, 4, 4, 4, 4, 428, 4, 4, 4, 4, 4, 4, 4, 225, 47, 47, 47, 4, 4, 4, 145, 4, 45, 429, 47, 47, 47, 47, 47, 47, 4, 184, 430, 4, 4, 4, 431, 432, 433, 18, 434, 4, 47, 47, 47, 47, 47, 47, 47, 4, 4, 4, 4, 48, 435, 1, 166, 398, 173, 47, 47, 47, 47, 47, 47, 436, 47, 47, 47, 47, 47, 47, 47, 4, 4, 4, 4, 4, 4, 226, 119, 145, 437, 438, 47, 47, 47, 47, 47, 4, 4, 4, 4, 4, 4, 4, 155, 4, 4, 21, 4, 4, 4, 439, 1, 440, 4, 441, 4, 4, 4, 145, 47, 4, 4, 4, 4, 442, 47, 47, 47, 4, 4, 4, 4, 4, 225, 4, 262, 4, 4, 4, 4, 4, 185, 4, 4, 4, 146, 443, 444, 445, 4, 4, 4, 446, 447, 4, 448, 449, 85, 4, 4, 4, 4, 260, 4, 4, 4, 4, 4, 4, 4, 4, 4, 450, 451, 451, 451, 1, 1, 1, 452, 1, 1, 453, 454, 455, 456, 23, 47, 47, 47, 47, 47, 4, 4, 4, 4, 457, 321, 47, 47, 445, 4, 458, 459, 460, 461, 462, 463, 464, 368, 465, 368, 47, 47, 47, 262, 274, 274, 278, 274, 274, 274, 274, 274, 274, 276, 292, 291, 291, 291, 274, 277, 466, 227, 467, 227, 227, 227, 468, 227, 227, 469, 47, 47, 47, 47, 470, 471, 472, 274, 274, 293, 473, 436, 47, 47, 274, 474, 274, 475, 274, 274, 274, 476, 274, 274, 477, 478, 274, 274, 274, 274, 479, 480, 481, 482, 483, 274, 274, 275, 274, 274, 484, 274, 274, 485, 274, 486, 274, 274, 274, 274, 274, 4, 4, 487, 274, 274, 274, 274, 274, 488, 297, 276, 4, 4, 4, 4, 4, 4, 4, 370, 4, 4, 4, 4, 4, 48, 47, 47, 368, 4, 4, 4, 76, 140, 4, 4, 76, 4, 184, 47, 47, 47, 47, 47, 47, 473, 47, 47, 47, 47, 47, 47, 489, 47, 47, 47, 488, 47, 47, 47, 274, 274, 274, 274, 274, 274, 274, 290, 490, 47, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, }; static RE_UINT8 re_line_break_stage_4[] = { 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 12, 12, 12, 13, 14, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16, 17, 14, 14, 14, 14, 14, 14, 16, 18, 19, 0, 0, 20, 0, 0, 0, 0, 0, 21, 22, 23, 24, 25, 26, 27, 14, 22, 28, 29, 28, 28, 26, 28, 30, 14, 14, 14, 24, 14, 14, 14, 14, 14, 14, 14, 24, 31, 28, 31, 14, 25, 14, 14, 14, 28, 28, 24, 32, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 34, 34, 34, 35, 0, 0, 0, 0, 0, 0, 14, 14, 14, 14, 36, 14, 14, 37, 36, 36, 14, 14, 14, 38, 38, 14, 14, 39, 14, 14, 14, 14, 14, 14, 14, 19, 0, 0, 0, 14, 14, 14, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 39, 14, 14, 14, 14, 14, 14, 14, 40, 41, 39, 9, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 43, 19, 44, 0, 45, 36, 36, 36, 36, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 47, 36, 36, 46, 48, 38, 36, 36, 36, 36, 36, 14, 14, 14, 14, 49, 50, 13, 14, 0, 0, 0, 0, 0, 51, 52, 53, 14, 14, 14, 14, 14, 19, 0, 0, 12, 12, 12, 12, 12, 54, 55, 14, 44, 14, 14, 14, 14, 14, 14, 14, 14, 14, 56, 0, 0, 0, 44, 19, 0, 0, 44, 19, 44, 0, 0, 14, 12, 12, 12, 12, 12, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 39, 19, 14, 14, 14, 14, 14, 14, 14, 0, 0, 0, 0, 0, 52, 39, 14, 14, 14, 14, 0, 0, 0, 0, 0, 44, 36, 36, 36, 36, 36, 36, 36, 0, 0, 14, 14, 57, 38, 36, 36, 14, 14, 14, 0, 0, 19, 0, 0, 0, 0, 19, 0, 19, 0, 0, 36, 14, 14, 14, 14, 14, 14, 14, 38, 14, 14, 14, 14, 19, 0, 36, 38, 36, 36, 36, 36, 36, 36, 36, 36, 14, 14, 38, 36, 36, 36, 36, 36, 36, 42, 0, 0, 0, 0, 0, 0, 0, 0, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 0, 44, 0, 19, 0, 0, 0, 14, 14, 14, 14, 14, 0, 58, 12, 12, 12, 12, 12, 19, 0, 39, 14, 14, 14, 38, 39, 38, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 14, 14, 14, 38, 38, 36, 14, 14, 36, 44, 0, 0, 0, 52, 42, 52, 42, 0, 38, 36, 36, 36, 42, 36, 36, 14, 39, 14, 0, 36, 12, 12, 12, 12, 12, 14, 50, 14, 14, 49, 9, 36, 36, 42, 0, 39, 14, 14, 38, 36, 39, 38, 14, 39, 38, 14, 36, 52, 0, 0, 52, 36, 42, 52, 42, 0, 36, 42, 36, 36, 36, 39, 14, 38, 38, 36, 36, 36, 12, 12, 12, 12, 12, 0, 14, 19, 36, 36, 36, 36, 36, 42, 0, 39, 14, 14, 14, 14, 39, 38, 14, 39, 14, 14, 36, 44, 0, 0, 0, 0, 42, 0, 42, 0, 36, 38, 36, 36, 36, 36, 36, 36, 36, 9, 36, 36, 36, 39, 36, 36, 36, 42, 0, 39, 14, 14, 14, 38, 39, 0, 0, 52, 42, 52, 42, 0, 36, 36, 36, 36, 0, 36, 36, 14, 39, 14, 14, 14, 14, 36, 36, 36, 36, 36, 44, 39, 14, 14, 38, 36, 14, 38, 14, 14, 36, 39, 38, 38, 14, 36, 39, 38, 36, 14, 38, 36, 14, 14, 14, 14, 14, 14, 36, 36, 0, 0, 52, 36, 0, 52, 0, 0, 36, 38, 36, 36, 42, 36, 36, 36, 36, 14, 14, 14, 14, 9, 38, 36, 36, 0, 0, 39, 14, 14, 14, 38, 14, 38, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 39, 0, 0, 0, 52, 0, 52, 0, 0, 36, 36, 36, 42, 52, 14, 38, 36, 36, 36, 36, 36, 36, 14, 14, 14, 14, 42, 0, 39, 14, 14, 14, 38, 14, 14, 14, 39, 14, 14, 36, 44, 0, 36, 36, 42, 52, 36, 36, 36, 38, 39, 38, 36, 36, 36, 36, 36, 36, 14, 14, 14, 14, 14, 38, 39, 0, 0, 0, 52, 0, 52, 0, 0, 38, 36, 36, 36, 42, 36, 36, 36, 39, 14, 14, 14, 36, 59, 14, 14, 14, 36, 0, 39, 14, 14, 14, 14, 14, 14, 14, 14, 38, 36, 14, 14, 14, 14, 39, 14, 14, 14, 14, 39, 36, 14, 14, 14, 38, 36, 52, 36, 42, 0, 0, 52, 52, 0, 0, 0, 0, 36, 0, 38, 36, 36, 36, 36, 36, 60, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 62, 36, 63, 61, 61, 61, 61, 61, 61, 61, 64, 12, 12, 12, 12, 12, 58, 36, 36, 60, 62, 62, 60, 62, 62, 60, 36, 36, 36, 61, 61, 60, 61, 61, 61, 60, 61, 60, 60, 36, 61, 60, 61, 61, 61, 61, 61, 61, 60, 61, 36, 61, 61, 62, 62, 61, 61, 61, 36, 12, 12, 12, 12, 12, 36, 61, 61, 32, 65, 29, 65, 66, 67, 68, 53, 53, 69, 56, 14, 0, 14, 14, 14, 14, 14, 43, 19, 19, 70, 70, 0, 14, 14, 14, 14, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 36, 42, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 14, 14, 19, 0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52, 58, 14, 14, 14, 44, 14, 14, 38, 14, 65, 71, 14, 14, 72, 73, 36, 36, 12, 12, 12, 12, 12, 58, 14, 14, 12, 12, 12, 12, 12, 61, 61, 61, 14, 14, 14, 39, 36, 36, 39, 36, 74, 74, 74, 74, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 14, 14, 14, 14, 38, 14, 14, 36, 14, 14, 14, 38, 38, 14, 14, 36, 38, 14, 14, 36, 14, 14, 14, 38, 38, 14, 14, 36, 14, 14, 14, 14, 14, 14, 14, 38, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 42, 0, 27, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 36, 36, 14, 14, 14, 36, 14, 14, 14, 36, 77, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16, 78, 36, 14, 14, 14, 14, 14, 27, 58, 14, 14, 14, 14, 14, 38, 36, 36, 36, 14, 14, 14, 14, 14, 14, 38, 14, 14, 0, 52, 36, 36, 36, 36, 36, 14, 0, 1, 41, 36, 36, 36, 36, 14, 0, 36, 36, 36, 36, 36, 36, 38, 0, 36, 36, 36, 36, 36, 36, 61, 61, 58, 79, 77, 80, 61, 36, 12, 12, 12, 12, 12, 36, 36, 36, 14, 53, 58, 29, 53, 19, 0, 73, 14, 14, 14, 14, 19, 38, 36, 36, 14, 14, 14, 36, 36, 36, 36, 36, 0, 0, 0, 0, 0, 0, 36, 36, 38, 36, 53, 12, 12, 12, 12, 12, 61, 61, 61, 61, 61, 61, 61, 36, 61, 61, 62, 36, 36, 36, 36, 36, 61, 61, 61, 61, 61, 61, 36, 36, 61, 61, 61, 61, 61, 36, 36, 36, 12, 12, 12, 12, 12, 62, 36, 61, 14, 14, 14, 19, 0, 0, 36, 14, 61, 61, 61, 61, 61, 61, 61, 62, 61, 61, 61, 61, 61, 61, 62, 42, 0, 0, 0, 0, 0, 0, 0, 52, 0, 0, 44, 14, 14, 14, 14, 14, 14, 14, 0, 0, 0, 0, 0, 0, 0, 0, 44, 14, 14, 14, 36, 36, 12, 12, 12, 12, 12, 58, 27, 58, 77, 14, 14, 14, 14, 19, 0, 0, 0, 0, 14, 14, 14, 14, 38, 36, 0, 44, 14, 14, 14, 14, 14, 14, 19, 0, 0, 0, 0, 0, 0, 14, 0, 0, 36, 36, 36, 36, 14, 14, 0, 0, 0, 0, 36, 81, 58, 58, 12, 12, 12, 12, 12, 36, 39, 14, 14, 14, 14, 14, 14, 14, 14, 58, 0, 44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 44, 14, 19, 14, 14, 0, 44, 38, 0, 36, 36, 36, 0, 0, 0, 36, 36, 36, 0, 0, 14, 14, 14, 14, 39, 39, 39, 39, 14, 14, 14, 14, 14, 14, 14, 36, 14, 14, 38, 14, 14, 14, 14, 14, 14, 14, 36, 14, 14, 14, 39, 14, 36, 14, 38, 14, 14, 14, 32, 38, 58, 58, 58, 82, 58, 83, 0, 0, 82, 58, 84, 25, 85, 86, 85, 86, 28, 14, 87, 88, 89, 0, 0, 33, 50, 50, 50, 50, 7, 90, 91, 14, 14, 14, 92, 93, 91, 14, 14, 14, 14, 14, 14, 77, 58, 58, 27, 58, 94, 14, 38, 0, 0, 0, 0, 0, 14, 36, 25, 14, 14, 14, 16, 95, 24, 28, 25, 14, 14, 14, 16, 78, 23, 23, 23, 6, 23, 23, 23, 23, 23, 23, 23, 22, 23, 6, 23, 22, 23, 23, 23, 23, 23, 23, 23, 23, 52, 36, 36, 36, 36, 36, 36, 36, 14, 49, 24, 14, 49, 14, 14, 14, 14, 24, 14, 96, 14, 14, 14, 14, 24, 25, 14, 14, 14, 24, 14, 14, 14, 14, 28, 14, 14, 24, 14, 25, 28, 28, 28, 28, 28, 28, 14, 14, 28, 28, 28, 28, 28, 14, 14, 14, 14, 14, 14, 14, 24, 14, 36, 36, 14, 25, 25, 14, 14, 14, 14, 14, 25, 28, 14, 24, 25, 24, 14, 24, 24, 23, 24, 14, 14, 25, 24, 28, 25, 24, 24, 24, 28, 28, 25, 25, 14, 14, 28, 28, 14, 14, 28, 14, 14, 14, 14, 14, 25, 14, 25, 14, 14, 25, 14, 14, 14, 14, 14, 14, 28, 14, 28, 28, 14, 28, 14, 28, 14, 28, 14, 28, 14, 14, 14, 14, 14, 14, 24, 14, 24, 14, 14, 14, 14, 14, 24, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 24, 14, 14, 14, 14, 14, 14, 14, 97, 14, 14, 14, 14, 70, 70, 14, 14, 14, 25, 14, 14, 14, 98, 14, 14, 14, 14, 14, 14, 16, 99, 14, 14, 98, 98, 14, 14, 14, 38, 36, 36, 14, 14, 14, 38, 36, 36, 36, 36, 14, 14, 14, 14, 14, 38, 36, 36, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 25, 28, 28, 25, 14, 14, 14, 14, 14, 14, 28, 28, 14, 14, 14, 14, 14, 28, 24, 28, 28, 28, 14, 14, 14, 14, 28, 14, 28, 14, 14, 28, 14, 28, 14, 14, 28, 25, 24, 14, 28, 28, 14, 14, 14, 14, 14, 14, 14, 14, 28, 28, 14, 14, 14, 14, 24, 98, 98, 24, 25, 24, 14, 14, 28, 14, 14, 98, 28, 100, 98, 98, 98, 14, 14, 14, 14, 101, 98, 14, 14, 25, 25, 14, 14, 14, 14, 14, 14, 28, 24, 28, 24, 102, 25, 28, 24, 14, 14, 14, 14, 14, 14, 14, 101, 14, 14, 14, 14, 14, 14, 14, 28, 14, 14, 14, 14, 14, 14, 101, 98, 98, 98, 98, 98, 102, 28, 103, 101, 98, 103, 102, 28, 98, 28, 102, 103, 98, 24, 14, 14, 28, 102, 28, 28, 103, 98, 98, 103, 98, 102, 103, 98, 98, 98, 100, 14, 98, 98, 98, 14, 14, 14, 14, 24, 14, 7, 85, 85, 5, 53, 14, 14, 70, 70, 70, 70, 70, 70, 70, 28, 28, 28, 28, 28, 28, 28, 14, 14, 14, 14, 14, 14, 14, 14, 16, 99, 14, 14, 14, 14, 14, 14, 14, 70, 70, 70, 70, 70, 14, 16, 104, 104, 104, 104, 104, 104, 104, 104, 104, 104, 99, 14, 14, 14, 14, 14, 14, 14, 14, 14, 70, 14, 14, 14, 24, 28, 28, 14, 14, 14, 14, 14, 36, 14, 14, 14, 14, 14, 14, 14, 14, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 39, 14, 14, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 19, 0, 14, 36, 36, 105, 58, 77, 106, 14, 14, 14, 14, 36, 36, 36, 39, 41, 36, 36, 36, 36, 36, 36, 42, 14, 14, 14, 38, 14, 14, 14, 38, 85, 85, 85, 85, 85, 85, 85, 58, 58, 58, 58, 27, 107, 14, 85, 14, 85, 70, 70, 70, 70, 58, 58, 56, 58, 27, 77, 14, 14, 108, 58, 77, 58, 109, 36, 36, 36, 36, 36, 36, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 110, 98, 98, 98, 98, 36, 36, 36, 36, 36, 36, 98, 98, 98, 36, 36, 36, 36, 36, 98, 98, 98, 98, 98, 98, 36, 36, 18, 111, 112, 98, 70, 70, 70, 70, 70, 98, 70, 70, 70, 70, 113, 114, 98, 98, 98, 98, 98, 0, 0, 0, 98, 98, 115, 98, 98, 112, 116, 98, 117, 118, 118, 118, 118, 98, 98, 98, 98, 118, 98, 98, 98, 98, 98, 98, 98, 118, 118, 118, 98, 98, 98, 119, 98, 98, 118, 120, 42, 121, 91, 116, 122, 118, 118, 118, 118, 98, 98, 98, 98, 98, 118, 119, 98, 112, 123, 116, 36, 36, 110, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 36, 110, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 124, 98, 98, 98, 98, 98, 124, 36, 36, 125, 125, 125, 125, 125, 125, 125, 125, 98, 98, 98, 98, 28, 28, 28, 28, 98, 98, 112, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 124, 36, 98, 98, 98, 124, 36, 36, 36, 36, 14, 14, 14, 14, 14, 14, 27, 106, 12, 12, 12, 12, 12, 14, 36, 36, 0, 44, 0, 0, 0, 0, 0, 14, 14, 14, 14, 14, 14, 14, 14, 0, 0, 27, 58, 58, 36, 36, 36, 36, 36, 36, 36, 39, 14, 14, 14, 14, 14, 44, 14, 44, 14, 19, 14, 14, 14, 19, 0, 0, 14, 14, 36, 36, 14, 14, 14, 14, 126, 36, 36, 36, 14, 14, 65, 53, 36, 36, 36, 36, 0, 14, 14, 14, 14, 14, 14, 14, 0, 0, 52, 36, 36, 36, 36, 58, 0, 14, 14, 14, 14, 14, 29, 36, 14, 14, 14, 0, 0, 0, 0, 58, 14, 14, 14, 19, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 39, 74, 74, 74, 74, 74, 74, 127, 36, 14, 19, 0, 0, 0, 0, 0, 0, 44, 14, 14, 27, 58, 14, 14, 39, 12, 12, 12, 12, 12, 36, 36, 14, 12, 12, 12, 12, 12, 61, 61, 62, 14, 14, 14, 14, 19, 0, 0, 0, 0, 0, 0, 52, 36, 36, 36, 36, 14, 19, 14, 14, 14, 14, 0, 36, 12, 12, 12, 12, 12, 36, 27, 58, 61, 62, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 60, 61, 61, 58, 14, 19, 52, 36, 36, 36, 36, 39, 14, 14, 38, 39, 14, 14, 38, 39, 14, 14, 38, 36, 36, 36, 36, 14, 19, 0, 0, 0, 1, 0, 36, 128, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 128, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 128, 129, 129, 129, 129, 129, 128, 129, 129, 129, 129, 129, 129, 129, 36, 36, 36, 36, 36, 36, 75, 75, 75, 130, 36, 131, 76, 76, 76, 76, 76, 76, 76, 76, 36, 36, 132, 132, 132, 132, 132, 132, 132, 132, 36, 39, 14, 14, 36, 36, 133, 134, 46, 46, 46, 46, 48, 46, 46, 46, 46, 46, 46, 47, 46, 46, 47, 47, 46, 133, 47, 46, 46, 46, 46, 46, 36, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 104, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 126, 36, 135, 136, 57, 137, 138, 36, 36, 36, 98, 98, 139, 104, 104, 104, 104, 104, 104, 104, 111, 139, 111, 98, 98, 98, 111, 78, 91, 53, 139, 104, 104, 111, 98, 98, 98, 124, 140, 141, 36, 36, 14, 14, 14, 14, 14, 14, 38, 142, 105, 98, 6, 98, 70, 98, 111, 111, 98, 98, 98, 98, 98, 91, 98, 143, 98, 98, 98, 98, 98, 139, 144, 98, 98, 98, 98, 98, 98, 139, 144, 139, 114, 70, 93, 145, 125, 125, 125, 125, 146, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 91, 36, 14, 14, 14, 36, 14, 14, 14, 36, 14, 14, 14, 36, 14, 38, 36, 22, 98, 140, 147, 14, 14, 14, 38, 36, 36, 36, 36, 42, 0, 148, 36, 14, 14, 14, 14, 14, 14, 39, 14, 14, 14, 14, 14, 14, 38, 14, 39, 58, 41, 36, 39, 14, 14, 14, 14, 14, 14, 36, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 36, 14, 14, 14, 14, 14, 14, 19, 36, 14, 14, 36, 36, 36, 36, 36, 36, 14, 14, 14, 0, 0, 52, 36, 36, 14, 14, 14, 14, 14, 14, 14, 81, 14, 14, 36, 36, 14, 14, 14, 14, 77, 14, 14, 36, 36, 36, 36, 36, 14, 14, 36, 36, 36, 36, 36, 39, 14, 14, 14, 36, 38, 14, 14, 14, 14, 14, 14, 39, 38, 36, 38, 39, 14, 14, 14, 81, 14, 14, 14, 14, 14, 38, 14, 36, 36, 39, 14, 14, 14, 14, 14, 14, 14, 14, 36, 81, 14, 14, 14, 14, 14, 36, 36, 39, 14, 14, 14, 14, 36, 36, 14, 14, 19, 0, 42, 52, 36, 36, 0, 0, 14, 14, 39, 14, 39, 14, 14, 14, 14, 14, 36, 36, 0, 52, 36, 42, 58, 58, 58, 58, 38, 36, 36, 36, 14, 14, 19, 52, 36, 39, 14, 14, 58, 58, 58, 149, 36, 36, 36, 36, 14, 14, 14, 36, 81, 58, 58, 58, 14, 38, 36, 36, 14, 14, 14, 14, 14, 36, 36, 36, 39, 14, 38, 36, 36, 36, 36, 36, 39, 14, 14, 14, 14, 38, 36, 36, 36, 36, 36, 36, 14, 38, 36, 36, 36, 14, 14, 14, 14, 14, 14, 14, 0, 0, 0, 0, 0, 0, 0, 1, 77, 14, 14, 36, 14, 14, 14, 12, 12, 12, 12, 12, 36, 36, 36, 36, 36, 36, 36, 42, 0, 0, 0, 0, 0, 44, 14, 58, 58, 36, 36, 36, 36, 36, 36, 36, 0, 0, 52, 12, 12, 12, 12, 12, 58, 58, 36, 36, 36, 36, 36, 36, 14, 19, 32, 38, 36, 36, 36, 36, 44, 14, 27, 77, 77, 0, 44, 36, 12, 12, 12, 12, 12, 32, 27, 58, 14, 14, 14, 14, 14, 14, 0, 0, 0, 0, 0, 0, 58, 27, 77, 36, 14, 14, 14, 38, 38, 14, 14, 39, 14, 14, 14, 14, 27, 36, 36, 36, 0, 0, 0, 0, 0, 52, 36, 36, 0, 0, 39, 14, 14, 14, 38, 39, 38, 36, 36, 42, 36, 36, 39, 14, 14, 0, 36, 0, 0, 0, 52, 36, 0, 0, 52, 36, 36, 36, 36, 36, 0, 0, 14, 14, 36, 36, 36, 36, 0, 0, 0, 36, 0, 0, 0, 0, 150, 58, 53, 14, 27, 58, 58, 58, 58, 58, 58, 58, 14, 14, 0, 36, 1, 77, 38, 36, 36, 36, 36, 36, 0, 0, 0, 0, 36, 36, 36, 36, 61, 61, 61, 61, 61, 36, 60, 61, 12, 12, 12, 12, 12, 61, 58, 151, 14, 38, 36, 36, 36, 36, 36, 39, 58, 58, 41, 36, 36, 36, 36, 36, 14, 14, 14, 14, 152, 70, 114, 14, 14, 99, 14, 70, 70, 14, 14, 14, 14, 14, 14, 14, 16, 114, 14, 14, 14, 14, 14, 14, 14, 14, 14, 70, 12, 12, 12, 12, 12, 36, 36, 58, 0, 0, 1, 36, 36, 36, 36, 36, 0, 0, 0, 1, 58, 14, 14, 14, 14, 14, 77, 36, 36, 36, 36, 36, 12, 12, 12, 12, 12, 39, 14, 14, 14, 14, 14, 14, 36, 36, 39, 14, 19, 0, 0, 0, 0, 0, 0, 0, 98, 36, 36, 36, 36, 36, 36, 36, 14, 14, 14, 14, 14, 36, 19, 1, 0, 0, 36, 36, 36, 36, 36, 36, 14, 14, 19, 0, 0, 14, 19, 0, 0, 44, 19, 0, 0, 0, 14, 14, 14, 14, 14, 14, 14, 0, 0, 14, 14, 0, 44, 36, 36, 36, 36, 36, 36, 38, 39, 38, 39, 14, 38, 14, 14, 14, 14, 14, 14, 39, 39, 14, 14, 14, 39, 14, 14, 14, 14, 14, 14, 14, 14, 39, 14, 38, 39, 14, 14, 14, 38, 14, 14, 14, 38, 14, 14, 14, 14, 14, 14, 39, 14, 38, 14, 14, 38, 38, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 12, 12, 12, 12, 12, 12, 12, 12, 12, 0, 0, 0, 44, 14, 19, 0, 0, 0, 0, 0, 0, 0, 0, 44, 14, 14, 14, 19, 14, 14, 14, 14, 14, 14, 14, 44, 27, 58, 77, 36, 36, 36, 36, 36, 36, 36, 42, 0, 0, 14, 14, 38, 39, 14, 14, 14, 14, 39, 38, 38, 39, 39, 14, 14, 14, 14, 38, 14, 14, 39, 39, 36, 36, 36, 38, 36, 39, 39, 39, 39, 14, 39, 38, 38, 39, 39, 39, 39, 39, 39, 38, 38, 39, 14, 38, 14, 14, 14, 38, 14, 14, 39, 14, 38, 38, 14, 14, 14, 14, 14, 39, 14, 14, 39, 14, 39, 14, 14, 39, 14, 14, 28, 28, 28, 28, 28, 28, 153, 36, 28, 28, 28, 28, 28, 28, 28, 38, 28, 28, 28, 28, 28, 14, 36, 36, 28, 28, 28, 28, 28, 153, 36, 36, 36, 36, 36, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 98, 124, 36, 36, 36, 36, 36, 36, 98, 98, 98, 98, 124, 36, 36, 36, 98, 98, 98, 98, 98, 98, 14, 98, 98, 98, 100, 101, 98, 98, 101, 98, 98, 98, 98, 98, 98, 100, 14, 14, 101, 101, 101, 98, 98, 98, 98, 100, 100, 101, 98, 98, 98, 98, 98, 98, 14, 14, 14, 101, 98, 98, 98, 98, 98, 98, 98, 100, 14, 14, 14, 14, 14, 14, 101, 98, 98, 98, 98, 98, 98, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 98, 98, 98, 98, 98, 110, 98, 98, 98, 98, 98, 98, 98, 14, 14, 14, 14, 98, 98, 98, 98, 14, 14, 14, 98, 98, 98, 14, 14, 14, 85, 155, 91, 14, 14, 124, 36, 36, 36, 36, 36, 36, 36, 98, 98, 124, 36, 36, 36, 36, 36, 42, 36, 36, 36, 36, 36, 36, 36, }; static RE_UINT8 re_line_break_stage_5[] = { 16, 16, 16, 18, 22, 20, 20, 21, 19, 6, 3, 12, 9, 10, 12, 3, 1, 36, 12, 9, 8, 15, 8, 7, 11, 11, 8, 8, 12, 12, 12, 6, 12, 1, 9, 36, 18, 2, 12, 16, 16, 29, 4, 1, 10, 9, 9, 9, 12, 25, 25, 12, 25, 3, 12, 18, 25, 25, 17, 12, 25, 1, 17, 25, 12, 17, 16, 4, 4, 4, 4, 16, 0, 0, 8, 12, 12, 0, 0, 12, 0, 8, 18, 0, 0, 16, 18, 16, 16, 12, 6, 16, 37, 37, 37, 0, 37, 12, 12, 10, 10, 10, 16, 6, 16, 0, 6, 6, 10, 11, 11, 12, 6, 12, 8, 6, 18, 18, 0, 10, 0, 24, 24, 24, 24, 0, 0, 9, 24, 12, 17, 17, 4, 17, 17, 18, 4, 6, 4, 12, 1, 2, 18, 17, 12, 4, 4, 0, 31, 31, 32, 32, 33, 33, 18, 12, 2, 0, 5, 24, 18, 9, 0, 18, 18, 4, 18, 28, 26, 25, 3, 3, 1, 3, 14, 14, 14, 18, 20, 20, 3, 25, 5, 5, 8, 1, 2, 5, 30, 12, 2, 25, 9, 12, 12, 14, 13, 13, 2, 12, 13, 12, 12, 13, 13, 25, 25, 13, 2, 1, 0, 6, 6, 18, 1, 18, 26, 26, 1, 0, 0, 13, 2, 13, 13, 5, 5, 1, 2, 2, 13, 16, 5, 13, 0, 38, 13, 38, 38, 13, 38, 0, 16, 5, 5, 38, 38, 5, 13, 0, 38, 38, 10, 12, 31, 0, 34, 35, 35, 35, 32, 0, 0, 33, 27, 27, 0, 37, 16, 37, 8, 2, 2, 8, 6, 1, 2, 14, 13, 1, 13, 9, 10, 13, 0, 30, 13, 6, 13, 2, 12, 38, 38, 12, 9, 0, 23, 25, 14, 0, 16, 17, 18, 24, 1, 1, 25, 0, 39, 39, 3, 5, }; /* Line_Break: 8608 bytes. */ RE_UINT32 re_get_line_break(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_line_break_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_line_break_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_line_break_stage_3[pos + f] << 3; f = code >> 1; code ^= f << 1; pos = (RE_UINT32)re_line_break_stage_4[pos + f] << 1; value = re_line_break_stage_5[pos + code]; return value; } /* Numeric_Type. */ static RE_UINT8 re_numeric_type_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 11, 11, 11, 12, 13, 14, 15, 11, 11, 11, 16, 11, 11, 11, 11, 11, 11, 17, 18, 19, 20, 11, 21, 22, 11, 11, 23, 11, 11, 11, 11, 11, 11, 11, 11, 24, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, }; static RE_UINT8 re_numeric_type_stage_2[] = { 0, 1, 1, 1, 1, 1, 2, 3, 1, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 12, 1, 1, 13, 14, 15, 16, 17, 18, 19, 1, 1, 1, 20, 21, 1, 1, 22, 1, 1, 23, 1, 1, 1, 1, 24, 1, 1, 1, 25, 26, 27, 1, 28, 1, 1, 1, 29, 1, 1, 30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 31, 32, 1, 33, 1, 34, 1, 1, 35, 1, 36, 1, 1, 1, 1, 1, 37, 38, 1, 1, 39, 40, 1, 1, 1, 41, 1, 1, 1, 1, 1, 1, 1, 42, 1, 1, 1, 43, 1, 1, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 1, 1, 1, 46, 1, 1, 1, 1, 1, 1, 1, 47, 48, 1, 1, 1, 1, 1, 1, 1, 1, 49, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 50, 1, 51, 52, 53, 54, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 55, 1, 1, 1, 1, 1, 15, 1, 56, 57, 58, 59, 1, 1, 1, 60, 61, 62, 63, 64, 1, 65, 1, 66, 67, 54, 1, 68, 1, 69, 70, 71, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 72, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 73, 74, 1, 1, 1, 1, 1, 1, 1, 75, 1, 1, 1, 76, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 77, 1, 1, 1, 1, 1, 1, 1, 1, 78, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 79, 80, 1, 1, 1, 1, 1, 1, 1, 81, 82, 83, 1, 1, 1, 1, 1, 1, 1, 84, 1, 1, 1, 1, 1, 85, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 86, 1, 1, 1, 1, 1, 1, 87, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 84, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_numeric_type_stage_3[] = { 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0, 8, 0, 0, 0, 4, 0, 0, 0, 9, 0, 0, 0, 4, 0, 0, 1, 0, 0, 0, 1, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 13, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 14, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 16, 17, 0, 0, 0, 0, 0, 18, 19, 20, 0, 0, 0, 0, 0, 0, 21, 22, 0, 0, 23, 0, 0, 0, 24, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 27, 28, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 29, 0, 0, 0, 0, 30, 31, 0, 30, 32, 0, 0, 33, 0, 0, 0, 34, 0, 0, 0, 0, 35, 0, 0, 0, 0, 0, 0, 0, 0, 36, 0, 0, 0, 0, 0, 37, 0, 26, 0, 38, 39, 40, 41, 36, 0, 0, 42, 0, 0, 0, 0, 43, 0, 44, 45, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 47, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 50, 0, 0, 0, 51, 52, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 44, 0, 0, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 57, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 58, 59, 60, 0, 0, 0, 56, 0, 3, 0, 0, 0, 0, 0, 61, 0, 62, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 63, 0, 55, 64, 26, 65, 66, 19, 67, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 0, 70, 71, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 73, 74, 0, 75, 0, 76, 77, 0, 0, 0, 0, 78, 79, 19, 0, 0, 80, 81, 82, 0, 0, 83, 0, 0, 73, 73, 0, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 85, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 87, 88, 0, 0, 0, 1, 0, 89, 0, 0, 0, 0, 1, 90, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 91, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92, 19, 19, 19, 93, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 94, 95, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 97, 98, 0, 0, 0, 0, 0, 0, 75, 0, 99, 0, 0, 0, 0, 0, 0, 0, 58, 0, 0, 43, 0, 0, 0, 100, 0, 58, 0, 0, 0, 0, 0, 0, 0, 35, 0, 0, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 103, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 0, 0, 0, 0, }; static RE_UINT8 re_numeric_type_stage_4[] = { 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 3, 4, 1, 2, 0, 0, 5, 1, 0, 0, 5, 1, 6, 7, 5, 1, 8, 0, 5, 1, 9, 0, 5, 1, 0, 10, 5, 1, 11, 0, 1, 12, 13, 0, 0, 14, 15, 16, 0, 17, 18, 0, 1, 2, 19, 7, 0, 0, 1, 20, 1, 2, 1, 2, 0, 0, 21, 22, 23, 22, 0, 0, 0, 0, 19, 19, 19, 19, 19, 19, 24, 7, 0, 0, 23, 25, 26, 27, 19, 23, 25, 13, 0, 28, 29, 30, 0, 0, 31, 32, 23, 33, 34, 0, 0, 0, 0, 35, 36, 0, 0, 0, 37, 7, 0, 9, 0, 0, 38, 0, 19, 7, 0, 0, 0, 19, 37, 19, 0, 0, 37, 19, 35, 0, 0, 0, 39, 0, 0, 0, 0, 40, 0, 0, 0, 35, 0, 0, 41, 42, 0, 0, 0, 43, 44, 0, 0, 0, 0, 36, 18, 0, 0, 36, 0, 18, 0, 0, 0, 0, 18, 0, 43, 0, 0, 0, 45, 0, 0, 0, 0, 46, 0, 0, 47, 43, 0, 0, 48, 0, 0, 0, 0, 0, 0, 39, 0, 0, 42, 42, 0, 0, 0, 40, 0, 0, 0, 17, 0, 49, 18, 0, 0, 0, 0, 45, 0, 43, 0, 0, 0, 0, 40, 0, 0, 0, 45, 0, 0, 45, 39, 0, 42, 0, 0, 0, 45, 43, 0, 0, 0, 0, 0, 18, 17, 19, 0, 0, 0, 0, 11, 0, 0, 39, 39, 18, 0, 0, 50, 0, 36, 19, 19, 19, 19, 19, 13, 0, 19, 19, 19, 18, 0, 51, 0, 0, 37, 19, 19, 13, 13, 0, 0, 0, 42, 40, 0, 0, 0, 0, 52, 0, 0, 0, 0, 19, 0, 0, 0, 37, 36, 19, 0, 0, 0, 0, 0, 53, 0, 0, 17, 13, 0, 0, 0, 54, 19, 19, 8, 19, 55, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 57, 0, 53, 0, 0, 0, 37, 0, 0, 0, 0, 0, 8, 23, 25, 19, 10, 0, 0, 58, 59, 60, 1, 0, 0, 0, 0, 5, 1, 37, 19, 16, 0, 0, 0, 1, 61, 1, 12, 9, 0, 19, 10, 0, 0, 0, 0, 1, 62, 7, 0, 0, 0, 19, 19, 7, 0, 0, 5, 1, 1, 1, 1, 1, 1, 23, 63, 0, 0, 40, 0, 0, 0, 39, 43, 0, 43, 0, 40, 0, 35, 0, 0, 0, 42, }; static RE_UINT8 re_numeric_type_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 1, 1, 0, 0, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 0, 0, 0, }; /* Numeric_Type: 2304 bytes. */ RE_UINT32 re_get_numeric_type(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_numeric_type_stage_1[f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_numeric_type_stage_2[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_numeric_type_stage_3[pos + f] << 2; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_numeric_type_stage_4[pos + f] << 3; value = re_numeric_type_stage_5[pos + code]; return value; } /* Numeric_Value. */ static RE_UINT8 re_numeric_value_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 11, 11, 11, 12, 13, 14, 15, 11, 11, 11, 16, 11, 11, 11, 11, 11, 11, 17, 18, 19, 20, 11, 21, 22, 11, 11, 23, 11, 11, 11, 11, 11, 11, 11, 11, 24, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, }; static RE_UINT8 re_numeric_value_stage_2[] = { 0, 1, 1, 1, 1, 1, 2, 3, 1, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 12, 1, 1, 13, 14, 15, 16, 17, 18, 19, 1, 1, 1, 20, 21, 1, 1, 22, 1, 1, 23, 1, 1, 1, 1, 24, 1, 1, 1, 25, 26, 27, 1, 28, 1, 1, 1, 29, 1, 1, 30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 31, 32, 1, 33, 1, 34, 1, 1, 35, 1, 36, 1, 1, 1, 1, 1, 37, 38, 1, 1, 39, 40, 1, 1, 1, 41, 1, 1, 1, 1, 1, 1, 1, 42, 1, 1, 1, 43, 1, 1, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 1, 1, 1, 46, 1, 1, 1, 1, 1, 1, 1, 47, 48, 1, 1, 1, 1, 1, 1, 1, 1, 49, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 50, 1, 51, 52, 53, 54, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 55, 1, 1, 1, 1, 1, 15, 1, 56, 57, 58, 59, 1, 1, 1, 60, 61, 62, 63, 64, 1, 65, 1, 66, 67, 54, 1, 68, 1, 69, 70, 71, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 72, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 73, 74, 1, 1, 1, 1, 1, 1, 1, 75, 1, 1, 1, 76, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 77, 1, 1, 1, 1, 1, 1, 1, 1, 78, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 79, 80, 1, 1, 1, 1, 1, 1, 1, 81, 82, 83, 1, 1, 1, 1, 1, 1, 1, 84, 1, 1, 1, 1, 1, 85, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 86, 1, 1, 1, 1, 1, 1, 87, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 88, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_numeric_value_stage_3[] = { 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0, 8, 0, 0, 0, 4, 0, 0, 0, 9, 0, 0, 0, 4, 0, 0, 1, 0, 0, 0, 1, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 13, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 14, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 15, 3, 0, 0, 0, 0, 0, 16, 17, 18, 0, 0, 0, 0, 0, 0, 19, 20, 0, 0, 21, 0, 0, 0, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 0, 28, 29, 0, 28, 30, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 34, 0, 0, 0, 0, 0, 35, 0, 36, 0, 37, 38, 39, 40, 41, 0, 0, 42, 0, 0, 0, 0, 43, 0, 44, 45, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 47, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 50, 0, 0, 0, 51, 52, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 57, 0, 0, 0, 0, 0, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0, 0, 0, 0, 60, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 61, 0, 0, 0, 62, 0, 0, 0, 0, 0, 0, 0, 63, 64, 65, 0, 0, 0, 66, 0, 3, 0, 0, 0, 0, 0, 67, 0, 68, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 69, 0, 70, 71, 72, 73, 74, 75, 76, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78, 0, 79, 80, 0, 0, 0, 81, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 82, 83, 0, 84, 0, 85, 86, 0, 0, 0, 0, 87, 88, 89, 0, 0, 90, 91, 92, 0, 0, 93, 0, 0, 94, 94, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 97, 0, 0, 0, 0, 0, 0, 98, 99, 0, 0, 0, 1, 0, 100, 0, 0, 0, 0, 1, 101, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 103, 104, 105, 106, 107, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 108, 109, 0, 0, 0, 0, 0, 0, 0, 110, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 111, 112, 0, 0, 0, 0, 0, 0, 113, 0, 114, 0, 0, 0, 0, 0, 0, 0, 115, 0, 0, 116, 0, 0, 0, 117, 0, 118, 0, 0, 0, 0, 0, 0, 0, 119, 0, 0, 120, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 121, 122, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 0, 0, 0, 0, 0, 123, 0, 0, 0, 124, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 125, 0, 0, 0, 0, 0, 0, 0, 0, 126, 0, 0, 0, }; static RE_UINT8 re_numeric_value_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 4, 0, 5, 6, 1, 2, 3, 0, 0, 0, 0, 0, 0, 7, 8, 9, 0, 0, 0, 0, 0, 7, 8, 9, 0, 10, 11, 0, 0, 7, 8, 9, 12, 13, 0, 0, 0, 7, 8, 9, 14, 0, 0, 0, 0, 7, 8, 9, 0, 0, 1, 15, 0, 7, 8, 9, 16, 17, 0, 0, 1, 2, 18, 19, 20, 0, 0, 0, 0, 0, 21, 2, 22, 23, 24, 25, 0, 0, 0, 26, 27, 0, 0, 0, 1, 2, 3, 0, 1, 2, 3, 0, 0, 0, 0, 0, 1, 2, 28, 0, 0, 0, 0, 0, 29, 2, 3, 0, 0, 0, 0, 0, 30, 31, 32, 33, 34, 35, 36, 37, 34, 35, 36, 37, 38, 39, 40, 0, 0, 0, 0, 0, 34, 35, 36, 41, 42, 34, 35, 36, 41, 42, 34, 35, 36, 41, 42, 0, 0, 0, 43, 44, 45, 46, 2, 47, 0, 0, 0, 0, 0, 48, 49, 50, 34, 35, 51, 49, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52, 0, 53, 0, 0, 0, 0, 0, 0, 21, 2, 3, 0, 0, 0, 54, 0, 0, 0, 0, 0, 48, 55, 0, 0, 34, 35, 56, 0, 0, 0, 0, 0, 0, 0, 57, 58, 59, 60, 61, 62, 0, 0, 0, 0, 63, 64, 65, 66, 0, 67, 0, 0, 0, 0, 0, 0, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 0, 0, 0, 0, 0, 0, 0, 0, 70, 0, 0, 0, 0, 71, 72, 73, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 75, 0, 76, 0, 0, 0, 0, 0, 0, 0, 0, 0, 77, 78, 0, 0, 0, 0, 0, 0, 79, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 0, 82, 0, 0, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 0, 0, 0, 0, 84, 85, 0, 0, 0, 0, 86, 87, 0, 88, 0, 0, 0, 0, 89, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 90, 0, 0, 0, 0, 0, 5, 0, 5, 0, 0, 0, 0, 0, 0, 0, 91, 0, 0, 0, 0, 0, 0, 0, 0, 92, 0, 0, 0, 15, 75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 93, 0, 0, 0, 94, 0, 0, 0, 0, 0, 0, 0, 0, 95, 0, 0, 0, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 97, 0, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 0, 0, 0, 0, 0, 0, 0, 99, 68, 0, 0, 0, 0, 0, 0, 0, 75, 0, 0, 0, 100, 0, 0, 0, 0, 0, 0, 0, 0, 101, 0, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 0, 0, 0, 0, 0, 0, 103, 0, 0, 0, 48, 49, 104, 0, 0, 0, 0, 0, 0, 0, 0, 105, 106, 0, 0, 0, 0, 107, 0, 108, 0, 75, 0, 0, 0, 0, 0, 103, 0, 0, 0, 0, 0, 0, 0, 109, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 110, 0, 111, 8, 9, 57, 58, 112, 113, 114, 115, 116, 117, 118, 0, 0, 0, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 122, 131, 132, 0, 0, 0, 133, 0, 0, 0, 0, 0, 21, 2, 22, 23, 24, 134, 135, 0, 136, 0, 0, 0, 0, 0, 0, 0, 137, 0, 138, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 140, 0, 0, 0, 0, 0, 0, 0, 0, 141, 142, 0, 0, 0, 0, 0, 0, 21, 143, 0, 111, 144, 145, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 111, 145, 0, 0, 0, 0, 0, 146, 147, 0, 0, 0, 0, 0, 0, 0, 0, 148, 34, 35, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 34, 163, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 164, 0, 0, 0, 0, 0, 0, 0, 165, 0, 0, 111, 145, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 163, 0, 0, 21, 166, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 167, 168, 34, 35, 149, 150, 169, 152, 170, 171, 0, 0, 0, 0, 48, 49, 50, 172, 173, 174, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 9, 21, 2, 22, 23, 24, 175, 0, 0, 0, 0, 0, 0, 1, 2, 22, 0, 1, 2, 22, 23, 176, 0, 0, 0, 8, 9, 49, 177, 35, 178, 2, 179, 180, 181, 9, 182, 183, 182, 184, 185, 186, 187, 188, 189, 144, 190, 191, 192, 193, 194, 195, 196, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 197, 198, 199, 0, 0, 0, 0, 0, 0, 0, 34, 35, 149, 150, 200, 0, 0, 0, 0, 0, 0, 7, 8, 9, 1, 2, 201, 8, 9, 1, 2, 201, 8, 9, 0, 111, 8, 9, 0, 0, 0, 0, 202, 49, 104, 29, 0, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 0, 0, 203, 0, 0, 0, 0, 0, 0, 98, 0, 0, 0, 0, 0, 0, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 0, 91, 0, 0, 0, 0, 0, 204, 0, 0, 88, 0, 0, 0, 88, 0, 0, 101, 0, 0, 0, 0, 73, 0, 0, 0, 0, 0, 0, 73, 0, 0, 0, 0, 0, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 107, 0, 0, 0, 0, 205, 0, 0, 0, 0, 0, 0, 0, 0, 206, 0, 0, 0, }; static RE_UINT8 re_numeric_value_stage_5[] = { 0, 0, 0, 0, 2, 27, 29, 31, 33, 35, 37, 39, 41, 43, 0, 0, 0, 0, 29, 31, 0, 27, 0, 0, 12, 17, 22, 0, 0, 0, 2, 27, 29, 31, 33, 35, 37, 39, 41, 43, 3, 7, 10, 12, 22, 50, 0, 0, 0, 0, 12, 17, 22, 3, 7, 10, 44, 89, 98, 0, 27, 29, 31, 0, 44, 89, 98, 12, 17, 22, 0, 0, 41, 43, 17, 28, 30, 32, 34, 36, 38, 40, 42, 1, 0, 27, 29, 31, 41, 43, 44, 54, 64, 74, 84, 85, 86, 87, 88, 89, 107, 0, 0, 0, 0, 0, 51, 52, 53, 0, 0, 0, 41, 43, 27, 0, 2, 0, 0, 0, 8, 6, 5, 13, 21, 11, 15, 19, 23, 9, 24, 7, 14, 20, 25, 27, 27, 29, 31, 33, 35, 37, 39, 41, 43, 44, 45, 46, 84, 89, 93, 98, 98, 102, 107, 0, 0, 37, 84, 111, 116, 2, 0, 0, 47, 48, 49, 50, 51, 52, 53, 54, 0, 0, 2, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 27, 29, 31, 41, 43, 44, 2, 0, 0, 27, 29, 31, 33, 35, 37, 39, 41, 43, 44, 43, 44, 27, 29, 0, 17, 0, 0, 0, 0, 0, 2, 44, 54, 64, 0, 31, 33, 0, 0, 43, 44, 0, 0, 44, 54, 64, 74, 84, 85, 86, 87, 0, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 0, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 0, 35, 0, 0, 0, 0, 0, 29, 0, 0, 35, 0, 0, 39, 0, 0, 27, 0, 0, 39, 0, 0, 0, 107, 0, 31, 0, 0, 0, 43, 0, 0, 29, 0, 0, 0, 35, 0, 33, 0, 0, 0, 0, 128, 44, 0, 0, 0, 0, 0, 0, 98, 31, 0, 0, 0, 89, 0, 0, 0, 128, 0, 0, 0, 0, 0, 130, 0, 0, 29, 0, 41, 0, 37, 0, 0, 0, 44, 0, 98, 54, 64, 0, 0, 74, 0, 0, 0, 0, 31, 31, 31, 0, 0, 0, 33, 0, 0, 27, 0, 0, 0, 43, 54, 0, 0, 44, 0, 41, 0, 0, 0, 0, 0, 39, 0, 0, 0, 43, 0, 0, 0, 89, 0, 0, 0, 33, 0, 0, 0, 29, 0, 0, 98, 0, 0, 0, 0, 37, 0, 37, 0, 0, 0, 0, 0, 2, 0, 39, 41, 43, 2, 12, 17, 22, 3, 7, 10, 0, 0, 0, 0, 0, 31, 0, 0, 0, 44, 0, 37, 0, 37, 0, 44, 0, 0, 0, 0, 0, 27, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 12, 17, 27, 35, 84, 93, 102, 111, 35, 44, 84, 89, 93, 98, 102, 35, 44, 84, 89, 93, 98, 107, 111, 44, 27, 27, 27, 29, 29, 29, 29, 35, 44, 44, 44, 44, 44, 64, 84, 84, 84, 84, 89, 91, 93, 93, 93, 93, 84, 17, 17, 21, 22, 0, 0, 0, 0, 0, 2, 12, 90, 91, 92, 93, 94, 95, 96, 97, 27, 35, 44, 84, 0, 88, 0, 0, 0, 0, 97, 0, 0, 27, 29, 44, 54, 89, 0, 0, 27, 29, 31, 44, 54, 89, 98, 107, 33, 35, 44, 54, 29, 31, 33, 33, 35, 44, 54, 89, 0, 0, 27, 44, 54, 89, 29, 31, 26, 17, 0, 0, 43, 44, 54, 64, 74, 84, 85, 86, 0, 0, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 119, 120, 122, 123, 124, 125, 126, 4, 9, 12, 13, 16, 17, 18, 21, 22, 24, 44, 54, 89, 98, 0, 27, 84, 0, 0, 27, 44, 54, 33, 44, 54, 89, 0, 0, 27, 35, 44, 84, 89, 98, 87, 88, 89, 90, 95, 96, 97, 17, 12, 13, 21, 0, 54, 64, 74, 84, 85, 86, 87, 88, 89, 98, 2, 27, 98, 0, 0, 0, 86, 87, 88, 0, 39, 41, 43, 33, 43, 27, 29, 31, 41, 43, 27, 29, 31, 33, 35, 29, 31, 31, 33, 35, 27, 29, 31, 31, 33, 35, 118, 121, 33, 35, 31, 31, 33, 33, 33, 33, 37, 39, 39, 39, 41, 41, 43, 43, 43, 43, 29, 31, 33, 35, 37, 27, 35, 35, 29, 31, 27, 29, 13, 21, 24, 13, 21, 7, 12, 9, 12, 12, 17, 13, 21, 74, 84, 33, 35, 37, 39, 41, 43, 0, 41, 43, 0, 44, 89, 107, 127, 128, 129, 130, 0, 0, 87, 88, 0, 0, 41, 43, 2, 27, 2, 2, 27, 29, 33, 0, 0, 0, 0, 0, 0, 64, 0, 33, 0, 0, 43, 0, 0, 0, }; /* Numeric_Value: 3228 bytes. */ RE_UINT32 re_get_numeric_value(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_numeric_value_stage_1[f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_numeric_value_stage_2[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_numeric_value_stage_3[pos + f] << 3; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_numeric_value_stage_4[pos + f] << 2; value = re_numeric_value_stage_5[pos + code]; return value; } /* Bidi_Mirrored. */ static RE_UINT8 re_bidi_mirrored_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_bidi_mirrored_stage_2[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_bidi_mirrored_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 1, 4, 5, 1, 6, 7, 8, 1, 9, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 12, 1, 1, 1, 1, }; static RE_UINT8 re_bidi_mirrored_stage_4[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 5, 3, 3, 3, 3, 3, 6, 7, 8, 3, 3, 9, 3, 3, 10, 11, 12, 13, 14, 3, 3, 3, 3, 3, 3, 3, 3, 15, 3, 16, 3, 3, 3, 3, 3, 3, 17, 18, 19, 20, 21, 22, 3, 3, 3, 3, 23, 3, 3, 3, 3, 3, 3, 3, 24, 3, 3, 3, 3, 3, 3, 3, 3, 25, 3, 3, 26, 27, 3, 3, 3, 3, 3, 28, 29, 30, 31, 32, }; static RE_UINT8 re_bidi_mirrored_stage_5[] = { 0, 0, 0, 0, 0, 3, 0, 80, 0, 0, 0, 40, 0, 0, 0, 40, 0, 0, 0, 0, 0, 8, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 96, 0, 0, 0, 0, 0, 0, 96, 0, 96, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 30, 63, 98, 188, 87, 248, 15, 250, 255, 31, 60, 128, 245, 207, 255, 255, 255, 159, 7, 1, 204, 255, 255, 193, 0, 62, 195, 255, 255, 63, 255, 255, 0, 15, 0, 0, 3, 6, 0, 0, 0, 0, 0, 0, 0, 255, 63, 0, 121, 59, 120, 112, 252, 255, 0, 0, 248, 255, 255, 249, 255, 255, 0, 1, 63, 194, 55, 31, 58, 3, 240, 51, 0, 252, 255, 223, 83, 122, 48, 112, 0, 0, 128, 1, 48, 188, 25, 254, 255, 255, 255, 255, 207, 191, 255, 255, 255, 255, 127, 80, 124, 112, 136, 47, 60, 54, 0, 48, 255, 3, 0, 0, 0, 255, 243, 15, 0, 0, 0, 0, 0, 0, 0, 126, 48, 0, 0, 0, 0, 3, 0, 80, 0, 0, 0, 40, 0, 0, 0, 168, 13, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, }; /* Bidi_Mirrored: 489 bytes. */ RE_UINT32 re_get_bidi_mirrored(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_bidi_mirrored_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_bidi_mirrored_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_bidi_mirrored_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_bidi_mirrored_stage_4[pos + f] << 6; pos += code; value = (re_bidi_mirrored_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Indic_Positional_Category. */ static RE_UINT8 re_indic_positional_category_stage_1[] = { 0, 1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_indic_positional_category_stage_2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 9, 0, 10, 11, 12, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 15, 16, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 0, 0, 0, 0, 0, 19, 20, 21, 22, 23, 24, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_indic_positional_category_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 3, 4, 5, 0, 6, 0, 0, 7, 8, 9, 5, 0, 10, 0, 0, 7, 11, 0, 0, 12, 10, 0, 0, 7, 13, 0, 5, 0, 6, 0, 0, 14, 15, 16, 5, 0, 17, 0, 0, 18, 19, 9, 0, 0, 20, 0, 0, 21, 22, 23, 5, 0, 6, 0, 0, 14, 24, 25, 5, 0, 6, 0, 0, 18, 26, 9, 5, 0, 27, 0, 0, 0, 28, 29, 0, 27, 0, 0, 0, 30, 31, 0, 0, 0, 0, 0, 0, 32, 33, 0, 0, 0, 0, 34, 0, 35, 0, 0, 0, 36, 37, 38, 39, 40, 41, 0, 0, 0, 0, 0, 42, 43, 0, 44, 45, 46, 47, 48, 0, 0, 0, 0, 0, 0, 0, 49, 0, 49, 0, 50, 0, 50, 0, 0, 0, 51, 52, 53, 0, 0, 0, 0, 54, 55, 0, 0, 0, 0, 0, 0, 0, 56, 57, 0, 0, 0, 0, 58, 0, 0, 0, 59, 60, 61, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 63, 64, 0, 65, 66, 67, 0, 68, 0, 0, 0, 69, 70, 0, 0, 71, 72, 0, 0, 0, 0, 0, 0, 0, 0, 0, 73, 74, 75, 76, 0, 77, 0, 0, 0, 0, 0, 78, 0, 0, 79, 80, 0, 81, 82, 0, 0, 83, 0, 84, 70, 0, 0, 1, 0, 0, 85, 86, 0, 87, 0, 0, 0, 88, 89, 90, 0, 0, 91, 0, 0, 0, 92, 93, 0, 94, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 97, 0, 0, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 0, 0, 100, 101, 0, 0, 0, 67, 0, 0, 102, 0, 0, 0, 0, 103, 0, 104, 105, 0, 0, 0, 106, 67, 0, 0, 107, 108, 0, 0, 0, 0, 0, 109, 110, 0, 0, 0, 0, 0, 0, 0, 0, 0, 111, 112, 0, 6, 0, 0, 18, 113, 9, 114, 115, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 116, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 118, 119, 120, 121, 0, 0, 0, 0, 0, 122, 123, 0, 0, 0, 0, 0, 124, 125, 0, 0, 0, 0, 0, 126, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_indic_positional_category_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 4, 5, 6, 7, 1, 2, 8, 5, 9, 10, 7, 1, 6, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 10, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 4, 5, 6, 3, 11, 12, 13, 14, 0, 0, 0, 0, 15, 0, 0, 0, 0, 10, 2, 0, 0, 0, 0, 0, 0, 5, 3, 0, 10, 16, 10, 17, 0, 1, 0, 18, 0, 0, 0, 0, 0, 5, 6, 7, 10, 19, 15, 5, 0, 0, 0, 0, 0, 0, 0, 3, 20, 5, 6, 3, 11, 21, 13, 22, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 2, 23, 0, 24, 12, 25, 26, 0, 2, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 8, 23, 1, 27, 1, 1, 0, 0, 0, 10, 3, 0, 0, 0, 0, 28, 8, 23, 19, 29, 30, 1, 0, 0, 0, 15, 23, 0, 0, 0, 0, 8, 5, 3, 24, 12, 25, 26, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 15, 8, 1, 3, 3, 4, 31, 32, 33, 20, 8, 1, 1, 6, 3, 0, 0, 34, 34, 35, 10, 1, 1, 1, 16, 20, 8, 1, 1, 6, 10, 3, 0, 34, 34, 36, 0, 1, 1, 1, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 18, 18, 10, 0, 0, 4, 18, 37, 6, 38, 38, 1, 1, 2, 37, 1, 3, 1, 0, 0, 18, 6, 6, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 20, 17, 39, 1, 1, 17, 23, 2, 18, 3, 0, 0, 0, 8, 6, 0, 0, 6, 3, 8, 23, 15, 8, 8, 8, 0, 10, 1, 16, 0, 0, 0, 0, 0, 0, 40, 41, 2, 8, 8, 5, 15, 0, 0, 0, 0, 0, 8, 20, 0, 0, 17, 3, 0, 0, 0, 0, 0, 0, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 1, 17, 6, 42, 43, 24, 25, 2, 20, 1, 1, 1, 1, 10, 0, 0, 0, 0, 10, 0, 1, 40, 44, 45, 2, 8, 0, 0, 8, 40, 8, 8, 5, 17, 0, 0, 8, 8, 46, 34, 8, 35, 8, 8, 23, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 10, 39, 20, 0, 0, 0, 0, 11, 40, 1, 17, 6, 3, 15, 2, 20, 1, 17, 7, 40, 24, 24, 41, 1, 1, 1, 1, 16, 18, 1, 1, 23, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 47, 48, 24, 25, 19, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 7, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 23, 0, 0, 0, 0, 0, 0, 15, 6, 17, 9, 1, 23, 6, 0, 0, 0, 0, 2, 1, 8, 20, 20, 1, 8, 0, 0, 0, 0, 0, 0, 0, 0, 8, 4, 49, 8, 7, 1, 1, 1, 24, 17, 0, 0, 0, 0, 1, 16, 50, 6, 6, 1, 6, 6, 2, 51, 51, 51, 52, 0, 18, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 10, 0, 0, 0, 15, 5, 2, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 3, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 6, 0, 0, 0, 0, 18, 6, 17, 6, 7, 0, 10, 8, 1, 6, 24, 2, 8, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 1, 17, 54, 41, 40, 55, 3, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 15, 2, 0, 2, 1, 56, 57, 58, 46, 35, 1, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 7, 9, 0, 0, 15, 0, 0, 0, 0, 0, 0, 15, 20, 8, 40, 23, 5, 0, 59, 6, 10, 52, 0, 0, 6, 7, 0, 0, 0, 0, 17, 3, 0, 0, 20, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 6, 6, 6, 1, 1, 16, 0, 0, 0, 0, 4, 5, 7, 2, 5, 3, 0, 0, 1, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 1, 6, 41, 38, 17, 3, 16, 0, 0, 0, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 0, 15, 9, 6, 6, 6, 1, 19, 23, 0, 0, 0, 0, 10, 3, 0, 0, 0, 0, 0, 0, 0, 8, 5, 1, 30, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 4, 5, 7, 1, 17, 3, 0, 0, 2, 8, 23, 11, 12, 13, 33, 0, 0, 8, 0, 1, 1, 1, 16, 0, 1, 1, 16, 0, 0, 0, 0, 0, 4, 5, 6, 6, 39, 60, 33, 26, 2, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 9, 6, 6, 0, 49, 32, 1, 5, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 8, 5, 6, 6, 7, 2, 20, 5, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 20, 9, 6, 1, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 10, 8, 1, 6, 41, 7, 1, 0, 0, }; static RE_UINT8 re_indic_positional_category_stage_5[] = { 0, 0, 5, 5, 5, 1, 6, 0, 1, 2, 1, 6, 6, 6, 6, 5, 1, 1, 2, 1, 0, 5, 0, 2, 2, 0, 0, 4, 4, 6, 0, 1, 5, 0, 5, 6, 0, 6, 5, 8, 1, 5, 9, 0, 10, 6, 1, 0, 2, 2, 4, 4, 4, 5, 7, 0, 8, 1, 8, 0, 8, 8, 9, 2, 4, 10, 4, 1, 3, 3, 3, 1, 3, 0, 5, 7, 7, 7, 6, 2, 6, 1, 2, 5, 9, 10, 4, 2, 1, 8, 8, 5, 1, 3, 6, 11, 7, 12, 2, 9, 13, 6, 13, 13, 13, 0, 11, 0, 5, 2, 2, 6, 6, 3, 3, 5, 5, 3, 0, 13, 5, 9, }; /* Indic_Positional_Category: 1842 bytes. */ RE_UINT32 re_get_indic_positional_category(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_indic_positional_category_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_indic_positional_category_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_indic_positional_category_stage_3[pos + f] << 3; f = code >> 1; code ^= f << 1; pos = (RE_UINT32)re_indic_positional_category_stage_4[pos + f] << 1; value = re_indic_positional_category_stage_5[pos + code]; return value; } /* Indic_Syllabic_Category. */ static RE_UINT8 re_indic_syllabic_category_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_indic_syllabic_category_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 1, 1, 1, 1, 1, 10, 1, 11, 12, 13, 14, 1, 1, 1, 15, 1, 1, 1, 1, 16, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 17, 18, 19, 20, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 21, 1, 1, 1, 1, 1, 22, 23, 24, 25, 26, 27, 28, 29, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_indic_syllabic_category_stage_3[] = { 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 3, 4, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 12, 20, 21, 15, 16, 22, 23, 24, 25, 26, 27, 28, 16, 29, 30, 0, 12, 31, 14, 15, 16, 29, 32, 33, 12, 34, 35, 36, 37, 38, 39, 40, 25, 0, 41, 42, 16, 43, 44, 45, 12, 0, 46, 42, 16, 47, 44, 48, 12, 49, 46, 42, 8, 50, 51, 52, 12, 53, 54, 55, 8, 56, 57, 58, 25, 59, 60, 8, 61, 62, 63, 2, 0, 0, 64, 65, 66, 67, 68, 69, 0, 0, 0, 0, 70, 71, 72, 8, 73, 74, 75, 76, 77, 78, 79, 0, 0, 0, 8, 8, 80, 81, 82, 83, 84, 85, 86, 87, 0, 0, 0, 0, 0, 0, 88, 89, 90, 89, 90, 91, 88, 92, 8, 8, 93, 94, 95, 96, 2, 0, 97, 61, 98, 99, 25, 8, 100, 101, 8, 8, 102, 103, 104, 2, 0, 0, 8, 105, 8, 8, 106, 107, 108, 109, 2, 2, 0, 0, 0, 0, 0, 0, 110, 90, 8, 111, 112, 2, 0, 0, 113, 8, 114, 115, 8, 8, 116, 117, 8, 8, 118, 119, 120, 0, 0, 0, 0, 0, 0, 0, 0, 121, 122, 123, 124, 125, 0, 0, 0, 0, 0, 126, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 129, 8, 130, 0, 8, 131, 132, 133, 134, 135, 8, 136, 137, 2, 138, 122, 139, 8, 140, 8, 141, 142, 0, 0, 143, 8, 8, 144, 145, 2, 146, 147, 148, 8, 149, 150, 151, 2, 8, 152, 8, 8, 8, 153, 154, 0, 155, 156, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 157, 158, 159, 2, 160, 161, 8, 162, 163, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 164, 90, 8, 165, 166, 167, 168, 169, 170, 8, 8, 171, 0, 0, 0, 0, 172, 8, 173, 174, 0, 175, 8, 176, 177, 178, 8, 179, 180, 2, 181, 182, 183, 184, 185, 186, 0, 0, 0, 0, 187, 188, 189, 190, 8, 191, 192, 2, 193, 15, 16, 29, 32, 40, 194, 195, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 196, 8, 8, 197, 198, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 199, 8, 200, 201, 202, 203, 0, 0, 199, 8, 8, 204, 205, 2, 0, 0, 190, 8, 206, 207, 2, 0, 0, 0, 8, 208, 209, 210, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_indic_syllabic_category_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 4, 0, 0, 0, 5, 0, 0, 0, 0, 6, 0, 0, 7, 8, 8, 8, 8, 9, 10, 10, 10, 10, 10, 10, 10, 10, 11, 12, 13, 13, 13, 14, 15, 16, 10, 10, 17, 18, 2, 2, 19, 8, 10, 10, 20, 21, 8, 22, 22, 9, 10, 10, 10, 10, 23, 10, 24, 25, 26, 12, 13, 27, 27, 28, 0, 29, 0, 30, 26, 0, 0, 0, 20, 21, 31, 32, 23, 33, 26, 34, 35, 29, 27, 36, 0, 0, 37, 24, 0, 18, 2, 2, 38, 39, 0, 0, 20, 21, 8, 40, 40, 9, 10, 10, 23, 37, 26, 12, 13, 41, 41, 36, 0, 0, 42, 0, 13, 27, 27, 36, 0, 43, 0, 30, 42, 0, 0, 0, 44, 21, 31, 19, 45, 46, 33, 23, 47, 48, 49, 25, 10, 10, 26, 43, 35, 43, 50, 36, 0, 29, 0, 0, 7, 21, 8, 45, 45, 9, 10, 10, 10, 10, 26, 51, 13, 50, 50, 36, 0, 52, 49, 0, 20, 21, 8, 45, 10, 37, 26, 12, 0, 52, 0, 53, 54, 0, 0, 0, 10, 10, 49, 51, 13, 50, 50, 55, 0, 29, 0, 32, 0, 0, 56, 57, 58, 21, 8, 8, 8, 31, 25, 10, 30, 10, 10, 42, 10, 49, 59, 29, 13, 60, 13, 13, 43, 0, 0, 0, 37, 10, 10, 10, 10, 10, 10, 49, 13, 13, 61, 0, 13, 41, 62, 63, 33, 64, 24, 42, 0, 10, 37, 10, 37, 65, 25, 33, 13, 13, 41, 66, 13, 67, 62, 68, 2, 2, 3, 10, 2, 2, 2, 2, 2, 69, 70, 0, 10, 10, 37, 10, 10, 10, 10, 48, 16, 13, 13, 71, 72, 73, 74, 75, 76, 76, 77, 76, 76, 76, 76, 76, 76, 76, 76, 78, 0, 79, 0, 0, 80, 8, 81, 13, 13, 82, 83, 84, 2, 2, 3, 85, 86, 17, 87, 88, 89, 90, 91, 92, 93, 94, 10, 10, 95, 96, 62, 97, 2, 2, 98, 99, 100, 10, 10, 23, 11, 101, 0, 0, 100, 10, 10, 10, 11, 0, 0, 0, 102, 0, 0, 0, 103, 8, 8, 8, 8, 43, 13, 13, 13, 71, 104, 105, 106, 0, 0, 107, 108, 10, 10, 10, 13, 13, 109, 0, 110, 111, 112, 0, 113, 114, 114, 115, 116, 117, 0, 0, 10, 10, 10, 0, 13, 13, 13, 13, 118, 111, 119, 0, 10, 120, 13, 0, 10, 10, 10, 80, 100, 121, 111, 122, 123, 13, 13, 13, 13, 91, 124, 125, 126, 127, 8, 8, 10, 128, 13, 13, 13, 129, 10, 0, 130, 8, 131, 10, 132, 13, 133, 134, 2, 2, 135, 136, 10, 137, 13, 13, 138, 0, 0, 0, 10, 139, 13, 118, 111, 140, 0, 0, 2, 2, 3, 37, 141, 142, 142, 142, 143, 0, 0, 0, 144, 145, 143, 0, 0, 0, 0, 146, 147, 4, 0, 0, 0, 148, 0, 0, 5, 148, 0, 0, 0, 0, 0, 4, 40, 149, 150, 10, 120, 13, 0, 0, 10, 10, 10, 151, 152, 153, 154, 10, 155, 0, 0, 0, 156, 8, 8, 8, 131, 10, 10, 10, 10, 157, 13, 13, 13, 158, 0, 0, 142, 142, 142, 142, 2, 2, 159, 10, 151, 114, 160, 119, 10, 120, 13, 161, 162, 0, 0, 0, 163, 8, 9, 100, 164, 13, 13, 165, 158, 0, 0, 0, 10, 166, 10, 10, 2, 2, 159, 49, 8, 131, 10, 10, 10, 10, 93, 13, 167, 168, 0, 0, 111, 111, 111, 169, 37, 0, 170, 92, 13, 13, 13, 96, 171, 0, 0, 0, 131, 10, 120, 13, 0, 172, 0, 0, 10, 10, 10, 86, 173, 10, 174, 111, 175, 13, 35, 176, 93, 52, 0, 71, 10, 37, 37, 10, 10, 0, 177, 178, 2, 2, 0, 0, 179, 180, 8, 8, 10, 10, 13, 13, 13, 181, 0, 0, 182, 183, 183, 183, 183, 184, 2, 2, 0, 0, 0, 185, 186, 8, 8, 9, 13, 13, 187, 0, 186, 100, 10, 10, 10, 120, 13, 13, 188, 189, 2, 2, 114, 190, 10, 10, 164, 0, 0, 0, 186, 8, 8, 8, 9, 10, 10, 10, 120, 13, 13, 13, 191, 0, 192, 67, 193, 2, 2, 2, 2, 194, 0, 0, 8, 8, 10, 10, 30, 10, 10, 10, 10, 10, 10, 13, 13, 195, 0, 0, 8, 49, 23, 30, 10, 10, 10, 30, 10, 10, 48, 0, 8, 8, 131, 10, 10, 10, 10, 150, 13, 13, 196, 0, 7, 21, 8, 22, 17, 197, 142, 145, 142, 145, 0, 0, 21, 8, 8, 100, 13, 13, 13, 198, 199, 107, 0, 0, 8, 8, 8, 131, 10, 10, 10, 120, 13, 99, 13, 200, 201, 0, 0, 0, 0, 0, 8, 99, 13, 13, 13, 202, 67, 0, 0, 0, 10, 10, 150, 203, 13, 204, 0, 0, 10, 10, 26, 205, 13, 13, 206, 0, 2, 2, 2, 0, }; static RE_UINT8 re_indic_syllabic_category_stage_5[] = { 0, 0, 0, 0, 0, 11, 0, 0, 33, 33, 33, 33, 33, 33, 0, 0, 11, 0, 0, 0, 0, 0, 28, 28, 0, 0, 0, 11, 1, 1, 1, 2, 8, 8, 8, 8, 8, 12, 12, 12, 12, 12, 12, 12, 12, 12, 9, 9, 4, 3, 9, 9, 9, 9, 9, 9, 9, 5, 9, 9, 0, 26, 26, 0, 0, 9, 9, 9, 8, 8, 9, 9, 0, 0, 33, 33, 0, 0, 8, 8, 0, 1, 1, 2, 0, 8, 8, 8, 8, 0, 0, 8, 12, 0, 12, 12, 12, 0, 12, 0, 0, 0, 12, 12, 12, 12, 0, 0, 9, 0, 0, 9, 9, 5, 13, 0, 0, 0, 0, 9, 12, 12, 0, 12, 8, 8, 8, 0, 0, 0, 0, 8, 0, 12, 12, 0, 4, 0, 9, 9, 9, 9, 9, 0, 9, 5, 0, 0, 0, 12, 12, 12, 1, 25, 11, 11, 0, 19, 0, 0, 8, 8, 0, 8, 9, 9, 0, 9, 0, 12, 0, 0, 0, 0, 9, 9, 0, 0, 1, 22, 8, 0, 8, 8, 8, 12, 0, 0, 0, 0, 0, 12, 12, 0, 0, 0, 12, 12, 12, 0, 9, 0, 9, 9, 0, 3, 9, 9, 0, 9, 9, 0, 0, 0, 12, 0, 0, 14, 14, 0, 9, 5, 16, 0, 0, 0, 13, 13, 13, 13, 13, 13, 0, 0, 1, 2, 0, 0, 5, 0, 9, 0, 9, 0, 9, 9, 6, 0, 24, 24, 24, 24, 29, 1, 6, 0, 12, 0, 0, 12, 0, 12, 0, 12, 19, 19, 0, 0, 9, 0, 0, 0, 0, 1, 0, 0, 0, 28, 0, 28, 0, 4, 0, 0, 9, 9, 1, 2, 9, 9, 1, 1, 6, 3, 0, 0, 21, 21, 21, 21, 21, 18, 18, 18, 18, 18, 18, 18, 0, 18, 18, 18, 18, 0, 0, 0, 0, 0, 28, 0, 12, 8, 8, 8, 8, 8, 8, 9, 9, 9, 1, 24, 2, 7, 6, 19, 19, 19, 19, 12, 0, 0, 11, 0, 12, 12, 8, 8, 9, 9, 12, 12, 12, 12, 19, 19, 19, 12, 9, 24, 24, 12, 12, 9, 9, 24, 24, 24, 24, 24, 12, 12, 12, 9, 9, 9, 9, 12, 12, 12, 12, 12, 19, 9, 9, 9, 9, 24, 24, 24, 12, 24, 33, 33, 24, 24, 9, 9, 0, 0, 8, 8, 8, 12, 6, 0, 0, 0, 12, 0, 9, 9, 12, 12, 12, 8, 9, 27, 27, 28, 17, 29, 28, 28, 28, 6, 7, 28, 3, 0, 0, 0, 11, 12, 12, 12, 9, 18, 18, 18, 20, 20, 1, 20, 20, 20, 20, 20, 20, 20, 9, 28, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 0, 0, 23, 23, 23, 23, 23, 0, 0, 0, 9, 20, 20, 20, 24, 24, 0, 0, 12, 12, 12, 9, 12, 19, 19, 20, 20, 20, 20, 0, 7, 9, 9, 9, 24, 24, 28, 28, 28, 0, 0, 28, 1, 1, 1, 17, 2, 8, 8, 8, 4, 9, 9, 9, 5, 12, 12, 12, 1, 17, 2, 8, 8, 8, 12, 12, 12, 18, 18, 18, 9, 9, 6, 7, 18, 18, 12, 12, 33, 33, 3, 12, 12, 12, 20, 20, 8, 8, 4, 9, 20, 20, 6, 6, 18, 18, 9, 9, 1, 1, 28, 4, 26, 26, 26, 0, 26, 26, 26, 26, 26, 26, 0, 0, 0, 0, 2, 2, 26, 0, 0, 0, 30, 31, 0, 0, 11, 11, 11, 11, 28, 0, 0, 0, 8, 8, 6, 12, 12, 12, 12, 1, 12, 12, 10, 10, 10, 10, 12, 12, 12, 12, 10, 18, 18, 12, 12, 12, 12, 18, 12, 1, 1, 2, 8, 8, 20, 9, 9, 9, 5, 0, 0, 0, 33, 33, 12, 12, 10, 10, 10, 24, 9, 9, 9, 20, 20, 20, 20, 6, 1, 1, 17, 2, 12, 12, 12, 4, 9, 18, 19, 19, 12, 9, 0, 12, 9, 9, 9, 19, 19, 19, 19, 0, 20, 20, 0, 0, 0, 0, 12, 24, 23, 24, 23, 0, 0, 2, 7, 0, 12, 8, 12, 12, 12, 12, 12, 20, 20, 20, 20, 9, 24, 6, 0, 0, 4, 4, 4, 0, 0, 0, 0, 7, 1, 1, 2, 14, 14, 8, 8, 8, 9, 9, 5, 0, 0, 0, 34, 34, 34, 34, 34, 34, 34, 34, 33, 33, 0, 0, 0, 32, 1, 1, 2, 8, 9, 5, 4, 0, 9, 9, 9, 7, 6, 0, 33, 33, 10, 12, 12, 12, 5, 3, 15, 15, 0, 0, 4, 9, 0, 33, 33, 33, 33, 0, 0, 0, 1, 5, 4, 25, 9, 4, 6, 0, 0, 0, 26, 26, 9, 9, 9, 1, 1, 2, 5, 4, 1, 1, 2, 5, 4, 0, 0, 0, 9, 1, 2, 5, 2, 9, 9, 9, 9, 9, 5, 4, 0, 19, 19, 19, 9, 9, 9, 6, }; /* Indic_Syllabic_Category: 2448 bytes. */ RE_UINT32 re_get_indic_syllabic_category(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_indic_syllabic_category_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_indic_syllabic_category_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_indic_syllabic_category_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_indic_syllabic_category_stage_4[pos + f] << 2; value = re_indic_syllabic_category_stage_5[pos + code]; return value; } /* Alphanumeric. */ static RE_UINT8 re_alphanumeric_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_alphanumeric_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 27, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_alphanumeric_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 31, 36, 37, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 48, 49, 1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 1, 59, 60, 61, 62, 63, 64, 31, 31, 31, 65, 66, 67, 68, 69, 70, 71, 72, 73, 31, 74, 31, 31, 31, 31, 31, 1, 1, 1, 75, 76, 77, 31, 31, 1, 1, 1, 1, 78, 31, 31, 31, 31, 31, 31, 31, 1, 1, 79, 31, 1, 1, 80, 81, 31, 31, 31, 82, 83, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 84, 31, 31, 31, 31, 31, 31, 31, 85, 86, 87, 88, 89, 31, 31, 31, 31, 31, 90, 31, 31, 91, 31, 31, 31, 31, 31, 31, 1, 1, 1, 1, 1, 1, 92, 1, 1, 1, 1, 1, 1, 1, 1, 93, 94, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 95, 31, 1, 1, 96, 31, 31, 31, 31, 31, }; static RE_UINT8 re_alphanumeric_stage_4[] = { 0, 1, 2, 2, 0, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 0, 0, 8, 9, 10, 11, 5, 12, 5, 5, 5, 5, 13, 5, 5, 5, 5, 14, 15, 16, 17, 18, 19, 20, 21, 5, 22, 23, 5, 5, 24, 25, 26, 5, 27, 5, 5, 28, 5, 29, 30, 31, 32, 0, 0, 33, 0, 34, 5, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 47, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 61, 65, 66, 67, 68, 69, 70, 71, 16, 72, 73, 0, 74, 75, 76, 0, 77, 78, 79, 80, 81, 82, 0, 0, 5, 83, 84, 85, 86, 5, 87, 88, 5, 5, 89, 5, 90, 91, 92, 5, 93, 5, 94, 0, 95, 5, 5, 96, 16, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 97, 2, 5, 5, 98, 99, 100, 100, 101, 5, 102, 103, 78, 1, 5, 5, 104, 5, 105, 5, 106, 107, 108, 109, 110, 5, 111, 112, 0, 113, 5, 107, 114, 112, 115, 0, 0, 5, 116, 117, 0, 5, 118, 5, 119, 5, 106, 120, 121, 0, 0, 0, 122, 5, 5, 5, 5, 5, 5, 0, 123, 96, 5, 124, 121, 5, 125, 126, 127, 0, 0, 0, 128, 129, 0, 0, 0, 130, 131, 132, 5, 133, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 5, 78, 5, 135, 107, 5, 5, 5, 5, 136, 5, 87, 5, 137, 138, 139, 139, 5, 0, 140, 0, 0, 0, 0, 0, 0, 141, 142, 16, 5, 143, 16, 5, 88, 144, 145, 5, 5, 146, 72, 0, 26, 5, 5, 5, 5, 5, 106, 0, 0, 5, 5, 5, 5, 5, 5, 106, 0, 5, 5, 5, 5, 31, 0, 26, 121, 147, 148, 5, 149, 5, 5, 5, 95, 150, 151, 5, 5, 152, 153, 0, 150, 154, 17, 5, 100, 5, 5, 155, 156, 5, 105, 157, 82, 5, 158, 159, 160, 5, 138, 161, 162, 5, 107, 163, 164, 165, 166, 88, 167, 5, 5, 5, 168, 5, 5, 5, 5, 5, 169, 170, 113, 5, 5, 5, 171, 5, 5, 172, 0, 173, 174, 175, 5, 5, 28, 176, 5, 5, 121, 26, 5, 177, 5, 17, 178, 0, 0, 0, 179, 5, 5, 5, 82, 1, 2, 2, 109, 5, 107, 180, 0, 181, 182, 183, 0, 5, 5, 5, 72, 0, 0, 5, 33, 0, 0, 0, 0, 0, 0, 0, 0, 82, 5, 184, 0, 5, 26, 105, 72, 121, 5, 185, 0, 5, 5, 5, 5, 121, 78, 0, 0, 5, 186, 5, 187, 0, 0, 0, 0, 5, 138, 106, 17, 0, 0, 0, 0, 188, 189, 106, 138, 107, 0, 0, 190, 106, 172, 0, 0, 5, 191, 0, 0, 192, 100, 0, 82, 82, 0, 79, 193, 5, 106, 106, 157, 28, 0, 0, 0, 5, 5, 133, 0, 5, 157, 5, 157, 5, 5, 194, 56, 151, 32, 26, 195, 5, 196, 26, 197, 5, 5, 198, 0, 199, 200, 0, 0, 201, 202, 5, 195, 38, 47, 203, 187, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 204, 0, 0, 0, 0, 0, 5, 205, 206, 0, 5, 107, 207, 0, 5, 106, 78, 0, 208, 168, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 209, 0, 0, 0, 0, 0, 0, 5, 32, 5, 5, 5, 5, 172, 0, 0, 0, 5, 5, 5, 146, 5, 5, 5, 5, 5, 5, 187, 0, 0, 0, 0, 0, 5, 146, 0, 0, 0, 0, 0, 0, 5, 5, 210, 0, 0, 0, 0, 0, 5, 32, 107, 78, 0, 0, 26, 211, 5, 138, 155, 212, 95, 0, 0, 0, 5, 5, 213, 107, 176, 0, 0, 0, 214, 0, 0, 0, 0, 0, 0, 0, 5, 5, 5, 215, 216, 0, 0, 0, 5, 5, 217, 5, 218, 219, 220, 5, 221, 222, 223, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 224, 225, 88, 217, 217, 135, 135, 226, 226, 227, 5, 5, 5, 5, 5, 5, 5, 193, 0, 220, 228, 229, 230, 231, 232, 0, 0, 0, 26, 84, 84, 78, 0, 0, 0, 5, 5, 5, 5, 5, 5, 138, 0, 5, 33, 5, 5, 5, 5, 5, 5, 121, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 214, 0, 0, 121, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_alphanumeric_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 32, 0, 0, 0, 0, 0, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 255, 255, 254, 255, 195, 255, 255, 255, 255, 239, 31, 254, 225, 255, 159, 0, 0, 255, 255, 0, 224, 255, 255, 255, 255, 3, 0, 255, 7, 48, 4, 255, 255, 255, 252, 255, 31, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 248, 3, 255, 255, 255, 255, 255, 239, 255, 223, 225, 255, 207, 255, 254, 255, 239, 159, 249, 255, 255, 253, 197, 227, 159, 89, 128, 176, 207, 255, 3, 0, 238, 135, 249, 255, 255, 253, 109, 195, 135, 25, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 227, 191, 27, 1, 0, 207, 255, 0, 2, 238, 159, 249, 255, 159, 25, 192, 176, 207, 255, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 29, 129, 0, 192, 255, 0, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 29, 96, 7, 207, 255, 0, 0, 238, 223, 253, 255, 255, 253, 239, 227, 223, 29, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 93, 128, 128, 207, 255, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 128, 95, 255, 192, 255, 12, 0, 255, 255, 255, 7, 127, 32, 255, 3, 150, 37, 240, 254, 174, 236, 255, 59, 95, 32, 255, 243, 1, 0, 0, 0, 255, 3, 0, 0, 255, 254, 255, 255, 255, 31, 254, 255, 3, 255, 255, 254, 255, 255, 255, 31, 255, 255, 127, 249, 255, 3, 255, 255, 231, 193, 255, 255, 127, 64, 255, 51, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 135, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 15, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 207, 255, 255, 1, 128, 16, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 1, 192, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 3, 255, 255, 255, 15, 254, 255, 31, 0, 128, 0, 0, 0, 255, 255, 239, 255, 239, 15, 255, 3, 255, 243, 255, 255, 191, 255, 3, 0, 255, 227, 255, 255, 255, 255, 255, 63, 0, 222, 111, 0, 128, 255, 31, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 62, 80, 189, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 0, 0, 192, 255, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 255, 255, 127, 0, 127, 127, 127, 127, 0, 128, 0, 0, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 224, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 255, 15, 0, 0, 255, 127, 240, 143, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 15, 0, 255, 3, 0, 0, 252, 40, 255, 255, 7, 0, 255, 255, 247, 255, 0, 128, 255, 3, 223, 255, 255, 127, 255, 63, 255, 3, 255, 255, 127, 196, 5, 0, 0, 56, 255, 255, 60, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 7, 255, 3, 15, 0, 255, 255, 127, 248, 255, 255, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 0, 0, 223, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 15, 255, 62, 0, 255, 0, 255, 255, 15, 0, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 31, 0, 0, 0, 63, 0, 0, 0, 255, 1, 255, 3, 255, 255, 199, 255, 255, 255, 71, 0, 30, 0, 255, 23, 255, 255, 251, 255, 255, 255, 159, 0, 127, 189, 255, 191, 255, 1, 255, 255, 159, 25, 129, 224, 179, 0, 255, 3, 255, 255, 63, 127, 0, 0, 0, 63, 17, 0, 255, 3, 255, 255, 255, 227, 255, 3, 0, 128, 127, 0, 0, 0, 255, 63, 0, 0, 248, 255, 255, 224, 31, 0, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 67, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 207, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* Alphanumeric: 2117 bytes. */ RE_UINT32 re_get_alphanumeric(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_alphanumeric_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_alphanumeric_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_alphanumeric_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_alphanumeric_stage_4[pos + f] << 5; pos += code; value = (re_alphanumeric_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Any. */ RE_UINT32 re_get_any(RE_UINT32 ch) { return 1; } /* Blank. */ static RE_UINT8 re_blank_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_blank_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_blank_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_blank_stage_4[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 4, 5, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_blank_stage_5[] = { 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 255, 7, 0, 0, 0, 128, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, }; /* Blank: 169 bytes. */ RE_UINT32 re_get_blank(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_blank_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_blank_stage_2[pos + f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_blank_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_blank_stage_4[pos + f] << 6; pos += code; value = (re_blank_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Graph. */ static RE_UINT8 re_graph_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 4, 8, 4, 8, }; static RE_UINT8 re_graph_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 7, 7, 7, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 28, 29, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 30, 7, 31, 32, 7, 33, 13, 13, 13, 13, 13, 34, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 35, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 36, }; static RE_UINT8 re_graph_stage_3[] = { 0, 1, 1, 2, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 15, 16, 1, 1, 17, 18, 19, 20, 21, 22, 23, 24, 1, 25, 26, 27, 1, 28, 29, 1, 1, 1, 1, 1, 1, 30, 31, 32, 33, 34, 35, 36, 37, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 48, 48, 48, 48, 48, 48, 48, 48, 1, 1, 49, 50, 1, 51, 52, 53, 54, 55, 56, 57, 58, 59, 1, 60, 61, 62, 63, 64, 65, 48, 66, 48, 67, 68, 69, 70, 71, 72, 73, 74, 75, 48, 76, 48, 48, 48, 48, 48, 1, 1, 1, 77, 78, 79, 48, 48, 1, 1, 1, 1, 80, 48, 48, 48, 48, 48, 48, 48, 1, 1, 81, 48, 1, 1, 82, 83, 48, 48, 48, 84, 85, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 86, 48, 48, 48, 87, 88, 89, 90, 91, 92, 93, 94, 1, 1, 95, 48, 48, 48, 48, 48, 96, 48, 48, 48, 48, 48, 97, 48, 98, 99, 100, 1, 1, 101, 102, 103, 104, 105, 48, 48, 48, 48, 48, 48, 1, 1, 1, 1, 1, 1, 106, 1, 1, 1, 1, 1, 1, 1, 1, 107, 108, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 109, 48, 1, 1, 110, 48, 48, 48, 48, 48, 111, 112, 48, 48, 48, 48, 48, 48, 1, 1, 1, 1, 1, 1, 1, 113, }; static RE_UINT8 re_graph_stage_4[] = { 0, 1, 2, 3, 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 5, 6, 2, 2, 2, 7, 8, 1, 9, 2, 10, 11, 12, 2, 2, 2, 2, 2, 2, 2, 13, 2, 14, 2, 2, 15, 2, 16, 2, 17, 18, 0, 0, 19, 0, 20, 2, 2, 2, 2, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 44, 48, 49, 50, 51, 52, 53, 54, 1, 55, 56, 0, 57, 58, 59, 0, 2, 2, 60, 61, 62, 12, 63, 0, 2, 2, 2, 2, 2, 2, 64, 2, 2, 2, 65, 2, 66, 67, 68, 2, 69, 2, 48, 70, 71, 2, 2, 72, 2, 2, 2, 2, 73, 2, 2, 74, 75, 76, 77, 78, 2, 2, 79, 80, 81, 2, 2, 82, 2, 83, 2, 84, 3, 85, 86, 87, 2, 88, 89, 2, 90, 2, 3, 91, 80, 17, 0, 0, 2, 2, 88, 70, 2, 2, 2, 92, 2, 93, 94, 2, 0, 0, 10, 95, 2, 2, 2, 2, 2, 2, 2, 96, 72, 2, 97, 79, 2, 98, 99, 100, 101, 102, 3, 103, 104, 3, 105, 106, 2, 2, 2, 2, 88, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 16, 2, 107, 108, 2, 2, 2, 2, 2, 2, 2, 2, 109, 110, 111, 112, 113, 2, 114, 3, 2, 2, 2, 2, 115, 2, 64, 2, 116, 76, 117, 117, 2, 2, 2, 118, 0, 119, 2, 2, 77, 2, 2, 2, 2, 2, 2, 84, 120, 1, 2, 1, 2, 8, 2, 2, 2, 121, 122, 2, 2, 114, 16, 2, 123, 3, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 84, 2, 2, 2, 2, 2, 2, 2, 2, 84, 0, 2, 2, 2, 2, 124, 2, 125, 2, 2, 126, 2, 2, 2, 2, 2, 82, 2, 2, 2, 2, 2, 127, 0, 128, 2, 129, 2, 82, 2, 2, 130, 79, 2, 2, 131, 70, 2, 2, 132, 3, 2, 76, 133, 2, 2, 2, 134, 76, 135, 136, 2, 137, 2, 2, 2, 138, 2, 2, 2, 2, 2, 123, 139, 56, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 140, 2, 2, 71, 0, 141, 142, 143, 2, 2, 2, 144, 2, 2, 2, 105, 2, 145, 2, 146, 147, 71, 2, 148, 149, 2, 2, 2, 91, 1, 2, 2, 2, 2, 3, 150, 151, 152, 153, 154, 0, 2, 2, 2, 16, 155, 156, 2, 2, 157, 158, 105, 79, 0, 0, 0, 0, 70, 2, 106, 56, 2, 123, 83, 16, 159, 2, 160, 0, 2, 2, 2, 2, 79, 161, 0, 0, 2, 10, 2, 162, 0, 0, 0, 0, 2, 76, 84, 146, 0, 0, 0, 0, 163, 164, 165, 2, 3, 166, 0, 167, 168, 169, 0, 0, 2, 170, 145, 2, 171, 172, 173, 2, 2, 0, 2, 174, 2, 175, 110, 176, 177, 178, 0, 0, 2, 2, 179, 0, 2, 180, 2, 181, 0, 0, 0, 3, 0, 0, 0, 0, 2, 2, 182, 183, 2, 2, 184, 185, 2, 98, 123, 76, 2, 2, 140, 186, 187, 79, 0, 0, 188, 189, 2, 190, 21, 30, 191, 192, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 193, 0, 0, 0, 0, 0, 2, 110, 79, 0, 2, 2, 194, 0, 2, 82, 161, 0, 111, 88, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 195, 0, 0, 0, 0, 0, 0, 2, 74, 2, 2, 2, 2, 71, 0, 0, 0, 2, 2, 2, 196, 2, 2, 2, 2, 2, 2, 197, 0, 0, 0, 0, 0, 2, 198, 0, 0, 0, 0, 0, 0, 2, 2, 107, 0, 0, 0, 0, 0, 2, 74, 3, 199, 0, 0, 105, 200, 2, 2, 201, 202, 203, 0, 0, 0, 2, 2, 204, 3, 205, 0, 0, 0, 206, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 207, 208, 197, 0, 0, 2, 2, 2, 2, 2, 2, 2, 84, 2, 209, 2, 2, 2, 2, 2, 179, 2, 2, 210, 0, 0, 0, 0, 0, 2, 2, 76, 15, 0, 0, 0, 0, 2, 2, 98, 2, 12, 211, 212, 2, 213, 214, 215, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 216, 2, 2, 2, 2, 2, 2, 2, 2, 217, 2, 2, 2, 2, 2, 218, 219, 0, 0, 2, 2, 2, 2, 2, 2, 220, 0, 212, 221, 222, 223, 224, 225, 0, 226, 2, 88, 2, 2, 77, 227, 228, 84, 124, 114, 2, 88, 16, 0, 0, 229, 230, 16, 231, 0, 0, 0, 0, 0, 2, 2, 2, 119, 2, 212, 2, 2, 2, 2, 2, 2, 2, 2, 106, 232, 2, 2, 2, 77, 2, 2, 19, 0, 88, 2, 193, 2, 10, 233, 0, 0, 234, 0, 0, 0, 235, 0, 158, 0, 2, 2, 2, 2, 2, 2, 76, 0, 2, 19, 2, 2, 2, 2, 2, 2, 79, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 206, 0, 0, 79, 0, 0, 0, 0, 0, 0, 0, 236, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 203, 2, 2, 2, 2, 2, 2, 2, 79, }; static RE_UINT8 re_graph_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 127, 255, 255, 255, 252, 240, 215, 255, 255, 251, 255, 255, 255, 255, 255, 254, 255, 255, 255, 127, 254, 255, 230, 254, 255, 255, 0, 255, 255, 255, 7, 31, 0, 255, 255, 255, 223, 255, 191, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 255, 7, 255, 63, 255, 127, 255, 255, 255, 79, 255, 255, 31, 0, 248, 255, 255, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 255, 15, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 3, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 255, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 255, 7, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 255, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 63, 254, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 28, 0, 255, 255, 255, 135, 255, 255, 255, 15, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 255, 254, 255, 255, 255, 31, 254, 255, 255, 255, 255, 254, 255, 223, 255, 7, 191, 32, 255, 255, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 31, 255, 255, 255, 3, 255, 255, 63, 63, 254, 255, 255, 31, 255, 255, 255, 1, 255, 223, 31, 0, 255, 255, 127, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 255, 63, 255, 3, 255, 3, 255, 127, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 15, 255, 15, 241, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 199, 255, 255, 255, 207, 255, 255, 255, 159, 255, 255, 15, 240, 255, 255, 255, 248, 255, 227, 255, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 255, 223, 255, 207, 239, 255, 255, 220, 127, 0, 248, 255, 255, 255, 124, 255, 255, 223, 255, 243, 255, 255, 127, 255, 31, 0, 0, 255, 255, 255, 255, 1, 0, 127, 0, 0, 0, 255, 7, 0, 0, 255, 255, 207, 255, 255, 255, 63, 255, 255, 255, 255, 227, 255, 253, 3, 0, 0, 240, 0, 0, 255, 127, 255, 255, 255, 255, 15, 254, 255, 128, 1, 128, 127, 127, 127, 127, 7, 0, 0, 0, 255, 255, 255, 251, 0, 0, 255, 15, 224, 255, 255, 255, 255, 63, 254, 255, 15, 0, 255, 255, 255, 31, 255, 255, 127, 0, 255, 255, 255, 15, 0, 0, 255, 63, 255, 0, 0, 0, 128, 255, 255, 15, 255, 3, 31, 192, 255, 3, 255, 255, 15, 128, 255, 191, 255, 195, 255, 63, 255, 243, 7, 0, 0, 248, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 63, 255, 3, 127, 248, 255, 255, 255, 63, 255, 255, 127, 0, 248, 224, 255, 255, 127, 95, 219, 255, 255, 255, 3, 0, 248, 255, 255, 255, 252, 255, 255, 0, 0, 0, 0, 0, 255, 63, 255, 255, 247, 255, 127, 15, 223, 255, 252, 252, 252, 28, 127, 127, 0, 62, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 135, 255, 255, 255, 255, 255, 143, 255, 255, 31, 255, 15, 1, 0, 0, 0, 255, 255, 255, 191, 15, 255, 63, 0, 255, 3, 0, 0, 15, 128, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 191, 255, 128, 255, 0, 0, 255, 255, 55, 248, 255, 255, 255, 143, 255, 255, 255, 131, 255, 255, 255, 240, 111, 240, 239, 254, 255, 255, 15, 135, 255, 0, 255, 1, 127, 248, 127, 0, 255, 255, 63, 254, 255, 255, 7, 255, 255, 255, 3, 30, 0, 254, 0, 0, 255, 1, 0, 0, 255, 255, 7, 0, 255, 255, 7, 252, 255, 63, 252, 255, 255, 255, 0, 128, 3, 0, 255, 255, 255, 1, 255, 3, 254, 255, 31, 0, 255, 255, 251, 255, 127, 189, 255, 191, 255, 3, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 255, 0, 255, 3, 31, 0, 255, 3, 255, 255, 7, 128, 255, 127, 31, 0, 15, 0, 0, 0, 255, 127, 0, 0, 255, 195, 0, 0, 255, 63, 63, 0, 63, 0, 255, 251, 251, 255, 255, 224, 255, 255, 0, 0, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 243, 127, 254, 255, 255, 63, 0, 0, 0, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 255, 207, 255, 255, 255, 15, 0, 248, 254, 255, 0, 0, 159, 255, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 0, 0, 3, 0, 255, 127, 254, 255, 254, 255, 254, 255, 192, 255, 255, 255, 7, 0, 255, 255, 255, 1, 3, 0, 255, 31, 15, 0, 255, 63, 0, 0, 0, 0, 255, 1, 31, 0, 0, 0, 2, 0, 0, 0, }; /* Graph: 2334 bytes. */ RE_UINT32 re_get_graph(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_graph_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_graph_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_graph_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_graph_stage_4[pos + f] << 5; pos += code; value = (re_graph_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Print. */ static RE_UINT8 re_print_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 4, 8, 4, 8, }; static RE_UINT8 re_print_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 7, 7, 7, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 28, 29, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 30, 7, 31, 32, 7, 33, 13, 13, 13, 13, 13, 34, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 35, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 36, }; static RE_UINT8 re_print_stage_3[] = { 0, 1, 1, 2, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 15, 16, 1, 1, 17, 18, 19, 20, 21, 22, 23, 24, 1, 25, 26, 27, 1, 28, 29, 1, 1, 1, 1, 1, 1, 30, 31, 32, 33, 34, 35, 36, 37, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 48, 48, 48, 48, 48, 48, 48, 48, 1, 1, 49, 50, 1, 51, 52, 53, 54, 55, 56, 57, 58, 59, 1, 60, 61, 62, 63, 64, 65, 48, 66, 48, 67, 68, 69, 70, 71, 72, 73, 74, 75, 48, 76, 48, 48, 48, 48, 48, 1, 1, 1, 77, 78, 79, 48, 48, 1, 1, 1, 1, 80, 48, 48, 48, 48, 48, 48, 48, 1, 1, 81, 48, 1, 1, 82, 83, 48, 48, 48, 84, 85, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 86, 48, 48, 48, 87, 88, 89, 90, 91, 92, 93, 94, 1, 1, 95, 48, 48, 48, 48, 48, 96, 48, 48, 48, 48, 48, 97, 48, 98, 99, 100, 1, 1, 101, 102, 103, 104, 105, 48, 48, 48, 48, 48, 48, 1, 1, 1, 1, 1, 1, 106, 1, 1, 1, 1, 1, 1, 1, 1, 107, 108, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 109, 48, 1, 1, 110, 48, 48, 48, 48, 48, 111, 112, 48, 48, 48, 48, 48, 48, 1, 1, 1, 1, 1, 1, 1, 113, }; static RE_UINT8 re_print_stage_4[] = { 0, 1, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 4, 5, 1, 1, 1, 6, 7, 8, 9, 1, 10, 11, 12, 1, 1, 1, 1, 1, 1, 1, 13, 1, 14, 1, 1, 15, 1, 16, 1, 17, 18, 0, 0, 19, 0, 20, 1, 1, 1, 1, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 44, 48, 49, 50, 51, 52, 53, 54, 8, 55, 56, 0, 57, 58, 59, 0, 1, 1, 60, 61, 62, 12, 63, 0, 1, 1, 1, 1, 1, 1, 64, 1, 1, 1, 65, 1, 66, 67, 68, 1, 69, 1, 48, 70, 71, 1, 1, 72, 1, 1, 1, 1, 70, 1, 1, 73, 74, 75, 76, 77, 1, 1, 78, 79, 80, 1, 1, 81, 1, 82, 1, 83, 2, 84, 85, 86, 1, 87, 88, 1, 89, 1, 2, 90, 79, 17, 0, 0, 1, 1, 87, 70, 1, 1, 1, 91, 1, 92, 93, 1, 0, 0, 10, 94, 1, 1, 1, 1, 1, 1, 1, 95, 72, 1, 96, 78, 1, 97, 98, 99, 1, 100, 1, 101, 102, 2, 103, 104, 1, 1, 1, 1, 87, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 16, 1, 105, 106, 1, 1, 1, 1, 1, 1, 1, 1, 107, 108, 109, 110, 111, 1, 112, 2, 1, 1, 1, 1, 113, 1, 64, 1, 114, 75, 115, 115, 1, 1, 1, 116, 0, 117, 1, 1, 76, 1, 1, 1, 1, 1, 1, 83, 118, 1, 1, 8, 1, 7, 1, 1, 1, 119, 120, 1, 1, 112, 16, 1, 121, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 83, 1, 1, 1, 1, 1, 1, 1, 1, 83, 0, 1, 1, 1, 1, 122, 1, 123, 1, 1, 124, 1, 1, 1, 1, 1, 81, 1, 1, 1, 1, 1, 125, 0, 126, 1, 127, 1, 81, 1, 1, 128, 78, 1, 1, 129, 70, 1, 1, 130, 2, 1, 75, 131, 1, 1, 1, 132, 75, 133, 134, 1, 135, 1, 1, 1, 136, 1, 1, 1, 1, 1, 121, 137, 56, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 138, 1, 1, 71, 0, 139, 140, 141, 1, 1, 1, 142, 1, 1, 1, 103, 1, 143, 1, 144, 145, 71, 1, 146, 147, 1, 1, 1, 90, 8, 1, 1, 1, 1, 2, 148, 149, 150, 151, 152, 0, 1, 1, 1, 16, 153, 154, 1, 1, 155, 156, 103, 78, 0, 0, 0, 0, 70, 1, 104, 56, 1, 121, 82, 16, 157, 1, 158, 0, 1, 1, 1, 1, 78, 159, 0, 0, 1, 10, 1, 160, 0, 0, 0, 0, 1, 75, 83, 144, 0, 0, 0, 0, 161, 162, 163, 1, 2, 164, 0, 165, 166, 167, 0, 0, 1, 168, 143, 1, 169, 170, 171, 1, 1, 0, 1, 172, 1, 173, 108, 174, 175, 176, 0, 0, 1, 1, 177, 0, 1, 178, 1, 179, 0, 0, 0, 2, 0, 0, 0, 0, 1, 1, 180, 181, 1, 1, 182, 183, 1, 97, 121, 75, 1, 1, 138, 184, 185, 78, 0, 0, 186, 187, 1, 188, 21, 30, 189, 190, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 191, 0, 0, 0, 0, 0, 1, 108, 78, 0, 1, 1, 192, 0, 1, 81, 159, 0, 109, 87, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 193, 0, 0, 0, 0, 0, 0, 1, 73, 1, 1, 1, 1, 71, 0, 0, 0, 1, 1, 1, 194, 1, 1, 1, 1, 1, 1, 195, 0, 0, 0, 0, 0, 1, 196, 0, 0, 0, 0, 0, 0, 1, 1, 105, 0, 0, 0, 0, 0, 1, 73, 2, 197, 0, 0, 103, 198, 1, 1, 199, 200, 201, 0, 0, 0, 1, 1, 202, 2, 203, 0, 0, 0, 204, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 205, 206, 195, 0, 0, 1, 1, 1, 1, 1, 1, 1, 83, 1, 207, 1, 1, 1, 1, 1, 177, 1, 1, 208, 0, 0, 0, 0, 0, 1, 1, 75, 15, 0, 0, 0, 0, 1, 1, 97, 1, 12, 209, 210, 1, 211, 212, 213, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 214, 1, 1, 1, 1, 1, 1, 1, 1, 215, 1, 1, 1, 1, 1, 216, 217, 0, 0, 1, 1, 1, 1, 1, 1, 218, 0, 210, 219, 220, 221, 222, 223, 0, 224, 1, 87, 1, 1, 76, 225, 226, 83, 122, 112, 1, 87, 16, 0, 0, 227, 228, 16, 229, 0, 0, 0, 0, 0, 1, 1, 1, 117, 1, 210, 1, 1, 1, 1, 1, 1, 1, 1, 104, 230, 1, 1, 1, 76, 1, 1, 19, 0, 87, 1, 191, 1, 10, 231, 0, 0, 232, 0, 0, 0, 233, 0, 156, 0, 1, 1, 1, 1, 1, 1, 75, 0, 1, 19, 1, 1, 1, 1, 1, 1, 78, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 204, 0, 0, 78, 0, 0, 0, 0, 0, 0, 0, 234, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 201, 1, 1, 1, 1, 1, 1, 1, 78, }; static RE_UINT8 re_print_stage_5[] = { 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 127, 255, 255, 255, 252, 240, 215, 255, 255, 251, 255, 255, 255, 255, 255, 254, 255, 255, 255, 127, 254, 254, 255, 255, 255, 255, 230, 254, 255, 255, 0, 255, 255, 255, 7, 31, 0, 255, 255, 255, 223, 255, 191, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 255, 7, 255, 63, 255, 127, 255, 255, 255, 79, 255, 255, 31, 0, 248, 255, 255, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 255, 15, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 3, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 255, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 255, 7, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 255, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 63, 254, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 28, 0, 255, 255, 255, 135, 255, 255, 255, 15, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 255, 254, 255, 255, 255, 31, 254, 255, 255, 255, 255, 254, 255, 223, 255, 7, 191, 32, 255, 255, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 31, 255, 255, 255, 3, 255, 255, 63, 63, 255, 255, 255, 1, 255, 223, 31, 0, 255, 255, 127, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 255, 63, 255, 3, 255, 3, 255, 127, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 15, 255, 15, 241, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 199, 255, 255, 255, 207, 255, 255, 255, 159, 255, 255, 15, 240, 255, 255, 255, 248, 255, 227, 255, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 255, 223, 255, 207, 239, 255, 255, 220, 127, 255, 252, 255, 255, 223, 255, 243, 255, 255, 127, 255, 31, 0, 0, 255, 255, 255, 255, 1, 0, 127, 0, 0, 0, 255, 7, 0, 0, 255, 255, 207, 255, 255, 255, 63, 255, 255, 255, 255, 227, 255, 253, 3, 0, 0, 240, 0, 0, 255, 127, 255, 255, 255, 255, 15, 254, 255, 128, 1, 128, 127, 127, 127, 127, 7, 0, 0, 0, 255, 255, 255, 251, 0, 0, 255, 15, 224, 255, 255, 255, 255, 63, 254, 255, 15, 0, 255, 255, 255, 31, 255, 255, 127, 0, 255, 255, 255, 15, 0, 0, 255, 63, 255, 0, 0, 0, 128, 255, 255, 15, 255, 3, 31, 192, 255, 3, 255, 255, 15, 128, 255, 191, 255, 195, 255, 63, 255, 243, 7, 0, 0, 248, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 63, 255, 3, 127, 248, 255, 255, 255, 63, 255, 255, 127, 0, 248, 224, 255, 255, 127, 95, 219, 255, 255, 255, 3, 0, 248, 255, 255, 255, 252, 255, 255, 0, 0, 0, 0, 0, 255, 63, 255, 255, 247, 255, 127, 15, 223, 255, 252, 252, 252, 28, 127, 127, 0, 62, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 135, 255, 255, 255, 255, 255, 143, 255, 255, 31, 255, 15, 1, 0, 0, 0, 255, 255, 255, 191, 15, 255, 63, 0, 255, 3, 0, 0, 15, 128, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 191, 255, 128, 255, 0, 0, 255, 255, 55, 248, 255, 255, 255, 143, 255, 255, 255, 131, 255, 255, 255, 240, 111, 240, 239, 254, 255, 255, 15, 135, 255, 0, 255, 1, 127, 248, 127, 0, 255, 255, 63, 254, 255, 255, 7, 255, 255, 255, 3, 30, 0, 254, 0, 0, 255, 1, 0, 0, 255, 255, 7, 0, 255, 255, 7, 252, 255, 63, 252, 255, 255, 255, 0, 128, 3, 0, 255, 255, 255, 1, 255, 3, 254, 255, 31, 0, 255, 255, 251, 255, 127, 189, 255, 191, 255, 3, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 255, 0, 255, 3, 31, 0, 255, 3, 255, 255, 7, 128, 255, 127, 31, 0, 15, 0, 0, 0, 255, 127, 0, 0, 255, 195, 0, 0, 255, 63, 63, 0, 63, 0, 255, 251, 251, 255, 255, 224, 255, 255, 0, 0, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 243, 127, 254, 255, 255, 63, 0, 0, 0, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 255, 207, 255, 255, 255, 15, 0, 248, 254, 255, 0, 0, 159, 255, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 0, 0, 3, 0, 255, 127, 254, 255, 254, 255, 254, 255, 192, 255, 255, 255, 7, 0, 255, 255, 255, 1, 3, 0, 255, 31, 15, 0, 255, 63, 0, 0, 0, 0, 255, 1, 31, 0, 0, 0, 2, 0, 0, 0, }; /* Print: 2326 bytes. */ RE_UINT32 re_get_print(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_print_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_print_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_print_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_print_stage_4[pos + f] << 5; pos += code; value = (re_print_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Word. */ static RE_UINT8 re_word_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, }; static RE_UINT8 re_word_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 28, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 29, 7, 30, 31, 7, 32, 13, 13, 13, 13, 13, 33, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 34, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_word_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 31, 36, 37, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 48, 49, 1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 1, 59, 60, 61, 62, 63, 64, 31, 31, 31, 65, 66, 67, 68, 69, 70, 71, 72, 73, 31, 74, 31, 31, 31, 31, 31, 1, 1, 1, 75, 76, 77, 31, 31, 1, 1, 1, 1, 78, 31, 31, 31, 31, 31, 31, 31, 1, 1, 79, 31, 1, 1, 80, 81, 31, 31, 31, 82, 83, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 84, 31, 31, 31, 31, 85, 86, 31, 87, 88, 89, 90, 31, 31, 91, 31, 31, 31, 31, 31, 92, 31, 31, 31, 31, 31, 93, 31, 31, 94, 31, 31, 31, 31, 31, 31, 1, 1, 1, 1, 1, 1, 95, 1, 1, 1, 1, 1, 1, 1, 1, 96, 97, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 98, 31, 1, 1, 99, 31, 31, 31, 31, 31, 31, 100, 31, 31, 31, 31, 31, 31, }; static RE_UINT8 re_word_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 6, 6, 11, 6, 6, 6, 6, 13, 14, 15, 16, 17, 18, 19, 20, 6, 6, 21, 6, 6, 22, 23, 24, 6, 25, 6, 6, 26, 6, 27, 6, 28, 29, 0, 0, 30, 0, 31, 6, 6, 6, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 42, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 56, 60, 61, 62, 63, 64, 65, 66, 15, 67, 68, 0, 69, 70, 71, 0, 72, 73, 74, 75, 76, 77, 78, 0, 6, 6, 79, 6, 80, 6, 81, 82, 6, 6, 83, 6, 84, 85, 86, 6, 87, 6, 60, 0, 88, 6, 6, 89, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 90, 3, 6, 6, 91, 92, 30, 93, 94, 6, 6, 95, 96, 97, 6, 6, 98, 6, 99, 6, 100, 101, 102, 103, 104, 6, 105, 106, 0, 29, 6, 101, 107, 106, 108, 0, 0, 6, 6, 109, 110, 6, 6, 6, 93, 6, 98, 111, 80, 0, 0, 112, 113, 6, 6, 6, 6, 6, 6, 6, 114, 89, 6, 115, 80, 6, 116, 117, 118, 119, 120, 121, 122, 123, 0, 24, 124, 125, 126, 127, 6, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 129, 6, 96, 6, 130, 101, 6, 6, 6, 6, 131, 6, 81, 6, 132, 133, 134, 134, 6, 0, 135, 0, 0, 0, 0, 0, 0, 136, 137, 15, 6, 138, 15, 6, 82, 139, 140, 6, 6, 141, 67, 0, 24, 6, 6, 6, 6, 6, 100, 0, 0, 6, 6, 6, 6, 6, 6, 100, 0, 6, 6, 6, 6, 142, 0, 24, 80, 143, 144, 6, 145, 6, 6, 6, 26, 146, 147, 6, 6, 148, 149, 0, 146, 6, 150, 6, 93, 6, 6, 151, 152, 6, 153, 93, 77, 6, 6, 154, 101, 6, 133, 155, 156, 6, 6, 157, 158, 159, 160, 82, 161, 6, 6, 6, 162, 6, 6, 6, 6, 6, 163, 164, 29, 6, 6, 6, 153, 6, 6, 165, 0, 166, 167, 168, 6, 6, 26, 169, 6, 6, 80, 24, 6, 170, 6, 150, 171, 88, 172, 173, 174, 6, 6, 6, 77, 1, 2, 3, 103, 6, 101, 175, 0, 176, 177, 178, 0, 6, 6, 6, 67, 0, 0, 6, 30, 0, 0, 0, 179, 0, 0, 0, 0, 77, 6, 124, 180, 6, 24, 99, 67, 80, 6, 181, 0, 6, 6, 6, 6, 80, 96, 0, 0, 6, 182, 6, 183, 0, 0, 0, 0, 6, 133, 100, 150, 0, 0, 0, 0, 184, 185, 100, 133, 101, 0, 0, 186, 100, 165, 0, 0, 6, 187, 0, 0, 188, 189, 0, 77, 77, 0, 74, 190, 6, 100, 100, 191, 26, 0, 0, 0, 6, 6, 128, 0, 6, 191, 6, 191, 6, 6, 190, 192, 6, 67, 24, 193, 6, 194, 24, 195, 6, 6, 196, 0, 197, 98, 0, 0, 198, 199, 6, 200, 33, 42, 201, 202, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 203, 0, 0, 0, 0, 0, 6, 204, 205, 0, 6, 6, 206, 0, 6, 98, 96, 0, 207, 109, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 208, 0, 0, 0, 0, 0, 0, 6, 209, 6, 6, 6, 6, 165, 0, 0, 0, 6, 6, 6, 141, 6, 6, 6, 6, 6, 6, 183, 0, 0, 0, 0, 0, 6, 141, 0, 0, 0, 0, 0, 0, 6, 6, 190, 0, 0, 0, 0, 0, 6, 209, 101, 96, 0, 0, 24, 104, 6, 133, 210, 211, 88, 0, 0, 0, 6, 6, 212, 101, 213, 0, 0, 0, 214, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 215, 216, 0, 0, 0, 0, 0, 0, 217, 218, 219, 0, 0, 0, 0, 220, 0, 0, 0, 0, 0, 6, 6, 194, 6, 221, 222, 223, 6, 224, 225, 226, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 227, 228, 82, 194, 194, 130, 130, 229, 229, 230, 6, 6, 231, 6, 232, 233, 234, 0, 0, 6, 6, 6, 6, 6, 6, 235, 0, 223, 236, 237, 238, 239, 240, 0, 0, 0, 24, 79, 79, 96, 0, 0, 0, 6, 6, 6, 6, 6, 6, 133, 0, 6, 30, 6, 6, 6, 6, 6, 6, 80, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 214, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 88, }; static RE_UINT8 re_word_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 135, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 255, 255, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 254, 255, 255, 255, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 195, 255, 255, 255, 255, 239, 159, 255, 253, 255, 159, 0, 0, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 63, 4, 255, 63, 0, 0, 255, 255, 255, 15, 255, 255, 31, 0, 248, 255, 255, 255, 207, 255, 254, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 3, 0, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 0, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 0, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 0, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 12, 0, 255, 255, 255, 7, 255, 127, 255, 3, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 1, 0, 0, 3, 255, 3, 160, 194, 255, 254, 255, 255, 255, 31, 254, 255, 223, 255, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 255, 3, 255, 255, 255, 255, 255, 63, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 31, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 143, 48, 255, 3, 0, 0, 0, 56, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 15, 192, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 3, 255, 255, 255, 159, 128, 0, 255, 127, 255, 15, 255, 3, 0, 248, 15, 0, 255, 227, 255, 255, 0, 0, 247, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 48, 0, 0, 0, 0, 0, 128, 1, 0, 16, 0, 0, 0, 2, 128, 0, 0, 255, 31, 255, 255, 1, 0, 132, 252, 47, 62, 80, 189, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 0, 0, 192, 255, 255, 127, 255, 255, 31, 248, 15, 0, 255, 128, 0, 128, 255, 255, 127, 0, 127, 127, 127, 127, 0, 128, 0, 0, 224, 0, 0, 0, 254, 255, 62, 31, 255, 255, 127, 230, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 0, 0, 255, 31, 255, 255, 255, 15, 0, 0, 255, 255, 247, 191, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 255, 0, 0, 0, 31, 0, 255, 3, 255, 255, 255, 40, 255, 63, 255, 255, 1, 128, 255, 3, 255, 63, 255, 3, 255, 255, 127, 252, 7, 0, 0, 56, 255, 255, 124, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 55, 255, 3, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 255, 255, 24, 0, 0, 224, 0, 0, 0, 0, 223, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 0, 0, 0, 32, 1, 0, 0, 0, 15, 255, 62, 0, 255, 0, 255, 255, 15, 0, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 255, 255, 15, 135, 127, 0, 0, 0, 255, 255, 7, 0, 192, 255, 0, 128, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 79, 0, 31, 28, 255, 23, 255, 255, 251, 255, 127, 189, 255, 191, 255, 1, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 191, 0, 255, 3, 255, 255, 63, 255, 1, 0, 0, 63, 17, 0, 255, 3, 255, 255, 255, 227, 255, 3, 0, 128, 255, 255, 255, 1, 15, 0, 255, 3, 248, 255, 255, 224, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 99, 224, 227, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 207, 255, 255, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, 31, 0, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* Word: 2214 bytes. */ RE_UINT32 re_get_word(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_word_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_word_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_word_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_word_stage_4[pos + f] << 5; pos += code; value = (re_word_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* XDigit. */ static RE_UINT8 re_xdigit_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_xdigit_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 4, 5, 6, 2, 2, 2, 2, 7, 2, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_xdigit_stage_3[] = { 0, 1, 1, 1, 1, 1, 2, 3, 1, 4, 4, 4, 4, 4, 5, 6, 7, 1, 1, 1, 1, 1, 1, 8, 9, 10, 11, 12, 13, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 14, 15, 16, 17, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 18, 1, 1, 1, 1, 19, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 20, 21, 17, 1, 14, 1, 22, 23, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 24, 16, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 25, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_xdigit_stage_4[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 0, 2, 2, 2, 4, 2, 5, 2, 5, 2, 6, 2, 6, 3, 2, 2, 2, 2, 4, 6, 2, 2, 2, 2, 3, 6, 2, 2, 2, 2, 7, 2, 6, 2, 2, 8, 2, 2, 6, 0, 2, 2, 8, 2, 2, 2, 2, 2, 6, 4, 2, 2, 9, 2, 6, 2, 2, 2, 2, 2, 0, 10, 11, 2, 2, 2, 2, 3, 2, 2, 5, 2, 0, 12, 2, 2, 6, 2, 6, 2, 4, 0, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 13, }; static RE_UINT8 re_xdigit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 3, 0, 0, 255, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 255, 0, 0, 0, 0, 255, 3, 0, 0, 0, 0, 192, 255, 0, 0, 0, 0, 0, 0, 255, 3, 255, 3, 0, 0, 0, 0, 0, 0, 255, 3, 0, 0, 255, 3, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 255, 0, 192, 255, 255, 255, 255, 255, 255, }; /* XDigit: 425 bytes. */ RE_UINT32 re_get_xdigit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_xdigit_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_xdigit_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_xdigit_stage_3[pos + f] << 2; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_xdigit_stage_4[pos + f] << 6; pos += code; value = (re_xdigit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Posix_Digit. */ static RE_UINT8 re_posix_digit_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_digit_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_digit_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_digit_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_digit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 0, 0, 0, 0, 0, 0, 0, 0, }; /* Posix_Digit: 97 bytes. */ RE_UINT32 re_get_posix_digit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_posix_digit_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_posix_digit_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_posix_digit_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_posix_digit_stage_4[pos + f] << 6; pos += code; value = (re_posix_digit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Posix_AlNum. */ static RE_UINT8 re_posix_alnum_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_posix_alnum_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 27, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_posix_alnum_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 31, 36, 37, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 48, 49, 1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 1, 59, 60, 61, 62, 63, 64, 31, 31, 31, 65, 66, 67, 68, 69, 70, 71, 72, 73, 31, 74, 31, 31, 31, 31, 31, 1, 1, 1, 75, 76, 77, 31, 31, 1, 1, 1, 1, 78, 31, 31, 31, 31, 31, 31, 31, 1, 1, 79, 31, 1, 1, 80, 81, 31, 31, 31, 82, 83, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 84, 31, 31, 31, 31, 31, 31, 31, 85, 86, 87, 88, 89, 31, 31, 31, 31, 31, 90, 31, 31, 91, 31, 31, 31, 31, 31, 31, 1, 1, 1, 1, 1, 1, 92, 1, 1, 1, 1, 1, 1, 1, 1, 93, 94, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 95, 31, 1, 1, 96, 31, 31, 31, 31, 31, }; static RE_UINT8 re_posix_alnum_stage_4[] = { 0, 1, 2, 2, 0, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 0, 0, 8, 9, 10, 11, 5, 12, 5, 5, 5, 5, 13, 5, 5, 5, 5, 14, 15, 16, 17, 18, 19, 20, 21, 5, 22, 23, 5, 5, 24, 25, 26, 5, 27, 5, 5, 28, 29, 30, 31, 32, 33, 0, 0, 34, 0, 35, 5, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 48, 52, 53, 54, 55, 56, 0, 57, 58, 59, 60, 61, 62, 63, 64, 61, 65, 66, 67, 68, 69, 70, 71, 16, 72, 73, 0, 74, 75, 76, 0, 77, 0, 78, 79, 80, 81, 0, 0, 5, 82, 26, 83, 84, 5, 85, 86, 5, 5, 87, 5, 88, 89, 90, 5, 91, 5, 92, 0, 93, 5, 5, 94, 16, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 95, 2, 5, 5, 96, 97, 98, 98, 99, 5, 100, 101, 0, 0, 5, 5, 102, 5, 103, 5, 104, 105, 106, 26, 107, 5, 108, 109, 0, 110, 5, 105, 111, 0, 112, 0, 0, 5, 113, 114, 0, 5, 115, 5, 116, 5, 104, 117, 118, 0, 0, 0, 119, 5, 5, 5, 5, 5, 5, 0, 120, 94, 5, 121, 118, 5, 122, 123, 124, 0, 0, 0, 125, 126, 0, 0, 0, 127, 128, 129, 5, 130, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 131, 5, 109, 5, 132, 105, 5, 5, 5, 5, 133, 5, 85, 5, 134, 135, 136, 136, 5, 0, 137, 0, 0, 0, 0, 0, 0, 138, 139, 16, 5, 140, 16, 5, 86, 141, 142, 5, 5, 143, 72, 0, 26, 5, 5, 5, 5, 5, 104, 0, 0, 5, 5, 5, 5, 5, 5, 104, 0, 5, 5, 5, 5, 32, 0, 26, 118, 144, 145, 5, 146, 5, 5, 5, 93, 147, 148, 5, 5, 149, 150, 0, 147, 151, 17, 5, 98, 5, 5, 60, 152, 29, 103, 153, 81, 5, 154, 137, 155, 5, 135, 156, 157, 5, 105, 158, 159, 160, 161, 86, 162, 5, 5, 5, 163, 5, 5, 5, 5, 5, 164, 165, 110, 5, 5, 5, 166, 5, 5, 167, 0, 168, 169, 170, 5, 5, 28, 171, 5, 5, 118, 26, 5, 172, 5, 17, 173, 0, 0, 0, 174, 5, 5, 5, 81, 0, 2, 2, 175, 5, 105, 176, 0, 177, 178, 179, 0, 5, 5, 5, 72, 0, 0, 5, 34, 0, 0, 0, 0, 0, 0, 0, 0, 81, 5, 180, 0, 5, 26, 103, 72, 118, 5, 181, 0, 5, 5, 5, 5, 118, 0, 0, 0, 5, 182, 5, 60, 0, 0, 0, 0, 5, 135, 104, 17, 0, 0, 0, 0, 183, 184, 104, 135, 105, 0, 0, 185, 104, 167, 0, 0, 5, 186, 0, 0, 187, 98, 0, 81, 81, 0, 78, 188, 5, 104, 104, 153, 28, 0, 0, 0, 5, 5, 130, 0, 5, 153, 5, 153, 5, 5, 189, 0, 148, 33, 26, 130, 5, 153, 26, 190, 5, 5, 191, 0, 192, 193, 0, 0, 194, 195, 5, 130, 39, 48, 196, 60, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 197, 0, 0, 0, 0, 0, 5, 198, 199, 0, 5, 105, 200, 0, 5, 104, 0, 0, 201, 163, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 202, 0, 0, 0, 0, 0, 0, 5, 33, 5, 5, 5, 5, 167, 0, 0, 0, 5, 5, 5, 143, 5, 5, 5, 5, 5, 5, 60, 0, 0, 0, 0, 0, 5, 143, 0, 0, 0, 0, 0, 0, 5, 5, 203, 0, 0, 0, 0, 0, 5, 33, 105, 0, 0, 0, 26, 156, 5, 135, 60, 204, 93, 0, 0, 0, 5, 5, 205, 105, 171, 0, 0, 0, 206, 0, 0, 0, 0, 0, 0, 0, 5, 5, 5, 207, 208, 0, 0, 0, 5, 5, 209, 5, 210, 211, 212, 5, 213, 214, 215, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 216, 217, 86, 209, 209, 132, 132, 218, 218, 219, 0, 5, 5, 5, 5, 5, 5, 188, 0, 212, 220, 221, 222, 223, 224, 0, 0, 0, 26, 225, 225, 109, 0, 0, 0, 5, 5, 5, 5, 5, 5, 135, 0, 5, 34, 5, 5, 5, 5, 5, 5, 118, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 206, 0, 0, 118, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_posix_alnum_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 32, 0, 0, 0, 0, 0, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 255, 255, 254, 0, 192, 255, 255, 255, 255, 239, 31, 254, 225, 0, 156, 0, 0, 255, 255, 0, 224, 255, 255, 255, 255, 3, 0, 0, 252, 255, 255, 255, 7, 48, 4, 255, 255, 255, 252, 255, 31, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 248, 3, 255, 255, 255, 255, 255, 239, 255, 223, 225, 255, 15, 0, 254, 255, 239, 159, 249, 255, 255, 253, 197, 227, 159, 89, 128, 176, 15, 0, 3, 0, 238, 135, 249, 255, 255, 253, 109, 195, 135, 25, 2, 94, 0, 0, 63, 0, 238, 191, 251, 255, 255, 253, 237, 227, 191, 27, 1, 0, 15, 0, 0, 2, 238, 159, 249, 255, 159, 25, 192, 176, 15, 0, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 29, 129, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 29, 96, 7, 15, 0, 0, 0, 238, 223, 253, 255, 255, 253, 239, 227, 223, 29, 96, 64, 15, 0, 6, 0, 255, 255, 255, 231, 223, 93, 128, 128, 15, 0, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 128, 95, 255, 0, 0, 12, 0, 255, 255, 255, 7, 127, 32, 0, 0, 150, 37, 240, 254, 174, 236, 255, 59, 95, 32, 0, 240, 1, 0, 0, 0, 255, 254, 255, 255, 255, 31, 254, 255, 3, 255, 255, 254, 255, 255, 255, 31, 255, 255, 127, 249, 231, 193, 255, 255, 127, 64, 0, 48, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 135, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 15, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 207, 255, 255, 1, 128, 16, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 1, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 0, 0, 255, 255, 255, 15, 254, 255, 31, 0, 128, 0, 0, 0, 255, 255, 239, 255, 239, 15, 0, 0, 255, 243, 0, 252, 191, 255, 3, 0, 0, 224, 0, 252, 255, 255, 255, 63, 0, 222, 111, 0, 128, 255, 31, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 62, 80, 189, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 0, 0, 192, 255, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 255, 255, 127, 0, 127, 127, 127, 127, 0, 128, 0, 0, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 224, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 0, 12, 0, 0, 255, 127, 240, 143, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 0, 0, 252, 40, 255, 255, 7, 0, 255, 255, 247, 255, 223, 255, 0, 124, 255, 63, 0, 0, 255, 255, 127, 196, 5, 0, 0, 56, 255, 255, 60, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 7, 0, 0, 15, 0, 255, 255, 127, 248, 255, 255, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 0, 0, 223, 255, 192, 255, 255, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 15, 255, 62, 0, 255, 0, 255, 255, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 31, 0, 0, 0, 63, 0, 0, 0, 255, 255, 71, 0, 30, 0, 0, 20, 255, 255, 251, 255, 255, 255, 159, 0, 127, 189, 255, 191, 255, 1, 255, 255, 159, 25, 129, 224, 179, 0, 0, 0, 255, 255, 63, 127, 0, 0, 0, 63, 17, 0, 0, 0, 255, 255, 255, 227, 0, 0, 0, 128, 127, 0, 0, 0, 248, 255, 255, 224, 31, 0, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 67, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 15, 0, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 255, 3, 255, 255, }; /* Posix_AlNum: 2089 bytes. */ RE_UINT32 re_get_posix_alnum(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_posix_alnum_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_posix_alnum_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_posix_alnum_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_posix_alnum_stage_4[pos + f] << 5; pos += code; value = (re_posix_alnum_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Posix_Punct. */ static RE_UINT8 re_posix_punct_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_posix_punct_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 7, 7, 7, 7, 7, 7, 7, 7, 7, 11, 12, 13, 14, 7, 15, 7, 7, 7, 7, 7, 7, 7, 7, 16, 7, 7, 7, 7, 7, 7, 7, 7, 7, 17, 7, 7, 18, 19, 7, 20, 21, 22, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, }; static RE_UINT8 re_posix_punct_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 1, 17, 18, 1, 19, 20, 21, 22, 23, 24, 25, 1, 1, 26, 27, 28, 29, 30, 31, 29, 29, 32, 29, 29, 29, 33, 34, 35, 36, 37, 38, 39, 40, 29, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 41, 1, 1, 1, 1, 1, 1, 42, 1, 43, 44, 45, 46, 47, 48, 1, 1, 1, 1, 1, 1, 1, 49, 1, 50, 51, 52, 1, 53, 1, 54, 1, 55, 1, 1, 56, 57, 58, 59, 1, 1, 1, 1, 60, 61, 62, 1, 63, 64, 65, 66, 1, 1, 1, 1, 67, 1, 1, 1, 1, 1, 68, 69, 1, 1, 1, 1, 1, 1, 1, 1, 70, 1, 1, 1, 71, 72, 73, 74, 1, 1, 75, 76, 29, 29, 77, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 10, 1, 78, 79, 80, 29, 29, 81, 82, 83, 84, 85, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_punct_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 0, 0, 0, 8, 9, 0, 0, 10, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 12, 0, 13, 14, 15, 16, 17, 0, 0, 18, 0, 0, 19, 20, 21, 0, 0, 0, 0, 0, 0, 22, 0, 23, 14, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 25, 0, 0, 0, 0, 0, 0, 0, 26, 0, 0, 0, 27, 0, 0, 0, 28, 0, 0, 0, 29, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 31, 0, 29, 32, 0, 0, 0, 0, 0, 33, 34, 0, 0, 35, 36, 37, 0, 0, 0, 38, 0, 36, 0, 0, 39, 0, 0, 0, 40, 41, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 43, 44, 0, 0, 45, 0, 46, 0, 0, 0, 0, 47, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 36, 50, 36, 0, 0, 0, 0, 51, 0, 0, 0, 0, 12, 52, 0, 0, 0, 53, 0, 54, 0, 36, 0, 0, 55, 0, 0, 0, 0, 0, 0, 56, 57, 58, 59, 60, 61, 62, 63, 61, 0, 0, 64, 65, 66, 0, 67, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 68, 50, 69, 48, 0, 53, 70, 0, 0, 50, 50, 50, 70, 71, 50, 50, 50, 50, 50, 50, 72, 73, 74, 75, 76, 0, 0, 0, 0, 0, 0, 0, 77, 0, 0, 0, 27, 0, 0, 0, 0, 50, 78, 79, 0, 80, 50, 50, 81, 50, 50, 50, 50, 50, 50, 70, 82, 83, 84, 0, 0, 44, 42, 0, 39, 0, 0, 0, 0, 85, 0, 50, 86, 61, 87, 88, 50, 87, 89, 50, 61, 0, 0, 0, 0, 0, 0, 50, 50, 0, 0, 0, 0, 59, 50, 69, 36, 90, 0, 0, 91, 0, 0, 0, 92, 93, 94, 0, 0, 95, 0, 0, 0, 0, 96, 0, 97, 0, 0, 98, 99, 0, 98, 29, 0, 0, 0, 100, 0, 0, 0, 53, 101, 0, 0, 36, 26, 0, 0, 39, 0, 0, 0, 0, 102, 0, 103, 0, 0, 0, 104, 94, 0, 0, 36, 0, 0, 0, 0, 0, 105, 41, 59, 106, 107, 0, 0, 0, 0, 1, 2, 2, 108, 0, 0, 0, 109, 79, 110, 0, 111, 112, 42, 59, 113, 0, 0, 0, 0, 29, 0, 27, 0, 0, 0, 0, 114, 0, 0, 0, 0, 0, 0, 5, 115, 0, 0, 0, 0, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 116, 29, 0, 0, 117, 118, 0, 111, 0, 0, 119, 0, 0, 0, 0, 0, 120, 0, 0, 121, 94, 0, 0, 0, 86, 122, 0, 0, 123, 0, 0, 124, 0, 0, 0, 103, 0, 0, 0, 0, 0, 0, 0, 0, 125, 0, 0, 0, 0, 0, 0, 0, 126, 0, 0, 0, 127, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 98, 0, 0, 0, 129, 0, 110, 130, 0, 0, 0, 0, 0, 0, 0, 0, 0, 131, 0, 0, 0, 50, 50, 50, 50, 50, 50, 50, 70, 50, 132, 50, 133, 134, 135, 50, 40, 50, 50, 136, 0, 0, 0, 0, 0, 50, 50, 93, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 137, 39, 129, 129, 114, 114, 103, 103, 138, 0, 0, 139, 0, 140, 141, 0, 0, 0, 50, 142, 50, 50, 81, 143, 144, 70, 59, 145, 38, 146, 147, 0, 0, 148, 149, 68, 150, 0, 0, 0, 0, 0, 50, 50, 50, 80, 50, 151, 50, 50, 50, 50, 50, 50, 50, 50, 89, 152, 50, 50, 50, 81, 50, 50, 153, 0, 142, 50, 154, 50, 60, 21, 0, 0, 116, 0, 0, 0, 155, 0, 42, 0, }; static RE_UINT8 re_posix_punct_stage_5[] = { 0, 0, 0, 0, 254, 255, 0, 252, 1, 0, 0, 248, 1, 0, 0, 120, 254, 219, 211, 137, 0, 0, 128, 0, 60, 0, 252, 255, 224, 175, 255, 255, 0, 0, 32, 64, 176, 0, 0, 0, 0, 0, 64, 0, 4, 0, 0, 0, 0, 0, 0, 252, 0, 230, 0, 0, 0, 0, 0, 64, 73, 0, 0, 0, 0, 0, 24, 0, 192, 255, 0, 200, 0, 60, 0, 0, 0, 0, 16, 64, 0, 2, 0, 96, 255, 63, 0, 0, 0, 0, 192, 3, 0, 0, 255, 127, 48, 0, 1, 0, 0, 0, 12, 12, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 248, 7, 0, 0, 0, 128, 0, 0, 0, 2, 0, 0, 16, 0, 0, 128, 0, 12, 254, 255, 255, 252, 0, 0, 80, 61, 32, 0, 0, 0, 0, 0, 0, 192, 191, 223, 255, 7, 0, 252, 0, 0, 0, 0, 0, 8, 255, 1, 0, 0, 0, 0, 255, 3, 1, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 24, 0, 56, 0, 0, 0, 0, 96, 0, 0, 0, 112, 15, 255, 7, 0, 0, 49, 0, 0, 0, 255, 255, 255, 255, 127, 63, 0, 0, 255, 7, 240, 31, 0, 0, 0, 240, 0, 0, 0, 248, 255, 0, 8, 0, 0, 0, 0, 160, 3, 224, 0, 224, 0, 224, 0, 96, 0, 0, 255, 255, 255, 0, 255, 255, 255, 255, 255, 127, 0, 0, 0, 124, 0, 124, 0, 0, 123, 3, 208, 193, 175, 66, 0, 12, 31, 188, 0, 0, 0, 12, 255, 255, 255, 255, 255, 7, 127, 0, 0, 0, 255, 255, 63, 0, 0, 0, 240, 255, 255, 255, 207, 255, 255, 255, 63, 255, 255, 255, 255, 227, 255, 253, 3, 0, 0, 240, 0, 0, 224, 7, 0, 222, 255, 127, 255, 255, 7, 0, 0, 0, 255, 255, 255, 251, 255, 255, 15, 0, 0, 0, 255, 15, 30, 255, 255, 255, 1, 0, 193, 224, 0, 0, 195, 255, 15, 0, 0, 0, 0, 252, 255, 255, 255, 0, 1, 0, 255, 255, 1, 0, 0, 224, 0, 0, 0, 0, 8, 64, 0, 0, 252, 0, 255, 255, 127, 0, 3, 0, 0, 0, 0, 6, 0, 0, 0, 15, 192, 3, 0, 0, 240, 0, 0, 192, 0, 0, 0, 0, 0, 23, 254, 63, 0, 192, 0, 0, 128, 3, 0, 8, 0, 0, 0, 2, 0, 0, 0, 0, 252, 255, 0, 0, 0, 48, 255, 255, 247, 255, 127, 15, 0, 0, 63, 0, 0, 0, 127, 127, 0, 48, 0, 0, 128, 255, 0, 0, 0, 254, 255, 19, 255, 15, 255, 255, 255, 31, 0, 128, 0, 0, 0, 0, 128, 1, 0, 0, 255, 1, 0, 1, 0, 0, 0, 0, 127, 0, 0, 0, 0, 30, 128, 63, 0, 0, 0, 0, 0, 216, 0, 0, 48, 0, 224, 35, 0, 232, 0, 0, 0, 63, 64, 0, 0, 0, 254, 255, 255, 0, 14, 0, 0, 0, 0, 0, 31, 0, 0, 0, 32, 0, 48, 0, 0, 0, 0, 0, 0, 144, 127, 254, 255, 255, 31, 28, 0, 0, 24, 240, 255, 255, 255, 195, 255, 255, 35, 0, 0, 0, 2, 0, 0, 8, 8, 0, 0, 0, 0, 0, 128, 7, 0, 224, 223, 255, 239, 15, 0, 0, 255, 15, 255, 255, 255, 127, 254, 255, 254, 255, 254, 255, 255, 127, 0, 0, 0, 12, 0, 0, 0, 252, 255, 7, 192, 255, 255, 255, 7, 0, 255, 255, 255, 1, 3, 0, 239, 255, 255, 255, 255, 31, 15, 0, 255, 255, 31, 0, 255, 0, 255, 3, 31, 0, 0, 0, }; /* Posix_Punct: 1609 bytes. */ RE_UINT32 re_get_posix_punct(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_posix_punct_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_posix_punct_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_posix_punct_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_posix_punct_stage_4[pos + f] << 5; pos += code; value = (re_posix_punct_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Posix_XDigit. */ static RE_UINT8 re_posix_xdigit_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_xdigit_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_xdigit_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_xdigit_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_xdigit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; /* Posix_XDigit: 97 bytes. */ RE_UINT32 re_get_posix_xdigit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_posix_xdigit_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_posix_xdigit_stage_2[pos + f] << 3; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_posix_xdigit_stage_3[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_posix_xdigit_stage_4[pos + f] << 7; pos += code; value = (re_posix_xdigit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* All_Cases. */ static RE_UINT8 re_all_cases_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 4, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_all_cases_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 16, 17, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 19, 6, 6, 6, 20, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 23, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_all_cases_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 9, 0, 10, 11, 12, 13, 14, 15, 16, 17, 18, 18, 18, 18, 18, 18, 19, 20, 21, 22, 18, 18, 18, 18, 18, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 21, 34, 18, 18, 35, 18, 18, 18, 18, 18, 36, 18, 37, 38, 39, 18, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 0, 0, 0, 0, 0, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 18, 18, 18, 64, 65, 66, 66, 11, 11, 11, 11, 15, 15, 15, 15, 67, 67, 18, 18, 18, 18, 68, 69, 18, 18, 18, 18, 18, 18, 70, 71, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 72, 73, 73, 73, 74, 0, 75, 76, 76, 76, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78, 78, 78, 78, 79, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 82, 83, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 85, 18, 18, 18, 18, 18, 86, 87, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 88, 89, 82, 83, 88, 89, 88, 89, 82, 83, 90, 91, 88, 89, 92, 93, 88, 89, 88, 89, 88, 89, 94, 95, 96, 97, 98, 99, 100, 101, 96, 102, 0, 0, 0, 0, 103, 104, 105, 0, 0, 106, 0, 0, 107, 107, 108, 108, 109, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 110, 111, 111, 111, 112, 112, 112, 113, 0, 0, 73, 73, 73, 73, 73, 74, 76, 76, 76, 76, 76, 77, 114, 115, 116, 117, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 37, 118, 119, 0, 120, 120, 120, 120, 121, 122, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 18, 18, 18, 18, 86, 0, 0, 18, 18, 18, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 18, 69, 18, 18, 18, 18, 18, 18, 18, 0, 123, 18, 124, 51, 18, 18, 125, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 0, 0, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 0, 0, 0, 0, 0, 0, 0, 0, 129, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 11, 11, 4, 5, 15, 15, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 130, 130, 130, 130, 131, 131, 131, 131, 131, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 132, 132, 132, 132, 132, 132, 133, 0, 134, 134, 134, 134, 134, 134, 135, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 15, 15, 15, 15, 0, 0, 0, 0, }; static RE_UINT8 re_all_cases_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 7, 5, 5, 5, 5, 5, 5, 5, 8, 5, 5, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 1, 1, 1, 1, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 11, 5, 5, 5, 5, 5, 12, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 0, 5, 5, 5, 5, 5, 5, 5, 13, 14, 15, 14, 15, 14, 15, 14, 15, 16, 17, 14, 15, 14, 15, 14, 15, 0, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 0, 14, 15, 14, 15, 14, 15, 18, 14, 15, 14, 15, 14, 15, 19, 20, 21, 14, 15, 14, 15, 22, 14, 15, 23, 23, 14, 15, 0, 24, 25, 26, 14, 15, 23, 27, 28, 29, 30, 14, 15, 31, 0, 29, 32, 33, 34, 14, 15, 14, 15, 14, 15, 35, 14, 15, 35, 0, 0, 14, 15, 35, 14, 15, 36, 36, 14, 15, 14, 15, 37, 14, 15, 0, 0, 14, 15, 0, 38, 0, 0, 0, 0, 39, 40, 41, 39, 40, 41, 39, 40, 41, 14, 15, 14, 15, 14, 15, 14, 15, 42, 14, 15, 0, 39, 40, 41, 14, 15, 43, 44, 45, 0, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 0, 0, 0, 0, 0, 0, 46, 14, 15, 47, 48, 49, 49, 14, 15, 50, 51, 52, 14, 15, 53, 54, 55, 56, 57, 0, 58, 58, 0, 59, 0, 60, 61, 0, 0, 0, 58, 62, 0, 63, 0, 64, 65, 0, 66, 67, 0, 68, 69, 0, 0, 67, 0, 70, 71, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 73, 0, 0, 74, 0, 0, 74, 0, 0, 0, 75, 74, 76, 77, 77, 78, 0, 0, 0, 0, 0, 79, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 81, 0, 0, 0, 0, 0, 0, 82, 0, 0, 14, 15, 14, 15, 0, 0, 14, 15, 0, 0, 0, 33, 33, 33, 0, 83, 0, 0, 0, 0, 0, 0, 84, 0, 85, 85, 85, 0, 86, 0, 87, 87, 88, 1, 89, 1, 1, 90, 1, 1, 91, 92, 93, 1, 94, 1, 1, 1, 95, 96, 0, 97, 1, 1, 98, 1, 1, 99, 1, 1, 100, 101, 101, 101, 102, 5, 103, 5, 5, 104, 5, 5, 105, 106, 107, 5, 108, 5, 5, 5, 109, 110, 111, 112, 5, 5, 113, 5, 5, 114, 5, 5, 115, 116, 116, 117, 118, 119, 0, 0, 0, 120, 121, 122, 123, 124, 125, 126, 127, 128, 0, 14, 15, 129, 14, 15, 0, 45, 45, 45, 130, 130, 130, 130, 130, 130, 130, 130, 131, 131, 131, 131, 131, 131, 131, 131, 14, 15, 0, 0, 0, 0, 0, 0, 0, 0, 14, 15, 14, 15, 14, 15, 132, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 133, 0, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 0, 0, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 0, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 0, 136, 0, 0, 0, 0, 0, 136, 0, 0, 137, 137, 137, 137, 137, 137, 137, 137, 117, 117, 117, 117, 117, 117, 0, 0, 122, 122, 122, 122, 122, 122, 0, 0, 0, 138, 0, 0, 0, 139, 0, 0, 140, 141, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 0, 0, 0, 0, 0, 142, 0, 0, 143, 0, 117, 117, 117, 117, 117, 117, 117, 117, 122, 122, 122, 122, 122, 122, 122, 122, 0, 117, 0, 117, 0, 117, 0, 117, 0, 122, 0, 122, 0, 122, 0, 122, 144, 144, 145, 145, 145, 145, 146, 146, 147, 147, 148, 148, 149, 149, 0, 0, 117, 117, 0, 150, 0, 0, 0, 0, 122, 122, 151, 151, 152, 0, 153, 0, 0, 0, 0, 150, 0, 0, 0, 0, 154, 154, 154, 154, 152, 0, 0, 0, 117, 117, 0, 155, 0, 0, 0, 0, 122, 122, 156, 156, 0, 0, 0, 0, 117, 117, 0, 157, 0, 125, 0, 0, 122, 122, 158, 158, 129, 0, 0, 0, 159, 159, 160, 160, 152, 0, 0, 0, 0, 0, 0, 0, 0, 0, 161, 0, 0, 0, 162, 163, 0, 0, 0, 0, 0, 0, 164, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 165, 0, 166, 166, 166, 166, 166, 166, 166, 166, 167, 167, 167, 167, 167, 167, 167, 167, 0, 0, 0, 14, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 169, 169, 169, 169, 169, 169, 169, 169, 169, 169, 0, 0, 0, 0, 0, 0, 14, 15, 170, 171, 172, 173, 174, 14, 15, 14, 15, 14, 15, 175, 176, 177, 178, 0, 14, 15, 0, 14, 15, 0, 0, 0, 0, 0, 0, 0, 179, 179, 0, 0, 0, 14, 15, 14, 15, 0, 0, 0, 14, 15, 0, 0, 0, 0, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 0, 180, 0, 0, 0, 0, 0, 180, 0, 0, 0, 14, 15, 14, 15, 181, 14, 15, 0, 0, 0, 14, 15, 182, 0, 0, 14, 15, 183, 184, 185, 186, 0, 0, 187, 188, 189, 190, 14, 15, 14, 15, 0, 0, 0, 191, 0, 0, 0, 0, 192, 192, 192, 192, 192, 192, 192, 192, 0, 0, 0, 0, 0, 14, 15, 0, 193, 193, 193, 193, 193, 193, 193, 193, 194, 194, 194, 194, 194, 194, 194, 194, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 0, 0, 0, 0, 0, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 0, 0, 0, 0, 0, }; /* All_Cases: 2184 bytes. */ static RE_AllCases re_all_cases_table[] = { {{ 0, 0, 0}}, {{ 32, 0, 0}}, {{ 32, 232, 0}}, {{ 32, 8415, 0}}, {{ 32, 300, 0}}, {{ -32, 0, 0}}, {{ -32, 199, 0}}, {{ -32, 8383, 0}}, {{ -32, 268, 0}}, {{ 743, 775, 0}}, {{ 32, 8294, 0}}, {{ 7615, 0, 0}}, {{ -32, 8262, 0}}, {{ 121, 0, 0}}, {{ 1, 0, 0}}, {{ -1, 0, 0}}, {{ -199, 0, 0}}, {{ -232, 0, 0}}, {{ -121, 0, 0}}, {{ -300, -268, 0}}, {{ 195, 0, 0}}, {{ 210, 0, 0}}, {{ 206, 0, 0}}, {{ 205, 0, 0}}, {{ 79, 0, 0}}, {{ 202, 0, 0}}, {{ 203, 0, 0}}, {{ 207, 0, 0}}, {{ 97, 0, 0}}, {{ 211, 0, 0}}, {{ 209, 0, 0}}, {{ 163, 0, 0}}, {{ 213, 0, 0}}, {{ 130, 0, 0}}, {{ 214, 0, 0}}, {{ 218, 0, 0}}, {{ 217, 0, 0}}, {{ 219, 0, 0}}, {{ 56, 0, 0}}, {{ 1, 2, 0}}, {{ -1, 1, 0}}, {{ -2, -1, 0}}, {{ -79, 0, 0}}, {{ -97, 0, 0}}, {{ -56, 0, 0}}, {{ -130, 0, 0}}, {{ 10795, 0, 0}}, {{ -163, 0, 0}}, {{ 10792, 0, 0}}, {{ 10815, 0, 0}}, {{ -195, 0, 0}}, {{ 69, 0, 0}}, {{ 71, 0, 0}}, {{ 10783, 0, 0}}, {{ 10780, 0, 0}}, {{ 10782, 0, 0}}, {{ -210, 0, 0}}, {{ -206, 0, 0}}, {{ -205, 0, 0}}, {{ -202, 0, 0}}, {{ -203, 0, 0}}, {{ 42319, 0, 0}}, {{ 42315, 0, 0}}, {{ -207, 0, 0}}, {{ 42280, 0, 0}}, {{ 42308, 0, 0}}, {{ -209, 0, 0}}, {{ -211, 0, 0}}, {{ 10743, 0, 0}}, {{ 42305, 0, 0}}, {{ 10749, 0, 0}}, {{ -213, 0, 0}}, {{ -214, 0, 0}}, {{ 10727, 0, 0}}, {{ -218, 0, 0}}, {{ 42282, 0, 0}}, {{ -69, 0, 0}}, {{ -217, 0, 0}}, {{ -71, 0, 0}}, {{ -219, 0, 0}}, {{ 42261, 0, 0}}, {{ 42258, 0, 0}}, {{ 84, 116, 7289}}, {{ 116, 0, 0}}, {{ 38, 0, 0}}, {{ 37, 0, 0}}, {{ 64, 0, 0}}, {{ 63, 0, 0}}, {{ 7235, 0, 0}}, {{ 32, 62, 0}}, {{ 32, 96, 0}}, {{ 32, 57, 92}}, {{ -84, 32, 7205}}, {{ 32, 86, 0}}, {{ -743, 32, 0}}, {{ 32, 54, 0}}, {{ 32, 80, 0}}, {{ 31, 32, 0}}, {{ 32, 47, 0}}, {{ 32, 7549, 0}}, {{ -38, 0, 0}}, {{ -37, 0, 0}}, {{ 7219, 0, 0}}, {{ -32, 30, 0}}, {{ -32, 64, 0}}, {{ -32, 25, 60}}, {{ -116, -32, 7173}}, {{ -32, 54, 0}}, {{ -775, -32, 0}}, {{ -32, 22, 0}}, {{ -32, 48, 0}}, {{ -31, 1, 0}}, {{ -32, -1, 0}}, {{ -32, 15, 0}}, {{ -32, 7517, 0}}, {{ -64, 0, 0}}, {{ -63, 0, 0}}, {{ 8, 0, 0}}, {{ -62, -30, 0}}, {{ -57, -25, 35}}, {{ -47, -15, 0}}, {{ -54, -22, 0}}, {{ -8, 0, 0}}, {{ -86, -54, 0}}, {{ -80, -48, 0}}, {{ 7, 0, 0}}, {{ -116, 0, 0}}, {{ -92, -60, -35}}, {{ -96, -64, 0}}, {{ -7, 0, 0}}, {{ 80, 0, 0}}, {{ -80, 0, 0}}, {{ 15, 0, 0}}, {{ -15, 0, 0}}, {{ 48, 0, 0}}, {{ -48, 0, 0}}, {{ 7264, 0, 0}}, {{ 38864, 0, 0}}, {{ 35332, 0, 0}}, {{ 3814, 0, 0}}, {{ 1, 59, 0}}, {{ -1, 58, 0}}, {{ -59, -58, 0}}, {{ -7615, 0, 0}}, {{ 74, 0, 0}}, {{ 86, 0, 0}}, {{ 100, 0, 0}}, {{ 128, 0, 0}}, {{ 112, 0, 0}}, {{ 126, 0, 0}}, {{ 9, 0, 0}}, {{ -74, 0, 0}}, {{ -9, 0, 0}}, {{ -7289, -7205, -7173}}, {{ -86, 0, 0}}, {{ -7235, 0, 0}}, {{ -100, 0, 0}}, {{ -7219, 0, 0}}, {{ -112, 0, 0}}, {{ -128, 0, 0}}, {{ -126, 0, 0}}, {{ -7549, -7517, 0}}, {{ -8415, -8383, 0}}, {{ -8294, -8262, 0}}, {{ 28, 0, 0}}, {{ -28, 0, 0}}, {{ 16, 0, 0}}, {{ -16, 0, 0}}, {{ 26, 0, 0}}, {{ -26, 0, 0}}, {{-10743, 0, 0}}, {{ -3814, 0, 0}}, {{-10727, 0, 0}}, {{-10795, 0, 0}}, {{-10792, 0, 0}}, {{-10780, 0, 0}}, {{-10749, 0, 0}}, {{-10783, 0, 0}}, {{-10782, 0, 0}}, {{-10815, 0, 0}}, {{ -7264, 0, 0}}, {{-35332, 0, 0}}, {{-42280, 0, 0}}, {{-42308, 0, 0}}, {{-42319, 0, 0}}, {{-42315, 0, 0}}, {{-42305, 0, 0}}, {{-42258, 0, 0}}, {{-42282, 0, 0}}, {{-42261, 0, 0}}, {{ 928, 0, 0}}, {{ -928, 0, 0}}, {{-38864, 0, 0}}, {{ 40, 0, 0}}, {{ -40, 0, 0}}, }; /* All_Cases: 2340 bytes. */ int re_get_all_cases(RE_UINT32 ch, RE_UINT32* codepoints) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; RE_AllCases* all_cases; int count; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_all_cases_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_all_cases_stage_2[pos + f] << 5; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_all_cases_stage_3[pos + f] << 3; value = re_all_cases_stage_4[pos + code]; all_cases = &re_all_cases_table[value]; codepoints[0] = ch; count = 1; while (count < RE_MAX_CASES && all_cases->diffs[count - 1] != 0) { codepoints[count] = (RE_UINT32)((RE_INT32)ch + all_cases->diffs[count - 1]); ++count; } return count; } /* Simple_Case_Folding. */ static RE_UINT8 re_simple_case_folding_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 4, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_simple_case_folding_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 17, 6, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_simple_case_folding_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 2, 2, 5, 5, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 7, 8, 8, 7, 6, 6, 6, 6, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 8, 20, 6, 6, 21, 6, 6, 6, 6, 6, 22, 6, 23, 24, 25, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 0, 0, 0, 0, 27, 28, 29, 30, 1, 2, 31, 32, 0, 0, 33, 34, 35, 6, 6, 6, 36, 37, 38, 38, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 39, 7, 6, 6, 6, 6, 6, 6, 40, 41, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 42, 43, 43, 43, 44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 46, 47, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 49, 50, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 51, 0, 48, 0, 51, 0, 51, 0, 48, 0, 52, 0, 51, 0, 0, 0, 51, 0, 51, 0, 51, 0, 53, 0, 54, 0, 55, 0, 56, 0, 57, 0, 0, 0, 0, 58, 59, 60, 0, 0, 0, 0, 0, 61, 61, 0, 0, 62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 64, 64, 64, 0, 0, 0, 0, 0, 0, 43, 43, 43, 43, 43, 44, 0, 0, 0, 0, 0, 0, 65, 66, 67, 68, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 23, 69, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 49, 0, 0, 6, 6, 6, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 6, 7, 6, 6, 6, 6, 6, 6, 6, 0, 70, 6, 71, 27, 6, 6, 72, 73, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 75, 75, 75, 75, 75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 76, 76, 76, 76, 76, 76, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_simple_case_folding_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 3, 0, 3, 0, 3, 0, 3, 0, 0, 0, 3, 0, 3, 0, 3, 0, 0, 3, 0, 3, 0, 3, 0, 3, 4, 3, 0, 3, 0, 3, 0, 5, 0, 6, 3, 0, 3, 0, 7, 3, 0, 8, 8, 3, 0, 0, 9, 10, 11, 3, 0, 8, 12, 0, 13, 14, 3, 0, 0, 0, 13, 15, 0, 16, 3, 0, 3, 0, 3, 0, 17, 3, 0, 17, 0, 0, 3, 0, 17, 3, 0, 18, 18, 3, 0, 3, 0, 19, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 20, 3, 0, 20, 3, 0, 20, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 0, 3, 0, 0, 20, 3, 0, 3, 0, 21, 22, 23, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0, 24, 3, 0, 25, 26, 0, 0, 3, 0, 27, 28, 29, 3, 0, 0, 0, 0, 0, 0, 30, 0, 0, 3, 0, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 31, 0, 32, 32, 32, 0, 33, 0, 34, 34, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 36, 37, 0, 0, 0, 38, 39, 0, 40, 41, 0, 0, 42, 43, 0, 3, 0, 44, 3, 0, 0, 23, 23, 23, 45, 45, 45, 45, 45, 45, 45, 45, 3, 0, 0, 0, 0, 0, 0, 0, 46, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 0, 0, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 0, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 0, 48, 0, 0, 0, 0, 0, 48, 0, 0, 49, 49, 49, 49, 49, 49, 0, 0, 3, 0, 3, 0, 3, 0, 0, 0, 0, 0, 0, 50, 0, 0, 51, 0, 49, 49, 49, 49, 49, 49, 49, 49, 0, 49, 0, 49, 0, 49, 0, 49, 49, 49, 52, 52, 53, 0, 54, 0, 55, 55, 55, 55, 53, 0, 0, 0, 49, 49, 56, 56, 0, 0, 0, 0, 49, 49, 57, 57, 44, 0, 0, 0, 58, 58, 59, 59, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 61, 62, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 0, 0, 64, 64, 64, 64, 64, 64, 64, 64, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 3, 0, 66, 67, 68, 0, 0, 3, 0, 3, 0, 3, 0, 69, 70, 71, 72, 0, 3, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 73, 73, 0, 0, 0, 3, 0, 3, 0, 0, 0, 3, 0, 3, 0, 74, 3, 0, 0, 0, 0, 3, 0, 75, 0, 0, 3, 0, 76, 77, 78, 79, 0, 0, 80, 81, 82, 83, 3, 0, 3, 0, 84, 84, 84, 84, 84, 84, 84, 84, 85, 85, 85, 85, 85, 85, 85, 85, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 0, 0, 0, 0, 0, }; /* Simple_Case_Folding: 1624 bytes. */ static RE_INT32 re_simple_case_folding_table[] = { 0, 32, 775, 1, -121, -268, 210, 206, 205, 79, 202, 203, 207, 211, 209, 213, 214, 218, 217, 219, 2, -97, -56, -130, 10795, -163, 10792, -195, 69, 71, 116, 38, 37, 64, 63, 8, -30, -25, -15, -22, -54, -48, -60, -64, -7, 80, 15, 48, 7264, -8, -58, -7615, -74, -9, -7173, -86, -100, -112, -128, -126, -7517, -8383, -8262, 28, 16, 26, -10743, -3814, -10727, -10780, -10749, -10783, -10782, -10815, -35332, -42280, -42308, -42319, -42315, -42305, -42258, -42282, -42261, 928, -38864, 40, }; /* Simple_Case_Folding: 344 bytes. */ RE_UINT32 re_get_simple_case_folding(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; RE_INT32 diff; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_simple_case_folding_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_simple_case_folding_stage_2[pos + f] << 5; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_simple_case_folding_stage_3[pos + f] << 3; value = re_simple_case_folding_stage_4[pos + code]; diff = re_simple_case_folding_table[value]; return (RE_UINT32)((RE_INT32)ch + diff); } /* Full_Case_Folding. */ static RE_UINT8 re_full_case_folding_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 4, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_full_case_folding_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 17, 6, 6, 6, 18, 6, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_full_case_folding_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 2, 2, 5, 6, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 8, 9, 9, 10, 7, 7, 7, 7, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 9, 22, 7, 7, 23, 7, 7, 7, 7, 7, 24, 7, 25, 26, 27, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28, 0, 0, 0, 0, 0, 29, 30, 31, 32, 33, 2, 34, 35, 36, 0, 37, 38, 39, 7, 7, 7, 40, 41, 42, 42, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 43, 44, 7, 7, 7, 7, 7, 7, 45, 46, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 47, 48, 48, 48, 49, 0, 0, 0, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 51, 51, 51, 51, 52, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 54, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 55, 56, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 0, 57, 0, 54, 0, 57, 0, 57, 0, 54, 58, 59, 0, 57, 0, 0, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 0, 0, 0, 0, 76, 77, 78, 0, 0, 0, 0, 0, 79, 79, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 82, 82, 82, 0, 0, 0, 0, 0, 0, 48, 48, 48, 48, 48, 49, 0, 0, 0, 0, 0, 0, 83, 84, 85, 86, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 25, 87, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 88, 0, 0, 7, 7, 7, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 44, 7, 44, 7, 7, 7, 7, 7, 7, 7, 0, 89, 7, 90, 29, 7, 7, 91, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 0, 0, 0, 0, 0, 0, 0, 0, 94, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 96, 96, 96, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 97, 97, 97, 97, 97, 97, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_full_case_folding_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 3, 4, 0, 4, 0, 4, 0, 4, 0, 5, 0, 4, 0, 4, 0, 4, 0, 0, 4, 0, 4, 0, 4, 0, 4, 0, 6, 4, 0, 4, 0, 4, 0, 7, 4, 0, 4, 0, 4, 0, 8, 0, 9, 4, 0, 4, 0, 10, 4, 0, 11, 11, 4, 0, 0, 12, 13, 14, 4, 0, 11, 15, 0, 16, 17, 4, 0, 0, 0, 16, 18, 0, 19, 4, 0, 4, 0, 4, 0, 20, 4, 0, 20, 0, 0, 4, 0, 20, 4, 0, 21, 21, 4, 0, 4, 0, 22, 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 23, 4, 0, 23, 4, 0, 23, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 0, 4, 0, 24, 23, 4, 0, 4, 0, 25, 26, 27, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 28, 4, 0, 29, 30, 0, 0, 4, 0, 31, 32, 33, 4, 0, 0, 0, 0, 0, 0, 34, 0, 0, 4, 0, 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 34, 0, 0, 0, 0, 0, 0, 35, 0, 36, 36, 36, 0, 37, 0, 38, 38, 39, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 42, 43, 0, 0, 0, 44, 45, 0, 46, 47, 0, 0, 48, 49, 0, 4, 0, 50, 4, 0, 0, 27, 27, 27, 51, 51, 51, 51, 51, 51, 51, 51, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 4, 0, 4, 0, 52, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 0, 0, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 0, 0, 0, 0, 0, 0, 0, 0, 54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 0, 55, 0, 0, 0, 0, 0, 55, 0, 0, 56, 56, 56, 56, 56, 56, 0, 0, 4, 0, 4, 0, 4, 0, 57, 58, 59, 60, 61, 62, 0, 0, 63, 0, 56, 56, 56, 56, 56, 56, 56, 56, 64, 0, 65, 0, 66, 0, 67, 0, 0, 56, 0, 56, 0, 56, 0, 56, 68, 68, 68, 68, 68, 68, 68, 68, 69, 69, 69, 69, 69, 69, 69, 69, 70, 70, 70, 70, 70, 70, 70, 70, 71, 71, 71, 71, 71, 71, 71, 71, 72, 72, 72, 72, 72, 72, 72, 72, 73, 73, 73, 73, 73, 73, 73, 73, 0, 0, 74, 75, 76, 0, 77, 78, 56, 56, 79, 79, 80, 0, 81, 0, 0, 0, 82, 83, 84, 0, 85, 86, 87, 87, 87, 87, 88, 0, 0, 0, 0, 0, 89, 90, 0, 0, 91, 92, 56, 56, 93, 93, 0, 0, 0, 0, 0, 0, 94, 95, 96, 0, 97, 98, 56, 56, 99, 99, 50, 0, 0, 0, 0, 0, 100, 101, 102, 0, 103, 104, 105, 105, 106, 106, 107, 0, 0, 0, 0, 0, 0, 0, 0, 0, 108, 0, 0, 0, 109, 110, 0, 0, 0, 0, 0, 0, 111, 0, 0, 0, 0, 0, 112, 112, 112, 112, 112, 112, 112, 112, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 4, 0, 114, 115, 116, 0, 0, 4, 0, 4, 0, 4, 0, 117, 118, 119, 120, 0, 4, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 121, 121, 0, 0, 0, 4, 0, 4, 0, 0, 4, 0, 4, 0, 4, 0, 0, 0, 0, 4, 0, 4, 0, 122, 4, 0, 0, 0, 0, 4, 0, 123, 0, 0, 4, 0, 124, 125, 126, 127, 0, 0, 128, 129, 130, 131, 4, 0, 4, 0, 132, 132, 132, 132, 132, 132, 132, 132, 133, 134, 135, 136, 137, 138, 139, 0, 0, 0, 0, 140, 141, 142, 143, 144, 145, 145, 145, 145, 145, 145, 145, 145, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 0, 0, 0, 0, 0, }; /* Full_Case_Folding: 1824 bytes. */ static RE_FullCaseFolding re_full_case_folding_table[] = { { 0, { 0, 0}}, { 32, { 0, 0}}, { 775, { 0, 0}}, { -108, { 115, 0}}, { 1, { 0, 0}}, { -199, { 775, 0}}, { 371, { 110, 0}}, { -121, { 0, 0}}, { -268, { 0, 0}}, { 210, { 0, 0}}, { 206, { 0, 0}}, { 205, { 0, 0}}, { 79, { 0, 0}}, { 202, { 0, 0}}, { 203, { 0, 0}}, { 207, { 0, 0}}, { 211, { 0, 0}}, { 209, { 0, 0}}, { 213, { 0, 0}}, { 214, { 0, 0}}, { 218, { 0, 0}}, { 217, { 0, 0}}, { 219, { 0, 0}}, { 2, { 0, 0}}, { -390, { 780, 0}}, { -97, { 0, 0}}, { -56, { 0, 0}}, { -130, { 0, 0}}, { 10795, { 0, 0}}, { -163, { 0, 0}}, { 10792, { 0, 0}}, { -195, { 0, 0}}, { 69, { 0, 0}}, { 71, { 0, 0}}, { 116, { 0, 0}}, { 38, { 0, 0}}, { 37, { 0, 0}}, { 64, { 0, 0}}, { 63, { 0, 0}}, { 41, { 776, 769}}, { 21, { 776, 769}}, { 8, { 0, 0}}, { -30, { 0, 0}}, { -25, { 0, 0}}, { -15, { 0, 0}}, { -22, { 0, 0}}, { -54, { 0, 0}}, { -48, { 0, 0}}, { -60, { 0, 0}}, { -64, { 0, 0}}, { -7, { 0, 0}}, { 80, { 0, 0}}, { 15, { 0, 0}}, { 48, { 0, 0}}, { -34, {1410, 0}}, { 7264, { 0, 0}}, { -8, { 0, 0}}, { -7726, { 817, 0}}, { -7715, { 776, 0}}, { -7713, { 778, 0}}, { -7712, { 778, 0}}, { -7737, { 702, 0}}, { -58, { 0, 0}}, { -7723, { 115, 0}}, { -7051, { 787, 0}}, { -7053, { 787, 768}}, { -7055, { 787, 769}}, { -7057, { 787, 834}}, { -128, { 953, 0}}, { -136, { 953, 0}}, { -112, { 953, 0}}, { -120, { 953, 0}}, { -64, { 953, 0}}, { -72, { 953, 0}}, { -66, { 953, 0}}, { -7170, { 953, 0}}, { -7176, { 953, 0}}, { -7173, { 834, 0}}, { -7174, { 834, 953}}, { -74, { 0, 0}}, { -7179, { 953, 0}}, { -7173, { 0, 0}}, { -78, { 953, 0}}, { -7180, { 953, 0}}, { -7190, { 953, 0}}, { -7183, { 834, 0}}, { -7184, { 834, 953}}, { -86, { 0, 0}}, { -7189, { 953, 0}}, { -7193, { 776, 768}}, { -7194, { 776, 769}}, { -7197, { 834, 0}}, { -7198, { 776, 834}}, { -100, { 0, 0}}, { -7197, { 776, 768}}, { -7198, { 776, 769}}, { -7203, { 787, 0}}, { -7201, { 834, 0}}, { -7202, { 776, 834}}, { -112, { 0, 0}}, { -118, { 953, 0}}, { -7210, { 953, 0}}, { -7206, { 953, 0}}, { -7213, { 834, 0}}, { -7214, { 834, 953}}, { -128, { 0, 0}}, { -126, { 0, 0}}, { -7219, { 953, 0}}, { -7517, { 0, 0}}, { -8383, { 0, 0}}, { -8262, { 0, 0}}, { 28, { 0, 0}}, { 16, { 0, 0}}, { 26, { 0, 0}}, {-10743, { 0, 0}}, { -3814, { 0, 0}}, {-10727, { 0, 0}}, {-10780, { 0, 0}}, {-10749, { 0, 0}}, {-10783, { 0, 0}}, {-10782, { 0, 0}}, {-10815, { 0, 0}}, {-35332, { 0, 0}}, {-42280, { 0, 0}}, {-42308, { 0, 0}}, {-42319, { 0, 0}}, {-42315, { 0, 0}}, {-42305, { 0, 0}}, {-42258, { 0, 0}}, {-42282, { 0, 0}}, {-42261, { 0, 0}}, { 928, { 0, 0}}, {-38864, { 0, 0}}, {-64154, { 102, 0}}, {-64155, { 105, 0}}, {-64156, { 108, 0}}, {-64157, { 102, 105}}, {-64158, { 102, 108}}, {-64146, { 116, 0}}, {-64147, { 116, 0}}, {-62879, {1398, 0}}, {-62880, {1381, 0}}, {-62881, {1387, 0}}, {-62872, {1398, 0}}, {-62883, {1389, 0}}, { 40, { 0, 0}}, }; /* Full_Case_Folding: 1168 bytes. */ int re_get_full_case_folding(RE_UINT32 ch, RE_UINT32* codepoints) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; RE_FullCaseFolding* case_folding; int count; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_full_case_folding_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_full_case_folding_stage_2[pos + f] << 5; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_full_case_folding_stage_3[pos + f] << 3; value = re_full_case_folding_stage_4[pos + code]; case_folding = &re_full_case_folding_table[value]; codepoints[0] = (RE_UINT32)((RE_INT32)ch + case_folding->diff); count = 1; while (count < RE_MAX_FOLDED && case_folding->codepoints[count - 1] != 0) { codepoints[count] = case_folding->codepoints[count - 1]; ++count; } return count; } /* Property function table. */ RE_GetPropertyFunc re_get_property[] = { re_get_general_category, re_get_block, re_get_script, re_get_word_break, re_get_grapheme_cluster_break, re_get_sentence_break, re_get_math, re_get_alphabetic, re_get_lowercase, re_get_uppercase, re_get_cased, re_get_case_ignorable, re_get_changes_when_lowercased, re_get_changes_when_uppercased, re_get_changes_when_titlecased, re_get_changes_when_casefolded, re_get_changes_when_casemapped, re_get_id_start, re_get_id_continue, re_get_xid_start, re_get_xid_continue, re_get_default_ignorable_code_point, re_get_grapheme_extend, re_get_grapheme_base, re_get_grapheme_link, re_get_white_space, re_get_bidi_control, re_get_join_control, re_get_dash, re_get_hyphen, re_get_quotation_mark, re_get_terminal_punctuation, re_get_other_math, re_get_hex_digit, re_get_ascii_hex_digit, re_get_other_alphabetic, re_get_ideographic, re_get_diacritic, re_get_extender, re_get_other_lowercase, re_get_other_uppercase, re_get_noncharacter_code_point, re_get_other_grapheme_extend, re_get_ids_binary_operator, re_get_ids_trinary_operator, re_get_radical, re_get_unified_ideograph, re_get_other_default_ignorable_code_point, re_get_deprecated, re_get_soft_dotted, re_get_logical_order_exception, re_get_other_id_start, re_get_other_id_continue, re_get_sterm, re_get_variation_selector, re_get_pattern_white_space, re_get_pattern_syntax, re_get_hangul_syllable_type, re_get_bidi_class, re_get_canonical_combining_class, re_get_decomposition_type, re_get_east_asian_width, re_get_joining_group, re_get_joining_type, re_get_line_break, re_get_numeric_type, re_get_numeric_value, re_get_bidi_mirrored, re_get_indic_positional_category, re_get_indic_syllabic_category, re_get_alphanumeric, re_get_any, re_get_blank, re_get_graph, re_get_print, re_get_word, re_get_xdigit, re_get_posix_digit, re_get_posix_alnum, re_get_posix_punct, re_get_posix_xdigit, }; regex-2016.01.10/Python2/_regex_unicode.h0000666000000000000000000001640712540663552016061 0ustar 00000000000000typedef unsigned char RE_UINT8; typedef signed char RE_INT8; typedef unsigned short RE_UINT16; typedef signed short RE_INT16; typedef unsigned int RE_UINT32; typedef signed int RE_INT32; typedef unsigned char BOOL; enum {FALSE, TRUE}; #define RE_ASCII_MAX 0x7F #define RE_LOCALE_MAX 0xFF #define RE_UNICODE_MAX 0x10FFFF #define RE_MAX_CASES 4 #define RE_MAX_FOLDED 3 typedef struct RE_Property { RE_UINT16 name; RE_UINT8 id; RE_UINT8 value_set; } RE_Property; typedef struct RE_PropertyValue { RE_UINT16 name; RE_UINT8 value_set; RE_UINT16 id; } RE_PropertyValue; typedef RE_UINT32 (*RE_GetPropertyFunc)(RE_UINT32 ch); #define RE_PROP_GC 0x0 #define RE_PROP_CASED 0xA #define RE_PROP_UPPERCASE 0x9 #define RE_PROP_LOWERCASE 0x8 #define RE_PROP_C 30 #define RE_PROP_L 31 #define RE_PROP_M 32 #define RE_PROP_N 33 #define RE_PROP_P 34 #define RE_PROP_S 35 #define RE_PROP_Z 36 #define RE_PROP_ASSIGNED 38 #define RE_PROP_CASEDLETTER 37 #define RE_PROP_CN 0 #define RE_PROP_LU 1 #define RE_PROP_LL 2 #define RE_PROP_LT 3 #define RE_PROP_LM 4 #define RE_PROP_LO 5 #define RE_PROP_MN 6 #define RE_PROP_ME 7 #define RE_PROP_MC 8 #define RE_PROP_ND 9 #define RE_PROP_NL 10 #define RE_PROP_NO 11 #define RE_PROP_ZS 12 #define RE_PROP_ZL 13 #define RE_PROP_ZP 14 #define RE_PROP_CC 15 #define RE_PROP_CF 16 #define RE_PROP_CO 17 #define RE_PROP_CS 18 #define RE_PROP_PD 19 #define RE_PROP_PS 20 #define RE_PROP_PE 21 #define RE_PROP_PC 22 #define RE_PROP_PO 23 #define RE_PROP_SM 24 #define RE_PROP_SC 25 #define RE_PROP_SK 26 #define RE_PROP_SO 27 #define RE_PROP_PI 28 #define RE_PROP_PF 29 #define RE_PROP_C_MASK 0x00078001 #define RE_PROP_L_MASK 0x0000003E #define RE_PROP_M_MASK 0x000001C0 #define RE_PROP_N_MASK 0x00000E00 #define RE_PROP_P_MASK 0x30F80000 #define RE_PROP_S_MASK 0x0F000000 #define RE_PROP_Z_MASK 0x00007000 #define RE_PROP_ALNUM 0x460001 #define RE_PROP_ALPHA 0x070001 #define RE_PROP_ANY 0x470001 #define RE_PROP_ASCII 0x010001 #define RE_PROP_BLANK 0x480001 #define RE_PROP_CNTRL 0x00000F #define RE_PROP_DIGIT 0x000009 #define RE_PROP_GRAPH 0x490001 #define RE_PROP_LOWER 0x080001 #define RE_PROP_PRINT 0x4A0001 #define RE_PROP_SPACE 0x190001 #define RE_PROP_UPPER 0x090001 #define RE_PROP_WORD 0x4B0001 #define RE_PROP_XDIGIT 0x4C0001 #define RE_PROP_POSIX_ALNUM 0x4E0001 #define RE_PROP_POSIX_DIGIT 0x4D0001 #define RE_PROP_POSIX_PUNCT 0x4F0001 #define RE_PROP_POSIX_XDIGIT 0x500001 #define RE_BREAK_OTHER 0 #define RE_BREAK_DOUBLEQUOTE 1 #define RE_BREAK_SINGLEQUOTE 2 #define RE_BREAK_HEBREWLETTER 3 #define RE_BREAK_CR 4 #define RE_BREAK_LF 5 #define RE_BREAK_NEWLINE 6 #define RE_BREAK_EXTEND 7 #define RE_BREAK_REGIONALINDICATOR 8 #define RE_BREAK_FORMAT 9 #define RE_BREAK_KATAKANA 10 #define RE_BREAK_ALETTER 11 #define RE_BREAK_MIDLETTER 12 #define RE_BREAK_MIDNUM 13 #define RE_BREAK_MIDNUMLET 14 #define RE_BREAK_NUMERIC 15 #define RE_BREAK_EXTENDNUMLET 16 #define RE_GBREAK_OTHER 0 #define RE_GBREAK_CR 1 #define RE_GBREAK_LF 2 #define RE_GBREAK_CONTROL 3 #define RE_GBREAK_EXTEND 4 #define RE_GBREAK_REGIONALINDICATOR 5 #define RE_GBREAK_SPACINGMARK 6 #define RE_GBREAK_L 7 #define RE_GBREAK_V 8 #define RE_GBREAK_T 9 #define RE_GBREAK_LV 10 #define RE_GBREAK_LVT 11 #define RE_GBREAK_PREPEND 12 extern char* re_strings[1296]; extern RE_Property re_properties[147]; extern RE_PropertyValue re_property_values[1412]; extern RE_UINT16 re_expand_on_folding[104]; extern RE_GetPropertyFunc re_get_property[81]; RE_UINT32 re_get_general_category(RE_UINT32 ch); RE_UINT32 re_get_block(RE_UINT32 ch); RE_UINT32 re_get_script(RE_UINT32 ch); RE_UINT32 re_get_word_break(RE_UINT32 ch); RE_UINT32 re_get_grapheme_cluster_break(RE_UINT32 ch); RE_UINT32 re_get_sentence_break(RE_UINT32 ch); RE_UINT32 re_get_math(RE_UINT32 ch); RE_UINT32 re_get_alphabetic(RE_UINT32 ch); RE_UINT32 re_get_lowercase(RE_UINT32 ch); RE_UINT32 re_get_uppercase(RE_UINT32 ch); RE_UINT32 re_get_cased(RE_UINT32 ch); RE_UINT32 re_get_case_ignorable(RE_UINT32 ch); RE_UINT32 re_get_changes_when_lowercased(RE_UINT32 ch); RE_UINT32 re_get_changes_when_uppercased(RE_UINT32 ch); RE_UINT32 re_get_changes_when_titlecased(RE_UINT32 ch); RE_UINT32 re_get_changes_when_casefolded(RE_UINT32 ch); RE_UINT32 re_get_changes_when_casemapped(RE_UINT32 ch); RE_UINT32 re_get_id_start(RE_UINT32 ch); RE_UINT32 re_get_id_continue(RE_UINT32 ch); RE_UINT32 re_get_xid_start(RE_UINT32 ch); RE_UINT32 re_get_xid_continue(RE_UINT32 ch); RE_UINT32 re_get_default_ignorable_code_point(RE_UINT32 ch); RE_UINT32 re_get_grapheme_extend(RE_UINT32 ch); RE_UINT32 re_get_grapheme_base(RE_UINT32 ch); RE_UINT32 re_get_grapheme_link(RE_UINT32 ch); RE_UINT32 re_get_white_space(RE_UINT32 ch); RE_UINT32 re_get_bidi_control(RE_UINT32 ch); RE_UINT32 re_get_join_control(RE_UINT32 ch); RE_UINT32 re_get_dash(RE_UINT32 ch); RE_UINT32 re_get_hyphen(RE_UINT32 ch); RE_UINT32 re_get_quotation_mark(RE_UINT32 ch); RE_UINT32 re_get_terminal_punctuation(RE_UINT32 ch); RE_UINT32 re_get_other_math(RE_UINT32 ch); RE_UINT32 re_get_hex_digit(RE_UINT32 ch); RE_UINT32 re_get_ascii_hex_digit(RE_UINT32 ch); RE_UINT32 re_get_other_alphabetic(RE_UINT32 ch); RE_UINT32 re_get_ideographic(RE_UINT32 ch); RE_UINT32 re_get_diacritic(RE_UINT32 ch); RE_UINT32 re_get_extender(RE_UINT32 ch); RE_UINT32 re_get_other_lowercase(RE_UINT32 ch); RE_UINT32 re_get_other_uppercase(RE_UINT32 ch); RE_UINT32 re_get_noncharacter_code_point(RE_UINT32 ch); RE_UINT32 re_get_other_grapheme_extend(RE_UINT32 ch); RE_UINT32 re_get_ids_binary_operator(RE_UINT32 ch); RE_UINT32 re_get_ids_trinary_operator(RE_UINT32 ch); RE_UINT32 re_get_radical(RE_UINT32 ch); RE_UINT32 re_get_unified_ideograph(RE_UINT32 ch); RE_UINT32 re_get_other_default_ignorable_code_point(RE_UINT32 ch); RE_UINT32 re_get_deprecated(RE_UINT32 ch); RE_UINT32 re_get_soft_dotted(RE_UINT32 ch); RE_UINT32 re_get_logical_order_exception(RE_UINT32 ch); RE_UINT32 re_get_other_id_start(RE_UINT32 ch); RE_UINT32 re_get_other_id_continue(RE_UINT32 ch); RE_UINT32 re_get_sterm(RE_UINT32 ch); RE_UINT32 re_get_variation_selector(RE_UINT32 ch); RE_UINT32 re_get_pattern_white_space(RE_UINT32 ch); RE_UINT32 re_get_pattern_syntax(RE_UINT32 ch); RE_UINT32 re_get_hangul_syllable_type(RE_UINT32 ch); RE_UINT32 re_get_bidi_class(RE_UINT32 ch); RE_UINT32 re_get_canonical_combining_class(RE_UINT32 ch); RE_UINT32 re_get_decomposition_type(RE_UINT32 ch); RE_UINT32 re_get_east_asian_width(RE_UINT32 ch); RE_UINT32 re_get_joining_group(RE_UINT32 ch); RE_UINT32 re_get_joining_type(RE_UINT32 ch); RE_UINT32 re_get_line_break(RE_UINT32 ch); RE_UINT32 re_get_numeric_type(RE_UINT32 ch); RE_UINT32 re_get_numeric_value(RE_UINT32 ch); RE_UINT32 re_get_bidi_mirrored(RE_UINT32 ch); RE_UINT32 re_get_indic_positional_category(RE_UINT32 ch); RE_UINT32 re_get_indic_syllabic_category(RE_UINT32 ch); RE_UINT32 re_get_alphanumeric(RE_UINT32 ch); RE_UINT32 re_get_any(RE_UINT32 ch); RE_UINT32 re_get_blank(RE_UINT32 ch); RE_UINT32 re_get_graph(RE_UINT32 ch); RE_UINT32 re_get_print(RE_UINT32 ch); RE_UINT32 re_get_word(RE_UINT32 ch); RE_UINT32 re_get_xdigit(RE_UINT32 ch); RE_UINT32 re_get_posix_digit(RE_UINT32 ch); RE_UINT32 re_get_posix_alnum(RE_UINT32 ch); RE_UINT32 re_get_posix_punct(RE_UINT32 ch); RE_UINT32 re_get_posix_xdigit(RE_UINT32 ch); int re_get_all_cases(RE_UINT32 ch, RE_UINT32* codepoints); RE_UINT32 re_get_simple_case_folding(RE_UINT32 ch); int re_get_full_case_folding(RE_UINT32 ch, RE_UINT32* codepoints); regex-2016.01.10/Python3/0000777000000000000000000000000012644552200012712 5ustar 00000000000000regex-2016.01.10/Python3/regex.py0000666000000000000000000007303712621677507014425 0ustar 00000000000000# # Secret Labs' Regular Expression Engine # # Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved. # # This version of the SRE library can be redistributed under CNRI's # Python 1.6 license. For any other use, please contact Secret Labs # AB (info@pythonware.com). # # Portions of this engine have been developed in cooperation with # CNRI. Hewlett-Packard provided funding for 1.6 integration and # other compatibility work. # # 2010-01-16 mrab Python front-end re-written and extended r"""Support for regular expressions (RE). This module provides regular expression matching operations similar to those found in Perl. It supports both 8-bit and Unicode strings; both the pattern and the strings being processed can contain null bytes and characters outside the US ASCII range. Regular expressions can contain both special and ordinary characters. Most ordinary characters, like "A", "a", or "0", are the simplest regular expressions; they simply match themselves. You can concatenate ordinary characters, so last matches the string 'last'. There are a few differences between the old (legacy) behaviour and the new (enhanced) behaviour, which are indicated by VERSION0 or VERSION1. The special characters are: "." Matches any character except a newline. "^" Matches the start of the string. "$" Matches the end of the string or just before the newline at the end of the string. "*" Matches 0 or more (greedy) repetitions of the preceding RE. Greedy means that it will match as many repetitions as possible. "+" Matches 1 or more (greedy) repetitions of the preceding RE. "?" Matches 0 or 1 (greedy) of the preceding RE. *?,+?,?? Non-greedy versions of the previous three special characters. *+,++,?+ Possessive versions of the previous three special characters. {m,n} Matches from m to n repetitions of the preceding RE. {m,n}? Non-greedy version of the above. {m,n}+ Possessive version of the above. {...} Fuzzy matching constraints. "\\" Either escapes special characters or signals a special sequence. [...] Indicates a set of characters. A "^" as the first character indicates a complementing set. "|" A|B, creates an RE that will match either A or B. (...) Matches the RE inside the parentheses. The contents are captured and can be retrieved or matched later in the string. (?flags-flags) VERSION1: Sets/clears the flags for the remainder of the group or pattern; VERSION0: Sets the flags for the entire pattern. (?:...) Non-capturing version of regular parentheses. (?>...) Atomic non-capturing version of regular parentheses. (?flags-flags:...) Non-capturing version of regular parentheses with local flags. (?P...) The substring matched by the group is accessible by name. (?...) The substring matched by the group is accessible by name. (?P=name) Matches the text matched earlier by the group named name. (?#...) A comment; ignored. (?=...) Matches if ... matches next, but doesn't consume the string. (?!...) Matches if ... doesn't match next. (?<=...) Matches if preceded by .... (? Matches the text matched by the group named name. \G Matches the empty string, but only at the position where the search started. \K Keeps only what follows for the entire match. \L Named list. The list is provided as a keyword argument. \m Matches the empty string, but only at the start of a word. \M Matches the empty string, but only at the end of a word. \n Matches the newline character. \N{name} Matches the named character. \p{name=value} Matches the character if its property has the specified value. \P{name=value} Matches the character if its property hasn't the specified value. \r Matches the carriage-return character. \s Matches any whitespace character; equivalent to [ \t\n\r\f\v]. \S Matches any non-whitespace character; equivalent to [^\s]. \t Matches the tab character. \uXXXX Matches the Unicode codepoint with 4-digit hex code XXXX. \UXXXXXXXX Matches the Unicode codepoint with 8-digit hex code XXXXXXXX. \v Matches the vertical tab character. \w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_] when matching a bytestring or a Unicode string with the ASCII flag, or the whole range of Unicode alphanumeric characters (letters plus digits plus underscore) when matching a Unicode string. With LOCALE, it will match the set [0-9_] plus characters defined as letters for the current locale. \W Matches the complement of \w; equivalent to [^\w]. \xXX Matches the character with 2-digit hex code XX. \X Matches a grapheme. \Z Matches only at the end of the string. \\ Matches a literal backslash. This module exports the following functions: match Match a regular expression pattern at the beginning of a string. fullmatch Match a regular expression pattern against all of a string. search Search a string for the presence of a pattern. sub Substitute occurrences of a pattern found in a string using a template string. subf Substitute occurrences of a pattern found in a string using a format string. subn Same as sub, but also return the number of substitutions made. subfn Same as subf, but also return the number of substitutions made. split Split a string by the occurrences of a pattern. VERSION1: will split at zero-width match; VERSION0: won't split at zero-width match. splititer Return an iterator yielding the parts of a split string. findall Find all occurrences of a pattern in a string. finditer Return an iterator yielding a match object for each match. compile Compile a pattern into a Pattern object. purge Clear the regular expression cache. escape Backslash all non-alphanumerics or special characters in a string. Most of the functions support a concurrent parameter: if True, the GIL will be released during matching, allowing other Python threads to run concurrently. If the string changes during matching, the behaviour is undefined. This parameter is not needed when working on the builtin (immutable) string classes. Some of the functions in this module take flags as optional parameters. Most of these flags can also be set within an RE: A a ASCII Make \w, \W, \b, \B, \d, and \D match the corresponding ASCII character categories. Default when matching a bytestring. B b BESTMATCH Find the best fuzzy match (default is first). D DEBUG Print the parsed pattern. E e ENHANCEMATCH Attempt to improve the fit after finding the first fuzzy match. F f FULLCASE Use full case-folding when performing case-insensitive matching in Unicode. I i IGNORECASE Perform case-insensitive matching. L L LOCALE Make \w, \W, \b, \B, \d, and \D dependent on the current locale. (One byte per character only.) M m MULTILINE "^" matches the beginning of lines (after a newline) as well as the string. "$" matches the end of lines (before a newline) as well as the end of the string. P p POSIX Perform POSIX-standard matching (leftmost longest). R r REVERSE Searches backwards. S s DOTALL "." matches any character at all, including the newline. U u UNICODE Make \w, \W, \b, \B, \d, and \D dependent on the Unicode locale. Default when matching a Unicode string. V0 V0 VERSION0 Turn on the old legacy behaviour. V1 V1 VERSION1 Turn on the new enhanced behaviour. This flag includes the FULLCASE flag. W w WORD Make \b and \B work with default Unicode word breaks and make ".", "^" and "$" work with Unicode line breaks. X x VERBOSE Ignore whitespace and comments for nicer looking REs. This module also defines an exception 'error'. """ # Public symbols. __all__ = ["compile", "escape", "findall", "finditer", "fullmatch", "match", "purge", "search", "split", "splititer", "sub", "subf", "subfn", "subn", "template", "Scanner", "A", "ASCII", "B", "BESTMATCH", "D", "DEBUG", "E", "ENHANCEMATCH", "S", "DOTALL", "F", "FULLCASE", "I", "IGNORECASE", "L", "LOCALE", "M", "MULTILINE", "P", "POSIX", "R", "REVERSE", "T", "TEMPLATE", "U", "UNICODE", "V0", "VERSION0", "V1", "VERSION1", "X", "VERBOSE", "W", "WORD", "error", "Regex"] __version__ = "2.4.85" # -------------------------------------------------------------------- # Public interface. def match(pattern, string, flags=0, pos=None, endpos=None, partial=False, concurrent=None, **kwargs): """Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.""" return _compile(pattern, flags, kwargs).match(string, pos, endpos, concurrent, partial) def fullmatch(pattern, string, flags=0, pos=None, endpos=None, partial=False, concurrent=None, **kwargs): """Try to apply the pattern against all of the string, returning a match object, or None if no match was found.""" return _compile(pattern, flags, kwargs).fullmatch(string, pos, endpos, concurrent, partial) def search(pattern, string, flags=0, pos=None, endpos=None, partial=False, concurrent=None, **kwargs): """Search through string looking for a match to the pattern, returning a match object, or None if no match was found.""" return _compile(pattern, flags, kwargs).search(string, pos, endpos, concurrent, partial) def sub(pattern, repl, string, count=0, flags=0, pos=None, endpos=None, concurrent=None, **kwargs): """Return the string obtained by replacing the leftmost (or rightmost with a reverse pattern) non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed; if a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags, kwargs).sub(repl, string, count, pos, endpos, concurrent) def subf(pattern, format, string, count=0, flags=0, pos=None, endpos=None, concurrent=None, **kwargs): """Return the string obtained by replacing the leftmost (or rightmost with a reverse pattern) non-overlapping occurrences of the pattern in string by the replacement format. format can be either a string or a callable; if a string, it's treated as a format string; if a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags, kwargs).subf(format, string, count, pos, endpos, concurrent) def subn(pattern, repl, string, count=0, flags=0, pos=None, endpos=None, concurrent=None, **kwargs): """Return a 2-tuple containing (new_string, number). new_string is the string obtained by replacing the leftmost (or rightmost with a reverse pattern) non-overlapping occurrences of the pattern in the source string by the replacement repl. number is the number of substitutions that were made. repl can be either a string or a callable; if a string, backslash escapes in it are processed; if a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags, kwargs).subn(repl, string, count, pos, endpos, concurrent) def subfn(pattern, format, string, count=0, flags=0, pos=None, endpos=None, concurrent=None, **kwargs): """Return a 2-tuple containing (new_string, number). new_string is the string obtained by replacing the leftmost (or rightmost with a reverse pattern) non-overlapping occurrences of the pattern in the source string by the replacement format. number is the number of substitutions that were made. format can be either a string or a callable; if a string, it's treated as a format string; if a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags, kwargs).subfn(format, string, count, pos, endpos, concurrent) def split(pattern, string, maxsplit=0, flags=0, concurrent=None, **kwargs): """Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.""" return _compile(pattern, flags, kwargs).split(string, maxsplit, concurrent) def splititer(pattern, string, maxsplit=0, flags=0, concurrent=None, **kwargs): "Return an iterator yielding the parts of a split string." return _compile(pattern, flags, kwargs).splititer(string, maxsplit, concurrent) def findall(pattern, string, flags=0, pos=None, endpos=None, overlapped=False, concurrent=None, **kwargs): """Return a list of all matches in the string. The matches may be overlapped if overlapped is True. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.""" return _compile(pattern, flags, kwargs).findall(string, pos, endpos, overlapped, concurrent) def finditer(pattern, string, flags=0, pos=None, endpos=None, overlapped=False, partial=False, concurrent=None, **kwargs): """Return an iterator over all matches in the string. The matches may be overlapped if overlapped is True. For each match, the iterator returns a match object. Empty matches are included in the result.""" return _compile(pattern, flags, kwargs).finditer(string, pos, endpos, overlapped, concurrent, partial) def compile(pattern, flags=0, **kwargs): "Compile a regular expression pattern, returning a pattern object." return _compile(pattern, flags, kwargs) def purge(): "Clear the regular expression cache" _cache.clear() _locale_sensitive.clear() def template(pattern, flags=0): "Compile a template pattern, returning a pattern object." return _compile(pattern, flags | TEMPLATE) def escape(pattern, special_only=False): "Escape all non-alphanumeric characters or special characters in pattern." # Convert it to Unicode. if isinstance(pattern, bytes): p = pattern.decode("latin-1") else: p = pattern s = [] if special_only: for c in p: if c in _METACHARS: s.append("\\") s.append(c) elif c == "\x00": s.append("\\000") else: s.append(c) else: for c in p: if c in _ALNUM: s.append(c) elif c == "\x00": s.append("\\000") else: s.append("\\") s.append(c) r = "".join(s) # Convert it back to bytes if necessary. if isinstance(pattern, bytes): r = r.encode("latin-1") return r # -------------------------------------------------------------------- # Internals. import _regex_core import _regex from threading import RLock as _RLock from locale import getlocale as _getlocale from _regex_core import * from _regex_core import (_ALL_VERSIONS, _ALL_ENCODINGS, _FirstSetError, _UnscopedFlagSet, _check_group_features, _compile_firstset, _compile_replacement, _flatten_code, _fold_case, _get_required_string, _parse_pattern, _shrink_cache) from _regex_core import (ALNUM as _ALNUM, Info as _Info, OP as _OP, Source as _Source, Fuzzy as _Fuzzy) # Version 0 is the old behaviour, compatible with the original 're' module. # Version 1 is the new behaviour, which differs slightly. DEFAULT_VERSION = VERSION0 _METACHARS = frozenset("()[]{}?*+|^$\\.") _regex_core.DEFAULT_VERSION = DEFAULT_VERSION # Caches for the patterns and replacements. _cache = {} _cache_lock = _RLock() _named_args = {} _replacement_cache = {} _locale_sensitive = {} # Maximum size of the cache. _MAXCACHE = 500 _MAXREPCACHE = 500 def _compile(pattern, flags=0, kwargs={}): "Compiles a regular expression to a PatternObject." # We won't bother to cache the pattern if we're debugging. debugging = (flags & DEBUG) != 0 # What locale is this pattern using? locale_key = (type(pattern), pattern) if _locale_sensitive.get(locale_key, True) or (flags & LOCALE) != 0: # This pattern is, or might be, locale-sensitive. pattern_locale = _getlocale()[1] else: # This pattern is definitely not locale-sensitive. pattern_locale = None if not debugging: try: # Do we know what keyword arguments are needed? args_key = pattern, type(pattern), flags args_needed = _named_args[args_key] # Are we being provided with its required keyword arguments? args_supplied = set() if args_needed: for k, v in args_needed: try: args_supplied.add((k, frozenset(kwargs[k]))) except KeyError: raise error("missing named list: {!r}".format(k)) args_supplied = frozenset(args_supplied) # Have we already seen this regular expression and named list? pattern_key = (pattern, type(pattern), flags, args_supplied, DEFAULT_VERSION, pattern_locale) return _cache[pattern_key] except KeyError: # It's a new pattern, or new named list for a known pattern. pass # Guess the encoding from the class of the pattern string. if isinstance(pattern, str): guess_encoding = UNICODE elif isinstance(pattern, bytes): guess_encoding = ASCII elif isinstance(pattern, _pattern_type): if flags: raise ValueError("cannot process flags argument with a compiled pattern") return pattern else: raise TypeError("first argument must be a string or compiled pattern") # Set the default version in the core code in case it has been changed. _regex_core.DEFAULT_VERSION = DEFAULT_VERSION global_flags = flags while True: caught_exception = None try: source = _Source(pattern) info = _Info(global_flags, source.char_type, kwargs) info.guess_encoding = guess_encoding source.ignore_space = bool(info.flags & VERBOSE) parsed = _parse_pattern(source, info) break except _UnscopedFlagSet: # Remember the global flags for the next attempt. global_flags = info.global_flags except error as e: caught_exception = e if caught_exception: raise error(caught_exception.msg, caught_exception.pattern, caught_exception.pos) if not source.at_end(): raise error("unbalanced parenthesis", pattern, source.pos) # Check the global flags for conflicts. version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION if version not in (0, VERSION0, VERSION1): raise ValueError("VERSION0 and VERSION1 flags are mutually incompatible") if (info.flags & _ALL_ENCODINGS) not in (0, ASCII, LOCALE, UNICODE): raise ValueError("ASCII, LOCALE and UNICODE flags are mutually incompatible") if isinstance(pattern, bytes) and (info.flags & UNICODE): raise ValueError("cannot use UNICODE flag with a bytes pattern") if not (info.flags & _ALL_ENCODINGS): if isinstance(pattern, str): info.flags |= UNICODE else: info.flags |= ASCII reverse = bool(info.flags & REVERSE) fuzzy = isinstance(parsed, _Fuzzy) # Remember whether this pattern as an inline locale flag. _locale_sensitive[locale_key] = info.inline_locale # Fix the group references. caught_exception = None try: parsed.fix_groups(pattern, reverse, False) except error as e: caught_exception = e if caught_exception: raise error(caught_exception.msg, caught_exception.pattern, caught_exception.pos) # Should we print the parsed pattern? if flags & DEBUG: parsed.dump(indent=0, reverse=reverse) # Optimise the parsed pattern. parsed = parsed.optimise(info) parsed = parsed.pack_characters(info) # Get the required string. req_offset, req_chars, req_flags = _get_required_string(parsed, info.flags) # Build the named lists. named_lists = {} named_list_indexes = [None] * len(info.named_lists_used) args_needed = set() for key, index in info.named_lists_used.items(): name, case_flags = key values = frozenset(kwargs[name]) if case_flags: items = frozenset(_fold_case(info, v) for v in values) else: items = values named_lists[name] = values named_list_indexes[index] = items args_needed.add((name, values)) # Check the features of the groups. _check_group_features(info, parsed) # Compile the parsed pattern. The result is a list of tuples. code = parsed.compile(reverse) # Is there a group call to the pattern as a whole? key = (0, reverse, fuzzy) ref = info.call_refs.get(key) if ref is not None: code = [(_OP.CALL_REF, ref)] + code + [(_OP.END, )] # Add the final 'success' opcode. code += [(_OP.SUCCESS, )] # Compile the additional copies of the groups that we need. for group, rev, fuz in info.additional_groups: code += group.compile(rev, fuz) # Flatten the code into a list of ints. code = _flatten_code(code) if not parsed.has_simple_start(): # Get the first set, if possible. try: fs_code = _compile_firstset(info, parsed.get_firstset(reverse)) fs_code = _flatten_code(fs_code) code = fs_code + code except _FirstSetError: pass # The named capture groups. index_group = dict((v, n) for n, v in info.group_index.items()) # Create the PatternObject. # # Local flags like IGNORECASE affect the code generation, but aren't needed # by the PatternObject itself. Conversely, global flags like LOCALE _don't_ # affect the code generation but _are_ needed by the PatternObject. compiled_pattern = _regex.compile(pattern, info.flags | version, code, info.group_index, index_group, named_lists, named_list_indexes, req_offset, req_chars, req_flags, info.group_count) # Do we need to reduce the size of the cache? if len(_cache) >= _MAXCACHE: with _cache_lock: _shrink_cache(_cache, _named_args, _locale_sensitive, _MAXCACHE) if not debugging: if (info.flags & LOCALE) == 0: pattern_locale = None args_needed = frozenset(args_needed) # Store this regular expression and named list. pattern_key = (pattern, type(pattern), flags, args_needed, DEFAULT_VERSION, pattern_locale) _cache[pattern_key] = compiled_pattern # Store what keyword arguments are needed. _named_args[args_key] = args_needed return compiled_pattern def _compile_replacement_helper(pattern, template): "Compiles a replacement template." # This function is called by the _regex module. # Have we seen this before? key = pattern.pattern, pattern.flags, template compiled = _replacement_cache.get(key) if compiled is not None: return compiled if len(_replacement_cache) >= _MAXREPCACHE: _replacement_cache.clear() is_unicode = isinstance(template, str) source = _Source(template) if is_unicode: def make_string(char_codes): return "".join(chr(c) for c in char_codes) else: def make_string(char_codes): return bytes(char_codes) compiled = [] literal = [] while True: ch = source.get() if not ch: break if ch == "\\": # '_compile_replacement' will return either an int group reference # or a string literal. It returns items (plural) in order to handle # a 2-character literal (an invalid escape sequence). is_group, items = _compile_replacement(source, pattern, is_unicode) if is_group: # It's a group, so first flush the literal. if literal: compiled.append(make_string(literal)) literal = [] compiled.extend(items) else: literal.extend(items) else: literal.append(ord(ch)) # Flush the literal. if literal: compiled.append(make_string(literal)) _replacement_cache[key] = compiled return compiled # We define _pattern_type here after all the support objects have been defined. _pattern_type = type(_compile("", 0, {})) # We'll define an alias for the 'compile' function so that the repr of a # pattern object is eval-able. Regex = compile # Register myself for pickling. import copyreg as _copy_reg def _pickle(p): return _compile, (p.pattern, p.flags) _copy_reg.pickle(_pattern_type, _pickle, _compile) regex-2016.01.10/Python3/test_regex.py0000666000000000000000000052704112624412455015453 0ustar 00000000000000import regex import string from weakref import proxy import unittest import copy from test.support import run_unittest import sys # String subclasses for issue 18468. class StrSubclass(str): def __getitem__(self, index): return StrSubclass(super().__getitem__(index)) class BytesSubclass(bytes): def __getitem__(self, index): return BytesSubclass(super().__getitem__(index)) class RegexTests(unittest.TestCase): PATTERN_CLASS = "" FLAGS_WITH_COMPILED_PAT = "cannot process flags argument with a compiled pattern" INVALID_GROUP_REF = "invalid group reference" MISSING_GT = "missing >" BAD_GROUP_NAME = "bad character in group name" MISSING_GROUP_NAME = "missing group name" MISSING_LT = "missing <" UNKNOWN_GROUP_I = "unknown group" UNKNOWN_GROUP = "unknown group" BAD_ESCAPE = r"bad escape \(end of pattern\)" BAD_OCTAL_ESCAPE = r"bad escape \\" BAD_SET = "unterminated character set" STR_PAT_ON_BYTES = "cannot use a string pattern on a bytes-like object" BYTES_PAT_ON_STR = "cannot use a bytes pattern on a string-like object" STR_PAT_BYTES_TEMPL = "expected str instance, bytes found" BYTES_PAT_STR_TEMPL = "expected a bytes-like object, str found" BYTES_PAT_UNI_FLAG = "cannot use UNICODE flag with a bytes pattern" MIXED_FLAGS = "ASCII, LOCALE and UNICODE flags are mutually incompatible" MISSING_RPAREN = "missing \\)" TRAILING_CHARS = "unbalanced parenthesis" BAD_CHAR_RANGE = "bad character range" NOTHING_TO_REPEAT = "nothing to repeat" MULTIPLE_REPEAT = "multiple repeat" OPEN_GROUP = "cannot refer to an open group" DUPLICATE_GROUP = "duplicate group" CANT_TURN_OFF = "bad inline flags: cannot turn flags off" UNDEF_CHAR_NAME = "undefined character name" def assertTypedEqual(self, actual, expect, msg=None): self.assertEqual(actual, expect, msg) def recurse(actual, expect): if isinstance(expect, (tuple, list)): for x, y in zip(actual, expect): recurse(x, y) else: self.assertIs(type(actual), type(expect), msg) recurse(actual, expect) def test_weakref(self): s = 'QabbbcR' x = regex.compile('ab+c') y = proxy(x) if x.findall('QabbbcR') != y.findall('QabbbcR'): self.fail() def test_search_star_plus(self): self.assertEqual(regex.search('a*', 'xxx').span(0), (0, 0)) self.assertEqual(regex.search('x*', 'axx').span(), (0, 0)) self.assertEqual(regex.search('x+', 'axx').span(0), (1, 3)) self.assertEqual(regex.search('x+', 'axx').span(), (1, 3)) self.assertEqual(regex.search('x', 'aaa'), None) self.assertEqual(regex.match('a*', 'xxx').span(0), (0, 0)) self.assertEqual(regex.match('a*', 'xxx').span(), (0, 0)) self.assertEqual(regex.match('x*', 'xxxa').span(0), (0, 3)) self.assertEqual(regex.match('x*', 'xxxa').span(), (0, 3)) self.assertEqual(regex.match('a+', 'xxx'), None) def bump_num(self, matchobj): int_value = int(matchobj[0]) return str(int_value + 1) def test_basic_regex_sub(self): self.assertEqual(regex.sub("(?i)b+", "x", "bbbb BBBB"), 'x x') self.assertEqual(regex.sub(r'\d+', self.bump_num, '08.2 -2 23x99y'), '9.3 -3 24x100y') self.assertEqual(regex.sub(r'\d+', self.bump_num, '08.2 -2 23x99y', 3), '9.3 -3 23x99y') self.assertEqual(regex.sub('.', lambda m: r"\n", 'x'), "\\n") self.assertEqual(regex.sub('.', r"\n", 'x'), "\n") self.assertEqual(regex.sub('(?Px)', r'\g\g', 'xx'), 'xxxx') self.assertEqual(regex.sub('(?Px)', r'\g\g<1>', 'xx'), 'xxxx') self.assertEqual(regex.sub('(?Px)', r'\g\g', 'xx'), 'xxxx') self.assertEqual(regex.sub('(?Px)', r'\g<1>\g<1>', 'xx'), 'xxxx') self.assertEqual(regex.sub('a', r'\t\n\v\r\f\a\b\B\Z\a\A\w\W\s\S\d\D', 'a'), "\t\n\v\r\f\a\b\\B\\Z\a\\A\\w\\W\\s\\S\\d\\D") self.assertEqual(regex.sub('a', '\t\n\v\r\f\a', 'a'), "\t\n\v\r\f\a") self.assertEqual(regex.sub('a', '\t\n\v\r\f\a', 'a'), chr(9) + chr(10) + chr(11) + chr(13) + chr(12) + chr(7)) self.assertEqual(regex.sub(r'^\s*', 'X', 'test'), 'Xtest') self.assertEqual(regex.sub(r"x", r"\x0A", "x"), "\n") self.assertEqual(regex.sub(r"x", r"\u000A", "x"), "\n") self.assertEqual(regex.sub(r"x", r"\U0000000A", "x"), "\n") self.assertEqual(regex.sub(r"x", r"\N{LATIN CAPITAL LETTER A}", "x"), "A") self.assertEqual(regex.sub(br"x", br"\x0A", b"x"), b"\n") self.assertEqual(regex.sub(br"x", br"\u000A", b"x"), b"\\u000A") self.assertEqual(regex.sub(br"x", br"\U0000000A", b"x"), b"\\U0000000A") self.assertEqual(regex.sub(br"x", br"\N{LATIN CAPITAL LETTER A}", b"x"), b"\\N{LATIN CAPITAL LETTER A}") def test_bug_449964(self): # Fails for group followed by other escape. self.assertEqual(regex.sub(r'(?Px)', r'\g<1>\g<1>\b', 'xx'), "xx\bxx\b") def test_bug_449000(self): # Test for sub() on escaped characters. self.assertEqual(regex.sub(r'\r\n', r'\n', 'abc\r\ndef\r\n'), "abc\ndef\n") self.assertEqual(regex.sub('\r\n', r'\n', 'abc\r\ndef\r\n'), "abc\ndef\n") self.assertEqual(regex.sub(r'\r\n', '\n', 'abc\r\ndef\r\n'), "abc\ndef\n") self.assertEqual(regex.sub('\r\n', '\n', 'abc\r\ndef\r\n'), "abc\ndef\n") def test_bug_1661(self): # Verify that flags do not get silently ignored with compiled patterns pattern = regex.compile('.') self.assertRaisesRegex(ValueError, self.FLAGS_WITH_COMPILED_PAT, lambda: regex.match(pattern, 'A', regex.I)) self.assertRaisesRegex(ValueError, self.FLAGS_WITH_COMPILED_PAT, lambda: regex.search(pattern, 'A', regex.I)) self.assertRaisesRegex(ValueError, self.FLAGS_WITH_COMPILED_PAT, lambda: regex.findall(pattern, 'A', regex.I)) self.assertRaisesRegex(ValueError, self.FLAGS_WITH_COMPILED_PAT, lambda: regex.compile(pattern, regex.I)) def test_bug_3629(self): # A regex that triggered a bug in the sre-code validator self.assertEqual(repr(type(regex.compile("(?P)(?(quote))"))), self.PATTERN_CLASS) def test_sub_template_numeric_escape(self): # Bug 776311 and friends. self.assertEqual(regex.sub('x', r'\0', 'x'), "\0") self.assertEqual(regex.sub('x', r'\000', 'x'), "\000") self.assertEqual(regex.sub('x', r'\001', 'x'), "\001") self.assertEqual(regex.sub('x', r'\008', 'x'), "\0" + "8") self.assertEqual(regex.sub('x', r'\009', 'x'), "\0" + "9") self.assertEqual(regex.sub('x', r'\111', 'x'), "\111") self.assertEqual(regex.sub('x', r'\117', 'x'), "\117") self.assertEqual(regex.sub('x', r'\1111', 'x'), "\1111") self.assertEqual(regex.sub('x', r'\1111', 'x'), "\111" + "1") self.assertEqual(regex.sub('x', r'\00', 'x'), '\x00') self.assertEqual(regex.sub('x', r'\07', 'x'), '\x07') self.assertEqual(regex.sub('x', r'\08', 'x'), "\0" + "8") self.assertEqual(regex.sub('x', r'\09', 'x'), "\0" + "9") self.assertEqual(regex.sub('x', r'\0a', 'x'), "\0" + "a") self.assertEqual(regex.sub('x', r'\400', 'x'), "\u0100") self.assertEqual(regex.sub('x', r'\777', 'x'), "\u01FF") self.assertEqual(regex.sub(b'x', br'\400', b'x'), b"\x00") self.assertEqual(regex.sub(b'x', br'\777', b'x'), b"\xFF") self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\1', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\8', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\9', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\11', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\18', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\1a', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\90', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\99', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\118', 'x')) # r'\11' + '8' self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\11a', 'x')) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\181', 'x')) # r'\18' + '1' self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.sub('x', r'\800', 'x')) # r'\80' + '0' # In Python 2.3 (etc), these loop endlessly in sre_parser.py. self.assertEqual(regex.sub('(((((((((((x)))))))))))', r'\11', 'x'), 'x') self.assertEqual(regex.sub('((((((((((y))))))))))(.)', r'\118', 'xyz'), 'xz8') self.assertEqual(regex.sub('((((((((((y))))))))))(.)', r'\11a', 'xyz'), 'xza') def test_qualified_re_sub(self): self.assertEqual(regex.sub('a', 'b', 'aaaaa'), 'bbbbb') self.assertEqual(regex.sub('a', 'b', 'aaaaa', 1), 'baaaa') def test_bug_114660(self): self.assertEqual(regex.sub(r'(\S)\s+(\S)', r'\1 \2', 'hello there'), 'hello there') def test_bug_462270(self): # Test for empty sub() behaviour, see SF bug #462270 self.assertEqual(regex.sub('(?V0)x*', '-', 'abxd'), '-a-b-d-') self.assertEqual(regex.sub('(?V1)x*', '-', 'abxd'), '-a-b--d-') self.assertEqual(regex.sub('x+', '-', 'abxd'), 'ab-d') def test_bug_14462(self): # chr(255) is a valid identifier in Python 3. group_name = '\xFF' self.assertEqual(regex.search(r'(?P<' + group_name + '>a)', 'abc').group(group_name), 'a') def test_symbolic_refs(self): self.assertRaisesRegex(regex.error, self.MISSING_GT, lambda: regex.sub('(?Px)', r'\gx)', r'\g<', 'xx')) self.assertRaisesRegex(regex.error, self.MISSING_LT, lambda: regex.sub('(?Px)', r'\g', 'xx')) self.assertRaisesRegex(regex.error, self.BAD_GROUP_NAME, lambda: regex.sub('(?Px)', r'\g', 'xx')) self.assertRaisesRegex(regex.error, self.BAD_GROUP_NAME, lambda: regex.sub('(?Px)', r'\g<1a1>', 'xx')) self.assertRaisesRegex(IndexError, self.UNKNOWN_GROUP_I, lambda: regex.sub('(?Px)', r'\g', 'xx')) # The new behaviour of unmatched but valid groups is to treat them like # empty matches in the replacement template, like in Perl. self.assertEqual(regex.sub('(?Px)|(?Py)', r'\g', 'xx'), '') self.assertEqual(regex.sub('(?Px)|(?Py)', r'\2', 'xx'), '') # The old behaviour was to raise it as an IndexError. self.assertRaisesRegex(regex.error, self.BAD_GROUP_NAME, lambda: regex.sub('(?Px)', r'\g<-1>', 'xx')) def test_re_subn(self): self.assertEqual(regex.subn("(?i)b+", "x", "bbbb BBBB"), ('x x', 2)) self.assertEqual(regex.subn("b+", "x", "bbbb BBBB"), ('x BBBB', 1)) self.assertEqual(regex.subn("b+", "x", "xyz"), ('xyz', 0)) self.assertEqual(regex.subn("b*", "x", "xyz"), ('xxxyxzx', 4)) self.assertEqual(regex.subn("b*", "x", "xyz", 2), ('xxxyz', 2)) def test_re_split(self): self.assertEqual(regex.split(":", ":a:b::c"), ['', 'a', 'b', '', 'c']) self.assertEqual(regex.split(":*", ":a:b::c"), ['', 'a', 'b', 'c']) self.assertEqual(regex.split("(:*)", ":a:b::c"), ['', ':', 'a', ':', 'b', '::', 'c']) self.assertEqual(regex.split("(?::*)", ":a:b::c"), ['', 'a', 'b', 'c']) self.assertEqual(regex.split("(:)*", ":a:b::c"), ['', ':', 'a', ':', 'b', ':', 'c']) self.assertEqual(regex.split("([b:]+)", ":a:b::c"), ['', ':', 'a', ':b::', 'c']) self.assertEqual(regex.split("(b)|(:+)", ":a:b::c"), ['', None, ':', 'a', None, ':', '', 'b', None, '', None, '::', 'c']) self.assertEqual(regex.split("(?:b)|(?::+)", ":a:b::c"), ['', 'a', '', '', 'c']) self.assertEqual(regex.split("x", "xaxbxc"), ['', 'a', 'b', 'c']) self.assertEqual([m for m in regex.splititer("x", "xaxbxc")], ['', 'a', 'b', 'c']) self.assertEqual(regex.split("(?r)x", "xaxbxc"), ['c', 'b', 'a', '']) self.assertEqual([m for m in regex.splititer("(?r)x", "xaxbxc")], ['c', 'b', 'a', '']) self.assertEqual(regex.split("(x)|(y)", "xaxbxc"), ['', 'x', None, 'a', 'x', None, 'b', 'x', None, 'c']) self.assertEqual([m for m in regex.splititer("(x)|(y)", "xaxbxc")], ['', 'x', None, 'a', 'x', None, 'b', 'x', None, 'c']) self.assertEqual(regex.split("(?r)(x)|(y)", "xaxbxc"), ['c', 'x', None, 'b', 'x', None, 'a', 'x', None, '']) self.assertEqual([m for m in regex.splititer("(?r)(x)|(y)", "xaxbxc")], ['c', 'x', None, 'b', 'x', None, 'a', 'x', None, '']) self.assertEqual(regex.split(r"(?V1)\b", "a b c"), ['', 'a', ' ', 'b', ' ', 'c', '']) self.assertEqual(regex.split(r"(?V1)\m", "a b c"), ['', 'a ', 'b ', 'c']) self.assertEqual(regex.split(r"(?V1)\M", "a b c"), ['a', ' b', ' c', '']) def test_qualified_re_split(self): self.assertEqual(regex.split(":", ":a:b::c", 2), ['', 'a', 'b::c']) self.assertEqual(regex.split(':', 'a:b:c:d', 2), ['a', 'b', 'c:d']) self.assertEqual(regex.split("(:)", ":a:b::c", 2), ['', ':', 'a', ':', 'b::c']) self.assertEqual(regex.split("(:*)", ":a:b::c", 2), ['', ':', 'a', ':', 'b::c']) def test_re_findall(self): self.assertEqual(regex.findall(":+", "abc"), []) self.assertEqual(regex.findall(":+", "a:b::c:::d"), [':', '::', ':::']) self.assertEqual(regex.findall("(:+)", "a:b::c:::d"), [':', '::', ':::']) self.assertEqual(regex.findall("(:)(:*)", "a:b::c:::d"), [(':', ''), (':', ':'), (':', '::')]) self.assertEqual(regex.findall(r"\((?P.{0,5}?TEST)\)", "(MY TEST)"), ["MY TEST"]) self.assertEqual(regex.findall(r"\((?P.{0,3}?TEST)\)", "(MY TEST)"), ["MY TEST"]) self.assertEqual(regex.findall(r"\((?P.{0,3}?T)\)", "(MY T)"), ["MY T"]) self.assertEqual(regex.findall(r"[^a]{2}[A-Z]", "\n S"), [' S']) self.assertEqual(regex.findall(r"[^a]{2,3}[A-Z]", "\n S"), ['\n S']) self.assertEqual(regex.findall(r"[^a]{2,3}[A-Z]", "\n S"), [' S']) self.assertEqual(regex.findall(r"X(Y[^Y]+?){1,2}( |Q)+DEF", "XYABCYPPQ\nQ DEF"), [('YPPQ\n', ' ')]) self.assertEqual(regex.findall(r"(\nTest(\n+.+?){0,2}?)?\n+End", "\nTest\nxyz\nxyz\nEnd"), [('\nTest\nxyz\nxyz', '\nxyz')]) def test_bug_117612(self): self.assertEqual(regex.findall(r"(a|(b))", "aba"), [('a', ''), ('b', 'b'), ('a', '')]) def test_re_match(self): self.assertEqual(regex.match('a', 'a')[:], ('a',)) self.assertEqual(regex.match('(a)', 'a')[:], ('a', 'a')) self.assertEqual(regex.match(r'(a)', 'a')[0], 'a') self.assertEqual(regex.match(r'(a)', 'a')[1], 'a') self.assertEqual(regex.match(r'(a)', 'a').group(1, 1), ('a', 'a')) pat = regex.compile('((a)|(b))(c)?') self.assertEqual(pat.match('a')[:], ('a', 'a', 'a', None, None)) self.assertEqual(pat.match('b')[:], ('b', 'b', None, 'b', None)) self.assertEqual(pat.match('ac')[:], ('ac', 'a', 'a', None, 'c')) self.assertEqual(pat.match('bc')[:], ('bc', 'b', None, 'b', 'c')) self.assertEqual(pat.match('bc')[:], ('bc', 'b', None, 'b', 'c')) # A single group. m = regex.match('(a)', 'a') self.assertEqual(m.group(), 'a') self.assertEqual(m.group(0), 'a') self.assertEqual(m.group(1), 'a') self.assertEqual(m.group(1, 1), ('a', 'a')) pat = regex.compile('(?:(?Pa)|(?Pb))(?Pc)?') self.assertEqual(pat.match('a').group(1, 2, 3), ('a', None, None)) self.assertEqual(pat.match('b').group('a1', 'b2', 'c3'), (None, 'b', None)) self.assertEqual(pat.match('ac').group(1, 'b2', 3), ('a', None, 'c')) def test_re_groupref_exists(self): self.assertEqual(regex.match(r'^(\()?([^()]+)(?(1)\))$', '(a)')[:], ('(a)', '(', 'a')) self.assertEqual(regex.match(r'^(\()?([^()]+)(?(1)\))$', 'a')[:], ('a', None, 'a')) self.assertEqual(regex.match(r'^(\()?([^()]+)(?(1)\))$', 'a)'), None) self.assertEqual(regex.match(r'^(\()?([^()]+)(?(1)\))$', '(a'), None) self.assertEqual(regex.match('^(?:(a)|c)((?(1)b|d))$', 'ab')[:], ('ab', 'a', 'b')) self.assertEqual(regex.match('^(?:(a)|c)((?(1)b|d))$', 'cd')[:], ('cd', None, 'd')) self.assertEqual(regex.match('^(?:(a)|c)((?(1)|d))$', 'cd')[:], ('cd', None, 'd')) self.assertEqual(regex.match('^(?:(a)|c)((?(1)|d))$', 'a')[:], ('a', 'a', '')) # Tests for bug #1177831: exercise groups other than the first group. p = regex.compile('(?Pa)(?Pb)?((?(g2)c|d))') self.assertEqual(p.match('abc')[:], ('abc', 'a', 'b', 'c')) self.assertEqual(p.match('ad')[:], ('ad', 'a', None, 'd')) self.assertEqual(p.match('abd'), None) self.assertEqual(p.match('ac'), None) def test_re_groupref(self): self.assertEqual(regex.match(r'^(\|)?([^()]+)\1$', '|a|')[:], ('|a|', '|', 'a')) self.assertEqual(regex.match(r'^(\|)?([^()]+)\1?$', 'a')[:], ('a', None, 'a')) self.assertEqual(regex.match(r'^(\|)?([^()]+)\1$', 'a|'), None) self.assertEqual(regex.match(r'^(\|)?([^()]+)\1$', '|a'), None) self.assertEqual(regex.match(r'^(?:(a)|c)(\1)$', 'aa')[:], ('aa', 'a', 'a')) self.assertEqual(regex.match(r'^(?:(a)|c)(\1)?$', 'c')[:], ('c', None, None)) self.assertEqual(regex.findall("(?i)(.{1,40}?),(.{1,40}?)(?:;)+(.{1,80}).{1,40}?\\3(\ |;)+(.{1,80}?)\\1", "TEST, BEST; LEST ; Lest 123 Test, Best"), [('TEST', ' BEST', ' LEST', ' ', '123 ')]) def test_groupdict(self): self.assertEqual(regex.match('(?Pfirst) (?Psecond)', 'first second').groupdict(), {'first': 'first', 'second': 'second'}) def test_expand(self): self.assertEqual(regex.match("(?Pfirst) (?Psecond)", "first second").expand(r"\2 \1 \g \g"), 'second first second first') def test_repeat_minmax(self): self.assertEqual(regex.match(r"^(\w){1}$", "abc"), None) self.assertEqual(regex.match(r"^(\w){1}?$", "abc"), None) self.assertEqual(regex.match(r"^(\w){1,2}$", "abc"), None) self.assertEqual(regex.match(r"^(\w){1,2}?$", "abc"), None) self.assertEqual(regex.match(r"^(\w){3}$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){1,3}$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){1,4}$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){3,4}?$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){3}?$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){1,3}?$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){1,4}?$", "abc")[1], 'c') self.assertEqual(regex.match(r"^(\w){3,4}?$", "abc")[1], 'c') self.assertEqual(regex.match("^x{1}$", "xxx"), None) self.assertEqual(regex.match("^x{1}?$", "xxx"), None) self.assertEqual(regex.match("^x{1,2}$", "xxx"), None) self.assertEqual(regex.match("^x{1,2}?$", "xxx"), None) self.assertEqual(regex.match("^x{1}", "xxx")[0], 'x') self.assertEqual(regex.match("^x{1}?", "xxx")[0], 'x') self.assertEqual(regex.match("^x{0,1}", "xxx")[0], 'x') self.assertEqual(regex.match("^x{0,1}?", "xxx")[0], '') self.assertEqual(bool(regex.match("^x{3}$", "xxx")), True) self.assertEqual(bool(regex.match("^x{1,3}$", "xxx")), True) self.assertEqual(bool(regex.match("^x{1,4}$", "xxx")), True) self.assertEqual(bool(regex.match("^x{3,4}?$", "xxx")), True) self.assertEqual(bool(regex.match("^x{3}?$", "xxx")), True) self.assertEqual(bool(regex.match("^x{1,3}?$", "xxx")), True) self.assertEqual(bool(regex.match("^x{1,4}?$", "xxx")), True) self.assertEqual(bool(regex.match("^x{3,4}?$", "xxx")), True) self.assertEqual(regex.match("^x{}$", "xxx"), None) self.assertEqual(bool(regex.match("^x{}$", "x{}")), True) def test_getattr(self): self.assertEqual(regex.compile("(?i)(a)(b)").pattern, '(?i)(a)(b)') self.assertEqual(regex.compile("(?i)(a)(b)").flags, regex.I | regex.U | regex.DEFAULT_VERSION) self.assertEqual(regex.compile(b"(?i)(a)(b)").flags, regex.A | regex.I | regex.DEFAULT_VERSION) self.assertEqual(regex.compile("(?i)(a)(b)").groups, 2) self.assertEqual(regex.compile("(?i)(a)(b)").groupindex, {}) self.assertEqual(regex.compile("(?i)(?Pa)(?Pb)").groupindex, {'first': 1, 'other': 2}) self.assertEqual(regex.match("(a)", "a").pos, 0) self.assertEqual(regex.match("(a)", "a").endpos, 1) self.assertEqual(regex.search("b(c)", "abcdef").pos, 0) self.assertEqual(regex.search("b(c)", "abcdef").endpos, 6) self.assertEqual(regex.search("b(c)", "abcdef").span(), (1, 3)) self.assertEqual(regex.search("b(c)", "abcdef").span(1), (2, 3)) self.assertEqual(regex.match("(a)", "a").string, 'a') self.assertEqual(regex.match("(a)", "a").regs, ((0, 1), (0, 1))) self.assertEqual(repr(type(regex.match("(a)", "a").re)), self.PATTERN_CLASS) # Issue 14260. p = regex.compile(r'abc(?Pdef)') p.groupindex["n"] = 0 self.assertEqual(p.groupindex["n"], 1) def test_special_escapes(self): self.assertEqual(regex.search(r"\b(b.)\b", "abcd abc bcd bx")[1], 'bx') self.assertEqual(regex.search(r"\B(b.)\B", "abc bcd bc abxd")[1], 'bx') self.assertEqual(regex.search(br"\b(b.)\b", b"abcd abc bcd bx", regex.LOCALE)[1], b'bx') self.assertEqual(regex.search(br"\B(b.)\B", b"abc bcd bc abxd", regex.LOCALE)[1], b'bx') self.assertEqual(regex.search(r"\b(b.)\b", "abcd abc bcd bx", regex.UNICODE)[1], 'bx') self.assertEqual(regex.search(r"\B(b.)\B", "abc bcd bc abxd", regex.UNICODE)[1], 'bx') self.assertEqual(regex.search(r"^abc$", "\nabc\n", regex.M)[0], 'abc') self.assertEqual(regex.search(r"^\Aabc\Z$", "abc", regex.M)[0], 'abc') self.assertEqual(regex.search(r"^\Aabc\Z$", "\nabc\n", regex.M), None) self.assertEqual(regex.search(br"\b(b.)\b", b"abcd abc bcd bx")[1], b'bx') self.assertEqual(regex.search(br"\B(b.)\B", b"abc bcd bc abxd")[1], b'bx') self.assertEqual(regex.search(br"^abc$", b"\nabc\n", regex.M)[0], b'abc') self.assertEqual(regex.search(br"^\Aabc\Z$", b"abc", regex.M)[0], b'abc') self.assertEqual(regex.search(br"^\Aabc\Z$", b"\nabc\n", regex.M), None) self.assertEqual(regex.search(r"\d\D\w\W\s\S", "1aa! a")[0], '1aa! a') self.assertEqual(regex.search(br"\d\D\w\W\s\S", b"1aa! a", regex.LOCALE)[0], b'1aa! a') self.assertEqual(regex.search(r"\d\D\w\W\s\S", "1aa! a", regex.UNICODE)[0], '1aa! a') def test_bigcharset(self): self.assertEqual(regex.match(r"([\u2222\u2223])", "\u2222")[1], '\u2222') self.assertEqual(regex.match(r"([\u2222\u2223])", "\u2222", regex.UNICODE)[1], '\u2222') self.assertEqual("".join(regex.findall(".", "e\xe8\xe9\xea\xeb\u0113\u011b\u0117", flags=regex.UNICODE)), 'e\xe8\xe9\xea\xeb\u0113\u011b\u0117') self.assertEqual("".join(regex.findall(r"[e\xe8\xe9\xea\xeb\u0113\u011b\u0117]", "e\xe8\xe9\xea\xeb\u0113\u011b\u0117", flags=regex.UNICODE)), 'e\xe8\xe9\xea\xeb\u0113\u011b\u0117') self.assertEqual("".join(regex.findall(r"e|\xe8|\xe9|\xea|\xeb|\u0113|\u011b|\u0117", "e\xe8\xe9\xea\xeb\u0113\u011b\u0117", flags=regex.UNICODE)), 'e\xe8\xe9\xea\xeb\u0113\u011b\u0117') def test_anyall(self): self.assertEqual(regex.match("a.b", "a\nb", regex.DOTALL)[0], "a\nb") self.assertEqual(regex.match("a.*b", "a\n\nb", regex.DOTALL)[0], "a\n\nb") def test_non_consuming(self): self.assertEqual(regex.match(r"(a(?=\s[^a]))", "a b")[1], 'a') self.assertEqual(regex.match(r"(a(?=\s[^a]*))", "a b")[1], 'a') self.assertEqual(regex.match(r"(a(?=\s[abc]))", "a b")[1], 'a') self.assertEqual(regex.match(r"(a(?=\s[abc]*))", "a bc")[1], 'a') self.assertEqual(regex.match(r"(a)(?=\s\1)", "a a")[1], 'a') self.assertEqual(regex.match(r"(a)(?=\s\1*)", "a aa")[1], 'a') self.assertEqual(regex.match(r"(a)(?=\s(abc|a))", "a a")[1], 'a') self.assertEqual(regex.match(r"(a(?!\s[^a]))", "a a")[1], 'a') self.assertEqual(regex.match(r"(a(?!\s[abc]))", "a d")[1], 'a') self.assertEqual(regex.match(r"(a)(?!\s\1)", "a b")[1], 'a') self.assertEqual(regex.match(r"(a)(?!\s(abc|a))", "a b")[1], 'a') def test_ignore_case(self): self.assertEqual(regex.match("abc", "ABC", regex.I)[0], 'ABC') self.assertEqual(regex.match(b"abc", b"ABC", regex.I)[0], b'ABC') self.assertEqual(regex.match(r"(a\s[^a]*)", "a bb", regex.I)[1], 'a bb') self.assertEqual(regex.match(r"(a\s[abc])", "a b", regex.I)[1], 'a b') self.assertEqual(regex.match(r"(a\s[abc]*)", "a bb", regex.I)[1], 'a bb') self.assertEqual(regex.match(r"((a)\s\2)", "a a", regex.I)[1], 'a a') self.assertEqual(regex.match(r"((a)\s\2*)", "a aa", regex.I)[1], 'a aa') self.assertEqual(regex.match(r"((a)\s(abc|a))", "a a", regex.I)[1], 'a a') self.assertEqual(regex.match(r"((a)\s(abc|a)*)", "a aa", regex.I)[1], 'a aa') # Issue 3511. self.assertEqual(regex.match(r"[Z-a]", "_").span(), (0, 1)) self.assertEqual(regex.match(r"(?i)[Z-a]", "_").span(), (0, 1)) self.assertEqual(bool(regex.match(r"(?i)nao", "nAo")), True) self.assertEqual(bool(regex.match(r"(?i)n\xE3o", "n\xC3o")), True) self.assertEqual(bool(regex.match(r"(?i)n\xE3o", "N\xC3O")), True) self.assertEqual(bool(regex.match(r"(?i)s", "\u017F")), True) def test_case_folding(self): self.assertEqual(regex.search(r"(?fi)ss", "SS").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)SS", "ss").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)SS", "\N{LATIN SMALL LETTER SHARP S}").span(), (0, 1)) self.assertEqual(regex.search(r"(?fi)\N{LATIN SMALL LETTER SHARP S}", "SS").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)\N{LATIN SMALL LIGATURE ST}", "ST").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)ST", "\N{LATIN SMALL LIGATURE ST}").span(), (0, 1)) self.assertEqual(regex.search(r"(?fi)ST", "\N{LATIN SMALL LIGATURE LONG S T}").span(), (0, 1)) self.assertEqual(regex.search(r"(?fi)SST", "\N{LATIN SMALL LETTER SHARP S}t").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)SST", "s\N{LATIN SMALL LIGATURE LONG S T}").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)SST", "s\N{LATIN SMALL LIGATURE ST}").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)\N{LATIN SMALL LIGATURE ST}", "SST").span(), (1, 3)) self.assertEqual(regex.search(r"(?fi)SST", "s\N{LATIN SMALL LIGATURE ST}").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)FFI", "\N{LATIN SMALL LIGATURE FFI}").span(), (0, 1)) self.assertEqual(regex.search(r"(?fi)FFI", "\N{LATIN SMALL LIGATURE FF}i").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)FFI", "f\N{LATIN SMALL LIGATURE FI}").span(), (0, 2)) self.assertEqual(regex.search(r"(?fi)\N{LATIN SMALL LIGATURE FFI}", "FFI").span(), (0, 3)) self.assertEqual(regex.search(r"(?fi)\N{LATIN SMALL LIGATURE FF}i", "FFI").span(), (0, 3)) self.assertEqual(regex.search(r"(?fi)f\N{LATIN SMALL LIGATURE FI}", "FFI").span(), (0, 3)) sigma = "\u03A3\u03C3\u03C2" for ch1 in sigma: for ch2 in sigma: if not regex.match(r"(?fi)" + ch1, ch2): self.fail() self.assertEqual(bool(regex.search(r"(?iV1)ff", "\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(r"(?iV1)ff", "\uFB01\uFB00")), True) self.assertEqual(bool(regex.search(r"(?iV1)fi", "\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(r"(?iV1)fi", "\uFB01\uFB00")), True) self.assertEqual(bool(regex.search(r"(?iV1)fffi", "\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(r"(?iV1)f\uFB03", "\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(r"(?iV1)ff", "\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(r"(?iV1)fi", "\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(r"(?iV1)fffi", "\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(r"(?iV1)f\uFB03", "\uFB00\uFB01")), True) self.assertEqual(bool(regex.search(r"(?iV1)f\uFB01", "\uFB00i")), True) self.assertEqual(bool(regex.search(r"(?iV1)f\uFB01", "\uFB00i")), True) self.assertEqual(regex.findall(r"(?iV0)\m(?:word){e<=3}\M(?ne", "affine", options=["\N{LATIN SMALL LIGATURE FFI}"]).span(), (0, 6)) self.assertEqual(regex.search(r"(?fi)a\Lne", "a\N{LATIN SMALL LIGATURE FFI}ne", options=["ffi"]).span(), (0, 4)) def test_category(self): self.assertEqual(regex.match(r"(\s)", " ")[1], ' ') def test_not_literal(self): self.assertEqual(regex.search(r"\s([^a])", " b")[1], 'b') self.assertEqual(regex.search(r"\s([^a]*)", " bb")[1], 'bb') def test_search_coverage(self): self.assertEqual(regex.search(r"\s(b)", " b")[1], 'b') self.assertEqual(regex.search(r"a\s", "a ")[0], 'a ') def test_re_escape(self): p = "" self.assertEqual(regex.escape(p), p) for i in range(0, 256): p += chr(i) self.assertEqual(bool(regex.match(regex.escape(chr(i)), chr(i))), True) self.assertEqual(regex.match(regex.escape(chr(i)), chr(i)).span(), (0, 1)) pat = regex.compile(regex.escape(p)) self.assertEqual(pat.match(p).span(), (0, 256)) def test_re_escape_byte(self): p = b"" self.assertEqual(regex.escape(p), p) for i in range(0, 256): b = bytes([i]) p += b self.assertEqual(bool(regex.match(regex.escape(b), b)), True) self.assertEqual(regex.match(regex.escape(b), b).span(), (0, 1)) pat = regex.compile(regex.escape(p)) self.assertEqual(pat.match(p).span(), (0, 256)) def test_constants(self): if regex.I != regex.IGNORECASE: self.fail() if regex.L != regex.LOCALE: self.fail() if regex.M != regex.MULTILINE: self.fail() if regex.S != regex.DOTALL: self.fail() if regex.X != regex.VERBOSE: self.fail() def test_flags(self): for flag in [regex.I, regex.M, regex.X, regex.S, regex.L]: self.assertEqual(repr(type(regex.compile('^pattern$', flag))), self.PATTERN_CLASS) def test_sre_character_literals(self): for i in [0, 8, 16, 32, 64, 127, 128, 255]: self.assertEqual(bool(regex.match(r"\%03o" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"\%03o0" % i, chr(i) + "0")), True) self.assertEqual(bool(regex.match(r"\%03o8" % i, chr(i) + "8")), True) self.assertEqual(bool(regex.match(r"\x%02x" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"\x%02x0" % i, chr(i) + "0")), True) self.assertEqual(bool(regex.match(r"\x%02xz" % i, chr(i) + "z")), True) self.assertRaisesRegex(regex.error, self.INVALID_GROUP_REF, lambda: regex.match(r"\911", "")) def test_sre_character_class_literals(self): for i in [0, 8, 16, 32, 64, 127, 128, 255]: self.assertEqual(bool(regex.match(r"[\%03o]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\%03o0]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\%03o8]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\x%02x]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\x%02x0]" % i, chr(i))), True) self.assertEqual(bool(regex.match(r"[\x%02xz]" % i, chr(i))), True) self.assertRaisesRegex(regex.error, self.BAD_OCTAL_ESCAPE, lambda: regex.match(r"[\911]", "")) def test_bug_113254(self): self.assertEqual(regex.match(r'(a)|(b)', 'b').start(1), -1) self.assertEqual(regex.match(r'(a)|(b)', 'b').end(1), -1) self.assertEqual(regex.match(r'(a)|(b)', 'b').span(1), (-1, -1)) def test_bug_527371(self): # Bug described in patches 527371/672491. self.assertEqual(regex.match(r'(a)?a','a').lastindex, None) self.assertEqual(regex.match(r'(a)(b)?b','ab').lastindex, 1) self.assertEqual(regex.match(r'(?Pa)(?Pb)?b','ab').lastgroup, 'a') self.assertEqual(regex.match("(?Pa(b))", "ab").lastgroup, 'a') self.assertEqual(regex.match("((a))", "a").lastindex, 1) def test_bug_545855(self): # Bug 545855 -- This pattern failed to cause a compile error as it # should, instead provoking a TypeError. self.assertRaisesRegex(regex.error, self.BAD_SET, lambda: regex.compile('foo[a-')) def test_bug_418626(self): # Bugs 418626 at al. -- Testing Greg Chapman's addition of op code # SRE_OP_MIN_REPEAT_ONE for eliminating recursion on simple uses of # pattern '*?' on a long string. self.assertEqual(regex.match('.*?c', 10000 * 'ab' + 'cd').end(0), 20001) self.assertEqual(regex.match('.*?cd', 5000 * 'ab' + 'c' + 5000 * 'ab' + 'cde').end(0), 20003) self.assertEqual(regex.match('.*?cd', 20000 * 'abc' + 'de').end(0), 60001) # Non-simple '*?' still used to hit the recursion limit, before the # non-recursive scheme was implemented. self.assertEqual(regex.search('(a|b)*?c', 10000 * 'ab' + 'cd').end(0), 20001) def test_bug_612074(self): pat = "[" + regex.escape("\u2039") + "]" self.assertEqual(regex.compile(pat) and 1, 1) def test_stack_overflow(self): # Nasty cases that used to overflow the straightforward recursive # implementation of repeated groups. self.assertEqual(regex.match('(x)*', 50000 * 'x')[1], 'x') self.assertEqual(regex.match('(x)*y', 50000 * 'x' + 'y')[1], 'x') self.assertEqual(regex.match('(x)*?y', 50000 * 'x' + 'y')[1], 'x') def test_scanner(self): def s_ident(scanner, token): return token def s_operator(scanner, token): return "op%s" % token def s_float(scanner, token): return float(token) def s_int(scanner, token): return int(token) scanner = regex.Scanner([(r"[a-zA-Z_]\w*", s_ident), (r"\d+\.\d*", s_float), (r"\d+", s_int), (r"=|\+|-|\*|/", s_operator), (r"\s+", None), ]) self.assertEqual(repr(type(scanner.scanner.scanner("").pattern)), self.PATTERN_CLASS) self.assertEqual(scanner.scan("sum = 3*foo + 312.50 + bar"), (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')) def test_bug_448951(self): # Bug 448951 (similar to 429357, but with single char match). # (Also test greedy matches.) for op in '', '?', '*': self.assertEqual(regex.match(r'((.%s):)?z' % op, 'z')[:], ('z', None, None)) self.assertEqual(regex.match(r'((.%s):)?z' % op, 'a:z')[:], ('a:z', 'a:', 'a')) def test_bug_725106(self): # Capturing groups in alternatives in repeats. self.assertEqual(regex.match('^((a)|b)*', 'abc')[:], ('ab', 'b', 'a')) self.assertEqual(regex.match('^(([ab])|c)*', 'abc')[:], ('abc', 'c', 'b')) self.assertEqual(regex.match('^((d)|[ab])*', 'abc')[:], ('ab', 'b', None)) self.assertEqual(regex.match('^((a)c|[ab])*', 'abc')[:], ('ab', 'b', None)) self.assertEqual(regex.match('^((a)|b)*?c', 'abc')[:], ('abc', 'b', 'a')) self.assertEqual(regex.match('^(([ab])|c)*?d', 'abcd')[:], ('abcd', 'c', 'b')) self.assertEqual(regex.match('^((d)|[ab])*?c', 'abc')[:], ('abc', 'b', None)) self.assertEqual(regex.match('^((a)c|[ab])*?c', 'abc')[:], ('abc', 'b', None)) def test_bug_725149(self): # Mark_stack_base restoring before restoring marks. self.assertEqual(regex.match('(a)(?:(?=(b)*)c)*', 'abb')[:], ('a', 'a', None)) self.assertEqual(regex.match('(a)((?!(b)*))*', 'abb')[:], ('a', 'a', None, None)) def test_bug_764548(self): # Bug 764548, regex.compile() barfs on str/unicode subclasses. class my_unicode(str): pass pat = regex.compile(my_unicode("abc")) self.assertEqual(pat.match("xyz"), None) def test_finditer(self): it = regex.finditer(r":+", "a:b::c:::d") self.assertEqual([item[0] for item in it], [':', '::', ':::']) def test_bug_926075(self): if regex.compile('bug_926075') is regex.compile(b'bug_926075'): self.fail() def test_bug_931848(self): pattern = "[\u002E\u3002\uFF0E\uFF61]" self.assertEqual(regex.compile(pattern).split("a.b.c"), ['a', 'b', 'c']) def test_bug_581080(self): it = regex.finditer(r"\s", "a b") self.assertEqual(next(it).span(), (1, 2)) self.assertRaises(StopIteration, lambda: next(it)) scanner = regex.compile(r"\s").scanner("a b") self.assertEqual(scanner.search().span(), (1, 2)) self.assertEqual(scanner.search(), None) def test_bug_817234(self): it = regex.finditer(r".*", "asdf") self.assertEqual(next(it).span(), (0, 4)) self.assertEqual(next(it).span(), (4, 4)) self.assertRaises(StopIteration, lambda: next(it)) def test_empty_array(self): # SF buf 1647541. import array for typecode in 'bBuhHiIlLfd': a = array.array(typecode) self.assertEqual(regex.compile(b"bla").match(a), None) self.assertEqual(regex.compile(b"").match(a)[1 : ], ()) def test_inline_flags(self): # Bug #1700. upper_char = chr(0x1ea0) # Latin Capital Letter A with Dot Below lower_char = chr(0x1ea1) # Latin Small Letter A with Dot Below p = regex.compile(upper_char, regex.I | regex.U) self.assertEqual(bool(p.match(lower_char)), True) p = regex.compile(lower_char, regex.I | regex.U) self.assertEqual(bool(p.match(upper_char)), True) p = regex.compile('(?i)' + upper_char, regex.U) self.assertEqual(bool(p.match(lower_char)), True) p = regex.compile('(?i)' + lower_char, regex.U) self.assertEqual(bool(p.match(upper_char)), True) p = regex.compile('(?iu)' + upper_char) self.assertEqual(bool(p.match(lower_char)), True) p = regex.compile('(?iu)' + lower_char) self.assertEqual(bool(p.match(upper_char)), True) self.assertEqual(bool(regex.match(r"(?i)a", "A")), True) self.assertEqual(bool(regex.match(r"a(?i)", "A")), True) self.assertEqual(bool(regex.match(r"(?iV1)a", "A")), True) self.assertEqual(regex.match(r"a(?iV1)", "A"), None) def test_dollar_matches_twice(self): # $ matches the end of string, and just before the terminating \n. pattern = regex.compile('$') self.assertEqual(pattern.sub('#', 'a\nb\n'), 'a\nb#\n#') self.assertEqual(pattern.sub('#', 'a\nb\nc'), 'a\nb\nc#') self.assertEqual(pattern.sub('#', '\n'), '#\n#') pattern = regex.compile('$', regex.MULTILINE) self.assertEqual(pattern.sub('#', 'a\nb\n' ), 'a#\nb#\n#') self.assertEqual(pattern.sub('#', 'a\nb\nc'), 'a#\nb#\nc#') self.assertEqual(pattern.sub('#', '\n'), '#\n#') def test_bytes_str_mixing(self): # Mixing str and bytes is disallowed. pat = regex.compile('.') bpat = regex.compile(b'.') self.assertRaisesRegex(TypeError, self.STR_PAT_ON_BYTES, lambda: pat.match(b'b')) self.assertRaisesRegex(TypeError, self.BYTES_PAT_ON_STR, lambda: bpat.match('b')) self.assertRaisesRegex(TypeError, self.STR_PAT_BYTES_TEMPL, lambda: pat.sub(b'b', 'c')) self.assertRaisesRegex(TypeError, self.STR_PAT_ON_BYTES, lambda: pat.sub('b', b'c')) self.assertRaisesRegex(TypeError, self.STR_PAT_ON_BYTES, lambda: pat.sub(b'b', b'c')) self.assertRaisesRegex(TypeError, self.BYTES_PAT_ON_STR, lambda: bpat.sub(b'b', 'c')) self.assertRaisesRegex(TypeError, self.BYTES_PAT_STR_TEMPL, lambda: bpat.sub('b', b'c')) self.assertRaisesRegex(TypeError, self.BYTES_PAT_ON_STR, lambda: bpat.sub('b', 'c')) self.assertRaisesRegex(ValueError, self.BYTES_PAT_UNI_FLAG, lambda: regex.compile(b'\w', regex.UNICODE)) self.assertRaisesRegex(ValueError, self.BYTES_PAT_UNI_FLAG, lambda: regex.compile(b'(?u)\w')) self.assertRaisesRegex(ValueError, self.MIXED_FLAGS, lambda: regex.compile('\w', regex.UNICODE | regex.ASCII)) self.assertRaisesRegex(ValueError, self.MIXED_FLAGS, lambda: regex.compile('(?u)\w', regex.ASCII)) self.assertRaisesRegex(ValueError, self.MIXED_FLAGS, lambda: regex.compile('(?a)\w', regex.UNICODE)) self.assertRaisesRegex(ValueError, self.MIXED_FLAGS, lambda: regex.compile('(?au)\w')) def test_ascii_and_unicode_flag(self): # String patterns. for flags in (0, regex.UNICODE): pat = regex.compile('\xc0', flags | regex.IGNORECASE) self.assertEqual(bool(pat.match('\xe0')), True) pat = regex.compile('\w', flags) self.assertEqual(bool(pat.match('\xe0')), True) pat = regex.compile('\xc0', regex.ASCII | regex.IGNORECASE) self.assertEqual(pat.match('\xe0'), None) pat = regex.compile('(?a)\xc0', regex.IGNORECASE) self.assertEqual(pat.match('\xe0'), None) pat = regex.compile('\w', regex.ASCII) self.assertEqual(pat.match('\xe0'), None) pat = regex.compile('(?a)\w') self.assertEqual(pat.match('\xe0'), None) # Bytes patterns. for flags in (0, regex.ASCII): pat = regex.compile(b'\xc0', flags | regex.IGNORECASE) self.assertEqual(pat.match(b'\xe0'), None) pat = regex.compile(b'\w') self.assertEqual(pat.match(b'\xe0'), None) self.assertRaisesRegex(ValueError, self.MIXED_FLAGS, lambda: regex.compile('(?au)\w')) def test_subscripting_match(self): m = regex.match(r'(?\w)', 'xy') if not m: self.fail("Failed: expected match but returned None") elif not m or m[0] != m.group(0) or m[1] != m.group(1): self.fail("Failed") if not m: self.fail("Failed: expected match but returned None") elif m[:] != ('x', 'x'): self.fail("Failed: expected \"('x', 'x')\" but got {} instead".format(ascii(m[:]))) def test_new_named_groups(self): m0 = regex.match(r'(?P\w)', 'x') m1 = regex.match(r'(?\w)', 'x') if not (m0 and m1 and m0[:] == m1[:]): self.fail("Failed") def test_properties(self): self.assertEqual(regex.match(b'(?ai)\xC0', b'\xE0'), None) self.assertEqual(regex.match(br'(?ai)\xC0', b'\xE0'), None) self.assertEqual(regex.match(br'(?a)\w', b'\xE0'), None) self.assertEqual(bool(regex.match(r'\w', '\xE0')), True) # Dropped the following test. It's not possible to determine what the # correct result should be in the general case. # self.assertEqual(bool(regex.match(br'(?L)\w', b'\xE0')), # b'\xE0'.isalnum()) self.assertEqual(bool(regex.match(br'(?L)\d', b'0')), True) self.assertEqual(bool(regex.match(br'(?L)\s', b' ')), True) self.assertEqual(bool(regex.match(br'(?L)\w', b'a')), True) self.assertEqual(regex.match(br'(?L)\d', b'?'), None) self.assertEqual(regex.match(br'(?L)\s', b'?'), None) self.assertEqual(regex.match(br'(?L)\w', b'?'), None) self.assertEqual(regex.match(br'(?L)\D', b'0'), None) self.assertEqual(regex.match(br'(?L)\S', b' '), None) self.assertEqual(regex.match(br'(?L)\W', b'a'), None) self.assertEqual(bool(regex.match(br'(?L)\D', b'?')), True) self.assertEqual(bool(regex.match(br'(?L)\S', b'?')), True) self.assertEqual(bool(regex.match(br'(?L)\W', b'?')), True) self.assertEqual(bool(regex.match(r'\p{Cyrillic}', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'(?i)\p{Cyrillic}', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{IsCyrillic}', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{Script=Cyrillic}', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{InCyrillic}', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{Block=Cyrillic}', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:Cyrillic:]]', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:IsCyrillic:]]', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:Script=Cyrillic:]]', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:InCyrillic:]]', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:Block=Cyrillic:]]', '\N{CYRILLIC CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\P{Cyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\P{IsCyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\P{Script=Cyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\P{InCyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\P{Block=Cyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{^Cyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{^IsCyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{^Script=Cyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{^InCyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\p{^Block=Cyrillic}', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:^Cyrillic:]]', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:^IsCyrillic:]]', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:^Script=Cyrillic:]]', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:^InCyrillic:]]', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'[[:^Block=Cyrillic:]]', '\N{LATIN CAPITAL LETTER A}')), True) self.assertEqual(bool(regex.match(r'\d', '0')), True) self.assertEqual(bool(regex.match(r'\s', ' ')), True) self.assertEqual(bool(regex.match(r'\w', 'A')), True) self.assertEqual(regex.match(r"\d", "?"), None) self.assertEqual(regex.match(r"\s", "?"), None) self.assertEqual(regex.match(r"\w", "?"), None) self.assertEqual(regex.match(r"\D", "0"), None) self.assertEqual(regex.match(r"\S", " "), None) self.assertEqual(regex.match(r"\W", "A"), None) self.assertEqual(bool(regex.match(r'\D', '?')), True) self.assertEqual(bool(regex.match(r'\S', '?')), True) self.assertEqual(bool(regex.match(r'\W', '?')), True) self.assertEqual(bool(regex.match(r'\p{L}', 'A')), True) self.assertEqual(bool(regex.match(r'\p{L}', 'a')), True) self.assertEqual(bool(regex.match(r'\p{Lu}', 'A')), True) self.assertEqual(bool(regex.match(r'\p{Ll}', 'a')), True) self.assertEqual(bool(regex.match(r'(?i)a', 'a')), True) self.assertEqual(bool(regex.match(r'(?i)a', 'A')), True) self.assertEqual(bool(regex.match(r'\w', '0')), True) self.assertEqual(bool(regex.match(r'\w', 'a')), True) self.assertEqual(bool(regex.match(r'\w', '_')), True) self.assertEqual(regex.match(r"\X", "\xE0").span(), (0, 1)) self.assertEqual(regex.match(r"\X", "a\u0300").span(), (0, 2)) self.assertEqual(regex.findall(r"\X", "a\xE0a\u0300e\xE9e\u0301"), ['a', '\xe0', 'a\u0300', 'e', '\xe9', 'e\u0301']) self.assertEqual(regex.findall(r"\X{3}", "a\xE0a\u0300e\xE9e\u0301"), ['a\xe0a\u0300', 'e\xe9e\u0301']) self.assertEqual(regex.findall(r"\X", "\r\r\n\u0301A\u0301"), ['\r', '\r\n', '\u0301', 'A\u0301']) self.assertEqual(bool(regex.match(r'\p{Ll}', 'a')), True) chars_u = "-09AZaz_\u0393\u03b3" chars_b = b"-09AZaz_" word_set = set("Ll Lm Lo Lt Lu Mc Me Mn Nd Nl No Pc".split()) tests = [ (r"\w", chars_u, "09AZaz_\u0393\u03b3"), (r"[[:word:]]", chars_u, "09AZaz_\u0393\u03b3"), (r"\W", chars_u, "-"), (r"[[:^word:]]", chars_u, "-"), (r"\d", chars_u, "09"), (r"[[:digit:]]", chars_u, "09"), (r"\D", chars_u, "-AZaz_\u0393\u03b3"), (r"[[:^digit:]]", chars_u, "-AZaz_\u0393\u03b3"), (r"[[:alpha:]]", chars_u, "AZaz\u0393\u03b3"), (r"[[:^alpha:]]", chars_u, "-09_"), (r"[[:alnum:]]", chars_u, "09AZaz\u0393\u03b3"), (r"[[:^alnum:]]", chars_u, "-_"), (r"[[:xdigit:]]", chars_u, "09Aa"), (r"[[:^xdigit:]]", chars_u, "-Zz_\u0393\u03b3"), (r"\p{InBasicLatin}", "a\xE1", "a"), (r"\P{InBasicLatin}", "a\xE1", "\xE1"), (r"(?i)\p{InBasicLatin}", "a\xE1", "a"), (r"(?i)\P{InBasicLatin}", "a\xE1", "\xE1"), (br"(?L)\w", chars_b, b"09AZaz_"), (br"(?L)[[:word:]]", chars_b, b"09AZaz_"), (br"(?L)\W", chars_b, b"-"), (br"(?L)[[:^word:]]", chars_b, b"-"), (br"(?L)\d", chars_b, b"09"), (br"(?L)[[:digit:]]", chars_b, b"09"), (br"(?L)\D", chars_b, b"-AZaz_"), (br"(?L)[[:^digit:]]", chars_b, b"-AZaz_"), (br"(?L)[[:alpha:]]", chars_b, b"AZaz"), (br"(?L)[[:^alpha:]]", chars_b, b"-09_"), (br"(?L)[[:alnum:]]", chars_b, b"09AZaz"), (br"(?L)[[:^alnum:]]", chars_b, b"-_"), (br"(?L)[[:xdigit:]]", chars_b, b"09Aa"), (br"(?L)[[:^xdigit:]]", chars_b, b"-Zz_"), (br"(?a)\w", chars_b, b"09AZaz_"), (br"(?a)[[:word:]]", chars_b, b"09AZaz_"), (br"(?a)\W", chars_b, b"-"), (br"(?a)[[:^word:]]", chars_b, b"-"), (br"(?a)\d", chars_b, b"09"), (br"(?a)[[:digit:]]", chars_b, b"09"), (br"(?a)\D", chars_b, b"-AZaz_"), (br"(?a)[[:^digit:]]", chars_b, b"-AZaz_"), (br"(?a)[[:alpha:]]", chars_b, b"AZaz"), (br"(?a)[[:^alpha:]]", chars_b, b"-09_"), (br"(?a)[[:alnum:]]", chars_b, b"09AZaz"), (br"(?a)[[:^alnum:]]", chars_b, b"-_"), (br"(?a)[[:xdigit:]]", chars_b, b"09Aa"), (br"(?a)[[:^xdigit:]]", chars_b, b"-Zz_"), ] for pattern, chars, expected in tests: try: if chars[ : 0].join(regex.findall(pattern, chars)) != expected: self.fail("Failed: {}".format(pattern)) except Exception as e: self.fail("Failed: {} raised {}".format(pattern, ascii(e))) self.assertEqual(bool(regex.match(r"\p{NumericValue=0}", "0")), True) self.assertEqual(bool(regex.match(r"\p{NumericValue=1/2}", "\N{VULGAR FRACTION ONE HALF}")), True) self.assertEqual(bool(regex.match(r"\p{NumericValue=0.5}", "\N{VULGAR FRACTION ONE HALF}")), True) def test_word_class(self): self.assertEqual(regex.findall(r"\w+", " \u0939\u093f\u0928\u094d\u0926\u0940,"), ['\u0939\u093f\u0928\u094d\u0926\u0940']) self.assertEqual(regex.findall(r"\W+", " \u0939\u093f\u0928\u094d\u0926\u0940,"), [' ', ',']) self.assertEqual(regex.split(r"(?V1)\b", " \u0939\u093f\u0928\u094d\u0926\u0940,"), [' ', '\u0939\u093f\u0928\u094d\u0926\u0940', ',']) self.assertEqual(regex.split(r"(?V1)\B", " \u0939\u093f\u0928\u094d\u0926\u0940,"), ['', ' \u0939', '\u093f', '\u0928', '\u094d', '\u0926', '\u0940,', '']) def test_search_anchor(self): self.assertEqual(regex.findall(r"\G\w{2}", "abcd ef"), ['ab', 'cd']) def test_search_reverse(self): self.assertEqual(regex.findall(r"(?r).", "abc"), ['c', 'b', 'a']) self.assertEqual(regex.findall(r"(?r).", "abc", overlapped=True), ['c', 'b', 'a']) self.assertEqual(regex.findall(r"(?r)..", "abcde"), ['de', 'bc']) self.assertEqual(regex.findall(r"(?r)..", "abcde", overlapped=True), ['de', 'cd', 'bc', 'ab']) self.assertEqual(regex.findall(r"(?r)(.)(-)(.)", "a-b-c", overlapped=True), [("b", "-", "c"), ("a", "-", "b")]) self.assertEqual([m[0] for m in regex.finditer(r"(?r).", "abc")], ['c', 'b', 'a']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)..", "abcde", overlapped=True)], ['de', 'cd', 'bc', 'ab']) self.assertEqual([m[0] for m in regex.finditer(r"(?r).", "abc")], ['c', 'b', 'a']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)..", "abcde", overlapped=True)], ['de', 'cd', 'bc', 'ab']) self.assertEqual(regex.findall(r"^|\w+", "foo bar"), ['', 'foo', 'bar']) self.assertEqual(regex.findall(r"(?V1)^|\w+", "foo bar"), ['', 'foo', 'bar']) self.assertEqual(regex.findall(r"(?r)^|\w+", "foo bar"), ['bar', 'foo', '']) self.assertEqual(regex.findall(r"(?rV1)^|\w+", "foo bar"), ['bar', 'foo', '']) self.assertEqual([m[0] for m in regex.finditer(r"^|\w+", "foo bar")], ['', 'foo', 'bar']) self.assertEqual([m[0] for m in regex.finditer(r"(?V1)^|\w+", "foo bar")], ['', 'foo', 'bar']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)^|\w+", "foo bar")], ['bar', 'foo', '']) self.assertEqual([m[0] for m in regex.finditer(r"(?rV1)^|\w+", "foo bar")], ['bar', 'foo', '']) self.assertEqual(regex.findall(r"\G\w{2}", "abcd ef"), ['ab', 'cd']) self.assertEqual(regex.findall(r".{2}(?<=\G.*)", "abcd"), ['ab', 'cd']) self.assertEqual(regex.findall(r"(?r)\G\w{2}", "abcd ef"), []) self.assertEqual(regex.findall(r"(?r)\w{2}\G", "abcd ef"), ['ef']) self.assertEqual(regex.findall(r"q*", "qqwe"), ['qq', '', '', '']) self.assertEqual(regex.findall(r"(?V1)q*", "qqwe"), ['qq', '', '', '']) self.assertEqual(regex.findall(r"(?r)q*", "qqwe"), ['', '', 'qq', '']) self.assertEqual(regex.findall(r"(?rV1)q*", "qqwe"), ['', '', 'qq', '']) self.assertEqual(regex.findall(".", "abcd", pos=1, endpos=3), ['b', 'c']) self.assertEqual(regex.findall(".", "abcd", pos=1, endpos=-1), ['b', 'c']) self.assertEqual([m[0] for m in regex.finditer(".", "abcd", pos=1, endpos=3)], ['b', 'c']) self.assertEqual([m[0] for m in regex.finditer(".", "abcd", pos=1, endpos=-1)], ['b', 'c']) self.assertEqual([m[0] for m in regex.finditer("(?r).", "abcd", pos=1, endpos=3)], ['c', 'b']) self.assertEqual([m[0] for m in regex.finditer("(?r).", "abcd", pos=1, endpos=-1)], ['c', 'b']) self.assertEqual(regex.findall("(?r).", "abcd", pos=1, endpos=3), ['c', 'b']) self.assertEqual(regex.findall("(?r).", "abcd", pos=1, endpos=-1), ['c', 'b']) self.assertEqual(regex.findall(r"[ab]", "aB", regex.I), ['a', 'B']) self.assertEqual(regex.findall(r"(?r)[ab]", "aB", regex.I), ['B', 'a']) self.assertEqual(regex.findall(r"(?r).{2}", "abc"), ['bc']) self.assertEqual(regex.findall(r"(?r).{2}", "abc", overlapped=True), ['bc', 'ab']) self.assertEqual(regex.findall(r"(\w+) (\w+)", "first second third fourth fifth"), [('first', 'second'), ('third', 'fourth')]) self.assertEqual(regex.findall(r"(?r)(\w+) (\w+)", "first second third fourth fifth"), [('fourth', 'fifth'), ('second', 'third')]) self.assertEqual([m[0] for m in regex.finditer(r"(?r).{2}", "abc")], ['bc']) self.assertEqual([m[0] for m in regex.finditer(r"(?r).{2}", "abc", overlapped=True)], ['bc', 'ab']) self.assertEqual([m[0] for m in regex.finditer(r"(\w+) (\w+)", "first second third fourth fifth")], ['first second', 'third fourth']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)(\w+) (\w+)", "first second third fourth fifth")], ['fourth fifth', 'second third']) self.assertEqual(regex.search("abcdef", "abcdef").span(), (0, 6)) self.assertEqual(regex.search("(?r)abcdef", "abcdef").span(), (0, 6)) self.assertEqual(regex.search("(?i)abcdef", "ABCDEF").span(), (0, 6)) self.assertEqual(regex.search("(?ir)abcdef", "ABCDEF").span(), (0, 6)) self.assertEqual(regex.sub(r"(.)", r"\1", "abc"), 'abc') self.assertEqual(regex.sub(r"(?r)(.)", r"\1", "abc"), 'abc') def test_atomic(self): # Issue 433030. self.assertEqual(regex.search(r"(?>a*)a", "aa"), None) def test_possessive(self): # Single-character non-possessive. self.assertEqual(regex.search(r"a?a", "a").span(), (0, 1)) self.assertEqual(regex.search(r"a*a", "aaa").span(), (0, 3)) self.assertEqual(regex.search(r"a+a", "aaa").span(), (0, 3)) self.assertEqual(regex.search(r"a{1,3}a", "aaa").span(), (0, 3)) # Multiple-character non-possessive. self.assertEqual(regex.search(r"(?:ab)?ab", "ab").span(), (0, 2)) self.assertEqual(regex.search(r"(?:ab)*ab", "ababab").span(), (0, 6)) self.assertEqual(regex.search(r"(?:ab)+ab", "ababab").span(), (0, 6)) self.assertEqual(regex.search(r"(?:ab){1,3}ab", "ababab").span(), (0, 6)) # Single-character possessive. self.assertEqual(regex.search(r"a?+a", "a"), None) self.assertEqual(regex.search(r"a*+a", "aaa"), None) self.assertEqual(regex.search(r"a++a", "aaa"), None) self.assertEqual(regex.search(r"a{1,3}+a", "aaa"), None) # Multiple-character possessive. self.assertEqual(regex.search(r"(?:ab)?+ab", "ab"), None) self.assertEqual(regex.search(r"(?:ab)*+ab", "ababab"), None) self.assertEqual(regex.search(r"(?:ab)++ab", "ababab"), None) self.assertEqual(regex.search(r"(?:ab){1,3}+ab", "ababab"), None) def test_zerowidth(self): # Issue 3262. self.assertEqual(regex.split(r"\b", "a b"), ['a b']) self.assertEqual(regex.split(r"(?V1)\b", "a b"), ['', 'a', ' ', 'b', '']) # Issue 1647489. self.assertEqual(regex.findall(r"^|\w+", "foo bar"), ['', 'foo', 'bar']) self.assertEqual([m[0] for m in regex.finditer(r"^|\w+", "foo bar")], ['', 'foo', 'bar']) self.assertEqual(regex.findall(r"(?r)^|\w+", "foo bar"), ['bar', 'foo', '']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)^|\w+", "foo bar")], ['bar', 'foo', '']) self.assertEqual(regex.findall(r"(?V1)^|\w+", "foo bar"), ['', 'foo', 'bar']) self.assertEqual([m[0] for m in regex.finditer(r"(?V1)^|\w+", "foo bar")], ['', 'foo', 'bar']) self.assertEqual(regex.findall(r"(?rV1)^|\w+", "foo bar"), ['bar', 'foo', '']) self.assertEqual([m[0] for m in regex.finditer(r"(?rV1)^|\w+", "foo bar")], ['bar', 'foo', '']) self.assertEqual(regex.split("", "xaxbxc"), ['xaxbxc']) self.assertEqual([m for m in regex.splititer("", "xaxbxc")], ['xaxbxc']) self.assertEqual(regex.split("(?r)", "xaxbxc"), ['xaxbxc']) self.assertEqual([m for m in regex.splititer("(?r)", "xaxbxc")], ['xaxbxc']) self.assertEqual(regex.split("(?V1)", "xaxbxc"), ['', 'x', 'a', 'x', 'b', 'x', 'c', '']) self.assertEqual([m for m in regex.splititer("(?V1)", "xaxbxc")], ['', 'x', 'a', 'x', 'b', 'x', 'c', '']) self.assertEqual(regex.split("(?rV1)", "xaxbxc"), ['', 'c', 'x', 'b', 'x', 'a', 'x', '']) self.assertEqual([m for m in regex.splititer("(?rV1)", "xaxbxc")], ['', 'c', 'x', 'b', 'x', 'a', 'x', '']) def test_scoped_and_inline_flags(self): # Issues 433028, 433024, 433027. self.assertEqual(regex.search(r"(?i)Ab", "ab").span(), (0, 2)) self.assertEqual(regex.search(r"(?i:A)b", "ab").span(), (0, 2)) self.assertEqual(regex.search(r"A(?i)b", "ab").span(), (0, 2)) self.assertEqual(regex.search(r"A(?iV1)b", "ab"), None) self.assertRaisesRegex(regex.error, self.CANT_TURN_OFF, lambda: regex.search(r"(?V0-i)Ab", "ab", flags=regex.I)) self.assertEqual(regex.search(r"(?V0)Ab", "ab"), None) self.assertEqual(regex.search(r"(?V1)Ab", "ab"), None) self.assertEqual(regex.search(r"(?V1-i)Ab", "ab", flags=regex.I), None) self.assertEqual(regex.search(r"(?-i:A)b", "ab", flags=regex.I), None) self.assertEqual(regex.search(r"A(?V1-i)b", "ab", flags=regex.I).span(), (0, 2)) def test_repeated_repeats(self): # Issue 2537. self.assertEqual(regex.search(r"(?:a+)+", "aaa").span(), (0, 3)) self.assertEqual(regex.search(r"(?:(?:ab)+c)+", "abcabc").span(), (0, 6)) def test_lookbehind(self): self.assertEqual(regex.search(r"123(?<=a\d+)", "a123").span(), (1, 4)) self.assertEqual(regex.search(r"123(?<=a\d+)", "b123"), None) self.assertEqual(regex.search(r"123(?[ \t]+\r*$)|(?P(?<=[^\n])\Z)') self.assertEqual(pat.subn(lambda m: '<' + m.lastgroup + '>', 'foobar '), ('foobar', 1)) self.assertEqual([m.group() for m in pat.finditer('foobar ')], [' ', '']) pat = regex.compile(r'(?mV1)(?P[ \t]+\r*$)|(?P(?<=[^\n])\Z)') self.assertEqual(pat.subn(lambda m: '<' + m.lastgroup + '>', 'foobar '), ('foobar', 2)) self.assertEqual([m.group() for m in pat.finditer('foobar ')], [' ', '']) def test_overlapped(self): self.assertEqual(regex.findall(r"..", "abcde"), ['ab', 'cd']) self.assertEqual(regex.findall(r"..", "abcde", overlapped=True), ['ab', 'bc', 'cd', 'de']) self.assertEqual(regex.findall(r"(?r)..", "abcde"), ['de', 'bc']) self.assertEqual(regex.findall(r"(?r)..", "abcde", overlapped=True), ['de', 'cd', 'bc', 'ab']) self.assertEqual(regex.findall(r"(.)(-)(.)", "a-b-c", overlapped=True), [("a", "-", "b"), ("b", "-", "c")]) self.assertEqual([m[0] for m in regex.finditer(r"..", "abcde")], ['ab', 'cd']) self.assertEqual([m[0] for m in regex.finditer(r"..", "abcde", overlapped=True)], ['ab', 'bc', 'cd', 'de']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)..", "abcde")], ['de', 'bc']) self.assertEqual([m[0] for m in regex.finditer(r"(?r)..", "abcde", overlapped=True)], ['de', 'cd', 'bc', 'ab']) self.assertEqual([m.groups() for m in regex.finditer(r"(.)(-)(.)", "a-b-c", overlapped=True)], [("a", "-", "b"), ("b", "-", "c")]) self.assertEqual([m.groups() for m in regex.finditer(r"(?r)(.)(-)(.)", "a-b-c", overlapped=True)], [("b", "-", "c"), ("a", "-", "b")]) def test_splititer(self): self.assertEqual(regex.split(r",", "a,b,,c,"), ['a', 'b', '', 'c', '']) self.assertEqual([m for m in regex.splititer(r",", "a,b,,c,")], ['a', 'b', '', 'c', '']) def test_grapheme(self): self.assertEqual(regex.match(r"\X", "\xE0").span(), (0, 1)) self.assertEqual(regex.match(r"\X", "a\u0300").span(), (0, 2)) self.assertEqual(regex.findall(r"\X", "a\xE0a\u0300e\xE9e\u0301"), ['a', '\xe0', 'a\u0300', 'e', '\xe9', 'e\u0301']) self.assertEqual(regex.findall(r"\X{3}", "a\xE0a\u0300e\xE9e\u0301"), ['a\xe0a\u0300', 'e\xe9e\u0301']) self.assertEqual(regex.findall(r"\X", "\r\r\n\u0301A\u0301"), ['\r', '\r\n', '\u0301', 'A\u0301']) def test_word_boundary(self): text = 'The quick ("brown") fox can\'t jump 32.3 feet, right?' self.assertEqual(regex.split(r'(?V1)\b', text), ['', 'The', ' ', 'quick', ' ("', 'brown', '") ', 'fox', ' ', 'can', "'", 't', ' ', 'jump', ' ', '32', '.', '3', ' ', 'feet', ', ', 'right', '?']) self.assertEqual(regex.split(r'(?V1w)\b', text), ['', 'The', ' ', 'quick', ' ', '(', '"', 'brown', '"', ')', ' ', 'fox', ' ', "can't", ' ', 'jump', ' ', '32.3', ' ', 'feet', ',', ' ', 'right', '?', '']) text = "The fox" self.assertEqual(regex.split(r'(?V1)\b', text), ['', 'The', ' ', 'fox', '']) self.assertEqual(regex.split(r'(?V1w)\b', text), ['', 'The', ' ', ' ', 'fox', '']) text = "can't aujourd'hui l'objectif" self.assertEqual(regex.split(r'(?V1)\b', text), ['', 'can', "'", 't', ' ', 'aujourd', "'", 'hui', ' ', 'l', "'", 'objectif', '']) self.assertEqual(regex.split(r'(?V1w)\b', text), ['', "can't", ' ', "aujourd'hui", ' ', "l'", 'objectif', '']) def test_line_boundary(self): self.assertEqual(regex.findall(r".+", "Line 1\nLine 2\n"), ["Line 1", "Line 2"]) self.assertEqual(regex.findall(r".+", "Line 1\rLine 2\r"), ["Line 1\rLine 2\r"]) self.assertEqual(regex.findall(r".+", "Line 1\r\nLine 2\r\n"), ["Line 1\r", "Line 2\r"]) self.assertEqual(regex.findall(r"(?w).+", "Line 1\nLine 2\n"), ["Line 1", "Line 2"]) self.assertEqual(regex.findall(r"(?w).+", "Line 1\rLine 2\r"), ["Line 1", "Line 2"]) self.assertEqual(regex.findall(r"(?w).+", "Line 1\r\nLine 2\r\n"), ["Line 1", "Line 2"]) self.assertEqual(regex.search(r"^abc", "abc").start(), 0) self.assertEqual(regex.search(r"^abc", "\nabc"), None) self.assertEqual(regex.search(r"^abc", "\rabc"), None) self.assertEqual(regex.search(r"(?w)^abc", "abc").start(), 0) self.assertEqual(regex.search(r"(?w)^abc", "\nabc"), None) self.assertEqual(regex.search(r"(?w)^abc", "\rabc"), None) self.assertEqual(regex.search(r"abc$", "abc").start(), 0) self.assertEqual(regex.search(r"abc$", "abc\n").start(), 0) self.assertEqual(regex.search(r"abc$", "abc\r"), None) self.assertEqual(regex.search(r"(?w)abc$", "abc").start(), 0) self.assertEqual(regex.search(r"(?w)abc$", "abc\n").start(), 0) self.assertEqual(regex.search(r"(?w)abc$", "abc\r").start(), 0) self.assertEqual(regex.search(r"(?m)^abc", "abc").start(), 0) self.assertEqual(regex.search(r"(?m)^abc", "\nabc").start(), 1) self.assertEqual(regex.search(r"(?m)^abc", "\rabc"), None) self.assertEqual(regex.search(r"(?mw)^abc", "abc").start(), 0) self.assertEqual(regex.search(r"(?mw)^abc", "\nabc").start(), 1) self.assertEqual(regex.search(r"(?mw)^abc", "\rabc").start(), 1) self.assertEqual(regex.search(r"(?m)abc$", "abc").start(), 0) self.assertEqual(regex.search(r"(?m)abc$", "abc\n").start(), 0) self.assertEqual(regex.search(r"(?m)abc$", "abc\r"), None) self.assertEqual(regex.search(r"(?mw)abc$", "abc").start(), 0) self.assertEqual(regex.search(r"(?mw)abc$", "abc\n").start(), 0) self.assertEqual(regex.search(r"(?mw)abc$", "abc\r").start(), 0) def test_branch_reset(self): self.assertEqual(regex.match(r"(?:(a)|(b))(c)", "ac").groups(), ('a', None, 'c')) self.assertEqual(regex.match(r"(?:(a)|(b))(c)", "bc").groups(), (None, 'b', 'c')) self.assertEqual(regex.match(r"(?:(?a)|(?b))(?c)", "ac").groups(), ('a', None, 'c')) self.assertEqual(regex.match(r"(?:(?a)|(?b))(?c)", "bc").groups(), (None, 'b', 'c')) self.assertEqual(regex.match(r"(?a)(?:(?b)|(?c))(?d)", "abd").groups(), ('a', 'b', None, 'd')) self.assertEqual(regex.match(r"(?a)(?:(?b)|(?c))(?d)", "acd").groups(), ('a', None, 'c', 'd')) self.assertEqual(regex.match(r"(a)(?:(b)|(c))(d)", "abd").groups(), ('a', 'b', None, 'd')) self.assertEqual(regex.match(r"(a)(?:(b)|(c))(d)", "acd").groups(), ('a', None, 'c', 'd')) self.assertEqual(regex.match(r"(a)(?|(b)|(b))(d)", "abd").groups(), ('a', 'b', 'd')) self.assertEqual(regex.match(r"(?|(?a)|(?b))(c)", "ac").groups(), ('a', None, 'c')) self.assertEqual(regex.match(r"(?|(?a)|(?b))(c)", "bc").groups(), (None, 'b', 'c')) self.assertEqual(regex.match(r"(?|(?a)|(?b))(c)", "ac").groups(), ('a', 'c')) self.assertEqual(regex.match(r"(?|(?a)|(?b))(c)", "bc").groups(), ('b', 'c')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(?c)(?d))(e)", "abe").groups(), ('a', 'b', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(?c)(?d))(e)", "cde").groups(), ('d', 'c', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(?c)(d))(e)", "abe").groups(), ('a', 'b', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(?c)(d))(e)", "cde").groups(), ('d', 'c', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(d))(e)", "abe").groups(), ('a', 'b', 'e')) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(d))(e)", "cde").groups(), ('c', 'd', 'e')) # Hg issue 87. self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(?d))(e)", "abe").groups(), ("a", "b", "e")) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(?d))(e)", "abe").capturesdict(), {"a": ["a"], "b": ["b"]}) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(?d))(e)", "cde").groups(), ("d", None, "e")) self.assertEqual(regex.match(r"(?|(?a)(?b)|(c)(?d))(e)", "cde").capturesdict(), {"a": ["c", "d"], "b": []}) def test_set(self): self.assertEqual(regex.match(r"[a]", "a").span(), (0, 1)) self.assertEqual(regex.match(r"(?i)[a]", "A").span(), (0, 1)) self.assertEqual(regex.match(r"[a-b]", r"a").span(), (0, 1)) self.assertEqual(regex.match(r"(?i)[a-b]", r"A").span(), (0, 1)) self.assertEqual(regex.sub(r"(?V0)([][])", r"-", "a[b]c"), "a-b-c") self.assertEqual(regex.findall(r"[\p{Alpha}]", "a0"), ["a"]) self.assertEqual(regex.findall(r"(?i)[\p{Alpha}]", "A0"), ["A"]) self.assertEqual(regex.findall(r"[a\p{Alpha}]", "ab0"), ["a", "b"]) self.assertEqual(regex.findall(r"[a\P{Alpha}]", "ab0"), ["a", "0"]) self.assertEqual(regex.findall(r"(?i)[a\p{Alpha}]", "ab0"), ["a", "b"]) self.assertEqual(regex.findall(r"(?i)[a\P{Alpha}]", "ab0"), ["a", "0"]) self.assertEqual(regex.findall(r"[a-b\p{Alpha}]", "abC0"), ["a", "b", "C"]) self.assertEqual(regex.findall(r"(?i)[a-b\p{Alpha}]", "AbC0"), ["A", "b", "C"]) self.assertEqual(regex.findall(r"[\p{Alpha}]", "a0"), ["a"]) self.assertEqual(regex.findall(r"[\P{Alpha}]", "a0"), ["0"]) self.assertEqual(regex.findall(r"[^\p{Alpha}]", "a0"), ["0"]) self.assertEqual(regex.findall(r"[^\P{Alpha}]", "a0"), ["a"]) self.assertEqual("".join(regex.findall(r"[^\d-h]", "a^b12c-h")), 'a^bc') self.assertEqual("".join(regex.findall(r"[^\dh]", "a^b12c-h")), 'a^bc-') self.assertEqual("".join(regex.findall(r"[^h\s\db]", "a^b 12c-h")), 'a^c-') self.assertEqual("".join(regex.findall(r"[^b\w]", "a b")), ' ') self.assertEqual("".join(regex.findall(r"[^b\S]", "a b")), ' ') self.assertEqual("".join(regex.findall(r"[^8\d]", "a 1b2")), 'a b') all_chars = "".join(chr(c) for c in range(0x100)) self.assertEqual(len(regex.findall(r"\p{ASCII}", all_chars)), 128) self.assertEqual(len(regex.findall(r"\p{Letter}", all_chars)), 117) self.assertEqual(len(regex.findall(r"\p{Digit}", all_chars)), 10) # Set operators self.assertEqual(len(regex.findall(r"(?V1)[\p{ASCII}&&\p{Letter}]", all_chars)), 52) self.assertEqual(len(regex.findall(r"(?V1)[\p{ASCII}&&\p{Alnum}&&\p{Letter}]", all_chars)), 52) self.assertEqual(len(regex.findall(r"(?V1)[\p{ASCII}&&\p{Alnum}&&\p{Digit}]", all_chars)), 10) self.assertEqual(len(regex.findall(r"(?V1)[\p{ASCII}&&\p{Cc}]", all_chars)), 33) self.assertEqual(len(regex.findall(r"(?V1)[\p{ASCII}&&\p{Graph}]", all_chars)), 94) self.assertEqual(len(regex.findall(r"(?V1)[\p{ASCII}--\p{Cc}]", all_chars)), 95) self.assertEqual(len(regex.findall(r"[\p{Letter}\p{Digit}]", all_chars)), 127) self.assertEqual(len(regex.findall(r"(?V1)[\p{Letter}||\p{Digit}]", all_chars)), 127) self.assertEqual(len(regex.findall(r"\p{HexDigit}", all_chars)), 22) self.assertEqual(len(regex.findall(r"(?V1)[\p{HexDigit}~~\p{Digit}]", all_chars)), 12) self.assertEqual(len(regex.findall(r"(?V1)[\p{Digit}~~\p{HexDigit}]", all_chars)), 12) self.assertEqual(repr(type(regex.compile(r"(?V0)([][-])"))), self.PATTERN_CLASS) self.assertEqual(regex.findall(r"(?V1)[[a-z]--[aei]]", "abc"), ["b", "c"]) self.assertEqual(regex.findall(r"(?iV1)[[a-z]--[aei]]", "abc"), ["b", "c"]) self.assertEqual(regex.findall("(?V1)[\w--a]","abc"), ["b", "c"]) self.assertEqual(regex.findall("(?iV1)[\w--a]","abc"), ["b", "c"]) def test_various(self): tests = [ # Test ?P< and ?P= extensions. ('(?Pa)', '', '', regex.error, self.BAD_GROUP_NAME), # Begins with a digit. ('(?Pa)', '', '', regex.error, self.BAD_GROUP_NAME), # Begins with an illegal char. ('(?Pa)', '', '', regex.error, self.BAD_GROUP_NAME), # Begins with an illegal char. # Same tests, for the ?P= form. ('(?Pa)(?P=foo_123', 'aa', '', regex.error, self.MISSING_RPAREN), ('(?Pa)(?P=1)', 'aa', '1', ascii('a')), ('(?Pa)(?P=0)', 'aa', '', regex.error, self.BAD_GROUP_NAME), ('(?Pa)(?P=-1)', 'aa', '', regex.error, self.BAD_GROUP_NAME), ('(?Pa)(?P=!)', 'aa', '', regex.error, self.BAD_GROUP_NAME), ('(?Pa)(?P=foo_124)', 'aa', '', regex.error, self.UNKNOWN_GROUP), # Backref to undefined group. ('(?Pa)', 'a', '1', ascii('a')), ('(?Pa)(?P=foo_123)', 'aa', '1', ascii('a')), # Mal-formed \g in pattern treated as literal for compatibility. (r'(?a)\ga)\g<1>', 'aa', '1', ascii('a')), (r'(?a)\g', 'aa', '', ascii(None)), (r'(?a)\g', 'aa', '', regex.error, self.UNKNOWN_GROUP), # Backref to undefined group. ('(?a)', 'a', '1', ascii('a')), (r'(?a)\g', 'aa', '1', ascii('a')), # Test octal escapes. ('\\1', 'a', '', regex.error, self.INVALID_GROUP_REF), # Backreference. ('[\\1]', '\1', '0', "'\\x01'"), # Character. ('\\09', chr(0) + '9', '0', ascii(chr(0) + '9')), ('\\141', 'a', '0', ascii('a')), ('(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)\\119', 'abcdefghijklk9', '0,11', ascii(('abcdefghijklk9', 'k'))), # Test \0 is handled everywhere. (r'\0', '\0', '0', ascii('\0')), (r'[\0a]', '\0', '0', ascii('\0')), (r'[a\0]', '\0', '0', ascii('\0')), (r'[^a\0]', '\0', '', ascii(None)), # Test various letter escapes. (r'\a[\b]\f\n\r\t\v', '\a\b\f\n\r\t\v', '0', ascii('\a\b\f\n\r\t\v')), (r'[\a][\b][\f][\n][\r][\t][\v]', '\a\b\f\n\r\t\v', '0', ascii('\a\b\f\n\r\t\v')), (r'\c\e\g\h\i\j\k\o\p\q\y\z', 'ceghijkopqyz', '0', ascii('ceghijkopqyz')), (r'\xff', '\377', '0', ascii(chr(255))), # New \x semantics. (r'\x00ffffffffffffff', '\377', '', ascii(None)), (r'\x00f', '\017', '', ascii(None)), (r'\x00fe', '\376', '', ascii(None)), (r'\x00ff', '\377', '', ascii(None)), (r'\t\n\v\r\f\a\g', '\t\n\v\r\f\ag', '0', ascii('\t\n\v\r\f\ag')), ('\t\n\v\r\f\a\g', '\t\n\v\r\f\ag', '0', ascii('\t\n\v\r\f\ag')), (r'\t\n\v\r\f\a', '\t\n\v\r\f\a', '0', ascii(chr(9) + chr(10) + chr(11) + chr(13) + chr(12) + chr(7))), (r'[\t][\n][\v][\r][\f][\b]', '\t\n\v\r\f\b', '0', ascii('\t\n\v\r\f\b')), (r"^\w+=(\\[\000-\277]|[^\n\\])*", "SRC=eval.c g.c blah blah blah \\\\\n\tapes.c", '0', ascii("SRC=eval.c g.c blah blah blah \\\\")), # Test that . only matches \n in DOTALL mode. ('a.b', 'acb', '0', ascii('acb')), ('a.b', 'a\nb', '', ascii(None)), ('a.*b', 'acc\nccb', '', ascii(None)), ('a.{4,5}b', 'acc\nccb', '', ascii(None)), ('a.b', 'a\rb', '0', ascii('a\rb')), # The new behaviour is that the inline flag affects only what follows. ('a.b(?s)', 'a\nb', '0', ascii('a\nb')), ('a.b(?sV1)', 'a\nb', '', ascii(None)), ('(?s)a.b', 'a\nb', '0', ascii('a\nb')), ('a.*(?s)b', 'acc\nccb', '0', ascii('acc\nccb')), ('a.*(?sV1)b', 'acc\nccb', '', ascii(None)), ('(?s)a.*b', 'acc\nccb', '0', ascii('acc\nccb')), ('(?s)a.{4,5}b', 'acc\nccb', '0', ascii('acc\nccb')), (')', '', '', regex.error, self.TRAILING_CHARS), # Unmatched right bracket. ('', '', '0', "''"), # Empty pattern. ('abc', 'abc', '0', ascii('abc')), ('abc', 'xbc', '', ascii(None)), ('abc', 'axc', '', ascii(None)), ('abc', 'abx', '', ascii(None)), ('abc', 'xabcy', '0', ascii('abc')), ('abc', 'ababc', '0', ascii('abc')), ('ab*c', 'abc', '0', ascii('abc')), ('ab*bc', 'abc', '0', ascii('abc')), ('ab*bc', 'abbc', '0', ascii('abbc')), ('ab*bc', 'abbbbc', '0', ascii('abbbbc')), ('ab+bc', 'abbc', '0', ascii('abbc')), ('ab+bc', 'abc', '', ascii(None)), ('ab+bc', 'abq', '', ascii(None)), ('ab+bc', 'abbbbc', '0', ascii('abbbbc')), ('ab?bc', 'abbc', '0', ascii('abbc')), ('ab?bc', 'abc', '0', ascii('abc')), ('ab?bc', 'abbbbc', '', ascii(None)), ('ab?c', 'abc', '0', ascii('abc')), ('^abc$', 'abc', '0', ascii('abc')), ('^abc$', 'abcc', '', ascii(None)), ('^abc', 'abcc', '0', ascii('abc')), ('^abc$', 'aabc', '', ascii(None)), ('abc$', 'aabc', '0', ascii('abc')), ('^', 'abc', '0', ascii('')), ('$', 'abc', '0', ascii('')), ('a.c', 'abc', '0', ascii('abc')), ('a.c', 'axc', '0', ascii('axc')), ('a.*c', 'axyzc', '0', ascii('axyzc')), ('a.*c', 'axyzd', '', ascii(None)), ('a[bc]d', 'abc', '', ascii(None)), ('a[bc]d', 'abd', '0', ascii('abd')), ('a[b-d]e', 'abd', '', ascii(None)), ('a[b-d]e', 'ace', '0', ascii('ace')), ('a[b-d]', 'aac', '0', ascii('ac')), ('a[-b]', 'a-', '0', ascii('a-')), ('a[\\-b]', 'a-', '0', ascii('a-')), ('a[b-]', 'a-', '0', ascii('a-')), ('a[]b', '-', '', regex.error, self.BAD_SET), ('a[', '-', '', regex.error, self.BAD_SET), ('a\\', '-', '', regex.error, self.BAD_ESCAPE), ('abc)', '-', '', regex.error, self.TRAILING_CHARS), ('(abc', '-', '', regex.error, self.MISSING_RPAREN), ('a]', 'a]', '0', ascii('a]')), ('a[]]b', 'a]b', '0', ascii('a]b')), ('a[]]b', 'a]b', '0', ascii('a]b')), ('a[^bc]d', 'aed', '0', ascii('aed')), ('a[^bc]d', 'abd', '', ascii(None)), ('a[^-b]c', 'adc', '0', ascii('adc')), ('a[^-b]c', 'a-c', '', ascii(None)), ('a[^]b]c', 'a]c', '', ascii(None)), ('a[^]b]c', 'adc', '0', ascii('adc')), ('\\ba\\b', 'a-', '0', ascii('a')), ('\\ba\\b', '-a', '0', ascii('a')), ('\\ba\\b', '-a-', '0', ascii('a')), ('\\by\\b', 'xy', '', ascii(None)), ('\\by\\b', 'yz', '', ascii(None)), ('\\by\\b', 'xyz', '', ascii(None)), ('x\\b', 'xyz', '', ascii(None)), ('x\\B', 'xyz', '0', ascii('x')), ('\\Bz', 'xyz', '0', ascii('z')), ('z\\B', 'xyz', '', ascii(None)), ('\\Bx', 'xyz', '', ascii(None)), ('\\Ba\\B', 'a-', '', ascii(None)), ('\\Ba\\B', '-a', '', ascii(None)), ('\\Ba\\B', '-a-', '', ascii(None)), ('\\By\\B', 'xy', '', ascii(None)), ('\\By\\B', 'yz', '', ascii(None)), ('\\By\\b', 'xy', '0', ascii('y')), ('\\by\\B', 'yz', '0', ascii('y')), ('\\By\\B', 'xyz', '0', ascii('y')), ('ab|cd', 'abc', '0', ascii('ab')), ('ab|cd', 'abcd', '0', ascii('ab')), ('()ef', 'def', '0,1', ascii(('ef', ''))), ('$b', 'b', '', ascii(None)), ('a\\(b', 'a(b', '', ascii(('a(b',))), ('a\\(*b', 'ab', '0', ascii('ab')), ('a\\(*b', 'a((b', '0', ascii('a((b')), ('a\\\\b', 'a\\b', '0', ascii('a\\b')), ('((a))', 'abc', '0,1,2', ascii(('a', 'a', 'a'))), ('(a)b(c)', 'abc', '0,1,2', ascii(('abc', 'a', 'c'))), ('a+b+c', 'aabbabc', '0', ascii('abc')), ('(a+|b)*', 'ab', '0,1', ascii(('ab', 'b'))), ('(a+|b)+', 'ab', '0,1', ascii(('ab', 'b'))), ('(a+|b)?', 'ab', '0,1', ascii(('a', 'a'))), (')(', '-', '', regex.error, self.TRAILING_CHARS), ('[^ab]*', 'cde', '0', ascii('cde')), ('abc', '', '', ascii(None)), ('a*', '', '0', ascii('')), ('a|b|c|d|e', 'e', '0', ascii('e')), ('(a|b|c|d|e)f', 'ef', '0,1', ascii(('ef', 'e'))), ('abcd*efg', 'abcdefg', '0', ascii('abcdefg')), ('ab*', 'xabyabbbz', '0', ascii('ab')), ('ab*', 'xayabbbz', '0', ascii('a')), ('(ab|cd)e', 'abcde', '0,1', ascii(('cde', 'cd'))), ('[abhgefdc]ij', 'hij', '0', ascii('hij')), ('^(ab|cd)e', 'abcde', '', ascii(None)), ('(abc|)ef', 'abcdef', '0,1', ascii(('ef', ''))), ('(a|b)c*d', 'abcd', '0,1', ascii(('bcd', 'b'))), ('(ab|ab*)bc', 'abc', '0,1', ascii(('abc', 'a'))), ('a([bc]*)c*', 'abc', '0,1', ascii(('abc', 'bc'))), ('a([bc]*)(c*d)', 'abcd', '0,1,2', ascii(('abcd', 'bc', 'd'))), ('a([bc]+)(c*d)', 'abcd', '0,1,2', ascii(('abcd', 'bc', 'd'))), ('a([bc]*)(c+d)', 'abcd', '0,1,2', ascii(('abcd', 'b', 'cd'))), ('a[bcd]*dcdcde', 'adcdcde', '0', ascii('adcdcde')), ('a[bcd]+dcdcde', 'adcdcde', '', ascii(None)), ('(ab|a)b*c', 'abc', '0,1', ascii(('abc', 'ab'))), ('((a)(b)c)(d)', 'abcd', '1,2,3,4', ascii(('abc', 'a', 'b', 'd'))), ('[a-zA-Z_][a-zA-Z0-9_]*', 'alpha', '0', ascii('alpha')), ('^a(bc+|b[eh])g|.h$', 'abh', '0,1', ascii(('bh', None))), ('(bc+d$|ef*g.|h?i(j|k))', 'effgz', '0,1,2', ascii(('effgz', 'effgz', None))), ('(bc+d$|ef*g.|h?i(j|k))', 'ij', '0,1,2', ascii(('ij', 'ij', 'j'))), ('(bc+d$|ef*g.|h?i(j|k))', 'effg', '', ascii(None)), ('(bc+d$|ef*g.|h?i(j|k))', 'bcdd', '', ascii(None)), ('(bc+d$|ef*g.|h?i(j|k))', 'reffgz', '0,1,2', ascii(('effgz', 'effgz', None))), ('(((((((((a)))))))))', 'a', '0', ascii('a')), ('multiple words of text', 'uh-uh', '', ascii(None)), ('multiple words', 'multiple words, yeah', '0', ascii('multiple words')), ('(.*)c(.*)', 'abcde', '0,1,2', ascii(('abcde', 'ab', 'de'))), ('\\((.*), (.*)\\)', '(a, b)', '2,1', ascii(('b', 'a'))), ('[k]', 'ab', '', ascii(None)), ('a[-]?c', 'ac', '0', ascii('ac')), ('(abc)\\1', 'abcabc', '1', ascii('abc')), ('([a-c]*)\\1', 'abcabc', '1', ascii('abc')), ('^(.+)?B', 'AB', '1', ascii('A')), ('(a+).\\1$', 'aaaaa', '0,1', ascii(('aaaaa', 'aa'))), ('^(a+).\\1$', 'aaaa', '', ascii(None)), ('(abc)\\1', 'abcabc', '0,1', ascii(('abcabc', 'abc'))), ('([a-c]+)\\1', 'abcabc', '0,1', ascii(('abcabc', 'abc'))), ('(a)\\1', 'aa', '0,1', ascii(('aa', 'a'))), ('(a+)\\1', 'aa', '0,1', ascii(('aa', 'a'))), ('(a+)+\\1', 'aa', '0,1', ascii(('aa', 'a'))), ('(a).+\\1', 'aba', '0,1', ascii(('aba', 'a'))), ('(a)ba*\\1', 'aba', '0,1', ascii(('aba', 'a'))), ('(aa|a)a\\1$', 'aaa', '0,1', ascii(('aaa', 'a'))), ('(a|aa)a\\1$', 'aaa', '0,1', ascii(('aaa', 'a'))), ('(a+)a\\1$', 'aaa', '0,1', ascii(('aaa', 'a'))), ('([abc]*)\\1', 'abcabc', '0,1', ascii(('abcabc', 'abc'))), ('(a)(b)c|ab', 'ab', '0,1,2', ascii(('ab', None, None))), ('(a)+x', 'aaax', '0,1', ascii(('aaax', 'a'))), ('([ac])+x', 'aacx', '0,1', ascii(('aacx', 'c'))), ('([^/]*/)*sub1/', 'd:msgs/tdir/sub1/trial/away.cpp', '0,1', ascii(('d:msgs/tdir/sub1/', 'tdir/'))), ('([^.]*)\\.([^:]*):[T ]+(.*)', 'track1.title:TBlah blah blah', '0,1,2,3', ascii(('track1.title:TBlah blah blah', 'track1', 'title', 'Blah blah blah'))), ('([^N]*N)+', 'abNNxyzN', '0,1', ascii(('abNNxyzN', 'xyzN'))), ('([^N]*N)+', 'abNNxyz', '0,1', ascii(('abNN', 'N'))), ('([abc]*)x', 'abcx', '0,1', ascii(('abcx', 'abc'))), ('([abc]*)x', 'abc', '', ascii(None)), ('([xyz]*)x', 'abcx', '0,1', ascii(('x', ''))), ('(a)+b|aac', 'aac', '0,1', ascii(('aac', None))), # Test symbolic groups. ('(?Paaa)a', 'aaaa', '', regex.error, self.BAD_GROUP_NAME), ('(?Paaa)a', 'aaaa', '0,id', ascii(('aaaa', 'aaa'))), ('(?Paa)(?P=id)', 'aaaa', '0,id', ascii(('aaaa', 'aa'))), ('(?Paa)(?P=xd)', 'aaaa', '', regex.error, self.UNKNOWN_GROUP), # Character properties. (r"\g", "g", '0', ascii('g')), (r"\g<1>", "g", '', regex.error, self.INVALID_GROUP_REF), (r"(.)\g<1>", "gg", '0', ascii('gg')), (r"(.)\g<1>", "gg", '', ascii(('gg', 'g'))), (r"\N", "N", '0', ascii('N')), (r"\N{LATIN SMALL LETTER A}", "a", '0', ascii('a')), (r"\p", "p", '0', ascii('p')), (r"\p{Ll}", "a", '0', ascii('a')), (r"\P", "P", '0', ascii('P')), (r"\P{Lu}", "p", '0', ascii('p')), # All tests from Perl. ('abc', 'abc', '0', ascii('abc')), ('abc', 'xbc', '', ascii(None)), ('abc', 'axc', '', ascii(None)), ('abc', 'abx', '', ascii(None)), ('abc', 'xabcy', '0', ascii('abc')), ('abc', 'ababc', '0', ascii('abc')), ('ab*c', 'abc', '0', ascii('abc')), ('ab*bc', 'abc', '0', ascii('abc')), ('ab*bc', 'abbc', '0', ascii('abbc')), ('ab*bc', 'abbbbc', '0', ascii('abbbbc')), ('ab{0,}bc', 'abbbbc', '0', ascii('abbbbc')), ('ab+bc', 'abbc', '0', ascii('abbc')), ('ab+bc', 'abc', '', ascii(None)), ('ab+bc', 'abq', '', ascii(None)), ('ab{1,}bc', 'abq', '', ascii(None)), ('ab+bc', 'abbbbc', '0', ascii('abbbbc')), ('ab{1,}bc', 'abbbbc', '0', ascii('abbbbc')), ('ab{1,3}bc', 'abbbbc', '0', ascii('abbbbc')), ('ab{3,4}bc', 'abbbbc', '0', ascii('abbbbc')), ('ab{4,5}bc', 'abbbbc', '', ascii(None)), ('ab?bc', 'abbc', '0', ascii('abbc')), ('ab?bc', 'abc', '0', ascii('abc')), ('ab{0,1}bc', 'abc', '0', ascii('abc')), ('ab?bc', 'abbbbc', '', ascii(None)), ('ab?c', 'abc', '0', ascii('abc')), ('ab{0,1}c', 'abc', '0', ascii('abc')), ('^abc$', 'abc', '0', ascii('abc')), ('^abc$', 'abcc', '', ascii(None)), ('^abc', 'abcc', '0', ascii('abc')), ('^abc$', 'aabc', '', ascii(None)), ('abc$', 'aabc', '0', ascii('abc')), ('^', 'abc', '0', ascii('')), ('$', 'abc', '0', ascii('')), ('a.c', 'abc', '0', ascii('abc')), ('a.c', 'axc', '0', ascii('axc')), ('a.*c', 'axyzc', '0', ascii('axyzc')), ('a.*c', 'axyzd', '', ascii(None)), ('a[bc]d', 'abc', '', ascii(None)), ('a[bc]d', 'abd', '0', ascii('abd')), ('a[b-d]e', 'abd', '', ascii(None)), ('a[b-d]e', 'ace', '0', ascii('ace')), ('a[b-d]', 'aac', '0', ascii('ac')), ('a[-b]', 'a-', '0', ascii('a-')), ('a[b-]', 'a-', '0', ascii('a-')), ('a[b-a]', '-', '', regex.error, self.BAD_CHAR_RANGE), ('a[]b', '-', '', regex.error, self.BAD_SET), ('a[', '-', '', regex.error, self.BAD_SET), ('a]', 'a]', '0', ascii('a]')), ('a[]]b', 'a]b', '0', ascii('a]b')), ('a[^bc]d', 'aed', '0', ascii('aed')), ('a[^bc]d', 'abd', '', ascii(None)), ('a[^-b]c', 'adc', '0', ascii('adc')), ('a[^-b]c', 'a-c', '', ascii(None)), ('a[^]b]c', 'a]c', '', ascii(None)), ('a[^]b]c', 'adc', '0', ascii('adc')), ('ab|cd', 'abc', '0', ascii('ab')), ('ab|cd', 'abcd', '0', ascii('ab')), ('()ef', 'def', '0,1', ascii(('ef', ''))), ('*a', '-', '', regex.error, self.NOTHING_TO_REPEAT), ('(*)b', '-', '', regex.error, self.NOTHING_TO_REPEAT), ('$b', 'b', '', ascii(None)), ('a\\', '-', '', regex.error, self.BAD_ESCAPE), ('a\\(b', 'a(b', '', ascii(('a(b',))), ('a\\(*b', 'ab', '0', ascii('ab')), ('a\\(*b', 'a((b', '0', ascii('a((b')), ('a\\\\b', 'a\\b', '0', ascii('a\\b')), ('abc)', '-', '', regex.error, self.TRAILING_CHARS), ('(abc', '-', '', regex.error, self.MISSING_RPAREN), ('((a))', 'abc', '0,1,2', ascii(('a', 'a', 'a'))), ('(a)b(c)', 'abc', '0,1,2', ascii(('abc', 'a', 'c'))), ('a+b+c', 'aabbabc', '0', ascii('abc')), ('a{1,}b{1,}c', 'aabbabc', '0', ascii('abc')), ('a**', '-', '', regex.error, self.MULTIPLE_REPEAT), ('a.+?c', 'abcabc', '0', ascii('abc')), ('(a+|b)*', 'ab', '0,1', ascii(('ab', 'b'))), ('(a+|b){0,}', 'ab', '0,1', ascii(('ab', 'b'))), ('(a+|b)+', 'ab', '0,1', ascii(('ab', 'b'))), ('(a+|b){1,}', 'ab', '0,1', ascii(('ab', 'b'))), ('(a+|b)?', 'ab', '0,1', ascii(('a', 'a'))), ('(a+|b){0,1}', 'ab', '0,1', ascii(('a', 'a'))), (')(', '-', '', regex.error, self.TRAILING_CHARS), ('[^ab]*', 'cde', '0', ascii('cde')), ('abc', '', '', ascii(None)), ('a*', '', '0', ascii('')), ('([abc])*d', 'abbbcd', '0,1', ascii(('abbbcd', 'c'))), ('([abc])*bcd', 'abcd', '0,1', ascii(('abcd', 'a'))), ('a|b|c|d|e', 'e', '0', ascii('e')), ('(a|b|c|d|e)f', 'ef', '0,1', ascii(('ef', 'e'))), ('abcd*efg', 'abcdefg', '0', ascii('abcdefg')), ('ab*', 'xabyabbbz', '0', ascii('ab')), ('ab*', 'xayabbbz', '0', ascii('a')), ('(ab|cd)e', 'abcde', '0,1', ascii(('cde', 'cd'))), ('[abhgefdc]ij', 'hij', '0', ascii('hij')), ('^(ab|cd)e', 'abcde', '', ascii(None)), ('(abc|)ef', 'abcdef', '0,1', ascii(('ef', ''))), ('(a|b)c*d', 'abcd', '0,1', ascii(('bcd', 'b'))), ('(ab|ab*)bc', 'abc', '0,1', ascii(('abc', 'a'))), ('a([bc]*)c*', 'abc', '0,1', ascii(('abc', 'bc'))), ('a([bc]*)(c*d)', 'abcd', '0,1,2', ascii(('abcd', 'bc', 'd'))), ('a([bc]+)(c*d)', 'abcd', '0,1,2', ascii(('abcd', 'bc', 'd'))), ('a([bc]*)(c+d)', 'abcd', '0,1,2', ascii(('abcd', 'b', 'cd'))), ('a[bcd]*dcdcde', 'adcdcde', '0', ascii('adcdcde')), ('a[bcd]+dcdcde', 'adcdcde', '', ascii(None)), ('(ab|a)b*c', 'abc', '0,1', ascii(('abc', 'ab'))), ('((a)(b)c)(d)', 'abcd', '1,2,3,4', ascii(('abc', 'a', 'b', 'd'))), ('[a-zA-Z_][a-zA-Z0-9_]*', 'alpha', '0', ascii('alpha')), ('^a(bc+|b[eh])g|.h$', 'abh', '0,1', ascii(('bh', None))), ('(bc+d$|ef*g.|h?i(j|k))', 'effgz', '0,1,2', ascii(('effgz', 'effgz', None))), ('(bc+d$|ef*g.|h?i(j|k))', 'ij', '0,1,2', ascii(('ij', 'ij', 'j'))), ('(bc+d$|ef*g.|h?i(j|k))', 'effg', '', ascii(None)), ('(bc+d$|ef*g.|h?i(j|k))', 'bcdd', '', ascii(None)), ('(bc+d$|ef*g.|h?i(j|k))', 'reffgz', '0,1,2', ascii(('effgz', 'effgz', None))), ('((((((((((a))))))))))', 'a', '10', ascii('a')), ('((((((((((a))))))))))\\10', 'aa', '0', ascii('aa')), # Python does not have the same rules for \\41 so this is a syntax error # ('((((((((((a))))))))))\\41', 'aa', '', ascii(None)), # ('((((((((((a))))))))))\\41', 'a!', '0', ascii('a!')), ('((((((((((a))))))))))\\41', '', '', regex.error, self.INVALID_GROUP_REF), ('(?i)((((((((((a))))))))))\\41', '', '', regex.error, self.INVALID_GROUP_REF), ('(((((((((a)))))))))', 'a', '0', ascii('a')), ('multiple words of text', 'uh-uh', '', ascii(None)), ('multiple words', 'multiple words, yeah', '0', ascii('multiple words')), ('(.*)c(.*)', 'abcde', '0,1,2', ascii(('abcde', 'ab', 'de'))), ('\\((.*), (.*)\\)', '(a, b)', '2,1', ascii(('b', 'a'))), ('[k]', 'ab', '', ascii(None)), ('a[-]?c', 'ac', '0', ascii('ac')), ('(abc)\\1', 'abcabc', '1', ascii('abc')), ('([a-c]*)\\1', 'abcabc', '1', ascii('abc')), ('(?i)abc', 'ABC', '0', ascii('ABC')), ('(?i)abc', 'XBC', '', ascii(None)), ('(?i)abc', 'AXC', '', ascii(None)), ('(?i)abc', 'ABX', '', ascii(None)), ('(?i)abc', 'XABCY', '0', ascii('ABC')), ('(?i)abc', 'ABABC', '0', ascii('ABC')), ('(?i)ab*c', 'ABC', '0', ascii('ABC')), ('(?i)ab*bc', 'ABC', '0', ascii('ABC')), ('(?i)ab*bc', 'ABBC', '0', ascii('ABBC')), ('(?i)ab*?bc', 'ABBBBC', '0', ascii('ABBBBC')), ('(?i)ab{0,}?bc', 'ABBBBC', '0', ascii('ABBBBC')), ('(?i)ab+?bc', 'ABBC', '0', ascii('ABBC')), ('(?i)ab+bc', 'ABC', '', ascii(None)), ('(?i)ab+bc', 'ABQ', '', ascii(None)), ('(?i)ab{1,}bc', 'ABQ', '', ascii(None)), ('(?i)ab+bc', 'ABBBBC', '0', ascii('ABBBBC')), ('(?i)ab{1,}?bc', 'ABBBBC', '0', ascii('ABBBBC')), ('(?i)ab{1,3}?bc', 'ABBBBC', '0', ascii('ABBBBC')), ('(?i)ab{3,4}?bc', 'ABBBBC', '0', ascii('ABBBBC')), ('(?i)ab{4,5}?bc', 'ABBBBC', '', ascii(None)), ('(?i)ab??bc', 'ABBC', '0', ascii('ABBC')), ('(?i)ab??bc', 'ABC', '0', ascii('ABC')), ('(?i)ab{0,1}?bc', 'ABC', '0', ascii('ABC')), ('(?i)ab??bc', 'ABBBBC', '', ascii(None)), ('(?i)ab??c', 'ABC', '0', ascii('ABC')), ('(?i)ab{0,1}?c', 'ABC', '0', ascii('ABC')), ('(?i)^abc$', 'ABC', '0', ascii('ABC')), ('(?i)^abc$', 'ABCC', '', ascii(None)), ('(?i)^abc', 'ABCC', '0', ascii('ABC')), ('(?i)^abc$', 'AABC', '', ascii(None)), ('(?i)abc$', 'AABC', '0', ascii('ABC')), ('(?i)^', 'ABC', '0', ascii('')), ('(?i)$', 'ABC', '0', ascii('')), ('(?i)a.c', 'ABC', '0', ascii('ABC')), ('(?i)a.c', 'AXC', '0', ascii('AXC')), ('(?i)a.*?c', 'AXYZC', '0', ascii('AXYZC')), ('(?i)a.*c', 'AXYZD', '', ascii(None)), ('(?i)a[bc]d', 'ABC', '', ascii(None)), ('(?i)a[bc]d', 'ABD', '0', ascii('ABD')), ('(?i)a[b-d]e', 'ABD', '', ascii(None)), ('(?i)a[b-d]e', 'ACE', '0', ascii('ACE')), ('(?i)a[b-d]', 'AAC', '0', ascii('AC')), ('(?i)a[-b]', 'A-', '0', ascii('A-')), ('(?i)a[b-]', 'A-', '0', ascii('A-')), ('(?i)a[b-a]', '-', '', regex.error, self.BAD_CHAR_RANGE), ('(?i)a[]b', '-', '', regex.error, self.BAD_SET), ('(?i)a[', '-', '', regex.error, self.BAD_SET), ('(?i)a]', 'A]', '0', ascii('A]')), ('(?i)a[]]b', 'A]B', '0', ascii('A]B')), ('(?i)a[^bc]d', 'AED', '0', ascii('AED')), ('(?i)a[^bc]d', 'ABD', '', ascii(None)), ('(?i)a[^-b]c', 'ADC', '0', ascii('ADC')), ('(?i)a[^-b]c', 'A-C', '', ascii(None)), ('(?i)a[^]b]c', 'A]C', '', ascii(None)), ('(?i)a[^]b]c', 'ADC', '0', ascii('ADC')), ('(?i)ab|cd', 'ABC', '0', ascii('AB')), ('(?i)ab|cd', 'ABCD', '0', ascii('AB')), ('(?i)()ef', 'DEF', '0,1', ascii(('EF', ''))), ('(?i)*a', '-', '', regex.error, self.NOTHING_TO_REPEAT), ('(?i)(*)b', '-', '', regex.error, self.NOTHING_TO_REPEAT), ('(?i)$b', 'B', '', ascii(None)), ('(?i)a\\', '-', '', regex.error, self.BAD_ESCAPE), ('(?i)a\\(b', 'A(B', '', ascii(('A(B',))), ('(?i)a\\(*b', 'AB', '0', ascii('AB')), ('(?i)a\\(*b', 'A((B', '0', ascii('A((B')), ('(?i)a\\\\b', 'A\\B', '0', ascii('A\\B')), ('(?i)abc)', '-', '', regex.error, self.TRAILING_CHARS), ('(?i)(abc', '-', '', regex.error, self.MISSING_RPAREN), ('(?i)((a))', 'ABC', '0,1,2', ascii(('A', 'A', 'A'))), ('(?i)(a)b(c)', 'ABC', '0,1,2', ascii(('ABC', 'A', 'C'))), ('(?i)a+b+c', 'AABBABC', '0', ascii('ABC')), ('(?i)a{1,}b{1,}c', 'AABBABC', '0', ascii('ABC')), ('(?i)a**', '-', '', regex.error, self.MULTIPLE_REPEAT), ('(?i)a.+?c', 'ABCABC', '0', ascii('ABC')), ('(?i)a.*?c', 'ABCABC', '0', ascii('ABC')), ('(?i)a.{0,5}?c', 'ABCABC', '0', ascii('ABC')), ('(?i)(a+|b)*', 'AB', '0,1', ascii(('AB', 'B'))), ('(?i)(a+|b){0,}', 'AB', '0,1', ascii(('AB', 'B'))), ('(?i)(a+|b)+', 'AB', '0,1', ascii(('AB', 'B'))), ('(?i)(a+|b){1,}', 'AB', '0,1', ascii(('AB', 'B'))), ('(?i)(a+|b)?', 'AB', '0,1', ascii(('A', 'A'))), ('(?i)(a+|b){0,1}', 'AB', '0,1', ascii(('A', 'A'))), ('(?i)(a+|b){0,1}?', 'AB', '0,1', ascii(('', None))), ('(?i))(', '-', '', regex.error, self.TRAILING_CHARS), ('(?i)[^ab]*', 'CDE', '0', ascii('CDE')), ('(?i)abc', '', '', ascii(None)), ('(?i)a*', '', '0', ascii('')), ('(?i)([abc])*d', 'ABBBCD', '0,1', ascii(('ABBBCD', 'C'))), ('(?i)([abc])*bcd', 'ABCD', '0,1', ascii(('ABCD', 'A'))), ('(?i)a|b|c|d|e', 'E', '0', ascii('E')), ('(?i)(a|b|c|d|e)f', 'EF', '0,1', ascii(('EF', 'E'))), ('(?i)abcd*efg', 'ABCDEFG', '0', ascii('ABCDEFG')), ('(?i)ab*', 'XABYABBBZ', '0', ascii('AB')), ('(?i)ab*', 'XAYABBBZ', '0', ascii('A')), ('(?i)(ab|cd)e', 'ABCDE', '0,1', ascii(('CDE', 'CD'))), ('(?i)[abhgefdc]ij', 'HIJ', '0', ascii('HIJ')), ('(?i)^(ab|cd)e', 'ABCDE', '', ascii(None)), ('(?i)(abc|)ef', 'ABCDEF', '0,1', ascii(('EF', ''))), ('(?i)(a|b)c*d', 'ABCD', '0,1', ascii(('BCD', 'B'))), ('(?i)(ab|ab*)bc', 'ABC', '0,1', ascii(('ABC', 'A'))), ('(?i)a([bc]*)c*', 'ABC', '0,1', ascii(('ABC', 'BC'))), ('(?i)a([bc]*)(c*d)', 'ABCD', '0,1,2', ascii(('ABCD', 'BC', 'D'))), ('(?i)a([bc]+)(c*d)', 'ABCD', '0,1,2', ascii(('ABCD', 'BC', 'D'))), ('(?i)a([bc]*)(c+d)', 'ABCD', '0,1,2', ascii(('ABCD', 'B', 'CD'))), ('(?i)a[bcd]*dcdcde', 'ADCDCDE', '0', ascii('ADCDCDE')), ('(?i)a[bcd]+dcdcde', 'ADCDCDE', '', ascii(None)), ('(?i)(ab|a)b*c', 'ABC', '0,1', ascii(('ABC', 'AB'))), ('(?i)((a)(b)c)(d)', 'ABCD', '1,2,3,4', ascii(('ABC', 'A', 'B', 'D'))), ('(?i)[a-zA-Z_][a-zA-Z0-9_]*', 'ALPHA', '0', ascii('ALPHA')), ('(?i)^a(bc+|b[eh])g|.h$', 'ABH', '0,1', ascii(('BH', None))), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'EFFGZ', '0,1,2', ascii(('EFFGZ', 'EFFGZ', None))), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'IJ', '0,1,2', ascii(('IJ', 'IJ', 'J'))), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'EFFG', '', ascii(None)), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'BCDD', '', ascii(None)), ('(?i)(bc+d$|ef*g.|h?i(j|k))', 'REFFGZ', '0,1,2', ascii(('EFFGZ', 'EFFGZ', None))), ('(?i)((((((((((a))))))))))', 'A', '10', ascii('A')), ('(?i)((((((((((a))))))))))\\10', 'AA', '0', ascii('AA')), #('(?i)((((((((((a))))))))))\\41', 'AA', '', ascii(None)), #('(?i)((((((((((a))))))))))\\41', 'A!', '0', ascii('A!')), ('(?i)(((((((((a)))))))))', 'A', '0', ascii('A')), ('(?i)(?:(?:(?:(?:(?:(?:(?:(?:(?:(a))))))))))', 'A', '1', ascii('A')), ('(?i)(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))', 'C', '1', ascii('C')), ('(?i)multiple words of text', 'UH-UH', '', ascii(None)), ('(?i)multiple words', 'MULTIPLE WORDS, YEAH', '0', ascii('MULTIPLE WORDS')), ('(?i)(.*)c(.*)', 'ABCDE', '0,1,2', ascii(('ABCDE', 'AB', 'DE'))), ('(?i)\\((.*), (.*)\\)', '(A, B)', '2,1', ascii(('B', 'A'))), ('(?i)[k]', 'AB', '', ascii(None)), # ('(?i)abcd', 'ABCD', SUCCEED, 'found+"-"+\\found+"-"+\\\\found', ascii(ABCD-$&-\\ABCD)), # ('(?i)a(bc)d', 'ABCD', SUCCEED, 'g1+"-"+\\g1+"-"+\\\\g1', ascii(BC-$1-\\BC)), ('(?i)a[-]?c', 'AC', '0', ascii('AC')), ('(?i)(abc)\\1', 'ABCABC', '1', ascii('ABC')), ('(?i)([a-c]*)\\1', 'ABCABC', '1', ascii('ABC')), ('a(?!b).', 'abad', '0', ascii('ad')), ('a(?=d).', 'abad', '0', ascii('ad')), ('a(?=c|d).', 'abad', '0', ascii('ad')), ('a(?:b|c|d)(.)', 'ace', '1', ascii('e')), ('a(?:b|c|d)*(.)', 'ace', '1', ascii('e')), ('a(?:b|c|d)+?(.)', 'ace', '1', ascii('e')), ('a(?:b|(c|e){1,2}?|d)+?(.)', 'ace', '1,2', ascii(('c', 'e'))), # Lookbehind: split by : but not if it is escaped by -. ('(?]*?b', 'a>b', '', ascii(None)), # Bug 490573: minimizing repeat problem. (r'^a*?$', 'foo', '', ascii(None)), # Bug 470582: nested groups problem. (r'^((a)c)?(ab)$', 'ab', '1,2,3', ascii((None, None, 'ab'))), # Another minimizing repeat problem (capturing groups in assertions). ('^([ab]*?)(?=(b)?)c', 'abc', '1,2', ascii(('ab', None))), ('^([ab]*?)(?!(b))c', 'abc', '1,2', ascii(('ab', None))), ('^([ab]*?)(?= (3, 4): with self.subTest(pattern=pattern, string=string): self.assertRaisesRegex(expected, excval, regex.search, pattern, string) else: m = regex.search(pattern, string) if m: if group_list: actual = ascii(m.group(*group_list)) else: actual = ascii(m[:]) else: actual = ascii(m) self.assertEqual(actual, expected) def test_replacement(self): self.assertEqual(regex.sub("test\?", "result\?\.\a\q\m\n", "test?"), "result\?\.\a\q\m\n") self.assertEqual(regex.sub(r"test\?", "result\?\.\a\q\m\n", "test?"), "result\?\.\a\q\m\n") self.assertEqual(regex.sub('(.)', r"\1\1", 'x'), 'xx') self.assertEqual(regex.sub('(.)', regex.escape(r"\1\1"), 'x'), r"\1\1") self.assertEqual(regex.sub('(.)', r"\\1\\1", 'x'), r"\1\1") self.assertEqual(regex.sub('(.)', lambda m: r"\1\1", 'x'), r"\1\1") def test_common_prefix(self): # Very long common prefix all = string.ascii_lowercase + string.digits + string.ascii_uppercase side = all * 4 regexp = '(' + side + '|' + side + ')' self.assertEqual(repr(type(regex.compile(regexp))), self.PATTERN_CLASS) def test_captures(self): self.assertEqual(regex.search(r"(\w)+", "abc").captures(1), ['a', 'b', 'c']) self.assertEqual(regex.search(r"(\w{3})+", "abcdef").captures(0, 1), (['abcdef'], ['abc', 'def'])) self.assertEqual(regex.search(r"^(\d{1,3})(?:\.(\d{1,3})){3}$", "192.168.0.1").captures(1, 2), (['192', ], ['168', '0', '1'])) self.assertEqual(regex.match(r"^([0-9A-F]{2}){4} ([a-z]\d){5}$", "3FB52A0C a2c4g3k9d3").captures(1, 2), (['3F', 'B5', '2A', '0C'], ['a2', 'c4', 'g3', 'k9', 'd3'])) self.assertEqual(regex.match("([a-z]W)([a-z]X)+([a-z]Y)", "aWbXcXdXeXfY").captures(1, 2, 3), (['aW'], ['bX', 'cX', 'dX', 'eX'], ['fY'])) self.assertEqual(regex.search(r".*?(?=(.)+)b", "ab").captures(1), ['b']) self.assertEqual(regex.search(r".*?(?>(.){0,2})d", "abcd").captures(1), ['b', 'c']) self.assertEqual(regex.search(r"(.)+", "a").captures(1), ['a']) def test_guards(self): m = regex.search(r"(X.*?Y\s*){3}(X\s*)+AB:", "XY\nX Y\nX Y\nXY\nXX AB:") self.assertEqual(m.span(0, 1, 2), ((3, 21), (12, 15), (16, 18))) m = regex.search(r"(X.*?Y\s*){3,}(X\s*)+AB:", "XY\nX Y\nX Y\nXY\nXX AB:") self.assertEqual(m.span(0, 1, 2), ((0, 21), (12, 15), (16, 18))) m = regex.search(r'\d{4}(\s*\w)?\W*((?!\d)\w){2}', "9999XX") self.assertEqual(m.span(0, 1, 2), ((0, 6), (-1, -1), (5, 6))) m = regex.search(r'A\s*?.*?(\n+.*?\s*?){0,2}\(X', 'A\n1\nS\n1 (X') self.assertEqual(m.span(0, 1), ((0, 10), (5, 8))) m = regex.search('Derde\s*:', 'aaaaaa:\nDerde:') self.assertEqual(m.span(), (8, 14)) m = regex.search('Derde\s*:', 'aaaaa:\nDerde:') self.assertEqual(m.span(), (7, 13)) def test_turkic(self): # Turkish has dotted and dotless I/i. pairs = "I=i;I=\u0131;i=\u0130" all_chars = set() matching = set() for pair in pairs.split(";"): ch1, ch2 = pair.split("=") all_chars.update((ch1, ch2)) matching.add((ch1, ch1)) matching.add((ch1, ch2)) matching.add((ch2, ch1)) matching.add((ch2, ch2)) for ch1 in all_chars: for ch2 in all_chars: m = regex.match(r"(?i)\A" + ch1 + r"\Z", ch2) if m: if (ch1, ch2) not in matching: self.fail("{} matching {}".format(ascii(ch1), ascii(ch2))) else: if (ch1, ch2) in matching: self.fail("{} not matching {}".format(ascii(ch1), ascii(ch2))) def test_named_lists(self): options = ["one", "two", "three"] self.assertEqual(regex.match(r"333\L444", "333one444", bar=options).group(), "333one444") self.assertEqual(regex.match(r"(?i)333\L444", "333TWO444", bar=options).group(), "333TWO444") self.assertEqual(regex.match(r"333\L444", "333four444", bar=options), None) options = [b"one", b"two", b"three"] self.assertEqual(regex.match(br"333\L444", b"333one444", bar=options).group(), b"333one444") self.assertEqual(regex.match(br"(?i)333\L444", b"333TWO444", bar=options).group(), b"333TWO444") self.assertEqual(regex.match(br"333\L444", b"333four444", bar=options), None) self.assertEqual(repr(type(regex.compile(r"3\L4\L+5", bar=["one", "two", "three"]))), self.PATTERN_CLASS) self.assertEqual(regex.findall(r"^\L", "solid QWERT", options=set(['good', 'brilliant', '+s\\ol[i}d'])), []) self.assertEqual(regex.findall(r"^\L", "+solid QWERT", options=set(['good', 'brilliant', '+solid'])), ['+solid']) options = ["STRASSE"] self.assertEqual(regex.match(r"(?fi)\L", "stra\N{LATIN SMALL LETTER SHARP S}e", words=options).span(), (0, 6)) options = ["STRASSE", "stress"] self.assertEqual(regex.match(r"(?fi)\L", "stra\N{LATIN SMALL LETTER SHARP S}e", words=options).span(), (0, 6)) options = ["stra\N{LATIN SMALL LETTER SHARP S}e"] self.assertEqual(regex.match(r"(?fi)\L", "STRASSE", words=options).span(), (0, 7)) options = ["kit"] self.assertEqual(regex.search(r"(?i)\L", "SKITS", words=options).span(), (1, 4)) self.assertEqual(regex.search(r"(?i)\L", "SK\N{LATIN CAPITAL LETTER I WITH DOT ABOVE}TS", words=options).span(), (1, 4)) self.assertEqual(regex.search(r"(?fi)\b(\w+) +\1\b", " stra\N{LATIN SMALL LETTER SHARP S}e STRASSE ").span(), (1, 15)) self.assertEqual(regex.search(r"(?fi)\b(\w+) +\1\b", " STRASSE stra\N{LATIN SMALL LETTER SHARP S}e ").span(), (1, 15)) self.assertEqual(regex.search(r"^\L$", "", options=[]).span(), (0, 0)) def test_fuzzy(self): # Some tests borrowed from TRE library tests. self.assertEqual(repr(type(regex.compile('(fou){s,e<=1}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(fuu){s}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(fuu){s,e}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(anaconda){1i+1d<1,s<=1}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(anaconda){1i+1d<1,s<=1,e<=10}'))), self.PATTERN_CLASS) self.assertEqual(repr(type(regex.compile('(anaconda){s<=1,e<=1,1i+1d<1}'))), self.PATTERN_CLASS) text = 'molasses anaconda foo bar baz smith anderson ' self.assertEqual(regex.search('(znacnda){s<=1,e<=3,1i+1d<1}', text), None) self.assertEqual(regex.search('(znacnda){s<=1,e<=3,1i+1d<2}', text).span(0, 1), ((9, 17), (9, 17))) self.assertEqual(regex.search('(ananda){1i+1d<2}', text), None) self.assertEqual(regex.search(r"(?:\bznacnda){e<=2}", text)[0], "anaconda") self.assertEqual(regex.search(r"(?:\bnacnda){e<=2}", text)[0], "anaconda") text = 'anaconda foo bar baz smith anderson' self.assertEqual(regex.search('(fuu){i<=3,d<=3,e<=5}', text).span(0, 1), ((0, 0), (0, 0))) self.assertEqual(regex.search('(?b)(fuu){i<=3,d<=3,e<=5}', text).span(0, 1), ((9, 10), (9, 10))) self.assertEqual(regex.search('(fuu){i<=2,d<=2,e<=5}', text).span(0, 1), ((7, 10), (7, 10))) self.assertEqual(regex.search('(?e)(fuu){i<=2,d<=2,e<=5}', text).span(0, 1), ((9, 10), (9, 10))) self.assertEqual(regex.search('(fuu){i<=3,d<=3,e}', text).span(0, 1), ((0, 0), (0, 0))) self.assertEqual(regex.search('(?b)(fuu){i<=3,d<=3,e}', text).span(0, 1), ((9, 10), (9, 10))) self.assertEqual(repr(type(regex.compile('(approximate){s<=3,1i+1d<3}'))), self.PATTERN_CLASS) # No cost limit. self.assertEqual(regex.search('(foobar){e}', 'xirefoabralfobarxie').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('(?e)(foobar){e}', 'xirefoabralfobarxie').span(0, 1), ((0, 3), (0, 3))) self.assertEqual(regex.search('(?b)(foobar){e}', 'xirefoabralfobarxie').span(0, 1), ((11, 16), (11, 16))) # At most two errors. self.assertEqual(regex.search('(foobar){e<=2}', 'xirefoabrzlfd').span(0, 1), ((4, 9), (4, 9))) self.assertEqual(regex.search('(foobar){e<=2}', 'xirefoabzlfd'), None) # At most two inserts or substitutions and max two errors total. self.assertEqual(regex.search('(foobar){i<=2,s<=2,e<=2}', 'oobargoobaploowap').span(0, 1), ((5, 11), (5, 11))) # Find best whole word match for "foobar". self.assertEqual(regex.search('\\b(foobar){e}\\b', 'zfoobarz').span(0, 1), ((0, 8), (0, 8))) self.assertEqual(regex.search('\\b(foobar){e}\\b', 'boing zfoobarz goobar woop').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('(?b)\\b(foobar){e}\\b', 'boing zfoobarz goobar woop').span(0, 1), ((15, 21), (15, 21))) # Match whole string, allow only 1 error. self.assertEqual(regex.search('^(foobar){e<=1}$', 'foobar').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xfoobar').span(0, 1), ((0, 7), (0, 7))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foobarx').span(0, 1), ((0, 7), (0, 7))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'fooxbar').span(0, 1), ((0, 7), (0, 7))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foxbar').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xoobar').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foobax').span(0, 1), ((0, 6), (0, 6))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'oobar').span(0, 1), ((0, 5), (0, 5))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'fobar').span(0, 1), ((0, 5), (0, 5))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'fooba').span(0, 1), ((0, 5), (0, 5))) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xfoobarx'), None) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foobarxx'), None) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xxfoobar'), None) self.assertEqual(regex.search('^(foobar){e<=1}$', 'xfoxbar'), None) self.assertEqual(regex.search('^(foobar){e<=1}$', 'foxbarx'), None) # At most one insert, two deletes, and three substitutions. # Additionally, deletes cost two and substitutes one, and total # cost must be less than 4. self.assertEqual(regex.search('(foobar){i<=1,d<=2,s<=3,2d+1s<4}', '3oifaowefbaoraofuiebofasebfaobfaorfeoaro').span(0, 1), ((6, 13), (6, 13))) self.assertEqual(regex.search('(?b)(foobar){i<=1,d<=2,s<=3,2d+1s<4}', '3oifaowefbaoraofuiebofasebfaobfaorfeoaro').span(0, 1), ((34, 39), (34, 39))) # Partially fuzzy matches. self.assertEqual(regex.search('foo(bar){e<=1}zap', 'foobarzap').span(0, 1), ((0, 9), (3, 6))) self.assertEqual(regex.search('foo(bar){e<=1}zap', 'fobarzap'), None) self.assertEqual(regex.search('foo(bar){e<=1}zap', 'foobrzap').span(0, 1), ((0, 8), (3, 5))) text = ('www.cnn.com 64.236.16.20\nwww.slashdot.org 66.35.250.150\n' 'For useful information, use www.slashdot.org\nthis is demo data!\n') self.assertEqual(regex.search(r'(?s)^.*(dot.org){e}.*$', text).span(0, 1), ((0, 120), (120, 120))) self.assertEqual(regex.search(r'(?es)^.*(dot.org){e}.*$', text).span(0, 1), ((0, 120), (93, 100))) self.assertEqual(regex.search(r'^.*(dot.org){e}.*$', text).span(0, 1), ((0, 119), (24, 101))) # Behaviour is unexpected, but arguably not wrong. It first finds the # best match, then the best in what follows, etc. self.assertEqual(regex.findall(r"\b\L{e<=1}\b", " book cot dog desk ", words="cat dog".split()), ["cot", "dog"]) self.assertEqual(regex.findall(r"\b\L{e<=1}\b", " book dog cot desk ", words="cat dog".split()), [" dog", "cot"]) self.assertEqual(regex.findall(r"(?e)\b\L{e<=1}\b", " book dog cot desk ", words="cat dog".split()), ["dog", "cot"]) self.assertEqual(regex.findall(r"(?r)\b\L{e<=1}\b", " book cot dog desk ", words="cat dog".split()), ["dog ", "cot"]) self.assertEqual(regex.findall(r"(?er)\b\L{e<=1}\b", " book cot dog desk ", words="cat dog".split()), ["dog", "cot"]) self.assertEqual(regex.findall(r"(?r)\b\L{e<=1}\b", " book dog cot desk ", words="cat dog".split()), ["cot", "dog"]) self.assertEqual(regex.findall(br"\b\L{e<=1}\b", b" book cot dog desk ", words=b"cat dog".split()), [b"cot", b"dog"]) self.assertEqual(regex.findall(br"\b\L{e<=1}\b", b" book dog cot desk ", words=b"cat dog".split()), [b" dog", b"cot"]) self.assertEqual(regex.findall(br"(?e)\b\L{e<=1}\b", b" book dog cot desk ", words=b"cat dog".split()), [b"dog", b"cot"]) self.assertEqual(regex.findall(br"(?r)\b\L{e<=1}\b", b" book cot dog desk ", words=b"cat dog".split()), [b"dog ", b"cot"]) self.assertEqual(regex.findall(br"(?er)\b\L{e<=1}\b", b" book cot dog desk ", words=b"cat dog".split()), [b"dog", b"cot"]) self.assertEqual(regex.findall(br"(?r)\b\L{e<=1}\b", b" book dog cot desk ", words=b"cat dog".split()), [b"cot", b"dog"]) self.assertEqual(regex.search(r"(\w+) (\1{e<=1})", "foo fou").groups(), ("foo", "fou")) self.assertEqual(regex.search(r"(?r)(\2{e<=1}) (\w+)", "foo fou").groups(), ("foo", "fou")) self.assertEqual(regex.search(br"(\w+) (\1{e<=1})", b"foo fou").groups(), (b"foo", b"fou")) self.assertEqual(regex.findall(r"(?:(?:QR)+){e}","abcde"), ["abcde", ""]) self.assertEqual(regex.findall(r"(?:Q+){e}","abc"), ["abc", ""]) # Hg issue 41. self.assertEqual(regex.match(r"(?:service detection){0[^()]+)|(?R))*\)", "(ab(cd)ef)")[ : ], ("(ab(cd)ef)", "ef")) self.assertEqual(regex.search(r"\(((?>[^()]+)|(?R))*\)", "(ab(cd)ef)").captures(1), ["ab", "cd", "(cd)", "ef"]) self.assertEqual(regex.search(r"(?r)\(((?R)|(?>[^()]+))*\)", "(ab(cd)ef)")[ : ], ("(ab(cd)ef)", "ab")) self.assertEqual(regex.search(r"(?r)\(((?R)|(?>[^()]+))*\)", "(ab(cd)ef)").captures(1), ["ef", "cd", "(cd)", "ab"]) self.assertEqual(regex.search(r"\(([^()]+|(?R))*\)", "some text (a(b(c)d)e) more text")[ : ], ("(a(b(c)d)e)", "e")) self.assertEqual(regex.search(r"(?r)\(((?R)|[^()]+)*\)", "some text (a(b(c)d)e) more text")[ : ], ("(a(b(c)d)e)", "a")) self.assertEqual(regex.search(r"(foo(\(((?:(?>[^()]+)|(?2))*)\)))", "foo(bar(baz)+baz(bop))")[ : ], ("foo(bar(baz)+baz(bop))", "foo(bar(baz)+baz(bop))", "(bar(baz)+baz(bop))", "bar(baz)+baz(bop)")) self.assertEqual(regex.search(r"(?r)(foo(\(((?:(?2)|(?>[^()]+))*)\)))", "foo(bar(baz)+baz(bop))")[ : ], ("foo(bar(baz)+baz(bop))", "foo(bar(baz)+baz(bop))", "(bar(baz)+baz(bop))", "bar(baz)+baz(bop)")) rgx = regex.compile(r"""^\s*(<\s*([a-zA-Z:]+)(?:\s*[a-zA-Z:]*\s*=\s*(?:'[^']*'|"[^"]*"))*\s*(/\s*)?>(?:[^<>]*|(?1))*(?(3)|<\s*/\s*\2\s*>))\s*$""") self.assertEqual(bool(rgx.search('')), True) self.assertEqual(bool(rgx.search('')), False) self.assertEqual(bool(rgx.search('')), True) self.assertEqual(bool(rgx.search('')), False) self.assertEqual(bool(rgx.search('')), False) self.assertEqual(bool(rgx.search('')), False) self.assertEqual(bool(rgx.search('')), True) self.assertEqual(bool(rgx.search('< fooo / >')), True) # The next regex should and does match. Perl 5.14 agrees. #self.assertEqual(bool(rgx.search('foo')), False) self.assertEqual(bool(rgx.search('foo')), False) self.assertEqual(bool(rgx.search('foo')), True) self.assertEqual(bool(rgx.search('foo')), True) self.assertEqual(bool(rgx.search('')), True) def test_copy(self): # PatternObjects are immutable, therefore there's no need to clone them. r = regex.compile("a") self.assert_(copy.copy(r) is r) self.assert_(copy.deepcopy(r) is r) # MatchObjects are normally mutable because the target string can be # detached. However, after the target string has been detached, a # MatchObject becomes immutable, so there's no need to clone it. m = r.match("a") self.assert_(copy.copy(m) is not m) self.assert_(copy.deepcopy(m) is not m) self.assert_(m.string is not None) m2 = copy.copy(m) m2.detach_string() self.assert_(m.string is not None) self.assert_(m2.string is None) # The following behaviour matches that of the re module. it = regex.finditer(".", "ab") it2 = copy.copy(it) self.assertEqual(next(it).group(), "a") self.assertEqual(next(it2).group(), "b") # The following behaviour matches that of the re module. it = regex.finditer(".", "ab") it2 = copy.deepcopy(it) self.assertEqual(next(it).group(), "a") self.assertEqual(next(it2).group(), "b") # The following behaviour is designed to match that of copying 'finditer'. it = regex.splititer(" ", "a b") it2 = copy.copy(it) self.assertEqual(next(it), "a") self.assertEqual(next(it2), "b") # The following behaviour is designed to match that of copying 'finditer'. it = regex.splititer(" ", "a b") it2 = copy.deepcopy(it) self.assertEqual(next(it), "a") self.assertEqual(next(it2), "b") def test_format(self): self.assertEqual(regex.subf(r"(\w+) (\w+)", "{0} => {2} {1}", "foo bar"), "foo bar => bar foo") self.assertEqual(regex.subf(r"(?\w+) (?\w+)", "{word2} {word1}", "foo bar"), "bar foo") self.assertEqual(regex.subfn(r"(\w+) (\w+)", "{0} => {2} {1}", "foo bar"), ("foo bar => bar foo", 1)) self.assertEqual(regex.subfn(r"(?\w+) (?\w+)", "{word2} {word1}", "foo bar"), ("bar foo", 1)) self.assertEqual(regex.match(r"(\w+) (\w+)", "foo bar").expandf("{0} => {2} {1}"), "foo bar => bar foo") def test_fullmatch(self): self.assertEqual(bool(regex.fullmatch(r"abc", "abc")), True) self.assertEqual(bool(regex.fullmatch(r"abc", "abcx")), False) self.assertEqual(bool(regex.fullmatch(r"abc", "abcx", endpos=3)), True) self.assertEqual(bool(regex.fullmatch(r"abc", "xabc", pos=1)), True) self.assertEqual(bool(regex.fullmatch(r"abc", "xabcy", pos=1)), False) self.assertEqual(bool(regex.fullmatch(r"abc", "xabcy", pos=1, endpos=4)), True) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "abc")), True) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "abcx")), False) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "abcx", endpos=3)), True) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "xabc", pos=1)), True) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "xabcy", pos=1)), False) self.assertEqual(bool(regex.fullmatch(r"(?r)abc", "xabcy", pos=1, endpos=4)), True) def test_issue_18468(self): # Applies only after Python 3.4 for compatibility with re. if (sys.version_info.major, sys.version_info.minor) < (3, 4): return self.assertTypedEqual(regex.sub('y', 'a', 'xyz'), 'xaz') self.assertTypedEqual(regex.sub('y', StrSubclass('a'), StrSubclass('xyz')), 'xaz') self.assertTypedEqual(regex.sub(b'y', b'a', b'xyz'), b'xaz') self.assertTypedEqual(regex.sub(b'y', BytesSubclass(b'a'), BytesSubclass(b'xyz')), b'xaz') self.assertTypedEqual(regex.sub(b'y', bytearray(b'a'), bytearray(b'xyz')), b'xaz') self.assertTypedEqual(regex.sub(b'y', memoryview(b'a'), memoryview(b'xyz')), b'xaz') for string in ":a:b::c", StrSubclass(":a:b::c"): self.assertTypedEqual(regex.split(":", string), ['', 'a', 'b', '', 'c']) self.assertTypedEqual(regex.split(":*", string), ['', 'a', 'b', 'c']) self.assertTypedEqual(regex.split("(:*)", string), ['', ':', 'a', ':', 'b', '::', 'c']) for string in (b":a:b::c", BytesSubclass(b":a:b::c"), bytearray(b":a:b::c"), memoryview(b":a:b::c")): self.assertTypedEqual(regex.split(b":", string), [b'', b'a', b'b', b'', b'c']) self.assertTypedEqual(regex.split(b":*", string), [b'', b'a', b'b', b'c']) self.assertTypedEqual(regex.split(b"(:*)", string), [b'', b':', b'a', b':', b'b', b'::', b'c']) for string in "a:b::c:::d", StrSubclass("a:b::c:::d"): self.assertTypedEqual(regex.findall(":+", string), [":", "::", ":::"]) self.assertTypedEqual(regex.findall("(:+)", string), [":", "::", ":::"]) self.assertTypedEqual(regex.findall("(:)(:*)", string), [(":", ""), (":", ":"), (":", "::")]) for string in (b"a:b::c:::d", BytesSubclass(b"a:b::c:::d"), bytearray(b"a:b::c:::d"), memoryview(b"a:b::c:::d")): self.assertTypedEqual(regex.findall(b":+", string), [b":", b"::", b":::"]) self.assertTypedEqual(regex.findall(b"(:+)", string), [b":", b"::", b":::"]) self.assertTypedEqual(regex.findall(b"(:)(:*)", string), [(b":", b""), (b":", b":"), (b":", b"::")]) for string in 'a', StrSubclass('a'): self.assertEqual(regex.match('a', string).groups(), ()) self.assertEqual(regex.match('(a)', string).groups(), ('a',)) self.assertEqual(regex.match('(a)', string).group(0), 'a') self.assertEqual(regex.match('(a)', string).group(1), 'a') self.assertEqual(regex.match('(a)', string).group(1, 1), ('a', 'a')) for string in (b'a', BytesSubclass(b'a'), bytearray(b'a'), memoryview(b'a')): self.assertEqual(regex.match(b'a', string).groups(), ()) self.assertEqual(regex.match(b'(a)', string).groups(), (b'a',)) self.assertEqual(regex.match(b'(a)', string).group(0), b'a') self.assertEqual(regex.match(b'(a)', string).group(1), b'a') self.assertEqual(regex.match(b'(a)', string).group(1, 1), (b'a', b'a')) def test_partial(self): self.assertEqual(regex.match('ab', 'a', partial=True).partial, True) self.assertEqual(regex.match('ab', 'a', partial=True).span(), (0, 1)) self.assertEqual(regex.match(r'cats', 'cat', partial=True).partial, True) self.assertEqual(regex.match(r'cats', 'cat', partial=True).span(), (0, 3)) self.assertEqual(regex.match(r'cats', 'catch', partial=True), None) self.assertEqual(regex.match(r'abc\w{3}', 'abcdef', partial=True).partial, False) self.assertEqual(regex.match(r'abc\w{3}', 'abcdef', partial=True).span(), (0, 6)) self.assertEqual(regex.match(r'abc\w{3}', 'abcde', partial=True).partial, True) self.assertEqual(regex.match(r'abc\w{3}', 'abcde', partial=True).span(), (0, 5)) self.assertEqual(regex.match(r'\d{4}$', '1234', partial=True).partial, False) self.assertEqual(regex.match(r'\L', 'post', partial=True, words=['post']).partial, False) self.assertEqual(regex.match(r'\L', 'post', partial=True, words=['post']).span(), (0, 4)) self.assertEqual(regex.match(r'\L', 'pos', partial=True, words=['post']).partial, True) self.assertEqual(regex.match(r'\L', 'pos', partial=True, words=['post']).span(), (0, 3)) self.assertEqual(regex.match(r'(?fi)\L', 'POST', partial=True, words=['po\uFB06']).partial, False) self.assertEqual(regex.match(r'(?fi)\L', 'POST', partial=True, words=['po\uFB06']).span(), (0, 4)) self.assertEqual(regex.match(r'(?fi)\L', 'POS', partial=True, words=['po\uFB06']).partial, True) self.assertEqual(regex.match(r'(?fi)\L', 'POS', partial=True, words=['po\uFB06']).span(), (0, 3)) self.assertEqual(regex.match(r'(?fi)\L', 'po\uFB06', partial=True, words=['POS']), None) self.assertEqual(regex.match(r'[a-z]*4R$', 'a', partial=True).span(), (0, 1)) self.assertEqual(regex.match(r'[a-z]*4R$', 'ab', partial=True).span(), (0, 2)) self.assertEqual(regex.match(r'[a-z]*4R$', 'ab4', partial=True).span(), (0, 3)) self.assertEqual(regex.match(r'[a-z]*4R$', 'a4', partial=True).span(), (0, 2)) self.assertEqual(regex.match(r'[a-z]*4R$', 'a4R', partial=True).span(), (0, 3)) self.assertEqual(regex.match(r'[a-z]*4R$', '4a', partial=True), None) self.assertEqual(regex.match(r'[a-z]*4R$', 'a44', partial=True), None) def test_hg_bugs(self): # Hg issue 28. self.assertEqual(bool(regex.compile("(?>b)", flags=regex.V1)), True) # Hg issue 29. self.assertEqual(bool(regex.compile(r"^((?>\w+)|(?>\s+))*$", flags=regex.V1)), True) # Hg issue 31. self.assertEqual(regex.findall(r"\((?:(?>[^()]+)|(?R))*\)", "a(bcd(e)f)g(h)"), ['(bcd(e)f)', '(h)']) self.assertEqual(regex.findall(r"\((?:(?:[^()]+)|(?R))*\)", "a(bcd(e)f)g(h)"), ['(bcd(e)f)', '(h)']) self.assertEqual(regex.findall(r"\((?:(?>[^()]+)|(?R))*\)", "a(b(cd)e)f)g)h"), ['(b(cd)e)']) self.assertEqual(regex.findall(r"\((?:(?>[^()]+)|(?R))*\)", "a(bc(d(e)f)gh"), ['(d(e)f)']) self.assertEqual(regex.findall(r"(?r)\((?:(?>[^()]+)|(?R))*\)", "a(bc(d(e)f)gh"), ['(d(e)f)']) self.assertEqual([m.group() for m in regex.finditer(r"\((?:[^()]*+|(?0))*\)", "a(b(c(de)fg)h")], ['(c(de)fg)']) # Hg issue 32. self.assertEqual(regex.search("a(bc)d", "abcd", regex.I | regex.V1).group(0), "abcd") # Hg issue 33. self.assertEqual(regex.search("([\da-f:]+)$", "E", regex.I | regex.V1).group(0), "E") self.assertEqual(regex.search("([\da-f:]+)$", "e", regex.I | regex.V1).group(0), "e") # Hg issue 34. self.assertEqual(regex.search("^(?=ab(de))(abd)(e)", "abde").groups(), ('de', 'abd', 'e')) # Hg issue 35. self.assertEqual(bool(regex.match(r"\ ", " ", flags=regex.X)), True) # Hg issue 36. self.assertEqual(regex.search(r"^(a|)\1{2}b", "b").group(0, 1), ('b', '')) # Hg issue 37. self.assertEqual(regex.search("^(a){0,0}", "abc").group(0, 1), ('', None)) # Hg issue 38. self.assertEqual(regex.search("(?>.*/)b", "a/b").group(0), "a/b") # Hg issue 39. self.assertEqual(regex.search(r"(?V0)((?i)blah)\s+\1", "blah BLAH").group(0, 1), ("blah BLAH", "blah")) self.assertEqual(regex.search(r"(?V1)((?i)blah)\s+\1", "blah BLAH"), None) # Hg issue 40. self.assertEqual(regex.search(r"(\()?[^()]+(?(1)\)|)", "(abcd").group(0), "abcd") # Hg issue 42. self.assertEqual(regex.search("(a*)*", "a").span(1), (1, 1)) self.assertEqual(regex.search("(a*)*", "aa").span(1), (2, 2)) self.assertEqual(regex.search("(a*)*", "aaa").span(1), (3, 3)) # Hg issue 43. self.assertEqual(regex.search("a(?#xxx)*", "aaa").group(), "aaa") # Hg issue 44. self.assertEqual(regex.search("(?=abc){3}abc", "abcabcabc").span(), (0, 3)) # Hg issue 45. self.assertEqual(regex.search("^(?:a(?:(?:))+)+", "a").span(), (0, 1)) self.assertEqual(regex.search("^(?:a(?:(?:))+)+", "aa").span(), (0, 2)) # Hg issue 46. self.assertEqual(regex.search("a(?x: b c )d", "abcd").group(0), "abcd") # Hg issue 47. self.assertEqual(regex.search("a#comment\n*", "aaa", flags=regex.X).group(0), "aaa") # Hg issue 48. self.assertEqual(regex.search(r"(?V1)(a(?(1)\1)){1}", "aaaaaaaaaa").span(0, 1), ((0, 1), (0, 1))) self.assertEqual(regex.search(r"(?V1)(a(?(1)\1)){2}", "aaaaaaaaaa").span(0, 1), ((0, 3), (1, 3))) self.assertEqual(regex.search(r"(?V1)(a(?(1)\1)){3}", "aaaaaaaaaa").span(0, 1), ((0, 6), (3, 6))) self.assertEqual(regex.search(r"(?V1)(a(?(1)\1)){4}", "aaaaaaaaaa").span(0, 1), ((0, 10), (6, 10))) # Hg issue 49. self.assertEqual(regex.search("(?V1)(a)(?<=b(?1))", "baz").group(0), "a") # Hg issue 50. self.assertEqual(regex.findall(r'(?fi)\L', 'POST, Post, post, po\u017Ft, po\uFB06, and po\uFB05', keywords=['post','pos']), ['POST', 'Post', 'post', 'po\u017Ft', 'po\uFB06', 'po\uFB05']) self.assertEqual(regex.findall(r'(?fi)pos|post', 'POST, Post, post, po\u017Ft, po\uFB06, and po\uFB05'), ['POS', 'Pos', 'pos', 'po\u017F', 'po\uFB06', 'po\uFB05']) self.assertEqual(regex.findall(r'(?fi)post|pos', 'POST, Post, post, po\u017Ft, po\uFB06, and po\uFB05'), ['POST', 'Post', 'post', 'po\u017Ft', 'po\uFB06', 'po\uFB05']) self.assertEqual(regex.findall(r'(?fi)post|another', 'POST, Post, post, po\u017Ft, po\uFB06, and po\uFB05'), ['POST', 'Post', 'post', 'po\u017Ft', 'po\uFB06', 'po\uFB05']) # Hg issue 51. self.assertEqual(regex.search("(?V1)((a)(?1)|(?2))", "a").group(0, 1, 2), ('a', 'a', None)) # Hg issue 52. self.assertEqual(regex.search(r"(?V1)(\1xx|){6}", "xx").span(0, 1), ((0, 2), (2, 2))) # Hg issue 53. self.assertEqual(regex.search("(a|)+", "a").group(0, 1), ("a", "")) # Hg issue 54. self.assertEqual(regex.search(r"(a|)*\d", "a" * 80), None) # Hg issue 55. self.assertEqual(regex.search("^(?:a?b?)*$", "ac"), None) # Hg issue 58. self.assertRaisesRegex(regex.error, self.UNDEF_CHAR_NAME, lambda: regex.compile("\\N{1}")) # Hg issue 59. self.assertEqual(regex.search("\\Z", "a\na\n").span(0), (4, 4)) # Hg issue 60. self.assertEqual(regex.search("(q1|.)*(q2|.)*(x(a|bc)*y){2,}", "xayxay").group(0), "xayxay") # Hg issue 61. self.assertEqual(regex.search("(?i)[^a]", "A"), None) # Hg issue 63. self.assertEqual(regex.search("(?i)[[:ascii:]]", "\N{KELVIN SIGN}"), None) # Hg issue 66. self.assertEqual(regex.search("((a|b(?1)c){3,5})", "baaaaca").group(0, 1, 2), ('aaaa', 'aaaa', 'a')) # Hg issue 71. self.assertEqual(regex.findall(r"(?<=:\S+ )\w+", ":9 abc :10 def"), ['abc', 'def']) self.assertEqual(regex.findall(r"(?<=:\S* )\w+", ":9 abc :10 def"), ['abc', 'def']) self.assertEqual(regex.findall(r"(?<=:\S+? )\w+", ":9 abc :10 def"), ['abc', 'def']) self.assertEqual(regex.findall(r"(?<=:\S*? )\w+", ":9 abc :10 def"), ['abc', 'def']) # Hg issue 73. self.assertEqual(regex.search(r"(?:fe)?male", "female").group(), "female") self.assertEqual([m.group() for m in regex.finditer(r"(fe)?male: h(?(1)(er)|(is)) (\w+)", "female: her dog; male: his cat. asdsasda")], ['female: her dog', 'male: his cat']) # Hg issue 78. self.assertEqual(regex.search(r'(?\((?:[^()]++|(?&rec))*\))', 'aaa(((1+0)+1)+1)bbb').captures('rec'), ['(1+0)', '((1+0)+1)', '(((1+0)+1)+1)']) # Hg issue 80. self.assertRaisesRegex(regex.error, self.BAD_ESCAPE, lambda: regex.sub('x', '\\', 'x'), ) # Hg issue 82. fz = "(CAGCCTCCCATTTCAGAATATACATCC){1a(?b))', "ab").spans("x"), [(1, 2), (0, 2)]) # Hg issue 91. # Check that the replacement cache works. self.assertEqual(regex.sub(r'(-)', lambda m: m.expand(r'x'), 'a-b-c'), 'axbxc') # Hg issue 94. rx = regex.compile(r'\bt(est){i<2}', flags=regex.V1) self.assertEqual(rx.search("Some text"), None) self.assertEqual(rx.findall("Some text"), []) # Hg issue 95. self.assertRaisesRegex(regex.error, self.MULTIPLE_REPEAT, lambda: regex.compile(r'.???')) # Hg issue 97. self.assertEqual(regex.escape('foo!?'), 'foo\\!\\?') self.assertEqual(regex.escape('foo!?', special_only=True), 'foo!\\?') self.assertEqual(regex.escape(b'foo!?'), b'foo\\!\\?') self.assertEqual(regex.escape(b'foo!?', special_only=True), b'foo!\\?') # Hg issue 100. self.assertEqual(regex.search('^([^z]*(?:WWWi|W))?$', 'WWWi').groups(), ('WWWi', )) self.assertEqual(regex.search('^([^z]*(?:WWWi|w))?$', 'WWWi').groups(), ('WWWi', )) self.assertEqual(regex.search('^([^z]*?(?:WWWi|W))?$', 'WWWi').groups(), ('WWWi', )) # Hg issue 101. pat = regex.compile(r'xxx', flags=regex.FULLCASE | regex.UNICODE) self.assertEqual([x.group() for x in pat.finditer('yxxx')], ['xxx']) self.assertEqual(pat.findall('yxxx'), ['xxx']) raw = 'yxxx' self.assertEqual([x.group() for x in pat.finditer(raw)], ['xxx']) self.assertEqual(pat.findall(raw), ['xxx']) pat = regex.compile(r'xxx', flags=regex.FULLCASE | regex.IGNORECASE | regex.UNICODE) self.assertEqual([x.group() for x in pat.finditer('yxxx')], ['xxx']) self.assertEqual(pat.findall('yxxx'), ['xxx']) raw = 'yxxx' self.assertEqual([x.group() for x in pat.finditer(raw)], ['xxx']) self.assertEqual(pat.findall(raw), ['xxx']) # Hg issue 106. self.assertEqual(regex.sub('(?V0).*', 'x', 'test'), 'x') self.assertEqual(regex.sub('(?V1).*', 'x', 'test'), 'xx') self.assertEqual(regex.sub('(?V0).*?', '|', 'test'), '|t|e|s|t|') self.assertEqual(regex.sub('(?V1).*?', '|', 'test'), '|||||||||') # Hg issue 112. self.assertEqual(regex.sub(r'^(@)\n(?!.*?@)(.*)', r'\1\n==========\n\2', '@\n', flags=regex.DOTALL), '@\n==========\n') # Hg issue 109. self.assertEqual(regex.match(r'(?:cats|cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?e)(?:cats|cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?b)(?:cats|cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?:cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?e)(?:cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?b)(?:cat){e<=1}', 'caz').fuzzy_counts, (1, 0, 0)) self.assertEqual(regex.match(r'(?:cats){e<=2}', 'c ats').fuzzy_counts, (1, 1, 0)) self.assertEqual(regex.match(r'(?e)(?:cats){e<=2}', 'c ats').fuzzy_counts, (0, 1, 0)) self.assertEqual(regex.match(r'(?b)(?:cats){e<=2}', 'c ats').fuzzy_counts, (0, 1, 0)) self.assertEqual(regex.match(r'(?:cats){e<=2}', 'c a ts').fuzzy_counts, (0, 2, 0)) self.assertEqual(regex.match(r'(?e)(?:cats){e<=2}', 'c a ts').fuzzy_counts, (0, 2, 0)) self.assertEqual(regex.match(r'(?b)(?:cats){e<=2}', 'c a ts').fuzzy_counts, (0, 2, 0)) self.assertEqual(regex.match(r'(?:cats){e<=1}', 'c ats').fuzzy_counts, (0, 1, 0)) self.assertEqual(regex.match(r'(?e)(?:cats){e<=1}', 'c ats').fuzzy_counts, (0, 1, 0)) self.assertEqual(regex.match(r'(?b)(?:cats){e<=1}', 'c ats').fuzzy_counts, (0, 1, 0)) # Hg issue 115. self.assertEqual(regex.findall(r'\bof ([a-z]+) of \1\b', 'To make use of one of these modules'), []) # Hg issue 125. self.assertEqual(regex.sub(r'x', r'\g<0>', 'x'), 'x') # Unreported issue: no such builtin as 'ascii' in Python 2. self.assertEqual(bool(regex.match(r'a', 'a', regex.DEBUG)), True) # Hg issue 131. self.assertEqual(regex.findall(r'(?V1)[[b-e]--cd]', 'abcdef'), ['b', 'e']) self.assertEqual(regex.findall(r'(?V1)[b-e--cd]', 'abcdef'), ['b', 'e']) self.assertEqual(regex.findall(r'(?V1)[[bcde]--cd]', 'abcdef'), ['b', 'e']) self.assertEqual(regex.findall(r'(?V1)[bcde--cd]', 'abcdef'), ['b', 'e']) # Hg issue 132. self.assertRaisesRegex(regex.error, '^unknown property at position 4$', lambda: regex.compile(r'\p{}')) # Issue 23692. self.assertEqual(regex.match('(?:()|(?(1)()|z)){2}(?(2)a|z)', 'a').group(0, 1, 2), ('a', '', '')) self.assertEqual(regex.match('(?:()|(?(1)()|z)){0,2}(?(2)a|z)', 'a').group(0, 1, 2), ('a', '', '')) # Hg issue 137: Posix character class :punct: does not seem to be # supported. # Posix compatibility as recommended here: # http://www.unicode.org/reports/tr18/#Compatibility_Properties # Posix in Unicode. chars = ''.join(chr(c) for c in range(0x10000)) self.assertEqual(ascii(''.join(regex.findall(r'''[[:alnum:]]+''', chars))), ascii(''.join(regex.findall(r'''[\p{Alpha}\p{PosixDigit}]+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:alpha:]]+''', chars))), ascii(''.join(regex.findall(r'''\p{Alpha}+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:ascii:]]+''', chars))), ascii(''.join(regex.findall(r'''[\p{InBasicLatin}]+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:blank:]]+''', chars))), ascii(''.join(regex.findall(r'''[\p{gc=Space_Separator}\t]+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:cntrl:]]+''', chars))), ascii(''.join(regex.findall(r'''\p{gc=Control}+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:digit:]]+''', chars))), ascii(''.join(regex.findall(r'''[0-9]+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:graph:]]+''', chars))), ascii(''.join(regex.findall(r'''[^\p{Space}\p{gc=Control}\p{gc=Surrogate}\p{gc=Unassigned}]+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:lower:]]+''', chars))), ascii(''.join(regex.findall(r'''\p{Lower}+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:print:]]+''', chars))), ascii(''.join(regex.findall(r'''(?V1)[\p{Graph}\p{Blank}--\p{Cntrl}]+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:punct:]]+''', chars))), ascii(''.join(regex.findall(r'''(?V1)[\p{gc=Punctuation}\p{gc=Symbol}--\p{Alpha}]+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:space:]]+''', chars))), ascii(''.join(regex.findall(r'''\p{Whitespace}+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:upper:]]+''', chars))), ascii(''.join(regex.findall(r'''\p{Upper}+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:word:]]+''', chars))), ascii(''.join(regex.findall(r'''[\p{Alpha}\p{gc=Mark}\p{Digit}\p{gc=Connector_Punctuation}\p{Join_Control}]+''', chars)))) self.assertEqual(ascii(''.join(regex.findall(r'''[[:xdigit:]]+''', chars))), ascii(''.join(regex.findall(r'''[0-9A-Fa-f]+''', chars)))) # Posix in ASCII. chars = bytes(range(0x100)) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:alnum:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)[\p{Alpha}\p{PosixDigit}]+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:alpha:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)\p{Alpha}+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:ascii:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)[\x00-\x7F]+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:blank:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)[\p{gc=Space_Separator}\t]+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:cntrl:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)\p{gc=Control}+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:digit:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)[0-9]+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:graph:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)[^\p{Space}\p{gc=Control}\p{gc=Surrogate}\p{gc=Unassigned}]+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:lower:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)\p{Lower}+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:print:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?aV1)[\p{Graph}\p{Blank}--\p{Cntrl}]+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:punct:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?aV1)[\p{gc=Punctuation}\p{gc=Symbol}--\p{Alpha}]+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:space:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)\p{Whitespace}+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:upper:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)\p{Upper}+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:word:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)[\p{Alpha}\p{gc=Mark}\p{Digit}\p{gc=Connector_Punctuation}\p{Join_Control}]+''', chars)))) self.assertEqual(ascii(b''.join(regex.findall(br'''(?a)[[:xdigit:]]+''', chars))), ascii(b''.join(regex.findall(br'''(?a)[0-9A-Fa-f]+''', chars)))) # Hg issue 138: grapheme anchored search not working properly. self.assertEqual(ascii(regex.search(r'\X$', 'ab\u2103').group()), ascii('\u2103')) # Hg issue 139: Regular expression with multiple wildcards where first # should match empty string does not always work. self.assertEqual(regex.search("([^L]*)([^R]*R)", "LtR").groups(), ('', 'LtR')) # Hg issue 140: Replace with REVERSE and groups has unexpected # behavior. self.assertEqual(regex.sub(r'(.)', r'x\1y', 'ab'), 'xayxby') self.assertEqual(regex.sub(r'(?r)(.)', r'x\1y', 'ab'), 'xayxby') self.assertEqual(regex.subf(r'(.)', 'x{1}y', 'ab'), 'xayxby') self.assertEqual(regex.subf(r'(?r)(.)', 'x{1}y', 'ab'), 'xayxby') # Hg issue 141: Crash on a certain partial match. self.assertEqual(regex.fullmatch('(a)*abc', 'ab', partial=True).span(), (0, 2)) self.assertEqual(regex.fullmatch('(a)*abc', 'ab', partial=True).partial, True) # Hg Issue #143: Partial matches have incorrect span if prefix is '.' # wildcard. self.assertEqual(regex.search('OXRG', 'OOGOX', partial=True).span(), (3, 5)) self.assertEqual(regex.search('.XRG', 'OOGOX', partial=True).span(), (3, 5)) self.assertEqual(regex.search('.{1,3}XRG', 'OOGOX', partial=True).span(), (1, 5)) # Hg issue 144: Latest version problem with matching 'R|R'. self.assertEqual(regex.match('R|R', 'R').span(), (0, 1)) # Hg issue 146: Forced-fail (?!) works improperly in conditional. self.assertEqual(regex.match(r'(.)(?(1)(?!))', 'xy'), None) # Groups cleared after failure. self.assertEqual(regex.findall(r'(y)?(\d)(?(1)\b\B)', 'ax1y2z3b'), [('', '1'), ('', '2'), ('', '3')]) self.assertEqual(regex.findall(r'(y)?+(\d)(?(1)\b\B)', 'ax1y2z3b'), [('', '1'), ('', '2'), ('', '3')]) # Hg issue 147: Fuzzy match can return match points beyond buffer end. self.assertEqual([m.span() for m in regex.finditer(r'(?i)(?:error){e}', 'regex failure')], [(0, 5), (5, 10), (10, 13), (13, 13)]) self.assertEqual([m.span() for m in regex.finditer(r'(?fi)(?:error){e}', 'regex failure')], [(0, 5), (5, 10), (10, 13), (13, 13)]) # Hg issue 151: Request: \K. self.assertEqual(regex.search(r'(ab\Kcd)', 'abcd').group(0, 1), ('cd', 'abcd')) self.assertEqual(regex.findall(r'\w\w\K\w\w', 'abcdefgh'), ['cd', 'gh']) self.assertEqual(regex.findall(r'(\w\w\K\w\w)', 'abcdefgh'), ['abcd', 'efgh']) self.assertEqual(regex.search(r'(?r)(ab\Kcd)', 'abcd').group(0, 1), ('ab', 'abcd')) self.assertEqual(regex.findall(r'(?r)\w\w\K\w\w', 'abcdefgh'), ['ef', 'ab']) self.assertEqual(regex.findall(r'(?r)(\w\w\K\w\w)', 'abcdefgh'), ['efgh', 'abcd']) # Hg issue 153: Request: (*SKIP). self.assertEqual(regex.search(r'12(*FAIL)|3', '123')[0], '3') self.assertEqual(regex.search(r'(?r)12(*FAIL)|3', '123')[0], '3') self.assertEqual(regex.search(r'\d+(*PRUNE)\d', '123'), None) self.assertEqual(regex.search(r'\d+(?=(*PRUNE))\d', '123')[0], '123') self.assertEqual(regex.search(r'\d+(*PRUNE)bcd|[3d]', '123bcd')[0], '123bcd') self.assertEqual(regex.search(r'\d+(*PRUNE)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'\d+?(*PRUNE)bcd|[3d]', '123bcd')[0], '3bcd') self.assertEqual(regex.search(r'\d+?(*PRUNE)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=3(*PRUNE))zzd|[4d]$', '123zzd')[0], '123zzd') self.assertEqual(regex.search(r'\d++(?<=3(*PRUNE))zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=(*PRUNE)3)zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=2(*PRUNE)3)zzd|[3d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d(*PRUNE)\d+', '123'), None) self.assertEqual(regex.search(r'(?r)\d(?<=(*PRUNE))\d+', '123')[0], '123') self.assertEqual(regex.search(r'(?r)\d+(*PRUNE)bcd|[3d]', '123bcd')[0], '123bcd') self.assertEqual(regex.search(r'(?r)\d+(*PRUNE)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=3(*PRUNE))zzd|[4d]$', '123zzd')[0], '123zzd') self.assertEqual(regex.search(r'(?r)\d++(?<=3(*PRUNE))zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=(*PRUNE)3)zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=2(*PRUNE)3)zzd|[3d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d+(*SKIP)bcd|[3d]', '123bcd')[0], '123bcd') self.assertEqual(regex.search(r'\d+(*SKIP)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'\d+?(*SKIP)bcd|[3d]', '123bcd')[0], '3bcd') self.assertEqual(regex.search(r'\d+?(*SKIP)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=3(*SKIP))zzd|[4d]$', '123zzd')[0], '123zzd') self.assertEqual(regex.search(r'\d++(?<=3(*SKIP))zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=(*SKIP)3)zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'\d++(?<=2(*SKIP)3)zzd|[3d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d+(*SKIP)bcd|[3d]', '123bcd')[0], '123bcd') self.assertEqual(regex.search(r'(?r)\d+(*SKIP)bcd|[3d]', '123zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=3(*SKIP))zzd|[4d]$', '123zzd')[0], '123zzd') self.assertEqual(regex.search(r'(?r)\d++(?<=3(*SKIP))zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=(*SKIP)3)zzd|[4d]$', '124zzd')[0], 'd') self.assertEqual(regex.search(r'(?r)\d++(?<=2(*SKIP)3)zzd|[3d]$', '124zzd')[0], 'd') # Hg issue 152: Request: Request: (?(DEFINE)...). self.assertEqual(regex.search(r'(?(DEFINE)(?\d+)(?\w+))(?&quant) (?&item)', '5 elephants')[0], '5 elephants') # Hg issue 150: Have an option for POSIX-compatible longest match of # alternates. self.assertEqual(regex.search(r'(?p)\d+(\w(\d*)?|[eE]([+-]\d+))', '10b12')[0], '10b12') self.assertEqual(regex.search(r'(?p)\d+(\w(\d*)?|[eE]([+-]\d+))', '10E+12')[0], '10E+12') self.assertEqual(regex.search(r'(?p)(\w|ae|oe|ue|ss)', 'ae')[0], 'ae') self.assertEqual(regex.search(r'(?p)one(self)?(selfsufficient)?', 'oneselfsufficient')[0], 'oneselfsufficient') # Hg issue 156: regression on atomic grouping self.assertEqual(regex.match('1(?>2)', '12').span(), (0, 2)) # Hg issue 157: regression: segfault on complex lookaround self.assertEqual(regex.match(r'(?V1w)(?=(?=[^A-Z]*+[A-Z])(?=[^a-z]*+[a-z]))(?=\D*+\d)(?=\p{Alphanumeric}*+\P{Alphanumeric})\A(?s:.){8,255}+\Z', 'AAaa11!!')[0], 'AAaa11!!') # Hg issue 158: Group issue with (?(DEFINE)...) TEST_REGEX = regex.compile(r'''(?smx) (?(DEFINE) (? ^,[^,]+, ) ) # Group 2 is defined on this line ^,([^,]+), (?:(?!(?&subcat)[\r\n]+(?&subcat)).)+ ''') TEST_DATA = ''' ,Cat 1, ,Brand 1, some thing ,Brand 2, other things ,Cat 2, ,Brand, Some thing ''' self.assertEqual([m.span(1, 2) for m in TEST_REGEX.finditer(TEST_DATA)], [((-1, -1), (2, 7)), ((-1, -1), (54, 59))]) # Hg issue 161: Unexpected fuzzy match results self.assertEqual(regex.search('(abcdefgh){e}', '******abcdefghijklmnopqrtuvwxyz', regex.BESTMATCH).span(), (6, 14)) self.assertEqual(regex.search('(abcdefghi){e}', '******abcdefghijklmnopqrtuvwxyz', regex.BESTMATCH).span(), (6, 15)) # Hg issue 163: allow lookarounds in conditionals. self.assertEqual(regex.match(r'(?:(?=\d)\d+\b|\w+)', '123abc').span(), (0, 6)) self.assertEqual(regex.match(r'(?(?=\d)\d+\b|\w+)', '123abc'), None) self.assertEqual(regex.search(r'(?(?<=love\s)you|(?<=hate\s)her)', "I love you").span(), (7, 10)) self.assertEqual(regex.findall(r'(?(?<=love\s)you|(?<=hate\s)her)', "I love you but I don't hate her either"), ['you', 'her']) # Hg issue #180: bug of POSIX matching. self.assertEqual(regex.search(r'(?p)a*(.*?)', 'aaabbb').group(0, 1), ('aaabbb', 'bbb')) self.assertEqual(regex.search(r'(?p)a*(.*)', 'aaabbb').group(0, 1), ('aaabbb', 'bbb')) self.assertEqual(regex.sub(r'(?p)a*(.*?)', r'\1', 'aaabbb'), 'bbb') self.assertEqual(regex.sub(r'(?p)a*(.*)', r'\1', 'aaabbb'), 'bbb') def test_subscripted_captures(self): self.assertEqual(regex.match(r'(?P.)+', 'abc').expandf('{0} {0[0]} {0[-1]}'), 'abc abc abc') self.assertEqual(regex.match(r'(?P.)+', 'abc').expandf('{1} {1[0]} {1[1]} {1[2]} {1[-1]} {1[-2]} {1[-3]}'), 'c a b c c b a') self.assertEqual(regex.match(r'(?P.)+', 'abc').expandf('{x} {x[0]} {x[1]} {x[2]} {x[-1]} {x[-2]} {x[-3]}'), 'c a b c c b a') self.assertEqual(regex.subf(r'(?P.)+', r'{0} {0[0]} {0[-1]}', 'abc'), 'abc abc abc') self.assertEqual(regex.subf(r'(?P.)+', '{1} {1[0]} {1[1]} {1[2]} {1[-1]} {1[-2]} {1[-3]}', 'abc'), 'c a b c c b a') self.assertEqual(regex.subf(r'(?P.)+', '{x} {x[0]} {x[1]} {x[2]} {x[-1]} {x[-2]} {x[-3]}', 'abc'), 'c a b c c b a') if sys.version_info < (3, 2, 0): # In Python 3.1 it's called assertRaisesRegexp. RegexTests.assertRaisesRegex = RegexTests.assertRaisesRegexp def test_main(): run_unittest(RegexTests) if __name__ == "__main__": test_main() regex-2016.01.10/Python3/_regex.c0000666000000000000000000271272012644551563014355 0ustar 00000000000000/* Secret Labs' Regular Expression Engine * * regular expression matching engine * * partial history: * 1999-10-24 fl created (based on existing template matcher code) * 2000-03-06 fl first alpha, sort of * 2000-08-01 fl fixes for 1.6b1 * 2000-08-07 fl use PyOS_CheckStack() if available * 2000-09-20 fl added expand method * 2001-03-20 fl lots of fixes for 2.1b2 * 2001-04-15 fl export copyright as Python attribute, not global * 2001-04-28 fl added __copy__ methods (work in progress) * 2001-05-14 fl fixes for 1.5.2 compatibility * 2001-07-01 fl added BIGCHARSET support (from Martin von Loewis) * 2001-10-18 fl fixed group reset issue (from Matthew Mueller) * 2001-10-20 fl added split primitive; reenable unicode for 1.6/2.0/2.1 * 2001-10-21 fl added sub/subn primitive * 2001-10-24 fl added finditer primitive (for 2.2 only) * 2001-12-07 fl fixed memory leak in sub/subn (Guido van Rossum) * 2002-11-09 fl fixed empty sub/subn return type * 2003-04-18 mvl fully support 4-byte codes * 2003-10-17 gn implemented non recursive scheme * 2009-07-26 mrab completely re-designed matcher code * 2011-11-18 mrab added support for PEP 393 strings * * Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved. * * This version of the SRE library can be redistributed under CNRI's * Python 1.6 license. For any other use, please contact Secret Labs * AB (info@pythonware.com). * * Portions of this engine have been developed in cooperation with * CNRI. Hewlett-Packard provided funding for 1.6 integration and * other compatibility work. */ /* #define VERBOSE */ #if defined(VERBOSE) #define TRACE(X) printf X; #else #define TRACE(X) #endif #include "Python.h" #include "structmember.h" /* offsetof */ #include #include "_regex.h" #include "pyport.h" #include "pythread.h" #if PY_VERSION_HEX < 0x03030000 typedef unsigned char Py_UCS1; typedef unsigned short Py_UCS2; #endif typedef RE_UINT32 RE_CODE; /* Properties in the General Category. */ #define RE_PROP_GC_CN ((RE_PROP_GC << 16) | RE_PROP_CN) #define RE_PROP_GC_LU ((RE_PROP_GC << 16) | RE_PROP_LU) #define RE_PROP_GC_LL ((RE_PROP_GC << 16) | RE_PROP_LL) #define RE_PROP_GC_LT ((RE_PROP_GC << 16) | RE_PROP_LT) #define RE_PROP_GC_P ((RE_PROP_GC << 16) | RE_PROP_P) /* Unlimited repeat count. */ #define RE_UNLIMITED (~(RE_CODE)0) /* The status of a . */ typedef RE_UINT32 RE_STATUS_T; /* Whether to match concurrently, i.e. release the GIL while matching. */ #define RE_CONC_NO 0 #define RE_CONC_YES 1 #define RE_CONC_DEFAULT 2 /* The side that could truncate in a partial match. * * The values RE_PARTIAL_LEFT and RE_PARTIAL_RIGHT are also used as array * indexes, so they need to be 0 and 1. */ #define RE_PARTIAL_NONE -1 #define RE_PARTIAL_LEFT 0 #define RE_PARTIAL_RIGHT 1 /* Flags for the kind of 'sub' call: 'sub', 'subn', 'subf', 'subfn'. */ #define RE_SUB 0x0 #define RE_SUBN 0x1 #define RE_SUBF 0x2 /* The name of this module, minus the leading underscore. */ #define RE_MODULE "regex" /* Error codes. */ #define RE_ERROR_SUCCESS 1 /* Successful match. */ #define RE_ERROR_FAILURE 0 /* Unsuccessful match. */ #define RE_ERROR_ILLEGAL -1 /* Illegal code. */ #define RE_ERROR_INTERNAL -2 /* Internal error. */ #define RE_ERROR_CONCURRENT -3 /* "concurrent" invalid. */ #define RE_ERROR_MEMORY -4 /* Out of memory. */ #define RE_ERROR_INTERRUPTED -5 /* Signal handler raised exception. */ #define RE_ERROR_REPLACEMENT -6 /* Invalid replacement string. */ #define RE_ERROR_INVALID_GROUP_REF -7 /* Invalid group reference. */ #define RE_ERROR_GROUP_INDEX_TYPE -8 /* Group index type error. */ #define RE_ERROR_NO_SUCH_GROUP -9 /* No such group. */ #define RE_ERROR_INDEX -10 /* String index. */ #define RE_ERROR_BACKTRACKING -11 /* Too much backtracking. */ #define RE_ERROR_NOT_STRING -12 /* Not a string. */ #define RE_ERROR_NOT_UNICODE -13 /* Not a Unicode string. */ #define RE_ERROR_NOT_BYTES -14 /* Not a bytestring. */ #define RE_ERROR_PARTIAL -15 /* Partial match. */ /* The number of backtrack entries per allocated block. */ #define RE_BACKTRACK_BLOCK_SIZE 64 /* The maximum number of backtrack entries to allocate. */ #define RE_MAX_BACKTRACK_ALLOC (1024 * 1024) /* The number of atomic entries per allocated block. */ #define RE_ATOMIC_BLOCK_SIZE 64 /* The initial maximum capacity of the guard block. */ #define RE_INIT_GUARDS_BLOCK_SIZE 16 /* The initial maximum capacity of the node list. */ #define RE_INIT_NODE_LIST_SIZE 16 /* The size increment for various allocation lists. */ #define RE_LIST_SIZE_INC 16 /* The initial maximum capacity of the capture groups. */ #define RE_INIT_CAPTURE_SIZE 16 /* Node bitflags. */ #define RE_POSITIVE_OP 0x1 #define RE_ZEROWIDTH_OP 0x2 #define RE_FUZZY_OP 0x4 #define RE_REVERSE_OP 0x8 #define RE_REQUIRED_OP 0x10 /* Guards against further matching can occur at the start of the body and the * tail of a repeat containing a repeat. */ #define RE_STATUS_BODY 0x1 #define RE_STATUS_TAIL 0x2 /* Whether a guard is added depends on whether there's a repeat in the body of * the repeat or a group reference in the body or tail of the repeat. */ #define RE_STATUS_NEITHER 0x0 #define RE_STATUS_REPEAT 0x4 #define RE_STATUS_LIMITED 0x8 #define RE_STATUS_REF 0x10 #define RE_STATUS_VISITED_AG 0x20 #define RE_STATUS_VISITED_REP 0x40 /* Whether a string node has been initialised for fast searching. */ #define RE_STATUS_FAST_INIT 0x80 /* Whether a node us being used. (Additional nodes may be created while the * pattern is being built. */ #define RE_STATUS_USED 0x100 /* Whether a node is a string node. */ #define RE_STATUS_STRING 0x200 /* Whether a repeat node is within another repeat. */ #define RE_STATUS_INNER 0x400 /* Various flags stored in a node status member. */ #define RE_STATUS_SHIFT 11 #define RE_STATUS_FUZZY (RE_FUZZY_OP << RE_STATUS_SHIFT) #define RE_STATUS_REVERSE (RE_REVERSE_OP << RE_STATUS_SHIFT) #define RE_STATUS_REQUIRED (RE_REQUIRED_OP << RE_STATUS_SHIFT) #define RE_STATUS_HAS_GROUPS 0x10000 #define RE_STATUS_HAS_REPEATS 0x20000 /* The different error types for fuzzy matching. */ #define RE_FUZZY_SUB 0 #define RE_FUZZY_INS 1 #define RE_FUZZY_DEL 2 #define RE_FUZZY_ERR 3 #define RE_FUZZY_COUNT 3 /* The various values in a FUZZY node. */ #define RE_FUZZY_VAL_MAX_BASE 1 #define RE_FUZZY_VAL_MAX_SUB (RE_FUZZY_VAL_MAX_BASE + RE_FUZZY_SUB) #define RE_FUZZY_VAL_MAX_INS (RE_FUZZY_VAL_MAX_BASE + RE_FUZZY_INS) #define RE_FUZZY_VAL_MAX_DEL (RE_FUZZY_VAL_MAX_BASE + RE_FUZZY_DEL) #define RE_FUZZY_VAL_MAX_ERR (RE_FUZZY_VAL_MAX_BASE + RE_FUZZY_ERR) #define RE_FUZZY_VAL_COST_BASE 5 #define RE_FUZZY_VAL_SUB_COST (RE_FUZZY_VAL_COST_BASE + RE_FUZZY_SUB) #define RE_FUZZY_VAL_INS_COST (RE_FUZZY_VAL_COST_BASE + RE_FUZZY_INS) #define RE_FUZZY_VAL_DEL_COST (RE_FUZZY_VAL_COST_BASE + RE_FUZZY_DEL) #define RE_FUZZY_VAL_MAX_COST (RE_FUZZY_VAL_COST_BASE + RE_FUZZY_ERR) /* The various values in an END_FUZZY node. */ #define RE_FUZZY_VAL_MIN_BASE 1 #define RE_FUZZY_VAL_MIN_SUB (RE_FUZZY_VAL_MIN_BASE + RE_FUZZY_SUB) #define RE_FUZZY_VAL_MIN_INS (RE_FUZZY_VAL_MIN_BASE + RE_FUZZY_INS) #define RE_FUZZY_VAL_MIN_DEL (RE_FUZZY_VAL_MIN_BASE + RE_FUZZY_DEL) #define RE_FUZZY_VAL_MIN_ERR (RE_FUZZY_VAL_MIN_BASE + RE_FUZZY_ERR) /* The maximum number of errors when trying to improve a fuzzy match. */ #define RE_MAX_ERRORS 10 /* The flags which will be set for full Unicode case folding. */ #define RE_FULL_CASE_FOLDING (RE_FLAG_UNICODE | RE_FLAG_FULLCASE | RE_FLAG_IGNORECASE) /* The shortest string prefix for which we'll use a fast string search. */ #define RE_MIN_FAST_LENGTH 5 static char copyright[] = " RE 2.3.0 Copyright (c) 1997-2002 by Secret Labs AB "; /* The exception to raise on error. */ static PyObject* error_exception; /* The dictionary of Unicode properties. */ static PyObject* property_dict; typedef struct RE_State* RE_StatePtr; /* Bit-flags for the common character properties supported by locale-sensitive * matching. */ #define RE_LOCALE_ALNUM 0x001 #define RE_LOCALE_ALPHA 0x002 #define RE_LOCALE_CNTRL 0x004 #define RE_LOCALE_DIGIT 0x008 #define RE_LOCALE_GRAPH 0x010 #define RE_LOCALE_LOWER 0x020 #define RE_LOCALE_PRINT 0x040 #define RE_LOCALE_PUNCT 0x080 #define RE_LOCALE_SPACE 0x100 #define RE_LOCALE_UPPER 0x200 /* Info about the current locale. * * Used by patterns that are locale-sensitive. */ typedef struct RE_LocaleInfo { unsigned short properties[0x100]; unsigned char uppercase[0x100]; unsigned char lowercase[0x100]; } RE_LocaleInfo; /* Handlers for ASCII, locale and Unicode. */ typedef struct RE_EncodingTable { BOOL (*has_property)(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch); BOOL (*at_boundary)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_word_start)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_word_end)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_default_boundary)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_default_word_start)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_default_word_end)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_grapheme_boundary)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*is_line_sep)(Py_UCS4 ch); BOOL (*at_line_start)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*at_line_end)(RE_StatePtr state, Py_ssize_t text_pos); BOOL (*possible_turkic)(RE_LocaleInfo* locale_info, Py_UCS4 ch); int (*all_cases)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* codepoints); Py_UCS4 (*simple_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch); int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); int (*all_turkic_i)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* cases); } RE_EncodingTable; /* Position within the regex and text. */ typedef struct RE_Position { struct RE_Node* node; Py_ssize_t text_pos; } RE_Position; /* Info about fuzzy matching. */ typedef struct RE_FuzzyInfo { struct RE_Node* node; size_t counts[RE_FUZZY_COUNT + 1]; /* Add 1 for total errors. */ size_t total_cost; } RE_FuzzyInfo; /* Storage for backtrack data. */ typedef struct RE_BacktrackData { union { struct { size_t capture_change; BOOL too_few_errors; } atomic; struct { RE_Position position; } branch; struct { RE_FuzzyInfo fuzzy_info; Py_ssize_t text_pos; RE_CODE index; } fuzzy; struct { RE_Position position; size_t count; struct RE_Node* fuzzy_node; BOOL too_few_errors; } fuzzy_insert; struct { RE_Position position; RE_INT8 fuzzy_type; RE_INT8 step; } fuzzy_item; struct { RE_Position position; Py_ssize_t string_pos; RE_INT8 fuzzy_type; RE_INT8 folded_pos; RE_INT8 folded_len; RE_INT8 gfolded_pos; RE_INT8 gfolded_len; RE_INT8 step; } fuzzy_string; struct { Py_ssize_t text_pos; Py_ssize_t current_capture; RE_CODE private_index; RE_CODE public_index; BOOL capture; } group; struct { struct RE_Node* node; size_t capture_change; } group_call; struct { Py_ssize_t match_pos; } keep; struct { struct RE_Node* node; size_t capture_change; BOOL too_few_errors; BOOL inside; } lookaround; struct { RE_Position position; Py_ssize_t text_pos; size_t count; Py_ssize_t start; size_t capture_change; RE_CODE index; } repeat; }; RE_UINT8 op; } RE_BacktrackData; /* Storage for backtrack data is allocated in blocks for speed. */ typedef struct RE_BacktrackBlock { RE_BacktrackData items[RE_BACKTRACK_BLOCK_SIZE]; struct RE_BacktrackBlock* previous; struct RE_BacktrackBlock* next; size_t capacity; size_t count; } RE_BacktrackBlock; /* Storage for atomic data. */ typedef struct RE_AtomicData { RE_BacktrackBlock* current_backtrack_block; size_t backtrack_count; struct RE_Node* node; RE_BacktrackData* backtrack; struct RE_SavedGroups* saved_groups; struct RE_SavedRepeats* saved_repeats; Py_ssize_t slice_start; Py_ssize_t slice_end; Py_ssize_t text_pos; BOOL is_lookaround; BOOL has_groups; BOOL has_repeats; } RE_AtomicData; /* Storage for atomic data is allocated in blocks for speed. */ typedef struct RE_AtomicBlock { RE_AtomicData items[RE_ATOMIC_BLOCK_SIZE]; struct RE_AtomicBlock* previous; struct RE_AtomicBlock* next; size_t capacity; size_t count; } RE_AtomicBlock; /* Storage for saved groups. */ typedef struct RE_SavedGroups { struct RE_SavedGroups* previous; struct RE_SavedGroups* next; struct RE_GroupSpan* spans; size_t* counts; } RE_SavedGroups; /* Storage for info around a recursive by 'basic'match'. */ typedef struct RE_Info { RE_BacktrackBlock* current_backtrack_block; size_t backtrack_count; RE_SavedGroups* current_saved_groups; struct RE_GroupCallFrame* current_group_call_frame; BOOL must_advance; } RE_Info; /* Storage for the next node. */ typedef struct RE_NextNode { struct RE_Node* node; struct RE_Node* test; struct RE_Node* match_next; Py_ssize_t match_step; } RE_NextNode; /* A pattern node. */ typedef struct RE_Node { RE_NextNode next_1; union { struct { RE_NextNode next_2; } nonstring; struct { /* Used only if (node->status & RE_STATUS_STRING) is true. */ Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; } string; }; Py_ssize_t step; size_t value_count; RE_CODE* values; RE_STATUS_T status; RE_UINT8 op; BOOL match; } RE_Node; /* Info about a group's span. */ typedef struct RE_GroupSpan { Py_ssize_t start; Py_ssize_t end; } RE_GroupSpan; /* Span of a guard (inclusive range). */ typedef struct RE_GuardSpan { Py_ssize_t low; Py_ssize_t high; BOOL protect; } RE_GuardSpan; /* Spans guarded against further matching. */ typedef struct RE_GuardList { size_t capacity; size_t count; RE_GuardSpan* spans; Py_ssize_t last_text_pos; size_t last_low; } RE_GuardList; /* Info about a group. */ typedef struct RE_GroupData { RE_GroupSpan span; size_t capture_count; size_t capture_capacity; Py_ssize_t current_capture; RE_GroupSpan* captures; } RE_GroupData; /* Info about a repeat. */ typedef struct RE_RepeatData { RE_GuardList body_guard_list; RE_GuardList tail_guard_list; size_t count; Py_ssize_t start; size_t capture_change; } RE_RepeatData; /* Storage for saved repeats. */ typedef struct RE_SavedRepeats { struct RE_SavedRepeats* previous; struct RE_SavedRepeats* next; RE_RepeatData* repeats; } RE_SavedRepeats; /* Guards for fuzzy sections. */ typedef struct RE_FuzzyGuards { RE_GuardList body_guard_list; RE_GuardList tail_guard_list; } RE_FuzzyGuards; /* Info about a capture group. */ typedef struct RE_GroupInfo { Py_ssize_t end_index; RE_Node* node; BOOL referenced; BOOL has_name; } RE_GroupInfo; /* Info about a call_ref. */ typedef struct RE_CallRefInfo { RE_Node* node; BOOL defined; BOOL used; } RE_CallRefInfo; /* Info about a repeat. */ typedef struct RE_RepeatInfo { RE_STATUS_T status; } RE_RepeatInfo; /* Stack frame for a group call. */ typedef struct RE_GroupCallFrame { struct RE_GroupCallFrame* previous; struct RE_GroupCallFrame* next; RE_Node* node; RE_GroupData* groups; RE_RepeatData* repeats; } RE_GroupCallFrame; /* Info about a string argument. */ typedef struct RE_StringInfo { Py_buffer view; /* View of the string if it's a buffer object. */ void* characters; /* Pointer to the characters of the string. */ Py_ssize_t length; /* Length of the string. */ Py_ssize_t charsize; /* Size of the characters in the string. */ BOOL is_unicode; /* Whether the string is Unicode. */ BOOL should_release; /* Whether the buffer should be released. */ } RE_StringInfo; /* Info about where the next match was found, starting from a certain search * position. This is used when a pattern starts with a BRANCH. */ #define MAX_SEARCH_POSITIONS 7 /* Info about a search position. */ typedef struct { Py_ssize_t start_pos; Py_ssize_t match_pos; } RE_SearchPosition; /* The state object used during matching. */ typedef struct RE_State { struct PatternObject* pattern; /* Parent PatternObject. */ /* Info about the string being matched. */ PyObject* string; Py_buffer view; /* View of the string if it's a buffer object. */ Py_ssize_t charsize; void* text; Py_ssize_t text_length; /* The slice of the string being searched. */ Py_ssize_t slice_start; Py_ssize_t slice_end; /* Info about the capture groups. */ RE_GroupData* groups; Py_ssize_t lastindex; Py_ssize_t lastgroup; /* Info about the repeats. */ RE_RepeatData* repeats; Py_ssize_t search_anchor; /* Where the last match finished. */ Py_ssize_t match_pos; /* The start position of the match. */ Py_ssize_t text_pos; /* The current position of the match. */ Py_ssize_t final_newline; /* The index of newline at end of string, or -1. */ Py_ssize_t final_line_sep; /* The index of line separator at end of string, or -1. */ /* Storage for backtrack info. */ RE_BacktrackBlock backtrack_block; RE_BacktrackBlock* current_backtrack_block; Py_ssize_t backtrack_allocated; RE_BacktrackData* backtrack; RE_AtomicBlock* current_atomic_block; /* Storage for saved capture groups. */ RE_SavedGroups* first_saved_groups; RE_SavedGroups* current_saved_groups; RE_SavedRepeats* first_saved_repeats; RE_SavedRepeats* current_saved_repeats; /* Info about the best POSIX match (leftmost longest). */ Py_ssize_t best_match_pos; Py_ssize_t best_text_pos; RE_GroupData* best_match_groups; /* Miscellaneous. */ Py_ssize_t min_width; /* The minimum width of the string to match (assuming it's not a fuzzy pattern). */ RE_EncodingTable* encoding; /* The 'encoding' of the string being searched. */ RE_LocaleInfo* locale_info; /* Info about the locale, if needed. */ Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); void* (*point_to)(void* text, Py_ssize_t pos); PyThread_type_lock lock; /* A lock for accessing the state across threads. */ RE_FuzzyInfo fuzzy_info; /* Info about fuzzy matching. */ size_t total_fuzzy_counts[RE_FUZZY_COUNT]; /* Totals for fuzzy matching. */ size_t best_fuzzy_counts[RE_FUZZY_COUNT]; /* Best totals for fuzzy matching. */ RE_FuzzyGuards* fuzzy_guards; /* The guards for a fuzzy match. */ size_t total_errors; /* The total number of errors of a fuzzy match. */ size_t max_errors; /* The maximum permitted number of errors. */ size_t fewest_errors; /* The fewest errors so far of an enhanced fuzzy match. */ /* The group call stack. */ RE_GroupCallFrame* first_group_call_frame; RE_GroupCallFrame* current_group_call_frame; RE_GuardList* group_call_guard_list; RE_SearchPosition search_positions[MAX_SEARCH_POSITIONS]; /* Where the search matches next. */ size_t capture_change; /* Incremented every time a captive group changes. */ Py_ssize_t req_pos; /* The position where the required string matched. */ Py_ssize_t req_end; /* The end position where the required string matched. */ int partial_side; /* The side that could truncate in a partial match. */ RE_UINT16 iterations; /* The number of iterations the matching engine has performed since checking for KeyboardInterrupt. */ BOOL is_unicode; /* Whether the string to be matched is Unicode. */ BOOL should_release; /* Whether the buffer should be released. */ BOOL overlapped; /* Whether the matches can be overlapped. */ BOOL reverse; /* Whether it's a reverse pattern. */ BOOL visible_captures; /* Whether the 'captures' method will be visible. */ BOOL version_0; /* Whether to perform version_0 behaviour (same as re module). */ BOOL must_advance; /* Whether the end of the match must advance past its start. */ BOOL is_multithreaded; /* Whether to release the GIL while matching. */ BOOL too_few_errors; /* Whether there were too few fuzzy errors. */ BOOL match_all; /* Whether to match all of the string ('fullmatch'). */ BOOL found_match; /* Whether a POSIX match has been found. */ } RE_State; /* Storage for the regex state and thread state. * * Scanner objects can sometimes be shared across threads, which means that * their RE_State structs are also shared. This isn't safe when the GIL is * released, so in such instances we have a lock (mutex) in the RE_State struct * to protect it during matching. We also need a thread-safe place to store the * thread state when releasing the GIL. */ typedef struct RE_SafeState { RE_State* re_state; PyThreadState* thread_state; } RE_SafeState; /* The PatternObject created from a regular expression. */ typedef struct PatternObject { PyObject_HEAD PyObject* pattern; /* Pattern source (or None). */ Py_ssize_t flags; /* Flags used when compiling pattern source. */ PyObject* weakreflist; /* List of weak references */ /* Nodes into which the regular expression is compiled. */ RE_Node* start_node; RE_Node* start_test; size_t true_group_count; /* The true number of capture groups. */ size_t public_group_count; /* The number of public capture groups. */ size_t repeat_count; /* The number of repeats. */ Py_ssize_t group_end_index; /* The number of group closures. */ PyObject* groupindex; PyObject* indexgroup; PyObject* named_lists; size_t named_lists_count; PyObject** partial_named_lists[2]; PyObject* named_list_indexes; /* Storage for the pattern nodes. */ size_t node_capacity; size_t node_count; RE_Node** node_list; /* Info about the capture groups. */ size_t group_info_capacity; RE_GroupInfo* group_info; /* Info about the call_refs. */ size_t call_ref_info_capacity; size_t call_ref_info_count; RE_CallRefInfo* call_ref_info; Py_ssize_t pattern_call_ref; /* Info about the repeats. */ size_t repeat_info_capacity; RE_RepeatInfo* repeat_info; Py_ssize_t min_width; /* The minimum width of the string to match (assuming it isn't a fuzzy pattern). */ RE_EncodingTable* encoding; /* Encoding handlers. */ RE_LocaleInfo* locale_info; /* Info about the locale, if needed. */ RE_GroupData* groups_storage; RE_RepeatData* repeats_storage; size_t fuzzy_count; /* The number of fuzzy sections. */ Py_ssize_t req_offset; /* The offset to the required string. */ RE_Node* req_string; /* The required string. */ BOOL is_fuzzy; /* Whether it's a fuzzy pattern. */ BOOL do_search_start; /* Whether to do an initial search. */ BOOL recursive; /* Whether the entire pattern is recursive. */ } PatternObject; /* The MatchObject created when a match is found. */ typedef struct MatchObject { PyObject_HEAD PyObject* string; /* Link to the target string or NULL if detached. */ PyObject* substring; /* Link to (a substring of) the target string. */ Py_ssize_t substring_offset; /* Offset into the target string. */ PatternObject* pattern; /* Link to the regex (pattern) object. */ Py_ssize_t pos; /* Start of current slice. */ Py_ssize_t endpos; /* End of current slice. */ Py_ssize_t match_start; /* Start of matched slice. */ Py_ssize_t match_end; /* End of matched slice. */ Py_ssize_t lastindex; /* Last group seen by the engine (-1 if none). */ Py_ssize_t lastgroup; /* Last named group seen by the engine (-1 if none). */ size_t group_count; /* The number of groups. */ RE_GroupData* groups; /* The capture groups. */ PyObject* regs; size_t fuzzy_counts[RE_FUZZY_COUNT]; BOOL partial; /* Whether it's a partial match. */ } MatchObject; /* The ScannerObject. */ typedef struct ScannerObject { PyObject_HEAD PatternObject* pattern; RE_State state; int status; } ScannerObject; /* The SplitterObject. */ typedef struct SplitterObject { PyObject_HEAD PatternObject* pattern; RE_State state; Py_ssize_t maxsplit; Py_ssize_t last_pos; Py_ssize_t split_count; Py_ssize_t index; int status; } SplitterObject; /* The CaptureObject. */ typedef struct CaptureObject { PyObject_HEAD Py_ssize_t group_index; MatchObject** match_indirect; } CaptureObject; /* Info used when compiling a pattern to nodes. */ typedef struct RE_CompileArgs { RE_CODE* code; /* The start of the compiled pattern. */ RE_CODE* end_code; /* The end of the compiled pattern. */ PatternObject* pattern; /* The pattern object. */ Py_ssize_t min_width; /* The minimum width of the string to match (assuming it isn't a fuzzy pattern). */ RE_Node* start; /* The start node. */ RE_Node* end; /* The end node. */ size_t repeat_depth; /* The nesting depth of the repeat. */ BOOL forward; /* Whether it's a forward (not reverse) pattern. */ BOOL visible_captures; /* Whether all of the captures will be visible. */ BOOL has_captures; /* Whether the pattern has capture groups. */ BOOL is_fuzzy; /* Whether the pattern (or some part of it) is fuzzy. */ BOOL within_fuzzy; /* Whether the subpattern is within a fuzzy section. */ BOOL has_groups; /* Whether the subpattern contains captures. */ BOOL has_repeats; /* Whether the subpattern contains repeats. */ } RE_CompileArgs; /* The string slices which will be concatenated to make the result string of * the 'sub' method. * * This allows us to avoid creating a list of slices if there of fewer than 2 * of them. Empty strings aren't recorded, so if 'list' and 'item' are both * NULL then the result is an empty string. */ typedef struct JoinInfo { PyObject* list; /* The list of slices if there are more than 2 of them. */ PyObject* item; /* The slice if there is only 1 of them. */ BOOL reversed; /* Whether the slices have been found in reverse order. */ BOOL is_unicode; /* Whether the string is Unicode. */ } JoinInfo; /* Info about fuzzy matching. */ typedef struct { RE_Node* new_node; Py_ssize_t new_text_pos; Py_ssize_t limit; Py_ssize_t new_string_pos; int step; int new_folded_pos; int folded_len; int new_gfolded_pos; int new_group_pos; int fuzzy_type; BOOL permit_insertion; } RE_FuzzyData; typedef struct RE_BestEntry { Py_ssize_t match_pos; Py_ssize_t text_pos; } RE_BestEntry; typedef struct RE_BestList { size_t capacity; size_t count; RE_BestEntry* entries; } RE_BestList; /* Function types for getting info from a MatchObject. */ typedef PyObject* (*RE_GetByIndexFunc)(MatchObject* self, Py_ssize_t index); /* Returns the magnitude of a 'Py_ssize_t' value. */ Py_LOCAL_INLINE(Py_ssize_t) abs_ssize_t(Py_ssize_t x) { return x >= 0 ? x : -x; } /* Returns the minimum of 2 'Py_ssize_t' values. */ Py_LOCAL_INLINE(Py_ssize_t) min_ssize_t(Py_ssize_t x, Py_ssize_t y) { return x <= y ? x : y; } /* Returns the maximum of 2 'Py_ssize_t' values. */ Py_LOCAL_INLINE(Py_ssize_t) max_ssize_t(Py_ssize_t x, Py_ssize_t y) { return x >= y ? x : y; } /* Returns the minimum of 2 'size_t' values. */ Py_LOCAL_INLINE(size_t) min_size_t(size_t x, size_t y) { return x <= y ? x : y; } /* Returns the maximum of 2 'size_t' values. */ Py_LOCAL_INLINE(size_t) max_size_t(size_t x, size_t y) { return x >= y ? x : y; } /* Returns the 'maximum' of 2 RE_STATUS_T values. */ Py_LOCAL_INLINE(RE_STATUS_T) max_status_2(RE_STATUS_T x, RE_STATUS_T y) { return x >= y ? x : y; } /* Returns the 'maximum' of 3 RE_STATUS_T values. */ Py_LOCAL_INLINE(RE_STATUS_T) max_status_3(RE_STATUS_T x, RE_STATUS_T y, RE_STATUS_T z) { return max_status_2(x, max_status_2(y, z)); } /* Returns the 'maximum' of 4 RE_STATUS_T values. */ Py_LOCAL_INLINE(RE_STATUS_T) max_status_4(RE_STATUS_T w, RE_STATUS_T x, RE_STATUS_T y, RE_STATUS_T z) { return max_status_2(max_status_2(w, x), max_status_2(y, z)); } /* Gets a character at a position assuming 1 byte per character. */ static Py_UCS4 bytes1_char_at(void* text, Py_ssize_t pos) { return *((Py_UCS1*)text + pos); } /* Sets a character at a position assuming 1 byte per character. */ static void bytes1_set_char_at(void* text, Py_ssize_t pos, Py_UCS4 ch) { *((Py_UCS1*)text + pos) = (Py_UCS1)ch; } /* Gets a pointer to a position assuming 1 byte per character. */ static void* bytes1_point_to(void* text, Py_ssize_t pos) { return (Py_UCS1*)text + pos; } /* Gets a character at a position assuming 2 bytes per character. */ static Py_UCS4 bytes2_char_at(void* text, Py_ssize_t pos) { return *((Py_UCS2*)text + pos); } /* Sets a character at a position assuming 2 bytes per character. */ static void bytes2_set_char_at(void* text, Py_ssize_t pos, Py_UCS4 ch) { *((Py_UCS2*)text + pos) = (Py_UCS2)ch; } /* Gets a pointer to a position assuming 2 bytes per character. */ static void* bytes2_point_to(void* text, Py_ssize_t pos) { return (Py_UCS2*)text + pos; } /* Gets a character at a position assuming 4 bytes per character. */ static Py_UCS4 bytes4_char_at(void* text, Py_ssize_t pos) { return *((Py_UCS4*)text + pos); } /* Sets a character at a position assuming 4 bytes per character. */ static void bytes4_set_char_at(void* text, Py_ssize_t pos, Py_UCS4 ch) { *((Py_UCS4*)text + pos) = (Py_UCS4)ch; } /* Gets a pointer to a position assuming 4 bytes per character. */ static void* bytes4_point_to(void* text, Py_ssize_t pos) { return (Py_UCS4*)text + pos; } /* Default for whether a position is on a word boundary. */ static BOOL at_boundary_always(RE_State* state, Py_ssize_t text_pos) { return TRUE; } /* Converts a BOOL to success/failure. */ Py_LOCAL_INLINE(int) bool_as_status(BOOL value) { return value ? RE_ERROR_SUCCESS : RE_ERROR_FAILURE; } /* ASCII-specific. */ Py_LOCAL_INLINE(BOOL) unicode_has_property(RE_CODE property, Py_UCS4 ch); /* Checks whether a character has a property. */ Py_LOCAL_INLINE(BOOL) ascii_has_property(RE_CODE property, Py_UCS4 ch) { if (ch > RE_ASCII_MAX) { /* Outside the ASCII range. */ RE_UINT32 value; value = property & 0xFFFF; return value == 0; } return unicode_has_property(property, ch); } /* Wrapper for calling 'ascii_has_property' via a pointer. */ static BOOL ascii_has_property_wrapper(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch) { return ascii_has_property(property, ch); } /* Checks whether there's a word character to the left. */ Py_LOCAL_INLINE(BOOL) ascii_word_left(RE_State* state, Py_ssize_t text_pos) { return text_pos > 0 && ascii_has_property(RE_PROP_WORD, state->char_at(state->text, text_pos - 1)); } /* Checks whether there's a word character to the right. */ Py_LOCAL_INLINE(BOOL) ascii_word_right(RE_State* state, Py_ssize_t text_pos) { return text_pos < state->text_length && ascii_has_property(RE_PROP_WORD, state->char_at(state->text, text_pos)); } /* Checks whether a position is on a word boundary. */ static BOOL ascii_at_boundary(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = ascii_word_left(state, text_pos); right = ascii_word_right(state, text_pos); return left != right; } /* Checks whether a position is at the start of a word. */ static BOOL ascii_at_word_start(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = ascii_word_left(state, text_pos); right = ascii_word_right(state, text_pos); return !left && right; } /* Checks whether a position is at the end of a word. */ static BOOL ascii_at_word_end(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = ascii_word_left(state, text_pos); right = ascii_word_right(state, text_pos); return left && !right; } /* Checks whether a character is a line separator. */ static BOOL ascii_is_line_sep(Py_UCS4 ch) { return 0x0A <= ch && ch <= 0x0D; } /* Checks whether a position is at the start of a line. */ static BOOL ascii_at_line_start(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 ch; if (text_pos <= 0) return TRUE; ch = state->char_at(state->text, text_pos - 1); if (ch == 0x0D) { if (text_pos >= state->text_length) return TRUE; /* No line break inside CRLF. */ return state->char_at(state->text, text_pos) != 0x0A; } return 0x0A <= ch && ch <= 0x0D; } /* Checks whether a position is at the end of a line. */ static BOOL ascii_at_line_end(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 ch; if (text_pos >= state->text_length) return TRUE; ch = state->char_at(state->text, text_pos); if (ch == 0x0A) { if (text_pos <= 0) return TRUE; /* No line break inside CRLF. */ return state->char_at(state->text, text_pos - 1) != 0x0D; } return 0x0A <= ch && ch <= 0x0D; } /* Checks whether a character could be Turkic (variants of I/i). For ASCII, it * won't be. */ static BOOL ascii_possible_turkic(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return FALSE; } /* Gets all the cases of a character. */ static int ascii_all_cases(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* codepoints) { int count; count = 0; codepoints[count++] = ch; if (('A' <= ch && ch <= 'Z') || ('a' <= ch && ch <= 'z')) /* It's a letter, so add the other case. */ codepoints[count++] = ch ^ 0x20; return count; } /* Returns a character with its case folded. */ static Py_UCS4 ascii_simple_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch) { if ('A' <= ch && ch <= 'Z') /* Uppercase folds to lowercase. */ return ch ^ 0x20; return ch; } /* Returns a character with its case folded. */ static int ascii_full_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded) { if ('A' <= ch && ch <= 'Z') /* Uppercase folds to lowercase. */ folded[0] = ch ^ 0x20; else folded[0] = ch; return 1; } /* Gets all the case variants of Turkic 'I'. The given character will be listed * first. */ static int ascii_all_turkic_i(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* cases) { int count; count = 0; cases[count++] = ch; if (ch != 'I') cases[count++] = 'I'; if (ch != 'i') cases[count++] = 'i'; return count; } /* The handlers for ASCII characters. */ static RE_EncodingTable ascii_encoding = { ascii_has_property_wrapper, ascii_at_boundary, ascii_at_word_start, ascii_at_word_end, ascii_at_boundary, /* No special "default word boundary" for ASCII. */ ascii_at_word_start, /* No special "default start of word" for ASCII. */ ascii_at_word_end, /* No special "default end of a word" for ASCII. */ at_boundary_always, /* No special "grapheme boundary" for ASCII. */ ascii_is_line_sep, ascii_at_line_start, ascii_at_line_end, ascii_possible_turkic, ascii_all_cases, ascii_simple_case_fold, ascii_full_case_fold, ascii_all_turkic_i, }; /* Locale-specific. */ /* Checks whether a character has the 'alnum' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isalnum(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_ALNUM) != 0; } /* Checks whether a character has the 'alpha' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isalpha(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_ALPHA) != 0; } /* Checks whether a character has the 'cntrl' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_iscntrl(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_CNTRL) != 0; } /* Checks whether a character has the 'digit' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isdigit(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_DIGIT) != 0; } /* Checks whether a character has the 'graph' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isgraph(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_GRAPH) != 0; } /* Checks whether a character has the 'lower' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_islower(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_LOWER) != 0; } /* Checks whether a character has the 'print' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isprint(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_PRINT) != 0; } /* Checks whether a character has the 'punct' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_ispunct(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_PUNCT) != 0; } /* Checks whether a character has the 'space' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isspace(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_SPACE) != 0; } /* Checks whether a character has the 'upper' property in the given locale. */ Py_LOCAL_INLINE(BOOL) locale_isupper(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX && (locale_info->properties[ch] & RE_LOCALE_UPPER) != 0; } /* Converts a character to lowercase in the given locale. */ Py_LOCAL_INLINE(Py_UCS4) locale_tolower(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX ? locale_info->lowercase[ch] : ch; } /* Converts a character to uppercase in the given locale. */ Py_LOCAL_INLINE(Py_UCS4) locale_toupper(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch <= RE_LOCALE_MAX ? locale_info->uppercase[ch] : ch; } /* Checks whether a character has a property. */ Py_LOCAL_INLINE(BOOL) locale_has_property(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch) { RE_UINT32 value; RE_UINT32 v; value = property & 0xFFFF; if (ch > RE_LOCALE_MAX) /* Outside the locale range. */ return value == 0; switch (property >> 16) { case RE_PROP_ALNUM >> 16: v = locale_isalnum(locale_info, ch) != 0; break; case RE_PROP_ALPHA >> 16: v = locale_isalpha(locale_info, ch) != 0; break; case RE_PROP_ANY >> 16: v = 1; break; case RE_PROP_ASCII >> 16: v = ch <= RE_ASCII_MAX; break; case RE_PROP_BLANK >> 16: v = ch == '\t' || ch == ' '; break; case RE_PROP_GC: switch (property) { case RE_PROP_ASSIGNED: v = ch <= RE_LOCALE_MAX; break; case RE_PROP_CASEDLETTER: v = locale_isalpha(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_CNTRL: v = locale_iscntrl(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_DIGIT: v = locale_isdigit(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_GC_CN: v = ch > RE_LOCALE_MAX; break; case RE_PROP_GC_LL: v = locale_islower(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_GC_LU: v = locale_isupper(locale_info, ch) ? value : 0xFFFF; break; case RE_PROP_GC_P: v = locale_ispunct(locale_info, ch) ? value : 0xFFFF; break; default: v = 0xFFFF; break; } break; case RE_PROP_GRAPH >> 16: v = locale_isgraph(locale_info, ch) != 0; break; case RE_PROP_LOWER >> 16: v = locale_islower(locale_info, ch) != 0; break; case RE_PROP_POSIX_ALNUM >> 16: v = re_get_posix_alnum(ch) != 0; break; case RE_PROP_POSIX_DIGIT >> 16: v = re_get_posix_digit(ch) != 0; break; case RE_PROP_POSIX_PUNCT >> 16: v = re_get_posix_punct(ch) != 0; break; case RE_PROP_POSIX_XDIGIT >> 16: v = re_get_posix_xdigit(ch) != 0; break; case RE_PROP_PRINT >> 16: v = locale_isprint(locale_info, ch) != 0; break; case RE_PROP_SPACE >> 16: v = locale_isspace(locale_info, ch) != 0; break; case RE_PROP_UPPER >> 16: v = locale_isupper(locale_info, ch) != 0; break; case RE_PROP_WORD >> 16: v = ch == '_' || locale_isalnum(locale_info, ch) != 0; break; case RE_PROP_XDIGIT >> 16: v = re_get_hex_digit(ch) != 0; break; default: v = 0; break; } return v == value; } /* Wrapper for calling 'locale_has_property' via a pointer. */ static BOOL locale_has_property_wrapper(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch) { return locale_has_property(locale_info, property, ch); } /* Checks whether there's a word character to the left. */ Py_LOCAL_INLINE(BOOL) locale_word_left(RE_State* state, Py_ssize_t text_pos) { return text_pos > 0 && locale_has_property(state->locale_info, RE_PROP_WORD, state->char_at(state->text, text_pos - 1)); } /* Checks whether there's a word character to the right. */ Py_LOCAL_INLINE(BOOL) locale_word_right(RE_State* state, Py_ssize_t text_pos) { return text_pos < state->text_length && locale_has_property(state->locale_info, RE_PROP_WORD, state->char_at(state->text, text_pos)); } /* Checks whether a position is on a word boundary. */ static BOOL locale_at_boundary(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = locale_word_left(state, text_pos); right = locale_word_right(state, text_pos); return left != right; } /* Checks whether a position is at the start of a word. */ static BOOL locale_at_word_start(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = locale_word_left(state, text_pos); right = locale_word_right(state, text_pos); return !left && right; } /* Checks whether a position is at the end of a word. */ static BOOL locale_at_word_end(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = locale_word_left(state, text_pos); right = locale_word_right(state, text_pos); return left && !right; } /* Checks whether a character could be Turkic (variants of I/i). */ static BOOL locale_possible_turkic(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return locale_toupper(locale_info, ch) == 'I' || locale_tolower(locale_info, ch) == 'i'; } /* Gets all the cases of a character. */ static int locale_all_cases(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* codepoints) { int count; Py_UCS4 other; count = 0; codepoints[count++] = ch; other = locale_toupper(locale_info, ch); if (other != ch) codepoints[count++] = other; other = locale_tolower(locale_info, ch); if (other != ch) codepoints[count++] = other; return count; } /* Returns a character with its case folded. */ static Py_UCS4 locale_simple_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return locale_tolower(locale_info, ch); } /* Returns a character with its case folded. */ static int locale_full_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded) { folded[0] = locale_tolower(locale_info, ch); return 1; } /* Gets all the case variants of Turkic 'I'. The given character will be listed * first. */ static int locale_all_turkic_i(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* cases) { int count; Py_UCS4 other; count = 0; cases[count++] = ch; if (ch != 'I') cases[count++] = 'I'; if (ch != 'i') cases[count++] = 'i'; /* Uppercase 'i' will be either dotted (Turkic) or dotless (non-Turkic). */ other = locale_toupper(locale_info, 'i'); if (other != ch && other != 'I') cases[count++] = other; /* Lowercase 'I' will be either dotless (Turkic) or dotted (non-Turkic). */ other = locale_tolower(locale_info, 'I'); if (other != ch && other != 'i') cases[count++] = other; return count; } /* The handlers for locale characters. */ static RE_EncodingTable locale_encoding = { locale_has_property_wrapper, locale_at_boundary, locale_at_word_start, locale_at_word_end, locale_at_boundary, /* No special "default word boundary" for locale. */ locale_at_word_start, /* No special "default start of a word" for locale. */ locale_at_word_end, /* No special "default end of a word" for locale. */ at_boundary_always, /* No special "grapheme boundary" for locale. */ ascii_is_line_sep, /* Assume locale line separators are same as ASCII. */ ascii_at_line_start, /* Assume locale line separators are same as ASCII. */ ascii_at_line_end, /* Assume locale line separators are same as ASCII. */ locale_possible_turkic, locale_all_cases, locale_simple_case_fold, locale_full_case_fold, locale_all_turkic_i, }; /* Unicode-specific. */ /* Checks whether a Unicode character has a property. */ Py_LOCAL_INLINE(BOOL) unicode_has_property(RE_CODE property, Py_UCS4 ch) { RE_UINT32 prop; RE_UINT32 value; RE_UINT32 v; prop = property >> 16; if (prop >= sizeof(re_get_property) / sizeof(re_get_property[0])) return FALSE; value = property & 0xFFFF; v = re_get_property[prop](ch); if (v == value) return TRUE; if (prop == RE_PROP_GC) { switch (value) { case RE_PROP_ASSIGNED: return v != RE_PROP_CN; case RE_PROP_C: return (RE_PROP_C_MASK & (1 << v)) != 0; case RE_PROP_CASEDLETTER: return v == RE_PROP_LU || v == RE_PROP_LL || v == RE_PROP_LT; case RE_PROP_L: return (RE_PROP_L_MASK & (1 << v)) != 0; case RE_PROP_M: return (RE_PROP_M_MASK & (1 << v)) != 0; case RE_PROP_N: return (RE_PROP_N_MASK & (1 << v)) != 0; case RE_PROP_P: return (RE_PROP_P_MASK & (1 << v)) != 0; case RE_PROP_S: return (RE_PROP_S_MASK & (1 << v)) != 0; case RE_PROP_Z: return (RE_PROP_Z_MASK & (1 << v)) != 0; } } return FALSE; } /* Wrapper for calling 'unicode_has_property' via a pointer. */ static BOOL unicode_has_property_wrapper(RE_LocaleInfo* locale_info, RE_CODE property, Py_UCS4 ch) { return unicode_has_property(property, ch); } /* Checks whether there's a word character to the left. */ Py_LOCAL_INLINE(BOOL) unicode_word_left(RE_State* state, Py_ssize_t text_pos) { return text_pos > 0 && unicode_has_property(RE_PROP_WORD, state->char_at(state->text, text_pos - 1)); } /* Checks whether there's a word character to the right. */ Py_LOCAL_INLINE(BOOL) unicode_word_right(RE_State* state, Py_ssize_t text_pos) { return text_pos < state->text_length && unicode_has_property(RE_PROP_WORD, state->char_at(state->text, text_pos)); } /* Checks whether a position is on a word boundary. */ static BOOL unicode_at_boundary(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = unicode_word_left(state, text_pos); right = unicode_word_right(state, text_pos); return left != right; } /* Checks whether a position is at the start of a word. */ static BOOL unicode_at_word_start(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = unicode_word_left(state, text_pos); right = unicode_word_right(state, text_pos); return !left && right; } /* Checks whether a position is at the end of a word. */ static BOOL unicode_at_word_end(RE_State* state, Py_ssize_t text_pos) { BOOL left; BOOL right; left = unicode_word_left(state, text_pos); right = unicode_word_right(state, text_pos); return left && !right; } /* Checks whether a character is a Unicode vowel. * * Only a limited number are treated as vowels. */ Py_LOCAL_INLINE(BOOL) is_unicode_vowel(Py_UCS4 ch) { #if PY_VERSION_HEX >= 0x03030000 switch (Py_UNICODE_TOLOWER(ch)) { #else switch (Py_UNICODE_TOLOWER((Py_UNICODE)ch)) { #endif case 'a': case 0xE0: case 0xE1: case 0xE2: case 'e': case 0xE8: case 0xE9: case 0xEA: case 'i': case 0xEC: case 0xED: case 0xEE: case 'o': case 0xF2: case 0xF3: case 0xF4: case 'u': case 0xF9: case 0xFA: case 0xFB: return TRUE; default: return FALSE; } } /* Checks whether a position is on a default word boundary. * * The rules are defined here: * http://www.unicode.org/reports/tr29/#Default_Word_Boundaries */ static BOOL unicode_at_default_boundary(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); int prop; int prop_m1; Py_ssize_t pos_m1; Py_ssize_t pos_m2; int prop_m2; Py_ssize_t pos_p0; int prop_p0; Py_ssize_t pos_p1; int prop_p1; /* Break at the start and end of the text. */ /* WB1 */ if (text_pos <= 0) return TRUE; /* WB2 */ if (text_pos >= state->text_length) return TRUE; char_at = state->char_at; prop = (int)re_get_word_break(char_at(state->text, text_pos)); prop_m1 = (int)re_get_word_break(char_at(state->text, text_pos - 1)); /* Don't break within CRLF. */ /* WB3 */ if (prop_m1 == RE_BREAK_CR && prop == RE_BREAK_LF) return FALSE; /* Otherwise break before and after Newlines (including CR and LF). */ /* WB3a and WB3b */ if (prop_m1 == RE_BREAK_NEWLINE || prop_m1 == RE_BREAK_CR || prop_m1 == RE_BREAK_LF || prop == RE_BREAK_NEWLINE || prop == RE_BREAK_CR || prop == RE_BREAK_LF) return TRUE; /* WB4 */ /* Get the property of the previous character, ignoring Format and Extend * characters. */ pos_m1 = text_pos - 1; prop_m1 = RE_BREAK_OTHER; while (pos_m1 >= 0) { prop_m1 = (int)re_get_word_break(char_at(state->text, pos_m1)); if (prop_m1 != RE_BREAK_EXTEND && prop_m1 != RE_BREAK_FORMAT) break; --pos_m1; } /* Get the property of the preceding character, ignoring Format and Extend * characters. */ pos_m2 = pos_m1 - 1; prop_m2 = RE_BREAK_OTHER; while (pos_m2 >= 0) { prop_m2 = (int)re_get_word_break(char_at(state->text, pos_m2)); if (prop_m2 != RE_BREAK_EXTEND && prop_m2 != RE_BREAK_FORMAT) break; --pos_m2; } /* Get the property of the next character, ignoring Format and Extend * characters. */ pos_p0 = text_pos; prop_p0 = prop; while (pos_p0 < state->text_length) { prop_p0 = (int)re_get_word_break(char_at(state->text, pos_p0)); if (prop_p0 != RE_BREAK_EXTEND && prop_p0 != RE_BREAK_FORMAT) break; ++pos_p0; } /* Get the property of the following character, ignoring Format and Extend * characters. */ pos_p1 = pos_p0 + 1; prop_p1 = RE_BREAK_OTHER; while (pos_p1 < state->text_length) { prop_p1 = (int)re_get_word_break(char_at(state->text, pos_p1)); if (prop_p1 != RE_BREAK_EXTEND && prop_p1 != RE_BREAK_FORMAT) break; ++pos_p1; } /* Don't break between most letters. */ /* WB5 */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_HEBREWLETTER) && (prop_p0 == RE_BREAK_ALETTER || prop_p0 == RE_BREAK_HEBREWLETTER)) return FALSE; /* Break between apostrophe and vowels (French, Italian). */ /* WB5a */ if (pos_m1 >= 0 && char_at(state->text, pos_m1) == '\'' && is_unicode_vowel(char_at(state->text, text_pos))) return TRUE; /* Don't break letters across certain punctuation. */ /* WB6 */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_HEBREWLETTER) && (prop_p0 == RE_BREAK_MIDLETTER || prop_p0 == RE_BREAK_MIDNUMLET || prop_p0 == RE_BREAK_SINGLEQUOTE) && (prop_p1 == RE_BREAK_ALETTER || prop_p1 == RE_BREAK_HEBREWLETTER)) return FALSE; /* WB7 */ if ((prop_m2 == RE_BREAK_ALETTER || prop_m2 == RE_BREAK_HEBREWLETTER) && (prop_m1 == RE_BREAK_MIDLETTER || prop_m1 == RE_BREAK_MIDNUMLET || prop_m1 == RE_BREAK_SINGLEQUOTE) && (prop_p0 == RE_BREAK_ALETTER || prop_p0 == RE_BREAK_HEBREWLETTER)) return FALSE; /* WB7a */ if (prop_m1 == RE_BREAK_HEBREWLETTER && prop_p0 == RE_BREAK_SINGLEQUOTE) return FALSE; /* WB7b */ if (prop_m1 == RE_BREAK_HEBREWLETTER && prop_p0 == RE_BREAK_DOUBLEQUOTE && prop_p1 == RE_BREAK_HEBREWLETTER) return FALSE; /* WB7c */ if (prop_m2 == RE_BREAK_HEBREWLETTER && prop_m1 == RE_BREAK_DOUBLEQUOTE && prop_p0 == RE_BREAK_HEBREWLETTER) return FALSE; /* Don't break within sequences of digits, or digits adjacent to letters * ("3a", or "A3"). */ /* WB8 */ if (prop_m1 == RE_BREAK_NUMERIC && prop_p0 == RE_BREAK_NUMERIC) return FALSE; /* WB9 */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_HEBREWLETTER) && prop_p0 == RE_BREAK_NUMERIC) return FALSE; /* WB10 */ if (prop_m1 == RE_BREAK_NUMERIC && (prop_p0 == RE_BREAK_ALETTER || prop_p0 == RE_BREAK_HEBREWLETTER)) return FALSE; /* Don't break within sequences, such as "3.2" or "3,456.789". */ /* WB11 */ if (prop_m2 == RE_BREAK_NUMERIC && (prop_m1 == RE_BREAK_MIDNUM || prop_m1 == RE_BREAK_MIDNUMLET || prop_m1 == RE_BREAK_SINGLEQUOTE) && prop_p0 == RE_BREAK_NUMERIC) return FALSE; /* WB12 */ if (prop_m1 == RE_BREAK_NUMERIC && (prop_p0 == RE_BREAK_MIDNUM || prop_p0 == RE_BREAK_MIDNUMLET || prop_p0 == RE_BREAK_SINGLEQUOTE) && prop_p1 == RE_BREAK_NUMERIC) return FALSE; /* Don't break between Katakana. */ /* WB13 */ if (prop_m1 == RE_BREAK_KATAKANA && prop_p0 == RE_BREAK_KATAKANA) return FALSE; /* Don't break from extenders. */ /* WB13a */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_HEBREWLETTER || prop_m1 == RE_BREAK_NUMERIC || prop_m1 == RE_BREAK_KATAKANA || prop_m1 == RE_BREAK_EXTENDNUMLET) && prop_p0 == RE_BREAK_EXTENDNUMLET) return FALSE; /* WB13b */ if (prop_m1 == RE_BREAK_EXTENDNUMLET && (prop_p0 == RE_BREAK_ALETTER || prop_p0 == RE_BREAK_HEBREWLETTER || prop_p0 == RE_BREAK_NUMERIC || prop_p0 == RE_BREAK_KATAKANA)) return FALSE; /* Don't break between regional indicator symbols. */ /* WB13c */ if (prop_m1 == RE_BREAK_REGIONALINDICATOR && prop_p0 == RE_BREAK_REGIONALINDICATOR) return FALSE; /* Otherwise, break everywhere (including around ideographs). */ /* WB14 */ return TRUE; } /* Checks whether a position is at the start/end of a word. */ Py_LOCAL_INLINE(BOOL) unicode_at_default_word_start_or_end(RE_State* state, Py_ssize_t text_pos, BOOL at_start) { Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); BOOL before; BOOL after; Py_UCS4 char_0; Py_UCS4 char_m1; int prop; int prop_m1; Py_ssize_t pos_m1; Py_ssize_t pos_p1; int prop_p1; Py_UCS4 char_p1; Py_ssize_t pos_m2; int prop_m2; Py_UCS4 char_m2; char_at = state->char_at; /* At the start or end of the text. */ if (text_pos <= 0 || text_pos >= state->text_length) { before = unicode_word_left(state, text_pos); after = unicode_word_right(state, text_pos); return before != at_start && after == at_start; } char_0 = char_at(state->text, text_pos); char_m1 = char_at(state->text, text_pos - 1); prop = (int)re_get_word_break(char_0); prop_m1 = (int)re_get_word_break(char_m1); /* No break within CRLF. */ if (prop_m1 == RE_BREAK_CR && prop == RE_BREAK_LF) return FALSE; /* Break before and after Newlines (including CR and LF). */ if (prop_m1 == RE_BREAK_NEWLINE || prop_m1 == RE_BREAK_CR || prop_m1 == RE_BREAK_LF || prop == RE_BREAK_NEWLINE || prop == RE_BREAK_CR || prop == RE_BREAK_LF) { before = unicode_has_property(RE_PROP_WORD, char_m1); after = unicode_has_property(RE_PROP_WORD, char_0); return before != at_start && after == at_start; } /* No break just before Format or Extend characters. */ if (prop == RE_BREAK_EXTEND || prop == RE_BREAK_FORMAT) return FALSE; /* Get the property of the previous character. */ pos_m1 = text_pos - 1; prop_m1 = RE_BREAK_OTHER; while (pos_m1 >= 0) { char_m1 = char_at(state->text, pos_m1); prop_m1 = (int)re_get_word_break(char_m1); if (prop_m1 != RE_BREAK_EXTEND && prop_m1 != RE_BREAK_FORMAT) break; --pos_m1; } /* No break between most letters. */ if (prop_m1 == RE_BREAK_ALETTER && prop == RE_BREAK_ALETTER) return FALSE; if (pos_m1 >= 0 && char_m1 == '\'' && is_unicode_vowel(char_0)) return TRUE; pos_p1 = text_pos + 1; prop_p1 = RE_BREAK_OTHER; while (pos_p1 < state->text_length) { char_p1 = char_at(state->text, pos_p1); prop_p1 = (int)re_get_word_break(char_p1); if (prop_p1 != RE_BREAK_EXTEND && prop_p1 != RE_BREAK_FORMAT) break; ++pos_p1; } /* No break letters across certain punctuation. */ if (prop_m1 == RE_BREAK_ALETTER && (prop == RE_BREAK_MIDLETTER || prop == RE_BREAK_MIDNUMLET) && prop_p1 == RE_BREAK_ALETTER) return FALSE; pos_m2 = pos_m1 - 1; prop_m2 = RE_BREAK_OTHER; while (pos_m2 >= 0) { char_m2 = char_at(state->text, pos_m2); prop_m2 = (int)re_get_word_break(char_m2); if (prop_m2 != RE_BREAK_EXTEND && prop_m1 != RE_BREAK_FORMAT) break; --pos_m2; } if (prop_m2 == RE_BREAK_ALETTER && (prop_m1 == RE_BREAK_MIDLETTER || prop_m1 == RE_BREAK_MIDNUMLET) && prop == RE_BREAK_ALETTER) return FALSE; /* No break within sequences of digits, or digits adjacent to letters * ("3a", or "A3"). */ if ((prop_m1 == RE_BREAK_NUMERIC || prop_m1 == RE_BREAK_ALETTER) && prop == RE_BREAK_NUMERIC) return FALSE; if (prop_m1 == RE_BREAK_NUMERIC && prop == RE_BREAK_ALETTER) return FALSE; /* No break within sequences, such as "3.2" or "3,456.789". */ if (prop_m2 == RE_BREAK_NUMERIC && (prop_m1 == RE_BREAK_MIDNUM || prop_m1 == RE_BREAK_MIDNUMLET) && prop == RE_BREAK_NUMERIC) return FALSE; if (prop_m1 == RE_BREAK_NUMERIC && (prop == RE_BREAK_MIDNUM || prop == RE_BREAK_MIDNUMLET) && prop_p1 == RE_BREAK_NUMERIC) return FALSE; /* No break between Katakana. */ if (prop_m1 == RE_BREAK_KATAKANA && prop == RE_BREAK_KATAKANA) return FALSE; /* No break from extenders. */ if ((prop_m1 == RE_BREAK_ALETTER || prop_m1 == RE_BREAK_NUMERIC || prop_m1 == RE_BREAK_KATAKANA || prop_m1 == RE_BREAK_EXTENDNUMLET) && prop == RE_BREAK_EXTENDNUMLET) return FALSE; if (prop_m1 == RE_BREAK_EXTENDNUMLET && (prop == RE_BREAK_ALETTER || prop == RE_BREAK_NUMERIC || prop == RE_BREAK_KATAKANA)) return FALSE; /* Otherwise, break everywhere (including around ideographs). */ before = unicode_has_property(RE_PROP_WORD, char_m1); after = unicode_has_property(RE_PROP_WORD, char_0); return before != at_start && after == at_start; } /* Checks whether a position is at the start of a word. */ static BOOL unicode_at_default_word_start(RE_State* state, Py_ssize_t text_pos) { return unicode_at_default_word_start_or_end(state, text_pos, TRUE); } /* Checks whether a position is at the end of a word. */ static BOOL unicode_at_default_word_end(RE_State* state, Py_ssize_t text_pos) { return unicode_at_default_word_start_or_end(state, text_pos, FALSE); } /* Checks whether a position is on a grapheme boundary. * * The rules are defined here: * http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries */ static BOOL unicode_at_grapheme_boundary(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); int prop; int prop_m1; /* Break at the start and end of the text. */ /* GB1 */ if (text_pos <= 0) return TRUE; /* GB2 */ if (text_pos >= state->text_length) return TRUE; char_at = state->char_at; prop = (int)re_get_grapheme_cluster_break(char_at(state->text, text_pos)); prop_m1 = (int)re_get_grapheme_cluster_break(char_at(state->text, text_pos - 1)); /* Don't break within CRLF. */ /* GB3 */ if (prop_m1 == RE_GBREAK_CR && prop == RE_GBREAK_LF) return FALSE; /* Otherwise break before and after controls (including CR and LF). */ /* GB4 and GB5 */ if (prop_m1 == RE_GBREAK_CONTROL || prop_m1 == RE_GBREAK_CR || prop_m1 == RE_GBREAK_LF || prop == RE_GBREAK_CONTROL || prop == RE_GBREAK_CR || prop == RE_GBREAK_LF) return TRUE; /* Don't break Hangul syllable sequences. */ /* GB6 */ if (prop_m1 == RE_GBREAK_L && (prop == RE_GBREAK_L || prop == RE_GBREAK_V || prop == RE_GBREAK_LV || prop == RE_GBREAK_LVT)) return FALSE; /* GB7 */ if ((prop_m1 == RE_GBREAK_LV || prop_m1 == RE_GBREAK_V) && (prop == RE_GBREAK_V || prop == RE_GBREAK_T)) return FALSE; /* GB8 */ if ((prop_m1 == RE_GBREAK_LVT || prop_m1 == RE_GBREAK_T) && (prop == RE_GBREAK_T)) return FALSE; /* Don't break between regional indicator symbols. */ /* GB8a */ if (prop_m1 == RE_GBREAK_REGIONALINDICATOR && prop == RE_GBREAK_REGIONALINDICATOR) return FALSE; /* Don't break just before Extend characters. */ /* GB9 */ if (prop == RE_GBREAK_EXTEND) return FALSE; /* Don't break before SpacingMarks, or after Prepend characters. */ /* GB9a */ if (prop == RE_GBREAK_SPACINGMARK) return FALSE; /* GB9b */ if (prop_m1 == RE_GBREAK_PREPEND) return FALSE; /* Otherwise, break everywhere. */ /* GB10 */ return TRUE; } /* Checks whether a character is a line separator. */ static BOOL unicode_is_line_sep(Py_UCS4 ch) { return (0x0A <= ch && ch <= 0x0D) || ch == 0x85 || ch == 0x2028 || ch == 0x2029; } /* Checks whether a position is at the start of a line. */ static BOOL unicode_at_line_start(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 ch; if (text_pos <= 0) return TRUE; ch = state->char_at(state->text, text_pos - 1); if (ch == 0x0D) { if (text_pos >= state->text_length) return TRUE; /* No line break inside CRLF. */ return state->char_at(state->text, text_pos) != 0x0A; } return (0x0A <= ch && ch <= 0x0D) || ch == 0x85 || ch == 0x2028 || ch == 0x2029; } /* Checks whether a position is at the end of a line. */ static BOOL unicode_at_line_end(RE_State* state, Py_ssize_t text_pos) { Py_UCS4 ch; if (text_pos >= state->text_length) return TRUE; ch = state->char_at(state->text, text_pos); if (ch == 0x0A) { if (text_pos <= 0) return TRUE; /* No line break inside CRLF. */ return state->char_at(state->text, text_pos - 1) != 0x0D; } return (0x0A <= ch && ch <= 0x0D) || ch == 0x85 || ch == 0x2028 || ch == 0x2029; } /* Checks whether a character could be Turkic (variants of I/i). */ static BOOL unicode_possible_turkic(RE_LocaleInfo* locale_info, Py_UCS4 ch) { return ch == 'I' || ch == 'i' || ch == 0x0130 || ch == 0x0131; } /* Gets all the cases of a character. */ static int unicode_all_cases(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* codepoints) { return re_get_all_cases(ch, codepoints); } /* Returns a character with its case folded, unless it could be Turkic * (variants of I/i). */ static Py_UCS4 unicode_simple_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch) { /* Is it a possible Turkic character? If so, pass it through unchanged. */ if (ch == 'I' || ch == 'i' || ch == 0x0130 || ch == 0x0131) return ch; return (Py_UCS4)re_get_simple_case_folding(ch); } /* Returns a character with its case folded, unless it could be Turkic * (variants of I/i). */ static int unicode_full_case_fold(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded) { /* Is it a possible Turkic character? If so, pass it through unchanged. */ if (ch == 'I' || ch == 'i' || ch == 0x0130 || ch == 0x0131) { folded[0] = ch; return 1; } return re_get_full_case_folding(ch, folded); } /* Gets all the case variants of Turkic 'I'. */ static int unicode_all_turkic_i(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* cases) { int count; count = 0; cases[count++] = ch; if (ch != 'I') cases[count++] = 'I'; if (ch != 'i') cases[count++] = 'i'; if (ch != 0x130) cases[count++] = 0x130; if (ch != 0x131) cases[count++] = 0x131; return count; } /* The handlers for Unicode characters. */ static RE_EncodingTable unicode_encoding = { unicode_has_property_wrapper, unicode_at_boundary, unicode_at_word_start, unicode_at_word_end, unicode_at_default_boundary, unicode_at_default_word_start, unicode_at_default_word_end, unicode_at_grapheme_boundary, unicode_is_line_sep, unicode_at_line_start, unicode_at_line_end, unicode_possible_turkic, unicode_all_cases, unicode_simple_case_fold, unicode_full_case_fold, unicode_all_turkic_i, }; Py_LOCAL_INLINE(PyObject*) get_object(char* module_name, char* object_name); /* Sets the error message. */ Py_LOCAL_INLINE(void) set_error(int status, PyObject* object) { TRACE(("<>\n")) if (!error_exception) error_exception = get_object("_" RE_MODULE "_core", "error"); switch (status) { case RE_ERROR_BACKTRACKING: PyErr_SetString(error_exception, "too much backtracking"); break; case RE_ERROR_CONCURRENT: PyErr_SetString(PyExc_ValueError, "concurrent not int or None"); break; case RE_ERROR_GROUP_INDEX_TYPE: if (object) PyErr_Format(PyExc_TypeError, "group indices must be integers or strings, not %.200s", object->ob_type->tp_name); else PyErr_Format(PyExc_TypeError, "group indices must be integers or strings"); break; case RE_ERROR_ILLEGAL: PyErr_SetString(PyExc_RuntimeError, "invalid RE code"); break; case RE_ERROR_INDEX: PyErr_SetString(PyExc_TypeError, "string indices must be integers"); break; case RE_ERROR_INTERRUPTED: /* An exception has already been raised, so let it fly. */ break; case RE_ERROR_INVALID_GROUP_REF: PyErr_SetString(error_exception, "invalid group reference"); break; case RE_ERROR_MEMORY: PyErr_NoMemory(); break; case RE_ERROR_NOT_BYTES: PyErr_Format(PyExc_TypeError, "expected a bytes-like object, %.200s found", object->ob_type->tp_name); break; case RE_ERROR_NOT_STRING: PyErr_Format(PyExc_TypeError, "expected string instance, %.200s found", object->ob_type->tp_name); break; case RE_ERROR_NOT_UNICODE: PyErr_Format(PyExc_TypeError, "expected str instance, %.200s found", object->ob_type->tp_name); break; case RE_ERROR_NO_SUCH_GROUP: PyErr_SetString(PyExc_IndexError, "no such group"); break; case RE_ERROR_REPLACEMENT: PyErr_SetString(error_exception, "invalid replacement"); break; default: /* Other error codes indicate compiler/engine bugs. */ PyErr_SetString(PyExc_RuntimeError, "internal error in regular expression engine"); break; } } /* Allocates memory. * * Sets the Python error handler and returns NULL if the allocation fails. */ Py_LOCAL_INLINE(void*) re_alloc(size_t size) { void* new_ptr; new_ptr = PyMem_Malloc(size); if (!new_ptr) set_error(RE_ERROR_MEMORY, NULL); return new_ptr; } /* Reallocates memory. * * Sets the Python error handler and returns NULL if the reallocation fails. */ Py_LOCAL_INLINE(void*) re_realloc(void* ptr, size_t size) { void* new_ptr; new_ptr = PyMem_Realloc(ptr, size); if (!new_ptr) set_error(RE_ERROR_MEMORY, NULL); return new_ptr; } /* Deallocates memory. */ Py_LOCAL_INLINE(void) re_dealloc(void* ptr) { PyMem_Free(ptr); } /* Releases the GIL if multithreading is enabled. */ Py_LOCAL_INLINE(void) release_GIL(RE_SafeState* safe_state) { if (safe_state->re_state->is_multithreaded) safe_state->thread_state = PyEval_SaveThread(); } /* Acquires the GIL if multithreading is enabled. */ Py_LOCAL_INLINE(void) acquire_GIL(RE_SafeState* safe_state) { if (safe_state->re_state->is_multithreaded) PyEval_RestoreThread(safe_state->thread_state); } /* Allocates memory, holding the GIL during the allocation. * * Sets the Python error handler and returns NULL if the allocation fails. */ Py_LOCAL_INLINE(void*) safe_alloc(RE_SafeState* safe_state, size_t size) { void* new_ptr; acquire_GIL(safe_state); new_ptr = re_alloc(size); release_GIL(safe_state); return new_ptr; } /* Reallocates memory, holding the GIL during the reallocation. * * Sets the Python error handler and returns NULL if the reallocation fails. */ Py_LOCAL_INLINE(void*) safe_realloc(RE_SafeState* safe_state, void* ptr, size_t size) { void* new_ptr; acquire_GIL(safe_state); new_ptr = re_realloc(ptr, size); release_GIL(safe_state); return new_ptr; } /* Deallocates memory, holding the GIL during the deallocation. */ Py_LOCAL_INLINE(void) safe_dealloc(RE_SafeState* safe_state, void* ptr) { acquire_GIL(safe_state); re_dealloc(ptr); release_GIL(safe_state); } /* Checks for KeyboardInterrupt, holding the GIL during the check. */ Py_LOCAL_INLINE(BOOL) safe_check_signals(RE_SafeState* safe_state) { BOOL result; acquire_GIL(safe_state); result = (BOOL)PyErr_CheckSignals(); release_GIL(safe_state); return result; } /* Checks whether a character is in a range. */ Py_LOCAL_INLINE(BOOL) in_range(Py_UCS4 lower, Py_UCS4 upper, Py_UCS4 ch) { return lower <= ch && ch <= upper; } /* Checks whether a character is in a range, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_range_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 lower, Py_UCS4 upper, Py_UCS4 ch) { int count; Py_UCS4 cases[RE_MAX_CASES]; int i; count = encoding->all_cases(locale_info, ch, cases); for (i = 0; i < count; i++) { if (in_range(lower, upper, cases[i])) return TRUE; } return FALSE; } /* Checks whether 2 characters are the same. */ Py_LOCAL_INLINE(BOOL) same_char(Py_UCS4 ch1, Py_UCS4 ch2) { return ch1 == ch2; } /* Wrapper for calling 'same_char' via a pointer. */ static BOOL same_char_wrapper(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2) { return same_char(ch1, ch2); } /* Checks whether 2 characters are the same, ignoring case. */ Py_LOCAL_INLINE(BOOL) same_char_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2) { int count; Py_UCS4 cases[RE_MAX_CASES]; int i; if (ch1 == ch2) return TRUE; count = encoding->all_cases(locale_info, ch1, cases); for (i = 1; i < count; i++) { if (cases[i] == ch2) return TRUE; } return FALSE; } /* Wrapper for calling 'same_char' via a pointer. */ static BOOL same_char_ign_wrapper(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2) { return same_char_ign(encoding, locale_info, ch1, ch2); } /* Checks whether a character is anything except a newline. */ Py_LOCAL_INLINE(BOOL) matches_ANY(RE_EncodingTable* encoding, RE_Node* node, Py_UCS4 ch) { return ch != '\n'; } /* Checks whether a character is anything except a line separator. */ Py_LOCAL_INLINE(BOOL) matches_ANY_U(RE_EncodingTable* encoding, RE_Node* node, Py_UCS4 ch) { return !encoding->is_line_sep(ch); } /* Checks whether 2 characters are the same. */ Py_LOCAL_INLINE(BOOL) matches_CHARACTER(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return same_char(node->values[0], ch); } /* Checks whether 2 characters are the same, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_CHARACTER_IGN(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return same_char_ign(encoding, locale_info, node->values[0], ch); } /* Checks whether a character has a property. */ Py_LOCAL_INLINE(BOOL) matches_PROPERTY(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return encoding->has_property(locale_info, node->values[0], ch); } /* Checks whether a character has a property, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_PROPERTY_IGN(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_UINT32 property; RE_UINT32 prop; property = node->values[0]; prop = property >> 16; /* We need to do special handling of case-sensitive properties according to * the 'encoding'. */ if (encoding == &unicode_encoding) { /* We are working with Unicode. */ if (property == RE_PROP_GC_LU || property == RE_PROP_GC_LL || property == RE_PROP_GC_LT) { RE_UINT32 value; value = re_get_general_category(ch); return value == RE_PROP_LU || value == RE_PROP_LL || value == RE_PROP_LT; } else if (prop == RE_PROP_UPPERCASE || prop == RE_PROP_LOWERCASE) return (BOOL)re_get_cased(ch); /* The property is case-insensitive. */ return unicode_has_property(property, ch); } else if (encoding == &ascii_encoding) { /* We are working with ASCII. */ if (property == RE_PROP_GC_LU || property == RE_PROP_GC_LL || property == RE_PROP_GC_LT) { RE_UINT32 value; value = re_get_general_category(ch); return value == RE_PROP_LU || value == RE_PROP_LL || value == RE_PROP_LT; } else if (prop == RE_PROP_UPPERCASE || prop == RE_PROP_LOWERCASE) return (BOOL)re_get_cased(ch); /* The property is case-insensitive. */ return ascii_has_property(property, ch); } else { /* We are working with Locale. */ if (property == RE_PROP_GC_LU || property == RE_PROP_GC_LL || property == RE_PROP_GC_LT) return locale_isupper(locale_info, ch) || locale_islower(locale_info, ch); else if (prop == RE_PROP_UPPERCASE || prop == RE_PROP_LOWERCASE) return locale_isupper(locale_info, ch) || locale_islower(locale_info, ch); /* The property is case-insensitive. */ return locale_has_property(locale_info, property, ch); } } /* Checks whether a character is in a range. */ Py_LOCAL_INLINE(BOOL) matches_RANGE(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return in_range(node->values[0], node->values[1], ch); } /* Checks whether a character is in a range, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_RANGE_IGN(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { return in_range_ign(encoding, locale_info, node->values[0], node->values[1], ch); } Py_LOCAL_INLINE(BOOL) in_set_diff(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch); Py_LOCAL_INLINE(BOOL) in_set_inter(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch); Py_LOCAL_INLINE(BOOL) in_set_sym_diff(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch); Py_LOCAL_INLINE(BOOL) in_set_union(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch); /* Checks whether a character matches a set member. */ Py_LOCAL_INLINE(BOOL) matches_member(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* member, Py_UCS4 ch) { switch (member->op) { case RE_OP_CHARACTER: /* values are: char_code */ TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->values[0])) return ch == member->values[0]; case RE_OP_PROPERTY: /* values are: property */ TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->values[0])) return encoding->has_property(locale_info, member->values[0], ch); case RE_OP_RANGE: /* values are: lower, upper */ TRACE(("%s %d %d %d\n", re_op_text[member->op], member->match, member->values[0], member->values[1])) return in_range(member->values[0], member->values[1], ch); case RE_OP_SET_DIFF: TRACE(("%s\n", re_op_text[member->op])) return in_set_diff(encoding, locale_info, member, ch); case RE_OP_SET_INTER: TRACE(("%s\n", re_op_text[member->op])) return in_set_inter(encoding, locale_info, member, ch); case RE_OP_SET_SYM_DIFF: TRACE(("%s\n", re_op_text[member->op])) return in_set_sym_diff(encoding, locale_info, member, ch); case RE_OP_SET_UNION: TRACE(("%s\n", re_op_text[member->op])) return in_set_union(encoding, locale_info, member, ch); case RE_OP_STRING: { /* values are: char_code, char_code, ... */ size_t i; TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->value_count)) for (i = 0; i < member->value_count; i++) { if (ch == member->values[i]) return TRUE; } return FALSE; } default: return FALSE; } } /* Checks whether a character matches a set member, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_member_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* member, int case_count, Py_UCS4* cases) { int i; for (i = 0; i < case_count; i++) { switch (member->op) { case RE_OP_CHARACTER: /* values are: char_code */ TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->values[0])) if (cases[i] == member->values[0]) return TRUE; break; case RE_OP_PROPERTY: /* values are: property */ TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->values[0])) if (encoding->has_property(locale_info, member->values[0], cases[i])) return TRUE; break; case RE_OP_RANGE: /* values are: lower, upper */ TRACE(("%s %d %d %d\n", re_op_text[member->op], member->match, member->values[0], member->values[1])) if (in_range(member->values[0], member->values[1], cases[i])) return TRUE; break; case RE_OP_SET_DIFF: TRACE(("%s\n", re_op_text[member->op])) if (in_set_diff(encoding, locale_info, member, cases[i])) return TRUE; break; case RE_OP_SET_INTER: TRACE(("%s\n", re_op_text[member->op])) if (in_set_inter(encoding, locale_info, member, cases[i])) return TRUE; break; case RE_OP_SET_SYM_DIFF: TRACE(("%s\n", re_op_text[member->op])) if (in_set_sym_diff(encoding, locale_info, member, cases[i])) return TRUE; break; case RE_OP_SET_UNION: TRACE(("%s\n", re_op_text[member->op])) if (in_set_union(encoding, locale_info, member, cases[i])) return TRUE; break; case RE_OP_STRING: { size_t j; TRACE(("%s %d %d\n", re_op_text[member->op], member->match, member->value_count)) for (j = 0; j < member->value_count; j++) { if (cases[i] == member->values[j]) return TRUE; } break; } default: return TRUE; } } return FALSE; } /* Checks whether a character is in a set difference. */ Py_LOCAL_INLINE(BOOL) in_set_diff(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_Node* member; member = node->nonstring.next_2.node; if (matches_member(encoding, locale_info, member, ch) != member->match) return FALSE; member = member->next_1.node; while (member) { if (matches_member(encoding, locale_info, member, ch) == member->match) return FALSE; member = member->next_1.node; } return TRUE; } /* Checks whether a character is in a set difference, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_set_diff_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, int case_count, Py_UCS4* cases) { RE_Node* member; member = node->nonstring.next_2.node; if (matches_member_ign(encoding, locale_info, member, case_count, cases) != member->match) return FALSE; member = member->next_1.node; while (member) { if (matches_member_ign(encoding, locale_info, member, case_count, cases) == member->match) return FALSE; member = member->next_1.node; } return TRUE; } /* Checks whether a character is in a set intersection. */ Py_LOCAL_INLINE(BOOL) in_set_inter(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_Node* member; member = node->nonstring.next_2.node; while (member) { if (matches_member(encoding, locale_info, member, ch) != member->match) return FALSE; member = member->next_1.node; } return TRUE; } /* Checks whether a character is in a set intersection, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_set_inter_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, int case_count, Py_UCS4* cases) { RE_Node* member; member = node->nonstring.next_2.node; while (member) { if (matches_member_ign(encoding, locale_info, member, case_count, cases) != member->match) return FALSE; member = member->next_1.node; } return TRUE; } /* Checks whether a character is in a set symmetric difference. */ Py_LOCAL_INLINE(BOOL) in_set_sym_diff(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_Node* member; BOOL result; member = node->nonstring.next_2.node; result = FALSE; while (member) { if (matches_member(encoding, locale_info, member, ch) == member->match) result = !result; member = member->next_1.node; } return result; } /* Checks whether a character is in a set symmetric difference, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_set_sym_diff_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, int case_count, Py_UCS4* cases) { RE_Node* member; BOOL result; member = node->nonstring.next_2.node; result = FALSE; while (member) { if (matches_member_ign(encoding, locale_info, member, case_count, cases) == member->match) result = !result; member = member->next_1.node; } return result; } /* Checks whether a character is in a set union. */ Py_LOCAL_INLINE(BOOL) in_set_union(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { RE_Node* member; member = node->nonstring.next_2.node; while (member) { if (matches_member(encoding, locale_info, member, ch) == member->match) return TRUE; member = member->next_1.node; } return FALSE; } /* Checks whether a character is in a set union, ignoring case. */ Py_LOCAL_INLINE(BOOL) in_set_union_ign(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, int case_count, Py_UCS4* cases) { RE_Node* member; member = node->nonstring.next_2.node; while (member) { if (matches_member_ign(encoding, locale_info, member, case_count, cases) == member->match) return TRUE; member = member->next_1.node; } return FALSE; } /* Checks whether a character is in a set. */ Py_LOCAL_INLINE(BOOL) matches_SET(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { switch (node->op) { case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_REV: return in_set_diff(encoding, locale_info, node, ch); case RE_OP_SET_INTER: case RE_OP_SET_INTER_REV: return in_set_inter(encoding, locale_info, node, ch); case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_REV: return in_set_sym_diff(encoding, locale_info, node, ch); case RE_OP_SET_UNION: case RE_OP_SET_UNION_REV: return in_set_union(encoding, locale_info, node, ch); } return FALSE; } /* Checks whether a character is in a set, ignoring case. */ Py_LOCAL_INLINE(BOOL) matches_SET_IGN(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, RE_Node* node, Py_UCS4 ch) { Py_UCS4 cases[RE_MAX_CASES]; int case_count; case_count = encoding->all_cases(locale_info, ch, cases); switch (node->op) { case RE_OP_SET_DIFF_IGN: case RE_OP_SET_DIFF_IGN_REV: return in_set_diff_ign(encoding, locale_info, node, case_count, cases); case RE_OP_SET_INTER_IGN: case RE_OP_SET_INTER_IGN_REV: return in_set_inter_ign(encoding, locale_info, node, case_count, cases); case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_SYM_DIFF_IGN_REV: return in_set_sym_diff_ign(encoding, locale_info, node, case_count, cases); case RE_OP_SET_UNION_IGN: case RE_OP_SET_UNION_IGN_REV: return in_set_union_ign(encoding, locale_info, node, case_count, cases); } return FALSE; } /* Resets a guard list. */ Py_LOCAL_INLINE(void) reset_guard_list(RE_GuardList* guard_list) { guard_list->count = 0; guard_list->last_text_pos = -1; } /* Clears the groups. */ Py_LOCAL_INLINE(void) clear_groups(RE_State* state) { size_t i; for (i = 0; i < state->pattern->true_group_count; i++) { RE_GroupData* group; group = &state->groups[i]; group->span.start = -1; group->span.end = -1; group->capture_count = 0; group->current_capture = -1; } } /* Initialises the state for a match. */ Py_LOCAL_INLINE(void) init_match(RE_State* state) { RE_AtomicBlock* current; size_t i; /* Reset the backtrack. */ state->current_backtrack_block = &state->backtrack_block; state->current_backtrack_block->count = 0; state->current_saved_groups = state->first_saved_groups; state->backtrack = NULL; state->search_anchor = state->text_pos; state->match_pos = state->text_pos; /* Reset the atomic stack. */ current = state->current_atomic_block; if (current) { while (current->previous) current = current->previous; state->current_atomic_block = current; state->current_atomic_block->count = 0; } /* Reset the guards for the repeats. */ for (i = 0; i < state->pattern->repeat_count; i++) { reset_guard_list(&state->repeats[i].body_guard_list); reset_guard_list(&state->repeats[i].tail_guard_list); } /* Reset the guards for the fuzzy sections. */ for (i = 0; i < state->pattern->fuzzy_count; i++) { reset_guard_list(&state->fuzzy_guards[i].body_guard_list); reset_guard_list(&state->fuzzy_guards[i].tail_guard_list); } /* Clear the groups. */ clear_groups(state); /* Reset the guards for the group calls. */ for (i = 0; i < state->pattern->call_ref_info_count; i++) reset_guard_list(&state->group_call_guard_list[i]); /* Clear the counts and cost for matching. */ if (state->pattern->is_fuzzy) { memset(state->fuzzy_info.counts, 0, sizeof(state->fuzzy_info.counts)); memset(state->total_fuzzy_counts, 0, sizeof(state->total_fuzzy_counts)); } state->fuzzy_info.total_cost = 0; state->total_errors = 0; state->too_few_errors = FALSE; state->found_match = FALSE; state->capture_change = 0; state->iterations = 0; } /* Adds a new backtrack entry. */ Py_LOCAL_INLINE(BOOL) add_backtrack(RE_SafeState* safe_state, RE_UINT8 op) { RE_State* state; RE_BacktrackBlock* current; state = safe_state->re_state; current = state->current_backtrack_block; if (current->count >= current->capacity) { if (!current->next) { RE_BacktrackBlock* next; /* Is there too much backtracking? */ if (state->backtrack_allocated >= RE_MAX_BACKTRACK_ALLOC) return FALSE; next = (RE_BacktrackBlock*)safe_alloc(safe_state, sizeof(RE_BacktrackBlock)); if (!next) return FALSE; next->previous = current; next->next = NULL; next->capacity = RE_BACKTRACK_BLOCK_SIZE; current->next = next; state->backtrack_allocated += RE_BACKTRACK_BLOCK_SIZE; } current = current->next; current->count = 0; state->current_backtrack_block = current; } state->backtrack = ¤t->items[current->count++]; state->backtrack->op = op; return TRUE; } /* Gets the last backtrack entry. * * It'll never be called when there are _no_ entries. */ Py_LOCAL_INLINE(RE_BacktrackData*) last_backtrack(RE_State* state) { RE_BacktrackBlock* current; current = state->current_backtrack_block; state->backtrack = ¤t->items[current->count - 1]; return state->backtrack; } /* Discards the last backtrack entry. * * It'll never be called to discard the _only_ entry. */ Py_LOCAL_INLINE(void) discard_backtrack(RE_State* state) { RE_BacktrackBlock* current; current = state->current_backtrack_block; --current->count; if (current->count == 0 && current->previous) state->current_backtrack_block = current->previous; } /* Pushes a new empty entry onto the atomic stack. */ Py_LOCAL_INLINE(RE_AtomicData*) push_atomic(RE_SafeState* safe_state) { RE_State* state; RE_AtomicBlock* current; state = safe_state->re_state; current = state->current_atomic_block; if (!current || current->count >= current->capacity) { /* The current block is full. */ if (current && current->next) /* Advance to the next block. */ current = current->next; else { /* Add a new block. */ RE_AtomicBlock* next; next = (RE_AtomicBlock*)safe_alloc(safe_state, sizeof(RE_AtomicBlock)); if (!next) return NULL; next->previous = current; next->next = NULL; next->capacity = RE_ATOMIC_BLOCK_SIZE; if (current) /* The current block is the last one. */ current->next = next; else /* The new block is the first one. */ state->current_atomic_block = next; current = next; } current->count = 0; } return ¤t->items[current->count++]; } /* Pops the top entry from the atomic stack. */ Py_LOCAL_INLINE(RE_AtomicData*) pop_atomic(RE_SafeState* safe_state) { RE_State* state; RE_AtomicBlock* current; RE_AtomicData* atomic; state = safe_state->re_state; current = state->current_atomic_block; atomic = ¤t->items[--current->count]; if (current->count == 0 && current->previous) state->current_atomic_block = current->previous; return atomic; } /* Gets the top entry from the atomic stack. */ Py_LOCAL_INLINE(RE_AtomicData*) top_atomic(RE_SafeState* safe_state) { RE_State* state; RE_AtomicBlock* current; state = safe_state->re_state; current = state->current_atomic_block; return ¤t->items[current->count - 1]; } /* Copies a repeat guard list. */ Py_LOCAL_INLINE(BOOL) copy_guard_data(RE_SafeState* safe_state, RE_GuardList* dst, RE_GuardList* src) { if (dst->capacity < src->count) { RE_GuardSpan* new_spans; if (!safe_state) return FALSE; dst->capacity = src->count; new_spans = (RE_GuardSpan*)safe_realloc(safe_state, dst->spans, dst->capacity * sizeof(RE_GuardSpan)); if (!new_spans) return FALSE; dst->spans = new_spans; } dst->count = src->count; memmove(dst->spans, src->spans, dst->count * sizeof(RE_GuardSpan)); dst->last_text_pos = -1; return TRUE; } /* Copies a repeat. */ Py_LOCAL_INLINE(BOOL) copy_repeat_data(RE_SafeState* safe_state, RE_RepeatData* dst, RE_RepeatData* src) { if (!copy_guard_data(safe_state, &dst->body_guard_list, &src->body_guard_list) || !copy_guard_data(safe_state, &dst->tail_guard_list, &src->tail_guard_list)) { safe_dealloc(safe_state, dst->body_guard_list.spans); safe_dealloc(safe_state, dst->tail_guard_list.spans); return FALSE; } dst->count = src->count; dst->start = src->start; dst->capture_change = src->capture_change; return TRUE; } /* Pushes a return node onto the group call stack. */ Py_LOCAL_INLINE(BOOL) push_group_return(RE_SafeState* safe_state, RE_Node* return_node) { RE_State* state; PatternObject* pattern; RE_GroupCallFrame* frame; state = safe_state->re_state; pattern = state->pattern; if (state->current_group_call_frame && state->current_group_call_frame->next) /* Advance to the next allocated frame. */ frame = state->current_group_call_frame->next; else if (!state->current_group_call_frame && state->first_group_call_frame) /* Advance to the first allocated frame. */ frame = state->first_group_call_frame; else { /* Create a new frame. */ frame = (RE_GroupCallFrame*)safe_alloc(safe_state, sizeof(RE_GroupCallFrame)); if (!frame) return FALSE; frame->groups = (RE_GroupData*)safe_alloc(safe_state, pattern->true_group_count * sizeof(RE_GroupData)); frame->repeats = (RE_RepeatData*)safe_alloc(safe_state, pattern->repeat_count * sizeof(RE_RepeatData)); if (!frame->groups || !frame->repeats) { safe_dealloc(safe_state, frame->groups); safe_dealloc(safe_state, frame->repeats); safe_dealloc(safe_state, frame); return FALSE; } memset(frame->groups, 0, pattern->true_group_count * sizeof(RE_GroupData)); memset(frame->repeats, 0, pattern->repeat_count * sizeof(RE_RepeatData)); frame->previous = state->current_group_call_frame; frame->next = NULL; if (frame->previous) frame->previous->next = frame; else state->first_group_call_frame = frame; } frame->node = return_node; /* Push the groups and guards. */ if (return_node) { size_t g; size_t r; for (g = 0; g < pattern->true_group_count; g++) { frame->groups[g].span = state->groups[g].span; frame->groups[g].current_capture = state->groups[g].current_capture; } for (r = 0; r < pattern->repeat_count; r++) { if (!copy_repeat_data(safe_state, &frame->repeats[r], &state->repeats[r])) return FALSE; } } state->current_group_call_frame = frame; return TRUE; } /* Pops a return node from the group call stack. */ Py_LOCAL_INLINE(RE_Node*) pop_group_return(RE_State* state) { RE_GroupCallFrame* frame; frame = state->current_group_call_frame; /* Pop the groups and repeats. */ if (frame->node) { PatternObject* pattern; size_t g; size_t r; pattern = state->pattern; for (g = 0; g < pattern->true_group_count; g++) { state->groups[g].span = frame->groups[g].span; state->groups[g].current_capture = frame->groups[g].current_capture; } for (r = 0; r < pattern->repeat_count; r++) copy_repeat_data(NULL, &state->repeats[r], &frame->repeats[r]); } /* Withdraw to previous frame. */ state->current_group_call_frame = frame->previous; return frame->node; } /* Returns the return node from the top of the group call stack. */ Py_LOCAL_INLINE(RE_Node*) top_group_return(RE_State* state) { RE_GroupCallFrame* frame; frame = state->current_group_call_frame; return frame->node; } /* Checks whether a node matches only 1 character. */ Py_LOCAL_INLINE(BOOL) node_matches_one_character(RE_Node* node) { switch (node->op) { case RE_OP_ANY: case RE_OP_ANY_ALL: case RE_OP_ANY_ALL_REV: case RE_OP_ANY_REV: case RE_OP_ANY_U: case RE_OP_ANY_U_REV: case RE_OP_CHARACTER: case RE_OP_CHARACTER_IGN: case RE_OP_CHARACTER_IGN_REV: case RE_OP_CHARACTER_REV: case RE_OP_PROPERTY: case RE_OP_PROPERTY_IGN: case RE_OP_PROPERTY_IGN_REV: case RE_OP_PROPERTY_REV: case RE_OP_RANGE: case RE_OP_RANGE_IGN: case RE_OP_RANGE_IGN_REV: case RE_OP_RANGE_REV: case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_IGN: case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER: case RE_OP_SET_INTER_IGN: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION: case RE_OP_SET_UNION_IGN: case RE_OP_SET_UNION_IGN_REV: case RE_OP_SET_UNION_REV: return TRUE; default: return FALSE; } } /* Checks whether the node is a firstset. */ Py_LOCAL_INLINE(BOOL) is_firstset(RE_Node* node) { if (node->step != 0) return FALSE; return node_matches_one_character(node); } /* Locates the start node for testing ahead. */ Py_LOCAL_INLINE(RE_Node*) locate_test_start(RE_Node* node) { for (;;) { switch (node->op) { case RE_OP_BOUNDARY: switch (node->next_1.node->op) { case RE_OP_STRING: case RE_OP_STRING_FLD: case RE_OP_STRING_FLD_REV: case RE_OP_STRING_IGN: case RE_OP_STRING_IGN_REV: case RE_OP_STRING_REV: return node->next_1.node; default: return node; } case RE_OP_CALL_REF: case RE_OP_END_GROUP: case RE_OP_START_GROUP: node = node->next_1.node; break; case RE_OP_GREEDY_REPEAT: case RE_OP_LAZY_REPEAT: if (node->values[1] == 0) return node; node = node->next_1.node; break; case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: if (node->values[1] == 0) return node; return node->nonstring.next_2.node; case RE_OP_LOOKAROUND: node = node->nonstring.next_2.node; break; default: if (is_firstset(node)) { switch (node->next_1.node->op) { case RE_OP_END_OF_STRING: case RE_OP_START_OF_STRING: return node->next_1.node; } } return node; } } } /* Checks whether a character matches any of a set of case characters. */ Py_LOCAL_INLINE(BOOL) any_case(Py_UCS4 ch, int case_count, Py_UCS4* cases) { int i; for (i = 0; i < case_count; i++) { if (ch == cases[i]) return TRUE; } return FALSE; } /* Matches many ANYs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_ANY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; text = state->text; encoding = state->encoding; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_ANY(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_ANY(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_ANY(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many ANYs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_ANY_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; text = state->text; encoding = state->encoding; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_ANY(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_ANY(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_ANY(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many ANY_Us, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_ANY_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; text = state->text; encoding = state->encoding; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_ANY_U(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_ANY_U(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_ANY_U(encoding, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many ANY_Us, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_ANY_U_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; text = state->text; encoding = state->encoding; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_ANY_U(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_ANY_U(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_ANY_U(encoding, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many CHARACTERs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_CHARACTER(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; Py_UCS4 ch; text = state->text; match = node->match == match; ch = node->values[0]; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && (text_ptr[0] == ch) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && (text_ptr[0] == ch) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && (text_ptr[0] == ch) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many CHARACTERs, up to a limit, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_CHARACTER_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; Py_UCS4 cases[RE_MAX_CASES]; int case_count; text = state->text; match = node->match == match; case_count = state->encoding->all_cases(state->locale_info, node->values[0], cases); switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && any_case(text_ptr[0], case_count, cases) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && any_case(text_ptr[0], case_count, cases) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && any_case(text_ptr[0], case_count, cases) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many CHARACTERs, up to a limit, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_CHARACTER_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; Py_UCS4 cases[RE_MAX_CASES]; int case_count; text = state->text; match = node->match == match; case_count = state->encoding->all_cases(state->locale_info, node->values[0], cases); switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && any_case(text_ptr[-1], case_count, cases) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && any_case(text_ptr[-1], case_count, cases) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && any_case(text_ptr[-1], case_count, cases) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many CHARACTERs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_CHARACTER_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; Py_UCS4 ch; text = state->text; match = node->match == match; ch = node->values[0]; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && (text_ptr[-1] == ch) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && (text_ptr[-1] == ch) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && (text_ptr[-1] == ch) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many PROPERTYs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_PROPERTY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many PROPERTYs, up to a limit, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_PROPERTY_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many PROPERTYs, up to a limit, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_PROPERTY_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many PROPERTYs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_PROPERTY_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_PROPERTY(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many RANGEs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_RANGE(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many RANGEs, up to a limit, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_RANGE_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many RANGEs, up to a limit, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_RANGE_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_RANGE_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many RANGEs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_RANGE_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_RANGE(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many SETs, up to a limit. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_SET(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many SETs, up to a limit, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_SET_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr < limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr < limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr < limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[0]) == match) ++text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many SETs, up to a limit, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_SET_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_SET_IGN(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Matches many SETs, up to a limit, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) match_many_SET_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL match) { void* text; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; text = state->text; match = node->match == match; encoding = state->encoding; locale_info = state->locale_info; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr > limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS1*)text; break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr > limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS2*)text; break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr > limit_ptr && matches_SET(encoding, locale_info, node, text_ptr[-1]) == match) --text_ptr; text_pos = text_ptr - (Py_UCS4*)text; break; } } return text_pos; } /* Counts a repeated character pattern. */ Py_LOCAL_INLINE(size_t) count_one(RE_State* state, RE_Node* node, Py_ssize_t text_pos, size_t max_count, BOOL* is_partial) { size_t count; *is_partial = FALSE; if (max_count < 1) return 0; switch (node->op) { case RE_OP_ANY: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_ANY(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_ANY_ALL: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_ANY_ALL_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_ANY_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_ANY_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_ANY_U: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_ANY_U(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_ANY_U_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_ANY_U_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_CHARACTER: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_CHARACTER(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_CHARACTER_IGN: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_CHARACTER_IGN(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_CHARACTER_IGN_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_CHARACTER_IGN_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_CHARACTER_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_CHARACTER_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_PROPERTY: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_PROPERTY(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_PROPERTY_IGN: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_PROPERTY_IGN(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_PROPERTY_IGN_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_PROPERTY_IGN_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_PROPERTY_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_PROPERTY_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_RANGE: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_RANGE(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_RANGE_IGN: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_RANGE_IGN(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_RANGE_IGN_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_RANGE_IGN_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_RANGE_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_RANGE_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_SET(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: count = min_size_t((size_t)(state->slice_end - text_pos), max_count); count = (size_t)(match_many_SET_IGN(state, node, text_pos, text_pos + (Py_ssize_t)count, TRUE) - text_pos); *is_partial = count == (size_t)(state->text_length - text_pos) && count < max_count && state->partial_side == RE_PARTIAL_RIGHT; return count; case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_SET_IGN_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: count = min_size_t((size_t)(text_pos - state->slice_start), max_count); count = (size_t)(text_pos - match_many_SET_REV(state, node, text_pos, text_pos - (Py_ssize_t)count, TRUE)); *is_partial = count == (size_t)(text_pos) && count < max_count && state->partial_side == RE_PARTIAL_LEFT; return count; } return 0; } /* Performs a simple string search. */ Py_LOCAL_INLINE(Py_ssize_t) simple_string_search(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { Py_ssize_t length; RE_CODE* values; Py_UCS4 check_char; length = (Py_ssize_t)node->value_count; values = node->values; check_char = values[0]; *is_partial = FALSE; switch (state->charsize) { case 1: { Py_UCS1* text = (Py_UCS1*)state->text; Py_UCS1* text_ptr = text + text_pos; Py_UCS1* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (text_ptr[0] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } case 2: { Py_UCS2* text = (Py_UCS2*)state->text; Py_UCS2* text_ptr = text + text_pos; Py_UCS2* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (text_ptr[0] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } case 4: { Py_UCS4* text = (Py_UCS4*)state->text; Py_UCS4* text_ptr = text + text_pos; Py_UCS4* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (text_ptr[0] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } } /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_pos; } return -1; } /* Performs a simple string search, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) simple_string_search_ign(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { Py_ssize_t length; RE_CODE* values; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; Py_UCS4 cases[RE_MAX_CASES]; int case_count; length = (Py_ssize_t)node->value_count; values = node->values; encoding = state->encoding; locale_info = state->locale_info; case_count = encoding->all_cases(locale_info, values[0], cases); *is_partial = FALSE; switch (state->charsize) { case 1: { Py_UCS1* text = (Py_UCS1*)state->text; Py_UCS1* text_ptr = text + text_pos; Py_UCS1* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (any_case(text_ptr[0], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } case 2: { Py_UCS2* text = (Py_UCS2*)state->text; Py_UCS2* text_ptr = text + text_pos; Py_UCS2* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (any_case(text_ptr[0], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } case 4: { Py_UCS4* text = (Py_UCS4*)state->text; Py_UCS4* text_ptr = text + text_pos; Py_UCS4* limit_ptr = text + limit; while (text_ptr < limit_ptr) { if (any_case(text_ptr[0], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr + s_pos >= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[s_pos], values[s_pos])) break; ++s_pos; } } ++text_ptr; } text_pos = text_ptr - text; break; } } /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_RIGHT) { /* Partial match. */ *is_partial = TRUE; return text_pos; } return -1; } /* Performs a simple string search, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) simple_string_search_ign_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { Py_ssize_t length; RE_CODE* values; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; Py_UCS4 cases[RE_MAX_CASES]; int case_count; length = (Py_ssize_t)node->value_count; values = node->values; encoding = state->encoding; locale_info = state->locale_info; case_count = encoding->all_cases(locale_info, values[length - 1], cases); *is_partial = FALSE; switch (state->charsize) { case 1: { Py_UCS1* text = (Py_UCS1*)state->text; Py_UCS1* text_ptr = text + text_pos; Py_UCS1* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (any_case(text_ptr[-1], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } case 2: { Py_UCS2* text = (Py_UCS2*)state->text; Py_UCS2* text_ptr = text + text_pos; Py_UCS2* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (any_case(text_ptr[-1], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } case 4: { Py_UCS4* text = (Py_UCS4*)state->text; Py_UCS4* text_ptr = text + text_pos; Py_UCS4* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (any_case(text_ptr[-1], case_count, cases)) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char_ign(encoding, locale_info, text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } } /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_pos; } return -1; } /* Performs a simple string search, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) simple_string_search_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { Py_ssize_t length; RE_CODE* values; Py_UCS4 check_char; length = (Py_ssize_t)node->value_count; values = node->values; check_char = values[length - 1]; *is_partial = FALSE; switch (state->charsize) { case 1: { Py_UCS1* text = (Py_UCS1*)state->text; Py_UCS1* text_ptr = text + text_pos; Py_UCS1* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (text_ptr[-1] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } case 2: { Py_UCS2* text = (Py_UCS2*)state->text; Py_UCS2* text_ptr = text + text_pos; Py_UCS2* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (text_ptr[-1] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } case 4: { Py_UCS4* text = (Py_UCS4*)state->text; Py_UCS4* text_ptr = text + text_pos; Py_UCS4* limit_ptr = text + limit; while (text_ptr > limit_ptr) { if (text_ptr[-1] == check_char) { Py_ssize_t s_pos; s_pos = 1; for (;;) { if (s_pos >= length) /* End of search string. */ return text_ptr - text; if (text_ptr - s_pos <= limit_ptr) { /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_ptr - text; } return -1; } if (!same_char(text_ptr[- s_pos - 1], values[length - s_pos - 1])) break; ++s_pos; } } --text_ptr; } text_pos = text_ptr - text; break; } } /* Off the end of the text. */ if (state->partial_side == RE_PARTIAL_LEFT) { /* Partial match. */ *is_partial = TRUE; return text_pos; } return -1; } /* Performs a Boyer-Moore fast string search. */ Py_LOCAL_INLINE(Py_ssize_t) fast_string_search(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit) { void* text; Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; Py_ssize_t last_pos; Py_UCS4 check_char; text = state->text; length = (Py_ssize_t)node->value_count; values = node->values; good_suffix_offset = node->string.good_suffix_offset; bad_character_offset = node->string.bad_character_offset; last_pos = length - 1; check_char = values[last_pos]; limit -= length; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (ch == check_char) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char(text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS1*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (ch == check_char) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char(text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS2*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (ch == check_char) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char(text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS4*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } } return -1; } /* Performs a Boyer-Moore fast string search, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) fast_string_search_ign(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit) { RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; void* text; Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; Py_ssize_t last_pos; Py_UCS4 cases[RE_MAX_CASES]; int case_count; encoding = state->encoding; locale_info = state->locale_info; text = state->text; length = (Py_ssize_t)node->value_count; values = node->values; good_suffix_offset = node->string.good_suffix_offset; bad_character_offset = node->string.bad_character_offset; last_pos = length - 1; case_count = encoding->all_cases(locale_info, values[last_pos], cases); limit -= length; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS1*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS2*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr <= limit_ptr) { Py_UCS4 ch; ch = text_ptr[last_pos]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = last_pos - 1; while (pos >= 0 && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) --pos; if (pos < 0) return text_ptr - (Py_UCS4*)text; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } } return -1; } /* Performs a Boyer-Moore fast string search, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) fast_string_search_ign_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit) { RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; void* text; Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; Py_UCS4 cases[RE_MAX_CASES]; int case_count; encoding = state->encoding; locale_info = state->locale_info; text = state->text; length = (Py_ssize_t)node->value_count; values = node->values; good_suffix_offset = node->string.good_suffix_offset; bad_character_offset = node->string.bad_character_offset; case_count = encoding->all_cases(locale_info, values[0], cases); text_pos -= length; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = 1; while (pos < length && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS1*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = 1; while (pos < length && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS2*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (any_case(ch, case_count, cases)) { Py_ssize_t pos; pos = 1; while (pos < length && same_char_ign(encoding, locale_info, text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS4*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } } return -1; } /* Performs a Boyer-Moore fast string search, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) fast_string_search_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit) { void* text; Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad_character_offset; Py_ssize_t* good_suffix_offset; Py_UCS4 check_char; text = state->text; length = (Py_ssize_t)node->value_count; values = node->values; good_suffix_offset = node->string.good_suffix_offset; bad_character_offset = node->string.bad_character_offset; check_char = values[0]; text_pos -= length; switch (state->charsize) { case 1: { Py_UCS1* text_ptr; Py_UCS1* limit_ptr; text_ptr = (Py_UCS1*)text + text_pos; limit_ptr = (Py_UCS1*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (ch == check_char) { Py_ssize_t pos; pos = 1; while (pos < length && same_char(text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS1*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 2: { Py_UCS2* text_ptr; Py_UCS2* limit_ptr; text_ptr = (Py_UCS2*)text + text_pos; limit_ptr = (Py_UCS2*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (ch == check_char) { Py_ssize_t pos; pos = 1; while (pos < length && same_char(text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS2*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } case 4: { Py_UCS4* text_ptr; Py_UCS4* limit_ptr; text_ptr = (Py_UCS4*)text + text_pos; limit_ptr = (Py_UCS4*)text + limit; while (text_ptr >= limit_ptr) { Py_UCS4 ch; ch = text_ptr[0]; if (ch == check_char) { Py_ssize_t pos; pos = 1; while (pos < length && same_char(text_ptr[pos], values[pos])) ++pos; if (pos >= length) return text_ptr - (Py_UCS4*)text + length; text_ptr += good_suffix_offset[pos]; } else text_ptr += bad_character_offset[ch & 0xFF]; } break; } } return -1; } /* Builds the tables for a Boyer-Moore fast string search. */ Py_LOCAL_INLINE(BOOL) build_fast_tables(RE_State* state, RE_Node* node, BOOL ignore) { Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad; Py_ssize_t* good; Py_UCS4 ch; Py_ssize_t last_pos; Py_ssize_t pos; BOOL (*is_same_char)(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2); Py_ssize_t suffix_len; BOOL saved_start; Py_ssize_t s; Py_ssize_t i; Py_ssize_t s_start; Py_UCS4 codepoints[RE_MAX_CASES]; length = (Py_ssize_t)node->value_count; if (length < RE_MIN_FAST_LENGTH) return TRUE; values = node->values; bad = (Py_ssize_t*)re_alloc(256 * sizeof(bad[0])); good = (Py_ssize_t*)re_alloc((size_t)length * sizeof(good[0])); if (!bad || !good) { re_dealloc(bad); re_dealloc(good); return FALSE; } for (ch = 0; ch < 0x100; ch++) bad[ch] = length; last_pos = length - 1; for (pos = 0; pos < last_pos; pos++) { Py_ssize_t offset; offset = last_pos - pos; ch = values[pos]; if (ignore) { int count; int i; count = state->encoding->all_cases(state->locale_info, ch, codepoints); for (i = 0; i < count; i++) bad[codepoints[i] & 0xFF] = offset; } else bad[ch & 0xFF] = offset; } is_same_char = ignore ? same_char_ign_wrapper : same_char_wrapper; suffix_len = 2; pos = length - suffix_len; saved_start = FALSE; s = pos - 1; i = suffix_len - 1; s_start = s; while (pos >= 0) { /* Look for another occurrence of the suffix. */ while (i > 0) { /* Have we dropped off the end of the string? */ if (s + i < 0) break; if (is_same_char(state->encoding, state->locale_info, values[s + i], values[pos + i])) /* It still matches. */ --i; else { /* Start again further along. */ --s; i = suffix_len - 1; } } if (s >= 0 && is_same_char(state->encoding, state->locale_info, values[s], values[pos])) { /* We haven't dropped off the end of the string, and the suffix has * matched this far, so this is a good starting point for the next * iteration. */ --s; if (!saved_start) { s_start = s; saved_start = TRUE; } } else { /* Calculate the suffix offset. */ good[pos] = pos - s; /* Extend the suffix and start searching for _this_ one. */ --pos; ++suffix_len; /* Where's a good place to start searching? */ if (saved_start) { s = s_start; saved_start = FALSE; } else --s; /* Can we short-circuit the searching? */ if (s < 0) break; } i = suffix_len - 1; } /* Fill-in any remaining entries. */ while (pos >= 0) { good[pos] = pos - s; --pos; --s; } node->string.bad_character_offset = bad; node->string.good_suffix_offset = good; return TRUE; } /* Builds the tables for a Boyer-Moore fast string search, backwards. */ Py_LOCAL_INLINE(BOOL) build_fast_tables_rev(RE_State* state, RE_Node* node, BOOL ignore) { Py_ssize_t length; RE_CODE* values; Py_ssize_t* bad; Py_ssize_t* good; Py_UCS4 ch; Py_ssize_t last_pos; Py_ssize_t pos; BOOL (*is_same_char)(RE_EncodingTable* encoding, RE_LocaleInfo* locale_info, Py_UCS4 ch1, Py_UCS4 ch2); Py_ssize_t suffix_len; BOOL saved_start; Py_ssize_t s; Py_ssize_t i; Py_ssize_t s_start; Py_UCS4 codepoints[RE_MAX_CASES]; length = (Py_ssize_t)node->value_count; if (length < RE_MIN_FAST_LENGTH) return TRUE; values = node->values; bad = (Py_ssize_t*)re_alloc(256 * sizeof(bad[0])); good = (Py_ssize_t*)re_alloc((size_t)length * sizeof(good[0])); if (!bad || !good) { re_dealloc(bad); re_dealloc(good); return FALSE; } for (ch = 0; ch < 0x100; ch++) bad[ch] = -length; last_pos = length - 1; for (pos = last_pos; pos > 0; pos--) { Py_ssize_t offset; offset = -pos; ch = values[pos]; if (ignore) { int count; int i; count = state->encoding->all_cases(state->locale_info, ch, codepoints); for (i = 0; i < count; i++) bad[codepoints[i] & 0xFF] = offset; } else bad[ch & 0xFF] = offset; } is_same_char = ignore ? same_char_ign_wrapper : same_char_wrapper; suffix_len = 2; pos = suffix_len - 1; saved_start = FALSE; s = pos + 1; i = suffix_len - 1; s_start = s; while (pos < length) { /* Look for another occurrence of the suffix. */ while (i > 0) { /* Have we dropped off the end of the string? */ if (s - i >= length) break; if (is_same_char(state->encoding, state->locale_info, values[s - i], values[pos - i])) /* It still matches. */ --i; else { /* Start again further along. */ ++s; i = suffix_len - 1; } } if (s < length && is_same_char(state->encoding, state->locale_info, values[s], values[pos])) { /* We haven't dropped off the end of the string, and the suffix has * matched this far, so this is a good starting point for the next * iteration. */ ++s; if (!saved_start) { s_start = s; saved_start = TRUE; } } else { /* Calculate the suffix offset. */ good[pos] = pos - s; /* Extend the suffix and start searching for _this_ one. */ ++pos; ++suffix_len; /* Where's a good place to start searching? */ if (saved_start) { s = s_start; saved_start = FALSE; } else ++s; /* Can we short-circuit the searching? */ if (s >= length) break; } i = suffix_len - 1; } /* Fill-in any remaining entries. */ while (pos < length) { good[pos] = pos - s; ++pos; ++s; } node->string.bad_character_offset = bad; node->string.good_suffix_offset = good; return TRUE; } /* Performs a string search. */ Py_LOCAL_INLINE(Py_ssize_t) string_search(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { RE_State* state; Py_ssize_t found_pos; state = safe_state->re_state; *is_partial = FALSE; /* Has the node been initialised for fast searching, if necessary? */ if (!(node->status & RE_STATUS_FAST_INIT)) { /* Ideally the pattern should immutable and shareable across threads. * Internally, however, it isn't. For safety we need to hold the GIL. */ acquire_GIL(safe_state); /* Double-check because of multithreading. */ if (!(node->status & RE_STATUS_FAST_INIT)) { build_fast_tables(state, node, FALSE); node->status |= RE_STATUS_FAST_INIT; } release_GIL(safe_state); } if (node->string.bad_character_offset) { /* Start with a fast search. This will find the string if it's complete * (i.e. not truncated). */ found_pos = fast_string_search(state, node, text_pos, limit); if (found_pos < 0 && state->partial_side == RE_PARTIAL_RIGHT) /* We didn't find the string, but it could've been truncated, so * try again, starting close to the end. */ found_pos = simple_string_search(state, node, limit - (Py_ssize_t)(node->value_count - 1), limit, is_partial); } else found_pos = simple_string_search(state, node, text_pos, limit, is_partial); return found_pos; } /* Performs a string search, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_fld(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, Py_ssize_t* new_pos, BOOL* is_partial) { RE_State* state; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); void* text; RE_CODE* values; Py_ssize_t start_pos; int f_pos; int folded_len; Py_ssize_t length; Py_ssize_t s_pos; Py_UCS4 folded[RE_MAX_FOLDED]; state = safe_state->re_state; encoding = state->encoding; locale_info = state->locale_info; full_case_fold = encoding->full_case_fold; char_at = state->char_at; text = state->text; values = node->values; start_pos = text_pos; f_pos = 0; folded_len = 0; length = (Py_ssize_t)node->value_count; s_pos = 0; *is_partial = FALSE; while (s_pos < length || f_pos < folded_len) { if (f_pos >= folded_len) { /* Fetch and casefold another character. */ if (text_pos >= limit) { if (text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) { *is_partial = TRUE; return start_pos; } return -1; } folded_len = full_case_fold(locale_info, char_at(text, text_pos), folded); f_pos = 0; } if (s_pos < length && same_char_ign(encoding, locale_info, values[s_pos], folded[f_pos])) { ++s_pos; ++f_pos; if (f_pos >= folded_len) ++text_pos; } else { ++start_pos; text_pos = start_pos; f_pos = 0; folded_len = 0; s_pos = 0; } } /* We found the string. */ if (new_pos) *new_pos = text_pos; return start_pos; } /* Performs a string search, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_fld_rev(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, Py_ssize_t* new_pos, BOOL* is_partial) { RE_State* state; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); void* text; RE_CODE* values; Py_ssize_t start_pos; int f_pos; int folded_len; Py_ssize_t length; Py_ssize_t s_pos; Py_UCS4 folded[RE_MAX_FOLDED]; state = safe_state->re_state; encoding = state->encoding; locale_info = state->locale_info; full_case_fold = encoding->full_case_fold; char_at = state->char_at; text = state->text; values = node->values; start_pos = text_pos; f_pos = 0; folded_len = 0; length = (Py_ssize_t)node->value_count; s_pos = 0; *is_partial = FALSE; while (s_pos < length || f_pos < folded_len) { if (f_pos >= folded_len) { /* Fetch and casefold another character. */ if (text_pos <= limit) { if (text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) { *is_partial = TRUE; return start_pos; } return -1; } folded_len = full_case_fold(locale_info, char_at(text, text_pos - 1), folded); f_pos = 0; } if (s_pos < length && same_char_ign(encoding, locale_info, values[length - s_pos - 1], folded[folded_len - f_pos - 1])) { ++s_pos; ++f_pos; if (f_pos >= folded_len) --text_pos; } else { --start_pos; text_pos = start_pos; f_pos = 0; folded_len = 0; s_pos = 0; } } /* We found the string. */ if (new_pos) *new_pos = text_pos; return start_pos; } /* Performs a string search, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_ign(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { RE_State* state; Py_ssize_t found_pos; state = safe_state->re_state; *is_partial = FALSE; /* Has the node been initialised for fast searching, if necessary? */ if (!(node->status & RE_STATUS_FAST_INIT)) { /* Ideally the pattern should immutable and shareable across threads. * Internally, however, it isn't. For safety we need to hold the GIL. */ acquire_GIL(safe_state); /* Double-check because of multithreading. */ if (!(node->status & RE_STATUS_FAST_INIT)) { build_fast_tables(state, node, TRUE); node->status |= RE_STATUS_FAST_INIT; } release_GIL(safe_state); } if (node->string.bad_character_offset) { /* Start with a fast search. This will find the string if it's complete * (i.e. not truncated). */ found_pos = fast_string_search_ign(state, node, text_pos, limit); if (found_pos < 0 && state->partial_side == RE_PARTIAL_RIGHT) /* We didn't find the string, but it could've been truncated, so * try again, starting close to the end. */ found_pos = simple_string_search_ign(state, node, limit - (Py_ssize_t)(node->value_count - 1), limit, is_partial); } else found_pos = simple_string_search_ign(state, node, text_pos, limit, is_partial); return found_pos; } /* Performs a string search, backwards, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_ign_rev(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { RE_State* state; Py_ssize_t found_pos; state = safe_state->re_state; *is_partial = FALSE; /* Has the node been initialised for fast searching, if necessary? */ if (!(node->status & RE_STATUS_FAST_INIT)) { /* Ideally the pattern should immutable and shareable across threads. * Internally, however, it isn't. For safety we need to hold the GIL. */ acquire_GIL(safe_state); /* Double-check because of multithreading. */ if (!(node->status & RE_STATUS_FAST_INIT)) { build_fast_tables_rev(state, node, TRUE); node->status |= RE_STATUS_FAST_INIT; } release_GIL(safe_state); } if (node->string.bad_character_offset) { /* Start with a fast search. This will find the string if it's complete * (i.e. not truncated). */ found_pos = fast_string_search_ign_rev(state, node, text_pos, limit); if (found_pos < 0 && state->partial_side == RE_PARTIAL_LEFT) /* We didn't find the string, but it could've been truncated, so * try again, starting close to the end. */ found_pos = simple_string_search_ign_rev(state, node, limit + (Py_ssize_t)(node->value_count - 1), limit, is_partial); } else found_pos = simple_string_search_ign_rev(state, node, text_pos, limit, is_partial); return found_pos; } /* Performs a string search, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) string_search_rev(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t limit, BOOL* is_partial) { RE_State* state; Py_ssize_t found_pos; state = safe_state->re_state; *is_partial = FALSE; /* Has the node been initialised for fast searching, if necessary? */ if (!(node->status & RE_STATUS_FAST_INIT)) { /* Ideally the pattern should immutable and shareable across threads. * Internally, however, it isn't. For safety we need to hold the GIL. */ acquire_GIL(safe_state); /* Double-check because of multithreading. */ if (!(node->status & RE_STATUS_FAST_INIT)) { build_fast_tables_rev(state, node, FALSE); node->status |= RE_STATUS_FAST_INIT; } release_GIL(safe_state); } if (node->string.bad_character_offset) { /* Start with a fast search. This will find the string if it's complete * (i.e. not truncated). */ found_pos = fast_string_search_rev(state, node, text_pos, limit); if (found_pos < 0 && state->partial_side == RE_PARTIAL_LEFT) /* We didn't find the string, but it could've been truncated, so * try again, starting close to the end. */ found_pos = simple_string_search_rev(state, node, limit + (Py_ssize_t)(node->value_count - 1), limit, is_partial); } else found_pos = simple_string_search_rev(state, node, text_pos, limit, is_partial); return found_pos; } /* Returns how many characters there could be before full case-folding. */ Py_LOCAL_INLINE(Py_ssize_t) possible_unfolded_length(Py_ssize_t length) { if (length == 0) return 0; if (length < RE_MAX_FOLDED) return 1; return length / RE_MAX_FOLDED; } /* Checks whether there's any character except a newline at a position. */ Py_LOCAL_INLINE(int) try_match_ANY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_ANY(state->encoding, node, state->char_at(state->text, text_pos))); } /* Checks whether there's any character at all at a position. */ Py_LOCAL_INLINE(int) try_match_ANY_ALL(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end); } /* Checks whether there's any character at all at a position, backwards. */ Py_LOCAL_INLINE(int) try_match_ANY_ALL_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start); } /* Checks whether there's any character except a newline at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_ANY_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_ANY(state->encoding, node, state->char_at(state->text, text_pos - 1))); } /* Checks whether there's any character except a line separator at a position. */ Py_LOCAL_INLINE(int) try_match_ANY_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_ANY_U(state->encoding, node, state->char_at(state->text, text_pos))); } /* Checks whether there's any character except a line separator at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_ANY_U_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_ANY_U(state->encoding, node, state->char_at(state->text, text_pos - 1))); } /* Checks whether a position is on a word boundary. */ Py_LOCAL_INLINE(int) try_match_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_boundary(state, text_pos) == node->match); } /* Checks whether there's a character at a position. */ Py_LOCAL_INLINE(int) try_match_CHARACTER(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_CHARACTER(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character at a position, ignoring case. */ Py_LOCAL_INLINE(int) try_match_CHARACTER_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_CHARACTER_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character at a position, ignoring case, backwards. */ Py_LOCAL_INLINE(int) try_match_CHARACTER_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_CHARACTER_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character at a position, backwards. */ Py_LOCAL_INLINE(int) try_match_CHARACTER_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_CHARACTER(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether a position is on a default word boundary. */ Py_LOCAL_INLINE(int) try_match_DEFAULT_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_default_boundary(state, text_pos) == node->match); } /* Checks whether a position is at the default end of a word. */ Py_LOCAL_INLINE(int) try_match_DEFAULT_END_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_default_word_end(state, text_pos)); } /* Checks whether a position is at the default start of a word. */ Py_LOCAL_INLINE(int) try_match_DEFAULT_START_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_default_word_start(state, text_pos)); } /* Checks whether a position is at the end of a line. */ Py_LOCAL_INLINE(int) try_match_END_OF_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos >= state->slice_end || state->char_at(state->text, text_pos) == '\n'); } /* Checks whether a position is at the end of a line. */ Py_LOCAL_INLINE(int) try_match_END_OF_LINE_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_line_end(state, text_pos)); } /* Checks whether a position is at the end of the string. */ Py_LOCAL_INLINE(int) try_match_END_OF_STRING(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos >= state->text_length); } /* Checks whether a position is at the end of a line or the string. */ Py_LOCAL_INLINE(int) try_match_END_OF_STRING_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos >= state->text_length || text_pos == state->final_newline); } /* Checks whether a position is at the end of the string. */ Py_LOCAL_INLINE(int) try_match_END_OF_STRING_LINE_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos >= state->text_length || text_pos == state->final_line_sep); } /* Checks whether a position is at the end of a word. */ Py_LOCAL_INLINE(int) try_match_END_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_word_end(state, text_pos)); } /* Checks whether a position is on a grapheme boundary. */ Py_LOCAL_INLINE(int) try_match_GRAPHEME_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_grapheme_boundary(state, text_pos)); } /* Checks whether there's a character with a certain property at a position. */ Py_LOCAL_INLINE(int) try_match_PROPERTY(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_PROPERTY(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character with a certain property at a position, * ignoring case. */ Py_LOCAL_INLINE(int) try_match_PROPERTY_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_PROPERTY_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character with a certain property at a position, * ignoring case, backwards. */ Py_LOCAL_INLINE(int) try_match_PROPERTY_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_PROPERTY_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character with a certain property at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_PROPERTY_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_PROPERTY(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character in a certain range at a position. */ Py_LOCAL_INLINE(int) try_match_RANGE(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_RANGE(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character in a certain range at a position, * ignoring case. */ Py_LOCAL_INLINE(int) try_match_RANGE_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_RANGE_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character in a certain range at a position, * ignoring case, backwards. */ Py_LOCAL_INLINE(int) try_match_RANGE_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_RANGE_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character in a certain range at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_RANGE_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_RANGE(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether a position is at the search anchor. */ Py_LOCAL_INLINE(int) try_match_SEARCH_ANCHOR(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos == state->search_anchor); } /* Checks whether there's a character in a certain set at a position. */ Py_LOCAL_INLINE(int) try_match_SET(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_SET(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character in a certain set at a position, ignoring * case. */ Py_LOCAL_INLINE(int) try_match_SET_IGN(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos < state->slice_end && matches_SET_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos)) == node->match); } /* Checks whether there's a character in a certain set at a position, ignoring * case, backwards. */ Py_LOCAL_INLINE(int) try_match_SET_IGN_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_SET_IGN(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether there's a character in a certain set at a position, * backwards. */ Py_LOCAL_INLINE(int) try_match_SET_REV(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { if (text_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } return bool_as_status(text_pos > state->slice_start && matches_SET(state->encoding, state->locale_info, node, state->char_at(state->text, text_pos - 1)) == node->match); } /* Checks whether a position is at the start of a line. */ Py_LOCAL_INLINE(int) try_match_START_OF_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos <= 0 || state->char_at(state->text, text_pos - 1) == '\n'); } /* Checks whether a position is at the start of a line. */ Py_LOCAL_INLINE(int) try_match_START_OF_LINE_U(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_line_start(state, text_pos)); } /* Checks whether a position is at the start of the string. */ Py_LOCAL_INLINE(int) try_match_START_OF_STRING(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(text_pos <= 0); } /* Checks whether a position is at the start of a word. */ Py_LOCAL_INLINE(int) try_match_START_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { return bool_as_status(state->encoding->at_word_start(state, text_pos)); } /* Checks whether there's a certain string at a position. */ Py_LOCAL_INLINE(int) try_match_STRING(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_CODE* values; Py_ssize_t s_pos; length = (Py_ssize_t)node->value_count; char_at = state->char_at; values = node->values; for (s_pos = 0; s_pos < length; s_pos++) { if (text_pos + s_pos >= state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } if (!same_char(char_at(state->text, text_pos + s_pos), values[s_pos])) return RE_ERROR_FAILURE; } next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, ignoring case. */ Py_LOCAL_INLINE(int) try_match_STRING_FLD(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t s_pos; RE_CODE* values; int folded_len; int f_pos; Py_ssize_t start_pos; Py_UCS4 folded[RE_MAX_FOLDED]; length = (Py_ssize_t)node->value_count; char_at = state->char_at; encoding = state->encoding; locale_info = state->locale_info; full_case_fold = encoding->full_case_fold; s_pos = 0; values = node->values; folded_len = 0; f_pos = 0; start_pos = text_pos; while (s_pos < length) { if (f_pos >= folded_len) { /* Fetch and casefold another character. */ if (text_pos >= state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } folded_len = full_case_fold(locale_info, char_at(state->text, text_pos), folded); f_pos = 0; } if (!same_char_ign(encoding, locale_info, folded[f_pos], values[s_pos])) return RE_ERROR_FAILURE; ++s_pos; ++f_pos; if (f_pos >= folded_len) ++text_pos; } if (f_pos < folded_len) return RE_ERROR_FAILURE; next_position->node = next->match_next; if (next->match_step == 0) next_position->text_pos = start_pos; else next_position->text_pos = text_pos; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, ignoring case, * backwards. */ Py_LOCAL_INLINE(int) try_match_STRING_FLD_REV(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t s_pos; RE_CODE* values; int folded_len; int f_pos; Py_ssize_t start_pos; Py_UCS4 folded[RE_MAX_FOLDED]; length = (Py_ssize_t)node->value_count; char_at = state->char_at; encoding = state->encoding; locale_info = state->locale_info; full_case_fold = encoding->full_case_fold; s_pos = 0; values = node->values; folded_len = 0; f_pos = 0; start_pos = text_pos; while (s_pos < length) { if (f_pos >= folded_len) { /* Fetch and casefold another character. */ if (text_pos <= state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } folded_len = full_case_fold(locale_info, char_at(state->text, text_pos - 1), folded); f_pos = 0; } if (!same_char_ign(encoding, locale_info, folded[folded_len - f_pos - 1], values[length - s_pos - 1])) return RE_ERROR_FAILURE; ++s_pos; ++f_pos; if (f_pos >= folded_len) --text_pos; } if (f_pos < folded_len) return RE_ERROR_FAILURE; next_position->node = next->match_next; if (next->match_step == 0) next_position->text_pos = start_pos; else next_position->text_pos = text_pos; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, ignoring case. */ Py_LOCAL_INLINE(int) try_match_STRING_IGN(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; RE_CODE* values; Py_ssize_t s_pos; length = (Py_ssize_t)node->value_count; char_at = state->char_at; encoding = state->encoding; locale_info = state->locale_info; values = node->values; for (s_pos = 0; s_pos < length; s_pos++) { if (text_pos + s_pos >= state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } if (!same_char_ign(encoding, locale_info, char_at(state->text, text_pos + s_pos), values[s_pos])) return RE_ERROR_FAILURE; } next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, ignoring case, * backwards. */ Py_LOCAL_INLINE(int) try_match_STRING_IGN_REV(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; RE_CODE* values; Py_ssize_t s_pos; length = (Py_ssize_t)node->value_count; char_at = state->char_at; encoding = state->encoding; locale_info = state->locale_info; values = node->values; for (s_pos = 0; s_pos < length; s_pos++) { if (text_pos - s_pos <= state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } if (!same_char_ign(encoding, locale_info, char_at(state->text, text_pos - s_pos - 1), values[length - s_pos - 1])) return RE_ERROR_FAILURE; } next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Checks whether there's a certain string at a position, backwards. */ Py_LOCAL_INLINE(int) try_match_STRING_REV(RE_State* state, RE_NextNode* next, RE_Node* node, Py_ssize_t text_pos, RE_Position* next_position) { Py_ssize_t length; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_CODE* values; Py_ssize_t s_pos; length = (Py_ssize_t)node->value_count; char_at = state->char_at; values = node->values; for (s_pos = 0; s_pos < length; s_pos++) { if (text_pos - s_pos <= state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } if (!same_char(char_at(state->text, text_pos - s_pos - 1), values[length - s_pos - 1])) return RE_ERROR_FAILURE; } next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Tries a match at the current text position. * * Returns the next node and text position if the match succeeds. */ Py_LOCAL_INLINE(int) try_match(RE_State* state, RE_NextNode* next, Py_ssize_t text_pos, RE_Position* next_position) { RE_Node* test; int status; test = next->test; if (test->status & RE_STATUS_FUZZY) { next_position->node = next->node; next_position->text_pos = text_pos; return RE_ERROR_SUCCESS; } switch (test->op) { case RE_OP_ANY: status = try_match_ANY(state, test, text_pos); break; case RE_OP_ANY_ALL: status = try_match_ANY_ALL(state, test, text_pos); break; case RE_OP_ANY_ALL_REV: status = try_match_ANY_ALL_REV(state, test, text_pos); break; case RE_OP_ANY_REV: status = try_match_ANY_REV(state, test, text_pos); break; case RE_OP_ANY_U: status = try_match_ANY_U(state, test, text_pos); break; case RE_OP_ANY_U_REV: status = try_match_ANY_U_REV(state, test, text_pos); break; case RE_OP_BOUNDARY: status = try_match_BOUNDARY(state, test, text_pos); break; case RE_OP_BRANCH: status = try_match(state, &test->next_1, text_pos, next_position); if (status == RE_ERROR_FAILURE) status = try_match(state, &test->nonstring.next_2, text_pos, next_position); break; case RE_OP_CHARACTER: status = try_match_CHARACTER(state, test, text_pos); break; case RE_OP_CHARACTER_IGN: status = try_match_CHARACTER_IGN(state, test, text_pos); break; case RE_OP_CHARACTER_IGN_REV: status = try_match_CHARACTER_IGN_REV(state, test, text_pos); break; case RE_OP_CHARACTER_REV: status = try_match_CHARACTER_REV(state, test, text_pos); break; case RE_OP_DEFAULT_BOUNDARY: status = try_match_DEFAULT_BOUNDARY(state, test, text_pos); break; case RE_OP_DEFAULT_END_OF_WORD: status = try_match_DEFAULT_END_OF_WORD(state, test, text_pos); break; case RE_OP_DEFAULT_START_OF_WORD: status = try_match_DEFAULT_START_OF_WORD(state, test, text_pos); break; case RE_OP_END_OF_LINE: status = try_match_END_OF_LINE(state, test, text_pos); break; case RE_OP_END_OF_LINE_U: status = try_match_END_OF_LINE_U(state, test, text_pos); break; case RE_OP_END_OF_STRING: status = try_match_END_OF_STRING(state, test, text_pos); break; case RE_OP_END_OF_STRING_LINE: status = try_match_END_OF_STRING_LINE(state, test, text_pos); break; case RE_OP_END_OF_STRING_LINE_U: status = try_match_END_OF_STRING_LINE_U(state, test, text_pos); break; case RE_OP_END_OF_WORD: status = try_match_END_OF_WORD(state, test, text_pos); break; case RE_OP_GRAPHEME_BOUNDARY: status = try_match_GRAPHEME_BOUNDARY(state, test, text_pos); break; case RE_OP_PROPERTY: status = try_match_PROPERTY(state, test, text_pos); break; case RE_OP_PROPERTY_IGN: status = try_match_PROPERTY_IGN(state, test, text_pos); break; case RE_OP_PROPERTY_IGN_REV: status = try_match_PROPERTY_IGN_REV(state, test, text_pos); break; case RE_OP_PROPERTY_REV: status = try_match_PROPERTY_REV(state, test, text_pos); break; case RE_OP_RANGE: status = try_match_RANGE(state, test, text_pos); break; case RE_OP_RANGE_IGN: status = try_match_RANGE_IGN(state, test, text_pos); break; case RE_OP_RANGE_IGN_REV: status = try_match_RANGE_IGN_REV(state, test, text_pos); break; case RE_OP_RANGE_REV: status = try_match_RANGE_REV(state, test, text_pos); break; case RE_OP_SEARCH_ANCHOR: status = try_match_SEARCH_ANCHOR(state, test, text_pos); break; case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: status = try_match_SET(state, test, text_pos); break; case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: status = try_match_SET_IGN(state, test, text_pos); break; case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: status = try_match_SET_IGN_REV(state, test, text_pos); break; case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: status = try_match_SET_REV(state, test, text_pos); break; case RE_OP_START_OF_LINE: status = try_match_START_OF_LINE(state, test, text_pos); break; case RE_OP_START_OF_LINE_U: status = try_match_START_OF_LINE_U(state, test, text_pos); break; case RE_OP_START_OF_STRING: status = try_match_START_OF_STRING(state, test, text_pos); break; case RE_OP_START_OF_WORD: status = try_match_START_OF_WORD(state, test, text_pos); break; case RE_OP_STRING: return try_match_STRING(state, next, test, text_pos, next_position); case RE_OP_STRING_FLD: return try_match_STRING_FLD(state, next, test, text_pos, next_position); case RE_OP_STRING_FLD_REV: return try_match_STRING_FLD_REV(state, next, test, text_pos, next_position); case RE_OP_STRING_IGN: return try_match_STRING_IGN(state, next, test, text_pos, next_position); case RE_OP_STRING_IGN_REV: return try_match_STRING_IGN_REV(state, next, test, text_pos, next_position); case RE_OP_STRING_REV: return try_match_STRING_REV(state, next, test, text_pos, next_position); default: next_position->node = next->node; next_position->text_pos = text_pos; return RE_ERROR_SUCCESS; } if (status != RE_ERROR_SUCCESS) return status; next_position->node = next->match_next; next_position->text_pos = text_pos + next->match_step; return RE_ERROR_SUCCESS; } /* Searches for a word boundary. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_boundary)(RE_State* state, Py_ssize_t text_pos); at_boundary = state->encoding->at_boundary; *is_partial = FALSE; for (;;) { if (at_boundary(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for a word boundary, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_BOUNDARY_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_boundary)(RE_State* state, Py_ssize_t text_pos); at_boundary = state->encoding->at_boundary; *is_partial = FALSE; for (;;) { if (at_boundary(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for a default word boundary. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_boundary)(RE_State* state, Py_ssize_t text_pos); at_default_boundary = state->encoding->at_default_boundary; *is_partial = FALSE; for (;;) { if (at_default_boundary(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for a default word boundary, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_BOUNDARY_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_boundary)(RE_State* state, Py_ssize_t text_pos); at_default_boundary = state->encoding->at_default_boundary; *is_partial = FALSE; for (;;) { if (at_default_boundary(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the default end of a word. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_END_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_word_end)(RE_State* state, Py_ssize_t text_pos); at_default_word_end = state->encoding->at_default_word_end; *is_partial = FALSE; for (;;) { if (at_default_word_end(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the default end of a word, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_END_OF_WORD_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_word_end)(RE_State* state, Py_ssize_t text_pos); at_default_word_end = state->encoding->at_default_word_end; *is_partial = FALSE; for (;;) { if (at_default_word_end(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the default start of a word. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_START_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_word_start)(RE_State* state, Py_ssize_t text_pos); at_default_word_start = state->encoding->at_default_word_start; *is_partial = FALSE; for (;;) { if (at_default_word_start(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the default start of a word, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_DEFAULT_START_OF_WORD_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_default_word_start)(RE_State* state, Py_ssize_t text_pos); at_default_word_start = state->encoding->at_default_word_start; *is_partial = FALSE; for (;;) { if (at_default_word_start(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the end of line. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; for (;;) { if (text_pos >= state->text_length || state->char_at(state->text, text_pos) == '\n') return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the end of line, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_LINE_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; for (;;) { if (text_pos >= state->text_length || state->char_at(state->text, text_pos) == '\n') return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the end of the string. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_STRING(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (state->slice_end >= state->text_length) return state->text_length; return -1; } /* Searches for the end of the string, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_STRING_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (text_pos >= state->text_length) return text_pos; return -1; } /* Searches for the end of the string or line. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_STRING_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (text_pos <= state->final_newline) text_pos = state->final_newline; else if (text_pos <= state->text_length) text_pos = state->text_length; if (text_pos > state->slice_end) return -1; if (text_pos >= state->text_length) return text_pos; return text_pos; } /* Searches for the end of the string or line, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_STRING_LINE_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (text_pos >= state->text_length) text_pos = state->text_length; else if (text_pos >= state->final_newline) text_pos = state->final_newline; else return -1; if (text_pos < state->slice_start) return -1; if (text_pos <= 0) return text_pos; return text_pos; } /* Searches for the end of a word. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_word_end)(RE_State* state, Py_ssize_t text_pos); at_word_end = state->encoding->at_word_end; *is_partial = FALSE; for (;;) { if (at_word_end(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the end of a word, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_END_OF_WORD_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_word_end)(RE_State* state, Py_ssize_t text_pos); at_word_end = state->encoding->at_word_end; *is_partial = FALSE; for (;;) { if (at_word_end(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for a grapheme boundary. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_GRAPHEME_BOUNDARY(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_grapheme_boundary)(RE_State* state, Py_ssize_t text_pos); at_grapheme_boundary = state->encoding->at_grapheme_boundary; *is_partial = FALSE; for (;;) { if (at_grapheme_boundary(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for a grapheme boundary, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_GRAPHEME_BOUNDARY_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_grapheme_boundary)(RE_State* state, Py_ssize_t text_pos); at_grapheme_boundary = state->encoding->at_grapheme_boundary; *is_partial = FALSE; for (;;) { if (at_grapheme_boundary(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the start of line. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_LINE(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; for (;;) { if (text_pos <= 0 || state->char_at(state->text, text_pos - 1) == '\n') return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the start of line, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_LINE_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; for (;;) { if (text_pos <= 0 || state->char_at(state->text, text_pos - 1) == '\n') return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for the start of the string. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_STRING(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (text_pos <= 0) return text_pos; return -1; } /* Searches for the start of the string, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_STRING_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { *is_partial = FALSE; if (state->slice_start <= 0) return 0; return -1; } /* Searches for the start of a word. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_WORD(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_word_start)(RE_State* state, Py_ssize_t text_pos); at_word_start = state->encoding->at_word_start; *is_partial = FALSE; for (;;) { if (at_word_start(state, text_pos) == node->match) return text_pos; if (text_pos >= state->slice_end) return -1; ++text_pos; } } /* Searches for the start of a word, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_START_OF_WORD_rev(RE_State* state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { BOOL (*at_word_start)(RE_State* state, Py_ssize_t text_pos); at_word_start = state->encoding->at_word_start; *is_partial = FALSE; for (;;) { if (at_word_start(state, text_pos) == node->match) return text_pos; if (text_pos <= state->slice_start) return -1; --text_pos; } } /* Searches for a string. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) return text_pos; return string_search(safe_state, node, text_pos, state->slice_end, is_partial); } /* Searches for a string, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_FLD(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t* new_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) { *new_pos = state->req_end; return text_pos; } return string_search_fld(safe_state, node, text_pos, state->slice_end, new_pos, is_partial); } /* Searches for a string, ignoring case, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_FLD_REV(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, Py_ssize_t* new_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) { *new_pos = state->req_end; return text_pos; } return string_search_fld_rev(safe_state, node, text_pos, state->slice_start, new_pos, is_partial); } /* Searches for a string, ignoring case. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_IGN(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) return text_pos; return string_search_ign(safe_state, node, text_pos, state->slice_end, is_partial); } /* Searches for a string, ignoring case, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_IGN_REV(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) return text_pos; return string_search_ign_rev(safe_state, node, text_pos, state->slice_start, is_partial); } /* Searches for a string, backwards. */ Py_LOCAL_INLINE(Py_ssize_t) search_start_STRING_REV(RE_SafeState* safe_state, RE_Node* node, Py_ssize_t text_pos, BOOL* is_partial) { RE_State* state; state = safe_state->re_state; *is_partial = FALSE; if ((node->status & RE_STATUS_REQUIRED) && text_pos == state->req_pos) return text_pos; return string_search_rev(safe_state, node, text_pos, state->slice_start, is_partial); } /* Searches for the start of a match. */ Py_LOCAL_INLINE(int) search_start(RE_SafeState* safe_state, RE_NextNode* next, RE_Position* new_position, int search_index) { RE_State* state; Py_ssize_t start_pos; RE_Node* test; RE_Node* node; RE_SearchPosition* info; Py_ssize_t text_pos; state = safe_state->re_state; start_pos = state->text_pos; TRACE(("<> at %d\n", start_pos)) test = next->test; node = next->node; if (state->reverse) { if (start_pos < state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = state->slice_start; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } } else { if (start_pos > state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = state->slice_end; return RE_ERROR_PARTIAL; } } } if (test->status & RE_STATUS_FUZZY) { /* Don't call 'search_start' again. */ state->pattern->do_search_start = FALSE; state->match_pos = start_pos; new_position->node = node; new_position->text_pos = start_pos; return RE_ERROR_SUCCESS; } again: if (!state->pattern->is_fuzzy && state->partial_side == RE_PARTIAL_NONE) { if (state->reverse) { if (start_pos - state->min_width < state->slice_start) return RE_ERROR_FAILURE; } else { if (start_pos + state->min_width > state->slice_end) return RE_ERROR_FAILURE; } } if (search_index < MAX_SEARCH_POSITIONS) { info = &state->search_positions[search_index]; if (state->reverse) { if (info->start_pos >= 0 && info->start_pos >= start_pos && start_pos >= info->match_pos) { state->match_pos = info->match_pos; new_position->text_pos = state->match_pos; new_position->node = node; return RE_ERROR_SUCCESS; } } else { if (info->start_pos >= 0 && info->start_pos <= start_pos && start_pos <= info->match_pos) { state->match_pos = info->match_pos; new_position->text_pos = state->match_pos; new_position->node = node; return RE_ERROR_SUCCESS; } } } else info = NULL; switch (test->op) { case RE_OP_ANY: start_pos = match_many_ANY(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_ANY_ALL: case RE_OP_ANY_ALL_REV: break; case RE_OP_ANY_REV: start_pos = match_many_ANY_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_ANY_U: start_pos = match_many_ANY_U(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_ANY_U_REV: start_pos = match_many_ANY_U_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_BOUNDARY: { BOOL is_partial; if (state->reverse) start_pos = search_start_BOUNDARY_rev(state, test, start_pos, &is_partial); else start_pos = search_start_BOUNDARY(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_CHARACTER: start_pos = match_many_CHARACTER(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_CHARACTER_IGN: start_pos = match_many_CHARACTER_IGN(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_CHARACTER_IGN_REV: start_pos = match_many_CHARACTER_IGN_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_CHARACTER_REV: start_pos = match_many_CHARACTER_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_DEFAULT_BOUNDARY: { BOOL is_partial; if (state->reverse) start_pos = search_start_DEFAULT_BOUNDARY_rev(state, test, start_pos, &is_partial); else start_pos = search_start_DEFAULT_BOUNDARY(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_DEFAULT_END_OF_WORD: { BOOL is_partial; if (state->reverse) start_pos = search_start_DEFAULT_END_OF_WORD_rev(state, test, start_pos, &is_partial); else start_pos = search_start_DEFAULT_END_OF_WORD(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_DEFAULT_START_OF_WORD: { BOOL is_partial; if (state->reverse) start_pos = search_start_DEFAULT_START_OF_WORD_rev(state, test, start_pos, &is_partial); else start_pos = search_start_DEFAULT_START_OF_WORD(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_END_OF_LINE: { BOOL is_partial; if (state->reverse) start_pos = search_start_END_OF_LINE_rev(state, test, start_pos, &is_partial); else start_pos = search_start_END_OF_LINE(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_END_OF_STRING: { BOOL is_partial; if (state->reverse) start_pos = search_start_END_OF_STRING_rev(state, test, start_pos, &is_partial); else start_pos = search_start_END_OF_STRING(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_END_OF_STRING_LINE: { BOOL is_partial; if (state->reverse) start_pos = search_start_END_OF_STRING_LINE_rev(state, test, start_pos, &is_partial); else start_pos = search_start_END_OF_STRING_LINE(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_END_OF_WORD: { BOOL is_partial; if (state->reverse) start_pos = search_start_END_OF_WORD_rev(state, test, start_pos, &is_partial); else start_pos = search_start_END_OF_WORD(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_GRAPHEME_BOUNDARY: { BOOL is_partial; if (state->reverse) start_pos = search_start_GRAPHEME_BOUNDARY_rev(state, test, start_pos, &is_partial); else start_pos = search_start_GRAPHEME_BOUNDARY(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_PROPERTY: start_pos = match_many_PROPERTY(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_PROPERTY_IGN: start_pos = match_many_PROPERTY_IGN(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_PROPERTY_IGN_REV: start_pos = match_many_PROPERTY_IGN_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_PROPERTY_REV: start_pos = match_many_PROPERTY_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_RANGE: start_pos = match_many_RANGE(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_RANGE_IGN: start_pos = match_many_RANGE_IGN(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return RE_ERROR_FAILURE; break; case RE_OP_RANGE_IGN_REV: start_pos = match_many_RANGE_IGN_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_RANGE_REV: start_pos = match_many_RANGE_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return RE_ERROR_FAILURE; break; case RE_OP_SEARCH_ANCHOR: if (state->reverse) { if (start_pos < state->search_anchor) return RE_ERROR_FAILURE; } else { if (start_pos > state->search_anchor) return RE_ERROR_FAILURE; } start_pos = state->search_anchor; break; case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: start_pos = match_many_SET(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return FALSE; break; case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: start_pos = match_many_SET_IGN(state, test, start_pos, state->slice_end, FALSE); if (start_pos >= state->text_length) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos >= state->slice_end) return FALSE; break; case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: start_pos = match_many_SET_IGN_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return FALSE; break; case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: start_pos = match_many_SET_REV(state, test, start_pos, state->slice_start, FALSE); if (start_pos <= 0) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } } if (start_pos <= state->slice_start) return FALSE; break; case RE_OP_START_OF_LINE: { BOOL is_partial; if (state->reverse) start_pos = search_start_START_OF_LINE_rev(state, test, start_pos, &is_partial); else start_pos = search_start_START_OF_LINE(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_START_OF_STRING: { BOOL is_partial; if (state->reverse) start_pos = search_start_START_OF_STRING_rev(state, test, start_pos, &is_partial); else start_pos = search_start_START_OF_STRING(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_START_OF_WORD: { BOOL is_partial; if (state->reverse) start_pos = search_start_START_OF_WORD_rev(state, test, start_pos, &is_partial); else start_pos = search_start_START_OF_WORD(state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_STRING: { BOOL is_partial; start_pos = search_start_STRING(safe_state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_STRING_FLD: { Py_ssize_t new_pos; BOOL is_partial; start_pos = search_start_STRING_FLD(safe_state, test, start_pos, &new_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } /* Can we look further ahead? */ if (test == node) { if (test->next_1.node) { int status; status = try_match(state, &test->next_1, new_pos, new_position); if (status < 0) return status; if (status == RE_ERROR_FAILURE) { ++start_pos; if (start_pos >= state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = state->slice_start; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } goto again; } } /* It's a possible match. */ state->match_pos = start_pos; if (info) { info->start_pos = state->text_pos; info->match_pos = state->match_pos; } return RE_ERROR_SUCCESS; } break; } case RE_OP_STRING_FLD_REV: { Py_ssize_t new_pos; BOOL is_partial; start_pos = search_start_STRING_FLD_REV(safe_state, test, start_pos, &new_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } /* Can we look further ahead? */ if (test == node) { if (test->next_1.node) { int status; status = try_match(state, &test->next_1, new_pos, new_position); if (status < 0) return status; if (status == RE_ERROR_FAILURE) { --start_pos; if (start_pos <= state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = state->slice_start; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } goto again; } } /* It's a possible match. */ state->match_pos = start_pos; if (info) { info->start_pos = state->text_pos; info->match_pos = state->match_pos; } return RE_ERROR_SUCCESS; } break; } case RE_OP_STRING_IGN: { BOOL is_partial; start_pos = search_start_STRING_IGN(safe_state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_STRING_IGN_REV: { BOOL is_partial; start_pos = search_start_STRING_IGN_REV(safe_state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } case RE_OP_STRING_REV: { BOOL is_partial; start_pos = search_start_STRING_REV(safe_state, test, start_pos, &is_partial); if (start_pos < 0) return RE_ERROR_FAILURE; if (is_partial) { new_position->text_pos = start_pos; return RE_ERROR_PARTIAL; } break; } default: /* Don't call 'search_start' again. */ state->pattern->do_search_start = FALSE; state->match_pos = start_pos; new_position->node = node; new_position->text_pos = start_pos; return RE_ERROR_SUCCESS; } /* Can we look further ahead? */ if (test == node) { text_pos = start_pos + test->step; if (test->next_1.node) { int status; status = try_match(state, &test->next_1, text_pos, new_position); if (status < 0) return status; if (status == RE_ERROR_FAILURE) { if (state->reverse) { --start_pos; if (start_pos < state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) { new_position->text_pos = state->slice_start; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } } else { ++start_pos; if (start_pos > state->slice_end) { if (state->partial_side == RE_PARTIAL_RIGHT) { new_position->text_pos = state->slice_end; return RE_ERROR_PARTIAL; } return RE_ERROR_FAILURE; } } goto again; } } } else { new_position->node = node; new_position->text_pos = start_pos; } /* It's a possible match. */ state->match_pos = start_pos; if (info) { info->start_pos = state->text_pos; info->match_pos = state->match_pos; } return RE_ERROR_SUCCESS; } /* Saves a capture group. */ Py_LOCAL_INLINE(BOOL) save_capture(RE_SafeState* safe_state, size_t private_index, size_t public_index) { RE_State* state; RE_GroupData* private_group; RE_GroupData* public_group; state = safe_state->re_state; /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ private_group = &state->groups[private_index - 1]; public_group = &state->groups[public_index - 1]; /* Will the repeated captures ever be visible? */ if (!state->visible_captures) { public_group->captures[0] = private_group->span; public_group->capture_count = 1; return TRUE; } if (public_group->capture_count >= public_group->capture_capacity) { size_t new_capacity; RE_GroupSpan* new_captures; new_capacity = public_group->capture_capacity * 2; new_capacity = max_size_t(new_capacity, RE_INIT_CAPTURE_SIZE); new_captures = (RE_GroupSpan*)safe_realloc(safe_state, public_group->captures, new_capacity * sizeof(RE_GroupSpan)); if (!new_captures) return FALSE; public_group->captures = new_captures; public_group->capture_capacity = new_capacity; } public_group->captures[public_group->capture_count++] = private_group->span; return TRUE; } /* Unsaves a capture group. */ Py_LOCAL_INLINE(void) unsave_capture(RE_State* state, size_t private_index, size_t public_index) { /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ if (state->groups[public_index - 1].capture_count > 0) --state->groups[public_index - 1].capture_count; } /* Pushes the groups for backtracking. */ Py_LOCAL_INLINE(BOOL) push_groups(RE_SafeState* safe_state) { RE_State* state; size_t group_count; RE_SavedGroups* current; size_t g; state = safe_state->re_state; group_count = state->pattern->true_group_count; if (group_count == 0) return TRUE; current = state->current_saved_groups; if (current && current->next) current = current->next; else if (!current && state->first_saved_groups) current = state->first_saved_groups; else { RE_SavedGroups* new_block; new_block = (RE_SavedGroups*)safe_alloc(safe_state, sizeof(RE_SavedGroups)); if (!new_block) return FALSE; new_block->spans = (RE_GroupSpan*)safe_alloc(safe_state, group_count * sizeof(RE_GroupSpan)); new_block->counts = (size_t*)safe_alloc(safe_state, group_count * sizeof(Py_ssize_t)); if (!new_block->spans || !new_block->counts) { safe_dealloc(safe_state, new_block->spans); safe_dealloc(safe_state, new_block->counts); safe_dealloc(safe_state, new_block); return FALSE; } new_block->previous = current; new_block->next = NULL; if (new_block->previous) new_block->previous->next = new_block; else state->first_saved_groups = new_block; current = new_block; } for (g = 0; g < group_count; g++) { current->spans[g] = state->groups[g].span; current->counts[g] = state->groups[g].capture_count; } state->current_saved_groups = current; return TRUE; } /* Pops the groups for backtracking. */ Py_LOCAL_INLINE(void) pop_groups(RE_State* state) { size_t group_count; RE_SavedGroups* current; size_t g; group_count = state->pattern->true_group_count; if (group_count == 0) return; current = state->current_saved_groups; for (g = 0; g < group_count; g++) { state->groups[g].span = current->spans[g]; state->groups[g].capture_count = current->counts[g]; } state->current_saved_groups = current->previous; } /* Drops the groups for backtracking. */ Py_LOCAL_INLINE(void) drop_groups(RE_State* state) { if (state->pattern->true_group_count != 0) state->current_saved_groups = state->current_saved_groups->previous; } /* Pushes the repeats for backtracking. */ Py_LOCAL_INLINE(BOOL) push_repeats(RE_SafeState* safe_state) { RE_State* state; PatternObject* pattern; size_t repeat_count; RE_SavedRepeats* current; size_t r; state = safe_state->re_state; pattern = state->pattern; repeat_count = pattern->repeat_count; if (repeat_count == 0) return TRUE; current = state->current_saved_repeats; if (current && current->next) current = current->next; else if (!current && state->first_saved_repeats) current = state->first_saved_repeats; else { RE_SavedRepeats* new_block; new_block = (RE_SavedRepeats*)safe_alloc(safe_state, sizeof(RE_SavedRepeats)); if (!new_block) return FALSE; new_block->repeats = (RE_RepeatData*)safe_alloc(safe_state, repeat_count * sizeof(RE_RepeatData)); if (!new_block->repeats) { safe_dealloc(safe_state, new_block); return FALSE; } memset(new_block->repeats, 0, repeat_count * sizeof(RE_RepeatData)); new_block->previous = current; new_block->next = NULL; if (new_block->previous) new_block->previous->next = new_block; else state->first_saved_repeats = new_block; current = new_block; } for (r = 0; r < repeat_count; r++) { if (!copy_repeat_data(safe_state, ¤t->repeats[r], &state->repeats[r])) return FALSE; } state->current_saved_repeats = current; return TRUE; } /* Pops the repeats for backtracking. */ Py_LOCAL_INLINE(void) pop_repeats(RE_State* state) { PatternObject* pattern; size_t repeat_count; RE_SavedRepeats* current; size_t r; pattern = state->pattern; repeat_count = pattern->repeat_count; if (repeat_count == 0) return; current = state->current_saved_repeats; for (r = 0; r < repeat_count; r++) copy_repeat_data(NULL, &state->repeats[r], ¤t->repeats[r]); state->current_saved_repeats = current->previous; } /* Drops the repeats for backtracking. */ Py_LOCAL_INLINE(void) drop_repeats(RE_State* state) { PatternObject* pattern; size_t repeat_count; RE_SavedRepeats* current; pattern = state->pattern; repeat_count = pattern->repeat_count; if (repeat_count == 0) return; current = state->current_saved_repeats; state->current_saved_repeats = current->previous; } /* Inserts a new span in a guard list. */ Py_LOCAL_INLINE(BOOL) insert_guard_span(RE_SafeState* safe_state, RE_GuardList* guard_list, size_t index) { size_t n; if (guard_list->count >= guard_list->capacity) { size_t new_capacity; RE_GuardSpan* new_spans; new_capacity = guard_list->capacity * 2; if (new_capacity == 0) new_capacity = RE_INIT_GUARDS_BLOCK_SIZE; new_spans = (RE_GuardSpan*)safe_realloc(safe_state, guard_list->spans, new_capacity * sizeof(RE_GuardSpan)); if (!new_spans) return FALSE; guard_list->capacity = new_capacity; guard_list->spans = new_spans; } n = guard_list->count - index; if (n > 0) memmove(guard_list->spans + index + 1, guard_list->spans + index, n * sizeof(RE_GuardSpan)); ++guard_list->count; return TRUE; } /* Deletes a span in a guard list. */ Py_LOCAL_INLINE(void) delete_guard_span(RE_GuardList* guard_list, size_t index) { size_t n; n = guard_list->count - index - 1; if (n > 0) memmove(guard_list->spans + index, guard_list->spans + index + 1, n * sizeof(RE_GuardSpan)); --guard_list->count; } /* Checks whether a position is guarded against further matching. */ Py_LOCAL_INLINE(BOOL) is_guarded(RE_GuardList* guard_list, Py_ssize_t text_pos) { size_t low; size_t high; /* Is this position in the guard list? */ if (guard_list->count == 0 || text_pos < guard_list->spans[0].low) guard_list->last_low = 0; else if (text_pos > guard_list->spans[guard_list->count - 1].high) guard_list->last_low = guard_list->count; else { low = 0; high = guard_list->count; while (low < high) { size_t mid; RE_GuardSpan* span; mid = (low + high) / 2; span = &guard_list->spans[mid]; if (text_pos < span->low) high = mid; else if (text_pos > span->high) low = mid + 1; else return span->protect; } guard_list->last_low = low; } guard_list->last_text_pos = text_pos; return FALSE; } /* Guards a position against further matching. */ Py_LOCAL_INLINE(BOOL) guard(RE_SafeState* safe_state, RE_GuardList* guard_list, Py_ssize_t text_pos, BOOL protect) { size_t low; size_t high; /* Where should be new position be added? */ if (text_pos == guard_list->last_text_pos) low = guard_list->last_low; else { low = 0; high = guard_list->count; while (low < high) { size_t mid; RE_GuardSpan* span; mid = (low + high) / 2; span = &guard_list->spans[mid]; if (text_pos < span->low) high = mid; else if (text_pos > span->high) low = mid + 1; else return TRUE; } } /* Add the position to the guard list. */ if (low > 0 && guard_list->spans[low - 1].high + 1 == text_pos && guard_list->spans[low - 1].protect == protect) { /* The new position is just above this span. */ if (low < guard_list->count && guard_list->spans[low].low - 1 == text_pos && guard_list->spans[low].protect == protect) { /* The new position joins 2 spans */ guard_list->spans[low - 1].high = guard_list->spans[low].high; delete_guard_span(guard_list, low); } else /* Extend the span. */ guard_list->spans[low - 1].high = text_pos; } else if (low < guard_list->count && guard_list->spans[low].low - 1 == text_pos && guard_list->spans[low].protect == protect) /* The new position is just below this span. */ /* Extend the span. */ guard_list->spans[low].low = text_pos; else { /* Insert a new span. */ if (!insert_guard_span(safe_state, guard_list, low)) return FALSE; guard_list->spans[low].low = text_pos; guard_list->spans[low].high = text_pos; guard_list->spans[low].protect = protect; } guard_list->last_text_pos = -1; return TRUE; } /* Guards a position against further matching for a repeat. */ Py_LOCAL_INLINE(BOOL) guard_repeat(RE_SafeState* safe_state, size_t index, Py_ssize_t text_pos, RE_STATUS_T guard_type, BOOL protect) { RE_State* state; RE_GuardList* guard_list; state = safe_state->re_state; /* Is a guard active here? */ if (!(state->pattern->repeat_info[index].status & guard_type)) return TRUE; /* Which guard list? */ if (guard_type & RE_STATUS_BODY) guard_list = &state->repeats[index].body_guard_list; else guard_list = &state->repeats[index].tail_guard_list; return guard(safe_state, guard_list, text_pos, protect); } /* Guards a range of positions against further matching for a repeat. */ Py_LOCAL_INLINE(BOOL) guard_repeat_range(RE_SafeState* safe_state, size_t index, Py_ssize_t lo_pos, Py_ssize_t hi_pos, RE_STATUS_T guard_type, BOOL protect) { RE_State* state; RE_GuardList* guard_list; Py_ssize_t pos; state = safe_state->re_state; /* Is a guard active here? */ if (!(state->pattern->repeat_info[index].status & guard_type)) return TRUE; /* Which guard list? */ if (guard_type & RE_STATUS_BODY) guard_list = &state->repeats[index].body_guard_list; else guard_list = &state->repeats[index].tail_guard_list; for (pos = lo_pos; pos <= hi_pos; pos++) { if (!guard(safe_state, guard_list, pos, protect)) return FALSE; } return TRUE; } /* Checks whether a position is guarded against further matching for a repeat. */ Py_LOCAL_INLINE(BOOL) is_repeat_guarded(RE_SafeState* safe_state, size_t index, Py_ssize_t text_pos, RE_STATUS_T guard_type) { RE_State* state; RE_GuardList* guard_list; state = safe_state->re_state; /* Is a guard active here? */ if (!(state->pattern->repeat_info[index].status & guard_type)) return FALSE; /* Which guard list? */ if (guard_type == RE_STATUS_BODY) guard_list = &state->repeats[index].body_guard_list; else guard_list = &state->repeats[index].tail_guard_list; return is_guarded(guard_list, text_pos); } /* Builds a Unicode string. */ Py_LOCAL_INLINE(PyObject*) build_unicode_value(void* buffer, Py_ssize_t len, Py_ssize_t buffer_charsize) { #if PY_VERSION_HEX >= 0x03030000 int kind; switch (buffer_charsize) { case 1: kind = PyUnicode_1BYTE_KIND; break; case 2: kind = PyUnicode_2BYTE_KIND; break; case 4: kind = PyUnicode_4BYTE_KIND; break; default: kind = PyUnicode_1BYTE_KIND; break; } return PyUnicode_FromKindAndData(kind, buffer, len); #else return PyUnicode_FromUnicode(buffer, len); #endif } /* Builds a bytestring. Returns NULL if any member is too wide. */ Py_LOCAL_INLINE(PyObject*) build_bytes_value(void* buffer, Py_ssize_t len, Py_ssize_t buffer_charsize) { Py_UCS1* byte_buffer; Py_ssize_t i; PyObject* result; if (buffer_charsize == 1) return Py_BuildValue("y#", buffer, len); byte_buffer = re_alloc((size_t)len); if (!byte_buffer) return NULL; for (i = 0; i < len; i++) { Py_UCS2 c = ((Py_UCS2*)buffer)[i]; if (c > 0xFF) goto too_wide; byte_buffer[i] = (Py_UCS1)c; } result = Py_BuildValue("y#", byte_buffer, len); re_dealloc(byte_buffer); return result; too_wide: re_dealloc(byte_buffer); return NULL; } /* Looks for a string in a string set. */ Py_LOCAL_INLINE(int) string_set_contains(RE_State* state, PyObject* string_set, Py_ssize_t first, Py_ssize_t last) { PyObject* string; int status; if (state->is_unicode) string = build_unicode_value(state->point_to(state->text, first), last - first, state->charsize); else string = build_bytes_value(state->point_to(state->text, first), last - first, state->charsize); if (!string) return RE_ERROR_INTERNAL; status = PySet_Contains(string_set, string); Py_DECREF(string); return status; } /* Looks for a string in a string set, ignoring case. */ Py_LOCAL_INLINE(int) string_set_contains_ign(RE_State* state, PyObject* string_set, void* buffer, Py_ssize_t index, Py_ssize_t len, Py_ssize_t buffer_charsize) { Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; BOOL (*possible_turkic)(RE_LocaleInfo* locale_info, Py_UCS4 ch); Py_UCS4 codepoints[4]; switch (buffer_charsize) { case 1: char_at = bytes1_char_at; set_char_at = bytes1_set_char_at; break; case 2: char_at = bytes2_char_at; set_char_at = bytes2_set_char_at; break; case 4: char_at = bytes4_char_at; set_char_at = bytes4_set_char_at; break; default: char_at = bytes1_char_at; set_char_at = bytes1_set_char_at; break; } encoding = state->encoding; locale_info = state->locale_info; possible_turkic = encoding->possible_turkic; /* Look for a possible Turkic 'I'. */ while (index < len && !possible_turkic(locale_info, char_at(buffer, index))) ++index; if (index < len) { /* Possible Turkic 'I'. */ int count; int i; /* Try all the alternatives to the 'I'. */ count = encoding->all_turkic_i(locale_info, char_at(buffer, index), codepoints); for (i = 0; i < count; i++) { int status; set_char_at(buffer, index, codepoints[i]); /* Recurse for the remainder of the string. */ status = string_set_contains_ign(state, string_set, buffer, index + 1, len, buffer_charsize); if (status != 0) return status; } return 0; } else { /* No Turkic 'I'. */ PyObject* string; int status; if (state->is_unicode) string = build_unicode_value(buffer, len, buffer_charsize); else string = build_bytes_value(buffer, len, buffer_charsize); if (!string) return RE_ERROR_MEMORY; status = PySet_Contains(string_set, string); Py_DECREF(string); return status; } } /* Creates a partial string set for truncation at the left or right side. */ Py_LOCAL_INLINE(int) make_partial_string_set(RE_State* state, RE_Node* node) { PatternObject* pattern; int partial_side; PyObject* string_set; PyObject* partial_set; PyObject* iter = NULL; PyObject* item = NULL; PyObject* slice = NULL; pattern = state->pattern; partial_side = state->partial_side; if (partial_side != RE_PARTIAL_LEFT && partial_side != RE_PARTIAL_RIGHT) return RE_ERROR_INTERNAL; /* Fetch the full string set. PyList_GET_ITEM borrows a reference. */ string_set = PyList_GET_ITEM(pattern->named_list_indexes, node->values[0]); if (!string_set) return RE_ERROR_INTERNAL; /* Gets the list of partial string sets. */ if (!pattern->partial_named_lists[partial_side]) { size_t size; size = pattern->named_lists_count * sizeof(PyObject*); pattern->partial_named_lists[partial_side] = re_alloc(size); if (!pattern->partial_named_lists[partial_side]) return RE_ERROR_INTERNAL; memset(pattern->partial_named_lists[partial_side], 0, size); } /* Get the partial string set. */ partial_set = pattern->partial_named_lists[partial_side][node->values[0]]; if (partial_set) return 1; /* Build the partial string set. */ partial_set = PySet_New(NULL); if (!partial_set) return RE_ERROR_INTERNAL; iter = PyObject_GetIter(string_set); if (!iter) goto error; item = PyIter_Next(iter); while (item) { Py_ssize_t len; Py_ssize_t first; Py_ssize_t last; len = PySequence_Length(item); if (len == -1) goto error; first = 0; last = len; while (last - first > 1) { int status; /* Shorten the entry. */ if (partial_side == RE_PARTIAL_LEFT) ++first; else --last; slice = PySequence_GetSlice(item, first, last); if (!slice) goto error; status = PySet_Add(partial_set, slice); Py_DECREF(slice); if (status < 0) goto error; } Py_DECREF(item); item = PyIter_Next(iter); } if (PyErr_Occurred()) goto error; Py_DECREF(iter); pattern->partial_named_lists[partial_side][node->values[0]] = partial_set; return 1; error: Py_XDECREF(item); Py_XDECREF(iter); Py_DECREF(partial_set); return RE_ERROR_INTERNAL; } /* Tries to match a string at the current position with a member of a string * set, forwards or backwards. */ Py_LOCAL_INLINE(int) string_set_match_fwdrev(RE_SafeState* safe_state, RE_Node* node, BOOL reverse) { RE_State* state; Py_ssize_t min_len; Py_ssize_t max_len; Py_ssize_t text_available; Py_ssize_t slice_available; int partial_side; Py_ssize_t len; Py_ssize_t first; Py_ssize_t last; int status; PyObject* string_set; state = safe_state->re_state; min_len = (Py_ssize_t)node->values[1]; max_len = (Py_ssize_t)node->values[2]; acquire_GIL(safe_state); if (reverse) { text_available = state->text_pos; slice_available = state->text_pos - state->slice_start; partial_side = RE_PARTIAL_LEFT; } else { text_available = state->text_length - state->text_pos; slice_available = state->slice_end - state->text_pos; partial_side = RE_PARTIAL_RIGHT; } /* Get as many characters as we need for the longest possible match. */ len = min_ssize_t(max_len, slice_available); if (reverse) { first = state->text_pos - len; last = state->text_pos; } else { first = state->text_pos; last = state->text_pos + len; } /* If we didn't get all of the characters we need, is a partial match * allowed? */ if (len < max_len && len == text_available && state->partial_side == partial_side) { if (len == 0) { /* An empty string is always a possible partial match. */ status = RE_ERROR_PARTIAL; goto finished; } /* Make a set of the possible partial matches. */ status = make_partial_string_set(state, node); if (status < 0) goto finished; /* Fetch the partial string set. */ string_set = state->pattern->partial_named_lists[partial_side][node->values[0]]; /* Is the text we have a partial match? */ status = string_set_contains(state, string_set, first, last); if (status < 0) goto finished; if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= len; else state->text_pos += len; status = RE_ERROR_PARTIAL; goto finished; } } /* Fetch the string set. PyList_GET_ITEM borrows a reference. */ string_set = PyList_GET_ITEM(state->pattern->named_list_indexes, node->values[0]); if (!string_set) { status = RE_ERROR_INTERNAL; goto finished; } /* We've already looked for a partial match (if allowed), but what about a * complete match? */ while (len >= min_len) { status = string_set_contains(state, string_set, first, last); if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= len; else state->text_pos += len; status = 1; goto finished; } /* Look for a shorter match. */ --len; if (reverse) ++first; else --last; } /* No match. */ status = 0; finished: release_GIL(safe_state); return status; } /* Tries to match a string at the current position with a member of a string * set, ignoring case, forwards or backwards. */ Py_LOCAL_INLINE(int) string_set_match_fld_fwdrev(RE_SafeState* safe_state, RE_Node* node, BOOL reverse) { RE_State* state; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); Py_ssize_t folded_charsize; void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); Py_ssize_t min_len; Py_ssize_t max_len; Py_ssize_t buf_len; void* folded; int status; BOOL* end_of_fold = NULL; Py_ssize_t text_available; Py_ssize_t slice_available; Py_ssize_t t_pos; Py_ssize_t f_pos; int step; int partial_side; Py_ssize_t len; Py_ssize_t consumed; Py_UCS4 codepoints[RE_MAX_FOLDED]; Py_ssize_t first; Py_ssize_t last; PyObject* string_set; state = safe_state->re_state; full_case_fold = state->encoding->full_case_fold; char_at = state->char_at; #if PY_VERSION_HEX >= 0x03030000 /* The folded string needs to be at least 2 bytes per character. */ folded_charsize = max_ssize_t(state->charsize, sizeof(Py_UCS2)); #else /* The folded string will have the same width as the original string. */ folded_charsize = state->charsize; #endif switch (folded_charsize) { case 1: set_char_at = bytes1_set_char_at; break; case 2: set_char_at = bytes2_set_char_at; break; case 4: set_char_at = bytes4_set_char_at; break; default: return RE_ERROR_INTERNAL; } min_len = (Py_ssize_t)node->values[1]; max_len = (Py_ssize_t)node->values[2]; acquire_GIL(safe_state); /* Allocate a buffer for the folded string. */ buf_len = max_len + RE_MAX_FOLDED; folded = re_alloc((size_t)(buf_len * folded_charsize)); if (!folded) { status = RE_ERROR_MEMORY; goto finished; } end_of_fold = re_alloc((size_t)buf_len * sizeof(BOOL)); if (!end_of_fold) { status = RE_ERROR_MEMORY; goto finished; } memset(end_of_fold, 0, (size_t)buf_len * sizeof(BOOL)); if (reverse) { text_available = state->text_pos; slice_available = state->text_pos - state->slice_start; t_pos = state->text_pos - 1; f_pos = buf_len; step = -1; partial_side = RE_PARTIAL_LEFT; } else { text_available = state->text_length - state->text_pos; slice_available = state->slice_end - state->text_pos; t_pos = state->text_pos; f_pos = 0; step = 1; partial_side = RE_PARTIAL_RIGHT; } /* We can stop getting characters as soon as the case-folded string is long * enough (each codepoint from the text can expand to more than one folded * codepoint). */ len = 0; end_of_fold[len] = TRUE; consumed = 0; while (len < max_len && consumed < slice_available) { int count; int j; count = full_case_fold(state->locale_info, char_at(state->text, t_pos), codepoints); if (reverse) f_pos -= count; for (j = 0; j < count; j++) set_char_at(folded, f_pos + j, codepoints[j]); if (!reverse) f_pos += count; len += count; end_of_fold[len] = TRUE; ++consumed; t_pos += step; } if (reverse) { first = f_pos; last = buf_len; } else { first = 0; last = f_pos; } /* If we didn't get all of the characters we need, is a partial match * allowed? */ if (len < max_len && len == text_available && state->partial_side == partial_side) { if (len == 0) { /* An empty string is always a possible partial match. */ status = RE_ERROR_PARTIAL; goto finished; } /* Make a set of the possible partial matches. */ status = make_partial_string_set(state, node); if (status < 0) goto finished; /* Fetch the partial string set. */ string_set = state->pattern->partial_named_lists[partial_side][node->values[0]]; /* Is the text we have a partial match? */ status = string_set_contains_ign(state, string_set, folded, first, last, folded_charsize); if (status < 0) goto finished; if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= consumed; else state->text_pos += consumed; status = RE_ERROR_PARTIAL; goto finished; } } /* Fetch the string set. PyList_GET_ITEM borrows a reference. */ string_set = PyList_GET_ITEM(state->pattern->named_list_indexes, node->values[0]); if (!string_set) { status = RE_ERROR_INTERNAL; goto finished; } /* We've already looked for a partial match (if allowed), but what about a * complete match? */ while (len >= min_len) { if (end_of_fold[len]) { status = string_set_contains_ign(state, string_set, folded, first, last, folded_charsize); if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= consumed; else state->text_pos += consumed; status = 1; goto finished; } --consumed; } /* Look for a shorter match. */ --len; if (reverse) ++first; else --last; } /* No match. */ status = 0; finished: re_dealloc(end_of_fold); re_dealloc(folded); release_GIL(safe_state); return status; } /* Tries to match a string at the current position with a member of a string * set, ignoring case, forwards or backwards. */ Py_LOCAL_INLINE(int) string_set_match_ign_fwdrev(RE_SafeState* safe_state, RE_Node* node, BOOL reverse) { RE_State* state; Py_UCS4 (*simple_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch); Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); Py_ssize_t folded_charsize; void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); Py_ssize_t min_len; Py_ssize_t max_len; void* folded; int status; Py_ssize_t text_available; Py_ssize_t slice_available; Py_ssize_t t_pos; Py_ssize_t f_pos; int step; int partial_side; Py_ssize_t len; Py_ssize_t i; Py_ssize_t first; Py_ssize_t last; PyObject* string_set; state = safe_state->re_state; simple_case_fold = state->encoding->simple_case_fold; char_at = state->char_at; #if PY_VERSION_HEX >= 0x03030000 /* The folded string needs to be at least 2 bytes per character. */ folded_charsize = max_ssize_t(state->charsize, sizeof(Py_UCS2)); #else /* The folded string will have the same width as the original string. */ folded_charsize = state->charsize; #endif switch (folded_charsize) { case 1: set_char_at = bytes1_set_char_at; break; case 2: set_char_at = bytes2_set_char_at; break; case 4: set_char_at = bytes4_set_char_at; break; default: return RE_ERROR_INTERNAL; } min_len = (Py_ssize_t)node->values[1]; max_len = (Py_ssize_t)node->values[2]; acquire_GIL(safe_state); /* Allocate a buffer for the folded string. */ folded = re_alloc((size_t)(max_len * folded_charsize)); if (!folded) { status = RE_ERROR_MEMORY; goto finished; } if (reverse) { text_available = state->text_pos; slice_available = state->text_pos - state->slice_start; t_pos = state->text_pos - 1; f_pos = max_len - 1; step = -1; partial_side = RE_PARTIAL_LEFT; } else { text_available = state->text_length - state->text_pos; slice_available = state->slice_end - state->text_pos; t_pos = state->text_pos; f_pos = 0; step = 1; partial_side = RE_PARTIAL_RIGHT; } /* Get as many characters as we need for the longest possible match. */ len = min_ssize_t(max_len, slice_available); for (i = 0; i < len; i ++) { Py_UCS4 ch; ch = simple_case_fold(state->locale_info, char_at(state->text, t_pos)); set_char_at(folded, f_pos, ch); t_pos += step; f_pos += step; } if (reverse) { first = f_pos; last = max_len; } else { first = 0; last = f_pos; } /* If we didn't get all of the characters we need, is a partial match * allowed? */ if (len < max_len && len == text_available && state->partial_side == partial_side) { if (len == 0) { /* An empty string is always a possible partial match. */ status = RE_ERROR_PARTIAL; goto finished; } /* Make a set of the possible partial matches. */ status = make_partial_string_set(state, node); if (status < 0) goto finished; /* Fetch the partial string set. */ string_set = state->pattern->partial_named_lists[partial_side][node->values[0]]; /* Is the text we have a partial match? */ status = string_set_contains_ign(state, string_set, folded, first, last, folded_charsize); if (status < 0) goto finished; if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= len; else state->text_pos += len; status = RE_ERROR_PARTIAL; goto finished; } } /* Fetch the string set. PyList_GET_ITEM borrows a reference. */ string_set = PyList_GET_ITEM(state->pattern->named_list_indexes, node->values[0]); if (!string_set) { status = RE_ERROR_INTERNAL; goto finished; } /* We've already looked for a partial match (if allowed), but what about a * complete match? */ while (len >= min_len) { status = string_set_contains_ign(state, string_set, folded, first, last, folded_charsize); if (status == 1) { /* Advance past the match. */ if (reverse) state->text_pos -= len; else state->text_pos += len; status = 1; goto finished; } /* Look for a shorter match. */ --len; if (reverse) ++first; else --last; } /* No match. */ status = 0; finished: re_dealloc(folded); release_GIL(safe_state); return status; } /* Checks whether any additional fuzzy error is permitted. */ Py_LOCAL_INLINE(BOOL) any_error_permitted(RE_State* state) { RE_FuzzyInfo* fuzzy_info; RE_CODE* values; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; return fuzzy_info->total_cost <= values[RE_FUZZY_VAL_MAX_COST] && fuzzy_info->counts[RE_FUZZY_ERR] < values[RE_FUZZY_VAL_MAX_ERR] && state->total_errors <= state->max_errors; } /* Checks whether this additional fuzzy error is permitted. */ Py_LOCAL_INLINE(BOOL) this_error_permitted(RE_State* state, int fuzzy_type) { RE_FuzzyInfo* fuzzy_info; RE_CODE* values; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; return fuzzy_info->total_cost + values[RE_FUZZY_VAL_COST_BASE + fuzzy_type] <= values[RE_FUZZY_VAL_MAX_COST] && fuzzy_info->counts[fuzzy_type] < values[RE_FUZZY_VAL_MAX_BASE + fuzzy_type] && state->total_errors + 1 <= state->max_errors; } /* Checks whether we've reachsd the end of the text during a fuzzy partial * match. */ Py_LOCAL_INLINE(int) check_fuzzy_partial(RE_State* state, Py_ssize_t text_pos) { switch (state->partial_side) { case RE_PARTIAL_LEFT: if (text_pos < 0) return RE_ERROR_PARTIAL; break; case RE_PARTIAL_RIGHT: if (text_pos > state->text_length) return RE_ERROR_PARTIAL; break; } return RE_ERROR_FAILURE; } /* Checks a fuzzy match of an item. */ Py_LOCAL_INLINE(int) next_fuzzy_match_item(RE_State* state, RE_FuzzyData* data, BOOL is_string, int step) { Py_ssize_t new_pos; if (this_error_permitted(state, data->fuzzy_type)) { switch (data->fuzzy_type) { case RE_FUZZY_DEL: /* Could a character at text_pos have been deleted? */ if (is_string) data->new_string_pos += step; else data->new_node = data->new_node->next_1.node; return RE_ERROR_SUCCESS; case RE_FUZZY_INS: /* Could the character at text_pos have been inserted? */ if (!data->permit_insertion) return RE_ERROR_FAILURE; new_pos = data->new_text_pos + step; if (state->slice_start <= new_pos && new_pos <= state->slice_end) { data->new_text_pos = new_pos; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); case RE_FUZZY_SUB: /* Could the character at text_pos have been substituted? */ new_pos = data->new_text_pos + step; if (state->slice_start <= new_pos && new_pos <= state->slice_end) { data->new_text_pos = new_pos; if (is_string) data->new_string_pos += step; else data->new_node = data->new_node->next_1.node; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); } } return RE_ERROR_FAILURE; } /* Tries a fuzzy match of an item of width 0 or 1. */ Py_LOCAL_INLINE(int) fuzzy_match_item(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, int step) { RE_State* state; RE_FuzzyData data; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; state = safe_state->re_state; if (!any_error_permitted(state)) { *node = NULL; return RE_ERROR_SUCCESS; } data.new_text_pos = *text_pos; data.new_node = *node; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; if (step == 0) { if (data.new_node->status & RE_STATUS_REVERSE) { data.step = -1; data.limit = state->slice_start; } else { data.step = 1; data.limit = state->slice_end; } } else data.step = step; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || data.new_text_pos != state->search_anchor; for (data.fuzzy_type = 0; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_item(state, &data, FALSE, step); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } *node = NULL; return RE_ERROR_SUCCESS; found: if (!add_backtrack(safe_state, (*node)->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_item.position.text_pos = *text_pos; bt_data->fuzzy_item.position.node = *node; bt_data->fuzzy_item.fuzzy_type = (RE_INT8)data.fuzzy_type; bt_data->fuzzy_item.step = (RE_INT8)step; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = data.new_text_pos; *node = data.new_node; return RE_ERROR_SUCCESS; } /* Retries a fuzzy match of a item of width 0 or 1. */ Py_LOCAL_INLINE(int) retry_fuzzy_match_item(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, BOOL advance) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; RE_FuzzyData data; int step; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; data.new_text_pos = bt_data->fuzzy_item.position.text_pos; data.new_node = bt_data->fuzzy_item.position.node; data.fuzzy_type = bt_data->fuzzy_item.fuzzy_type; data.step = bt_data->fuzzy_item.step; if (data.fuzzy_type >= 0) { --fuzzy_info->counts[data.fuzzy_type]; --fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; --state->total_errors; } /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || data.new_text_pos != state->search_anchor; step = advance ? data.step : 0; for (++data.fuzzy_type; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_item(state, &data, FALSE, step); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } discard_backtrack(state); *node = NULL; return RE_ERROR_SUCCESS; found: bt_data->fuzzy_item.fuzzy_type = (RE_INT8)data.fuzzy_type; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = data.new_text_pos; *node = data.new_node; return RE_ERROR_SUCCESS; } /* Tries a fuzzy insertion. */ Py_LOCAL_INLINE(int) fuzzy_insert(RE_SafeState* safe_state, Py_ssize_t text_pos, RE_Node* node) { RE_State* state; RE_BacktrackData* bt_data; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; state = safe_state->re_state; /* No insertion or deletion. */ if (!add_backtrack(safe_state, node->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_insert.position.text_pos = text_pos; bt_data->fuzzy_insert.position.node = node; bt_data->fuzzy_insert.count = 0; bt_data->fuzzy_insert.too_few_errors = state->too_few_errors; bt_data->fuzzy_insert.fuzzy_node = node; /* END_FUZZY node. */ /* Check whether there are too few errors. */ fuzzy_info = &state->fuzzy_info; /* The node in this case is the END_FUZZY node. */ values = node->values; if (fuzzy_info->counts[RE_FUZZY_DEL] < values[RE_FUZZY_VAL_MIN_DEL] || fuzzy_info->counts[RE_FUZZY_INS] < values[RE_FUZZY_VAL_MIN_INS] || fuzzy_info->counts[RE_FUZZY_SUB] < values[RE_FUZZY_VAL_MIN_SUB] || fuzzy_info->counts[RE_FUZZY_ERR] < values[RE_FUZZY_VAL_MIN_ERR]) state->too_few_errors = RE_ERROR_SUCCESS; return RE_ERROR_SUCCESS; } /* Retries a fuzzy insertion. */ Py_LOCAL_INLINE(int) retry_fuzzy_insert(RE_SafeState* safe_state, Py_ssize_t* text_pos, RE_Node** node) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; Py_ssize_t new_text_pos; RE_Node* new_node; int step; Py_ssize_t limit; RE_Node* fuzzy_node; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; new_text_pos = bt_data->fuzzy_insert.position.text_pos; new_node = bt_data->fuzzy_insert.position.node; if (new_node->status & RE_STATUS_REVERSE) { step = -1; limit = state->slice_start; } else { step = 1; limit = state->slice_end; } /* Could the character at text_pos have been inserted? */ if (!this_error_permitted(state, RE_FUZZY_INS) || new_text_pos == limit) { size_t count; count = bt_data->fuzzy_insert.count; fuzzy_info->counts[RE_FUZZY_INS] -= count; fuzzy_info->counts[RE_FUZZY_ERR] -= count; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_INS_COST] * count; state->total_errors -= count; state->too_few_errors = bt_data->fuzzy_insert.too_few_errors; discard_backtrack(state); *node = NULL; return RE_ERROR_SUCCESS; } ++bt_data->fuzzy_insert.count; ++fuzzy_info->counts[RE_FUZZY_INS]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_INS_COST]; ++state->total_errors; /* Check whether there are too few errors. */ state->too_few_errors = bt_data->fuzzy_insert.too_few_errors; fuzzy_node = bt_data->fuzzy_insert.fuzzy_node; /* END_FUZZY node. */ values = fuzzy_node->values; if (fuzzy_info->counts[RE_FUZZY_DEL] < values[RE_FUZZY_VAL_MIN_DEL] || fuzzy_info->counts[RE_FUZZY_INS] < values[RE_FUZZY_VAL_MIN_INS] || fuzzy_info->counts[RE_FUZZY_SUB] < values[RE_FUZZY_VAL_MIN_SUB] || fuzzy_info->counts[RE_FUZZY_ERR] < values[RE_FUZZY_VAL_MIN_ERR]) state->too_few_errors = RE_ERROR_SUCCESS; *text_pos = new_text_pos + step * (Py_ssize_t)bt_data->fuzzy_insert.count; *node = new_node; return RE_ERROR_SUCCESS; } /* Tries a fuzzy match of a string. */ Py_LOCAL_INLINE(int) fuzzy_match_string(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node* node, Py_ssize_t* string_pos, BOOL* matched, int step) { RE_State* state; RE_FuzzyData data; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; state = safe_state->re_state; if (!any_error_permitted(state)) { *matched = FALSE; return RE_ERROR_SUCCESS; } data.new_text_pos = *text_pos; data.new_string_pos = *string_pos; data.step = step; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || data.new_text_pos != state->search_anchor; for (data.fuzzy_type = 0; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_item(state, &data, TRUE, data.step); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } *matched = FALSE; return RE_ERROR_SUCCESS; found: if (!add_backtrack(safe_state, node->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_string.position.text_pos = *text_pos; bt_data->fuzzy_string.position.node = node; bt_data->fuzzy_string.string_pos = *string_pos; bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; bt_data->fuzzy_string.step = (RE_INT8)step; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = data.new_text_pos; *string_pos = data.new_string_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Retries a fuzzy match of a string. */ Py_LOCAL_INLINE(int) retry_fuzzy_match_string(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, Py_ssize_t* string_pos, BOOL* matched) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; RE_FuzzyData data; RE_Node* new_node; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; data.new_text_pos = bt_data->fuzzy_string.position.text_pos; new_node = bt_data->fuzzy_string.position.node; data.new_string_pos = bt_data->fuzzy_string.string_pos; data.fuzzy_type = bt_data->fuzzy_string.fuzzy_type; data.step = bt_data->fuzzy_string.step; --fuzzy_info->counts[data.fuzzy_type]; --fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; --state->total_errors; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || data.new_text_pos != state->search_anchor; for (++data.fuzzy_type; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_item(state, &data, TRUE, data.step); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } discard_backtrack(state); *matched = FALSE; return RE_ERROR_SUCCESS; found: bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = data.new_text_pos; *node = new_node; *string_pos = data.new_string_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Checks a fuzzy match of a atring. */ Py_LOCAL_INLINE(int) next_fuzzy_match_string_fld(RE_State* state, RE_FuzzyData* data) { int new_pos; if (this_error_permitted(state, data->fuzzy_type)) { switch (data->fuzzy_type) { case RE_FUZZY_DEL: /* Could a character at text_pos have been deleted? */ data->new_string_pos += data->step; return RE_ERROR_SUCCESS; case RE_FUZZY_INS: /* Could the character at text_pos have been inserted? */ if (!data->permit_insertion) return RE_ERROR_FAILURE; new_pos = data->new_folded_pos + data->step; if (0 <= new_pos && new_pos <= data->folded_len) { data->new_folded_pos = new_pos; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); case RE_FUZZY_SUB: /* Could the character at text_pos have been substituted? */ new_pos = data->new_folded_pos + data->step; if (0 <= new_pos && new_pos <= data->folded_len) { data->new_folded_pos = new_pos; data->new_string_pos += data->step; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); } } return RE_ERROR_FAILURE; } /* Tries a fuzzy match of a string, ignoring case. */ Py_LOCAL_INLINE(int) fuzzy_match_string_fld(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node* node, Py_ssize_t* string_pos, int* folded_pos, int folded_len, BOOL* matched, int step) { RE_State* state; Py_ssize_t new_text_pos; RE_FuzzyData data; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; state = safe_state->re_state; if (!any_error_permitted(state)) { *matched = FALSE; return RE_ERROR_SUCCESS; } new_text_pos = *text_pos; data.new_string_pos = *string_pos; data.new_folded_pos = *folded_pos; data.folded_len = folded_len; data.step = step; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || new_text_pos != state->search_anchor; if (step > 0) { if (data.new_folded_pos != 0) data.permit_insertion = RE_ERROR_SUCCESS; } else { if (data.new_folded_pos != folded_len) data.permit_insertion = RE_ERROR_SUCCESS; } for (data.fuzzy_type = 0; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_string_fld(state, &data); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } *matched = FALSE; return RE_ERROR_SUCCESS; found: if (!add_backtrack(safe_state, node->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_string.position.text_pos = *text_pos; bt_data->fuzzy_string.position.node = node; bt_data->fuzzy_string.string_pos = *string_pos; bt_data->fuzzy_string.folded_pos = (RE_INT8)(*folded_pos); bt_data->fuzzy_string.folded_len = (RE_INT8)folded_len; bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; bt_data->fuzzy_string.step = (RE_INT8)step; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = new_text_pos; *string_pos = data.new_string_pos; *folded_pos = data.new_folded_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Retries a fuzzy match of a string, ignoring case. */ Py_LOCAL_INLINE(int) retry_fuzzy_match_string_fld(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, Py_ssize_t* string_pos, int* folded_pos, BOOL* matched) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; Py_ssize_t new_text_pos; RE_Node* new_node; RE_FuzzyData data; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; new_text_pos = bt_data->fuzzy_string.position.text_pos; new_node = bt_data->fuzzy_string.position.node; data.new_string_pos = bt_data->fuzzy_string.string_pos; data.new_folded_pos = bt_data->fuzzy_string.folded_pos; data.folded_len = bt_data->fuzzy_string.folded_len; data.fuzzy_type = bt_data->fuzzy_string.fuzzy_type; data.step = bt_data->fuzzy_string.step; --fuzzy_info->counts[data.fuzzy_type]; --fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; --state->total_errors; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || new_text_pos != state->search_anchor; if (data.step > 0) { if (data.new_folded_pos != 0) data.permit_insertion = RE_ERROR_SUCCESS; } else { if (data.new_folded_pos != bt_data->fuzzy_string.folded_len) data.permit_insertion = RE_ERROR_SUCCESS; } for (++data.fuzzy_type; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_string_fld(state, &data); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } discard_backtrack(state); *matched = FALSE; return RE_ERROR_SUCCESS; found: bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = new_text_pos; *node = new_node; *string_pos = data.new_string_pos; *folded_pos = data.new_folded_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Checks a fuzzy match of a atring. */ Py_LOCAL_INLINE(int) next_fuzzy_match_group_fld(RE_State* state, RE_FuzzyData* data) { int new_pos; if (this_error_permitted(state, data->fuzzy_type)) { switch (data->fuzzy_type) { case RE_FUZZY_DEL: /* Could a character at text_pos have been deleted? */ data->new_gfolded_pos += data->step; return RE_ERROR_SUCCESS; case RE_FUZZY_INS: /* Could the character at text_pos have been inserted? */ if (!data->permit_insertion) return RE_ERROR_FAILURE; new_pos = data->new_folded_pos + data->step; if (0 <= new_pos && new_pos <= data->folded_len) { data->new_folded_pos = new_pos; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); case RE_FUZZY_SUB: /* Could the character at text_pos have been substituted? */ new_pos = data->new_folded_pos + data->step; if (0 <= new_pos && new_pos <= data->folded_len) { data->new_folded_pos = new_pos; data->new_gfolded_pos += data->step; return RE_ERROR_SUCCESS; } return check_fuzzy_partial(state, new_pos); } } return RE_ERROR_FAILURE; } /* Tries a fuzzy match of a group reference, ignoring case. */ Py_LOCAL_INLINE(int) fuzzy_match_group_fld(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node* node, int* folded_pos, int folded_len, Py_ssize_t* group_pos, int* gfolded_pos, int gfolded_len, BOOL* matched, int step) { RE_State* state; Py_ssize_t new_text_pos; RE_FuzzyData data; Py_ssize_t new_group_pos; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; state = safe_state->re_state; if (!any_error_permitted(state)) { *matched = FALSE; return RE_ERROR_SUCCESS; } new_text_pos = *text_pos; data.new_folded_pos = *folded_pos; data.folded_len = folded_len; new_group_pos = *group_pos; data.new_gfolded_pos = *gfolded_pos; data.step = step; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || new_text_pos != state->search_anchor; if (data.step > 0) { if (data.new_folded_pos != 0) data.permit_insertion = RE_ERROR_SUCCESS; } else { if (data.new_folded_pos != folded_len) data.permit_insertion = RE_ERROR_SUCCESS; } for (data.fuzzy_type = 0; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_group_fld(state, &data); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } *matched = FALSE; return RE_ERROR_SUCCESS; found: if (!add_backtrack(safe_state, node->op)) return RE_ERROR_FAILURE; bt_data = state->backtrack; bt_data->fuzzy_string.position.text_pos = *text_pos; bt_data->fuzzy_string.position.node = node; bt_data->fuzzy_string.string_pos = *group_pos; bt_data->fuzzy_string.folded_pos = (RE_INT8)(*folded_pos); bt_data->fuzzy_string.folded_len = (RE_INT8)folded_len; bt_data->fuzzy_string.gfolded_pos = (RE_INT8)(*gfolded_pos); bt_data->fuzzy_string.gfolded_len = (RE_INT8)gfolded_len; bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; bt_data->fuzzy_string.step = (RE_INT8)step; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = new_text_pos; *group_pos = new_group_pos; *folded_pos = data.new_folded_pos; *gfolded_pos = data.new_gfolded_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Retries a fuzzy match of a group reference, ignoring case. */ Py_LOCAL_INLINE(int) retry_fuzzy_match_group_fld(RE_SafeState* safe_state, BOOL search, Py_ssize_t* text_pos, RE_Node** node, int* folded_pos, Py_ssize_t* group_pos, int* gfolded_pos, BOOL* matched) { RE_State* state; RE_FuzzyInfo* fuzzy_info; RE_CODE* values; RE_BacktrackData* bt_data; Py_ssize_t new_text_pos; RE_Node* new_node; Py_ssize_t new_group_pos; RE_FuzzyData data; state = safe_state->re_state; fuzzy_info = &state->fuzzy_info; values = fuzzy_info->node->values; bt_data = state->backtrack; new_text_pos = bt_data->fuzzy_string.position.text_pos; new_node = bt_data->fuzzy_string.position.node; new_group_pos = bt_data->fuzzy_string.string_pos; data.new_folded_pos = bt_data->fuzzy_string.folded_pos; data.folded_len = bt_data->fuzzy_string.folded_len; data.new_gfolded_pos = bt_data->fuzzy_string.gfolded_pos; data.fuzzy_type = bt_data->fuzzy_string.fuzzy_type; data.step = bt_data->fuzzy_string.step; --fuzzy_info->counts[data.fuzzy_type]; --fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost -= values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; --state->total_errors; /* Permit insertion except initially when searching (it's better just to * start searching one character later). */ data.permit_insertion = !search || new_text_pos != state->search_anchor || data.new_folded_pos != bt_data->fuzzy_string.folded_len; for (++data.fuzzy_type; data.fuzzy_type < RE_FUZZY_COUNT; data.fuzzy_type++) { int status; status = next_fuzzy_match_group_fld(state, &data); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) goto found; } discard_backtrack(state); *matched = FALSE; return RE_ERROR_SUCCESS; found: bt_data->fuzzy_string.fuzzy_type = (RE_INT8)data.fuzzy_type; ++fuzzy_info->counts[data.fuzzy_type]; ++fuzzy_info->counts[RE_FUZZY_ERR]; fuzzy_info->total_cost += values[RE_FUZZY_VAL_COST_BASE + data.fuzzy_type]; ++state->total_errors; *text_pos = new_text_pos; *node = new_node; *group_pos = new_group_pos; *folded_pos = data.new_folded_pos; *gfolded_pos = data.new_gfolded_pos; *matched = TRUE; return RE_ERROR_SUCCESS; } /* Locates the required string, if there's one. */ Py_LOCAL_INLINE(Py_ssize_t) locate_required_string(RE_SafeState* safe_state, BOOL search) { RE_State* state; PatternObject* pattern; Py_ssize_t found_pos; Py_ssize_t end_pos; state = safe_state->re_state; pattern = state->pattern; if (!pattern->req_string) /* There isn't a required string, so start matching from the current * position. */ return state->text_pos; /* Search for the required string and calculate where to start matching. */ switch (pattern->req_string->op) { case RE_OP_STRING: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_end; else { limit = state->slice_start + pattern->req_offset + (Py_ssize_t)pattern->req_string->value_count; if (limit > state->slice_end || limit < 0) limit = state->slice_end; } if (state->req_pos < 0 || state->text_pos > state->req_pos) /* First time or already passed it. */ found_pos = string_search(safe_state, pattern->req_string, state->text_pos, limit, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = found_pos + (Py_ssize_t)pattern->req_string->value_count; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos -= pattern->req_offset; if (found_pos >= state->text_pos) return found_pos; } break; } case RE_OP_STRING_FLD: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_end; else { limit = state->slice_start + pattern->req_offset + (Py_ssize_t)pattern->req_string->value_count; if (limit > state->slice_end || limit < 0) limit = state->slice_end; } if (state->req_pos < 0 || state->text_pos > state->req_pos) /* First time or already passed it. */ found_pos = string_search_fld(safe_state, pattern->req_string, state->text_pos, limit, &end_pos, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = end_pos; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos -= pattern->req_offset; if (found_pos >= state->text_pos) return found_pos; } break; } case RE_OP_STRING_FLD_REV: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_start; else { limit = state->slice_end - pattern->req_offset - (Py_ssize_t)pattern->req_string->value_count; if (limit < state->slice_start) limit = state->slice_start; } if (state->req_pos < 0 || state->text_pos < state->req_pos) /* First time or already passed it. */ found_pos = string_search_fld_rev(safe_state, pattern->req_string, state->text_pos, limit, &end_pos, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = end_pos; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos += pattern->req_offset; if (found_pos <= state->text_pos) return found_pos; } break; } case RE_OP_STRING_IGN: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_end; else { limit = state->slice_start + pattern->req_offset + (Py_ssize_t)pattern->req_string->value_count; if (limit > state->slice_end || limit < 0) limit = state->slice_end; } if (state->req_pos < 0 || state->text_pos > state->req_pos) /* First time or already passed it. */ found_pos = string_search_ign(safe_state, pattern->req_string, state->text_pos, limit, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = found_pos + (Py_ssize_t)pattern->req_string->value_count; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos -= pattern->req_offset; if (found_pos >= state->text_pos) return found_pos; } break; } case RE_OP_STRING_IGN_REV: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_start; else { limit = state->slice_end - pattern->req_offset - (Py_ssize_t)pattern->req_string->value_count; if (limit < state->slice_start) limit = state->slice_start; } if (state->req_pos < 0 || state->text_pos < state->req_pos) /* First time or already passed it. */ found_pos = string_search_ign_rev(safe_state, pattern->req_string, state->text_pos, limit, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = found_pos - (Py_ssize_t)pattern->req_string->value_count; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos += pattern->req_offset; if (found_pos <= state->text_pos) return found_pos; } break; } case RE_OP_STRING_REV: { BOOL is_partial; Py_ssize_t limit; if (search || pattern->req_offset < 0) limit = state->slice_start; else { limit = state->slice_end - pattern->req_offset - (Py_ssize_t)pattern->req_string->value_count; if (limit < state->slice_start) limit = state->slice_start; } if (state->req_pos < 0 || state->text_pos < state->req_pos) /* First time or already passed it. */ found_pos = string_search_rev(safe_state, pattern->req_string, state->text_pos, limit, &is_partial); else { found_pos = state->req_pos; is_partial = FALSE; } if (found_pos < 0) /* The required string wasn't found. */ return -1; if (!is_partial) { /* Record where the required string matched. */ state->req_pos = found_pos; state->req_end = found_pos - (Py_ssize_t)pattern->req_string->value_count; } if (pattern->req_offset >= 0) { /* Step back from the required string to where we should start * matching. */ found_pos += pattern->req_offset; if (found_pos <= state->text_pos) return found_pos; } break; } } /* Start matching from the current position. */ return state->text_pos; } /* Tries to match a character pattern. */ Py_LOCAL_INLINE(int) match_one(RE_State* state, RE_Node* node, Py_ssize_t text_pos) { switch (node->op) { case RE_OP_ANY: return try_match_ANY(state, node, text_pos); case RE_OP_ANY_ALL: return try_match_ANY_ALL(state, node, text_pos); case RE_OP_ANY_ALL_REV: return try_match_ANY_ALL_REV(state, node, text_pos); case RE_OP_ANY_REV: return try_match_ANY_REV(state, node, text_pos); case RE_OP_ANY_U: return try_match_ANY_U(state, node, text_pos); case RE_OP_ANY_U_REV: return try_match_ANY_U_REV(state, node, text_pos); case RE_OP_CHARACTER: return try_match_CHARACTER(state, node, text_pos); case RE_OP_CHARACTER_IGN: return try_match_CHARACTER_IGN(state, node, text_pos); case RE_OP_CHARACTER_IGN_REV: return try_match_CHARACTER_IGN_REV(state, node, text_pos); case RE_OP_CHARACTER_REV: return try_match_CHARACTER_REV(state, node, text_pos); case RE_OP_PROPERTY: return try_match_PROPERTY(state, node, text_pos); case RE_OP_PROPERTY_IGN: return try_match_PROPERTY_IGN(state, node, text_pos); case RE_OP_PROPERTY_IGN_REV: return try_match_PROPERTY_IGN_REV(state, node, text_pos); case RE_OP_PROPERTY_REV: return try_match_PROPERTY_REV(state, node, text_pos); case RE_OP_RANGE: return try_match_RANGE(state, node, text_pos); case RE_OP_RANGE_IGN: return try_match_RANGE_IGN(state, node, text_pos); case RE_OP_RANGE_IGN_REV: return try_match_RANGE_IGN_REV(state, node, text_pos); case RE_OP_RANGE_REV: return try_match_RANGE_REV(state, node, text_pos); case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: return try_match_SET(state, node, text_pos); case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: return try_match_SET_IGN(state, node, text_pos); case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: return try_match_SET_IGN_REV(state, node, text_pos); case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: return try_match_SET_REV(state, node, text_pos); } return FALSE; } /* Tests whether 2 nodes contains the same values. */ Py_LOCAL_INLINE(BOOL) same_values(RE_Node* node_1, RE_Node* node_2) { size_t i; if (node_1->value_count != node_2->value_count) return FALSE; for (i = 0; i < node_1->value_count; i++) { if (node_1->values[i] != node_2->values[i]) return FALSE; } return TRUE; } /* Tests whether 2 nodes are equivalent (both string-like in the same way). */ Py_LOCAL_INLINE(BOOL) equivalent_nodes(RE_Node* node_1, RE_Node* node_2) { switch (node_1->op) { case RE_OP_CHARACTER: case RE_OP_STRING: switch (node_2->op) { case RE_OP_CHARACTER: case RE_OP_STRING: return same_values(node_1, node_2); } break; case RE_OP_CHARACTER_IGN: case RE_OP_STRING_IGN: switch (node_2->op) { case RE_OP_CHARACTER_IGN: case RE_OP_STRING_IGN: return same_values(node_1, node_2); } break; case RE_OP_CHARACTER_IGN_REV: case RE_OP_STRING_IGN_REV: switch (node_2->op) { case RE_OP_CHARACTER_IGN_REV: case RE_OP_STRING_IGN_REV: return same_values(node_1, node_2); } break; case RE_OP_CHARACTER_REV: case RE_OP_STRING_REV: switch (node_2->op) { case RE_OP_CHARACTER_REV: case RE_OP_STRING_REV: return same_values(node_1, node_2); } break; } return FALSE; } /* Prunes the backtracking. */ Py_LOCAL_INLINE(void) prune_backtracking(RE_State* state) { RE_AtomicBlock* current; current = state->current_atomic_block; if (current && current->count > 0) { /* In an atomic group or a lookaround. */ RE_AtomicData* atomic; /* Discard any backtracking info from inside the atomic group or * lookaround. */ atomic = ¤t->items[current->count - 1]; state->current_backtrack_block = atomic->current_backtrack_block; state->current_backtrack_block->count = atomic->backtrack_count; } else { /* In the outermost pattern. */ while (state->current_backtrack_block->previous) state->current_backtrack_block = state->current_backtrack_block->previous; /* Keep the bottom FAILURE on the backtracking stack. */ state->current_backtrack_block->count = 1; } } /* Saves the match as the best POSIX match (leftmost longest) found so far. */ Py_LOCAL_INLINE(BOOL) save_best_match(RE_SafeState* safe_state) { RE_State* state; size_t group_count; size_t g; state = safe_state->re_state; state->best_match_pos = state->match_pos; state->best_text_pos = state->text_pos; state->found_match = TRUE; memmove(state->best_fuzzy_counts, state->total_fuzzy_counts, sizeof(state->total_fuzzy_counts)); group_count = state->pattern->true_group_count; if (group_count == 0) return TRUE; acquire_GIL(safe_state); if (!state->best_match_groups) { /* Allocate storage for the groups of the best match. */ state->best_match_groups = (RE_GroupData*)re_alloc(group_count * sizeof(RE_GroupData)); if (!state->best_match_groups) goto error; memset(state->best_match_groups, 0, group_count * sizeof(RE_GroupData)); for (g = 0; g < group_count; g++) { RE_GroupData* best; RE_GroupData* group; best = &state->best_match_groups[g]; group = &state->groups[g]; best->capture_capacity = group->capture_capacity; best->captures = (RE_GroupSpan*)re_alloc(best->capture_capacity * sizeof(RE_GroupSpan)); if (!best->captures) goto error; } } /* Copy the group spans and captures. */ for (g = 0; g < group_count; g++) { RE_GroupData* best; RE_GroupData* group; best = &state->best_match_groups[g]; group = &state->groups[g]; best->span = group->span; best->capture_count = group->capture_count; if (best->capture_count < best->capture_capacity) { /* We need more space for the captures. */ re_dealloc(best->captures); best->captures = (RE_GroupSpan*)re_alloc(best->capture_capacity * sizeof(RE_GroupSpan)); if (!best->captures) goto error; } /* Copy the captures for this group. */ memmove(best->captures, group->captures, group->capture_count * sizeof(RE_GroupSpan)); } release_GIL(safe_state); return TRUE; error: release_GIL(safe_state); return FALSE; } /* Restores the best match for a POSIX match (leftmost longest). */ Py_LOCAL_INLINE(void) restore_best_match(RE_SafeState* safe_state) { RE_State* state; size_t group_count; size_t g; state = safe_state->re_state; if (!state->found_match) return; state->match_pos = state->best_match_pos; state->text_pos = state->best_text_pos; memmove(state->total_fuzzy_counts, state->best_fuzzy_counts, sizeof(state->total_fuzzy_counts)); group_count = state->pattern->true_group_count; if (group_count == 0) return; /* Copy the group spans and captures. */ for (g = 0; g < group_count; g++) { RE_GroupData* group; RE_GroupData* best; group = &state->groups[g]; best = &state->best_match_groups[g]; group->span = best->span; group->capture_count = best->capture_count; /* Copy the captures for this group. */ memmove(group->captures, best->captures, best->capture_count * sizeof(RE_GroupSpan)); } } /* Checks whether the new match is better than the current match for a POSIX * match (leftmost longest) and saves it if it is. */ Py_LOCAL_INLINE(BOOL) check_posix_match(RE_SafeState* safe_state) { RE_State* state; Py_ssize_t best_length; Py_ssize_t new_length; state = safe_state->re_state; if (!state->found_match) return save_best_match(safe_state); /* Check the overall match. */ if (state->reverse) { /* We're searching backwards. */ best_length = state->match_pos - state->best_text_pos; new_length = state->match_pos - state->text_pos; } else { /* We're searching forwards. */ best_length = state->best_text_pos - state->match_pos; new_length = state->text_pos - state->match_pos; } if (new_length > best_length) /* It's a longer match. */ return save_best_match(safe_state); return TRUE; } /* Performs a depth-first match or search from the context. */ Py_LOCAL_INLINE(int) basic_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; RE_EncodingTable* encoding; RE_LocaleInfo* locale_info; PatternObject* pattern; RE_Node* start_node; RE_NextNode start_pair; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); Py_ssize_t pattern_step; /* The overall step of the pattern (forwards or backwards). */ Py_ssize_t string_pos; BOOL do_search_start; Py_ssize_t found_pos; int status; RE_Node* node; int folded_pos; int gfolded_pos; TRACE(("<>\n")) state = safe_state->re_state; encoding = state->encoding; locale_info = state->locale_info; pattern = state->pattern; start_node = pattern->start_node; /* Look beyond any initial group node. */ start_pair.node = start_node; start_pair.test = pattern->start_test; /* Is the pattern anchored to the start or end of the string? */ switch (start_pair.test->op) { case RE_OP_END_OF_STRING: if (state->reverse) { /* Searching backwards. */ if (state->text_pos != state->text_length) return RE_ERROR_FAILURE; /* Don't bother to search further because it's anchored. */ search = FALSE; } break; case RE_OP_START_OF_STRING: if (!state->reverse) { /* Searching forwards. */ if (state->text_pos != 0) return RE_ERROR_FAILURE; /* Don't bother to search further because it's anchored. */ search = FALSE; } break; } char_at = state->char_at; pattern_step = state->reverse ? -1 : 1; string_pos = -1; do_search_start = pattern->do_search_start; state->fewest_errors = state->max_errors; if (do_search_start && pattern->req_string && equivalent_nodes(start_pair.test, pattern->req_string)) do_search_start = FALSE; /* Add a backtrack entry for failure. */ if (!add_backtrack(safe_state, RE_OP_FAILURE)) return RE_ERROR_BACKTRACKING; start_match: /* If we're searching, advance along the string until there could be a * match. */ if (pattern->pattern_call_ref >= 0) { RE_GuardList* guard_list; guard_list = &state->group_call_guard_list[pattern->pattern_call_ref]; guard_list->count = 0; guard_list->last_text_pos = -1; } /* Locate the required string, if there's one, unless this is a recursive * call of 'basic_match'. */ if (!pattern->req_string) found_pos = state->text_pos; else { found_pos = locate_required_string(safe_state, search); if (found_pos < 0) return RE_ERROR_FAILURE; } if (search) { state->text_pos = found_pos; if (do_search_start) { RE_Position new_position; next_match_1: /* 'search_start' will clear 'do_search_start' if it can't perform * a fast search for the next possible match. This enables us to * avoid the overhead of the call subsequently. */ status = search_start(safe_state, &start_pair, &new_position, 0); if (status == RE_ERROR_PARTIAL) { state->match_pos = state->text_pos; return status; } else if (status != RE_ERROR_SUCCESS) return status; node = new_position.node; state->text_pos = new_position.text_pos; if (node->op == RE_OP_SUCCESS) { /* Must the match advance past its start? */ if (state->text_pos != state->search_anchor || !state->must_advance) return RE_ERROR_SUCCESS; state->text_pos = state->match_pos + pattern_step; goto next_match_1; } /* 'do_search_start' may have been cleared. */ do_search_start = pattern->do_search_start; } else { /* Avoiding 'search_start', which we've found can't perform a fast * search for the next possible match. */ node = start_node; next_match_2: if (state->reverse) { if (state->text_pos < state->slice_start) { if (state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } } else { if (state->text_pos > state->slice_end) { if (state-> partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; return RE_ERROR_FAILURE; } } state->match_pos = state->text_pos; if (node->op == RE_OP_SUCCESS) { /* Must the match advance past its start? */ if (state->text_pos != state->search_anchor || !state->must_advance) { BOOL success; if (state->match_all) { /* We want to match all of the slice. */ if (state->reverse) success = state->text_pos == state->slice_start; else success = state->text_pos == state->slice_end; } else success = TRUE; if (success) return RE_ERROR_SUCCESS; } state->text_pos = state->match_pos + pattern_step; goto next_match_2; } } } else { /* The start position is anchored to the current position. */ if (found_pos != state->text_pos) return RE_ERROR_FAILURE; node = start_node; } advance: /* The main matching loop. */ for (;;) { TRACE(("%d|", state->text_pos)) /* Should we abort the matching? */ ++state->iterations; if (state->iterations == 0 && safe_check_signals(safe_state)) return RE_ERROR_INTERRUPTED; switch (node->op) { case RE_OP_ANY: /* Any character except a newline. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { ++state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_ALL: /* Any character at all. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_ALL(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { ++state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_ALL_REV: /* Any character at all, backwards. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_ALL_REV(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { --state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_REV: /* Any character except a newline, backwards. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_REV(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { --state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_U: /* Any character except a line separator. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_U(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { ++state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ANY_U_REV: /* Any character except a line separator, backwards. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_ANY_U_REV(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { --state->text_pos; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_ATOMIC: /* Start of an atomic group. */ { RE_AtomicData* atomic; TRACE(("%s\n", re_op_text[node->op])) if (!add_backtrack(safe_state, RE_OP_ATOMIC)) return RE_ERROR_BACKTRACKING; state->backtrack->atomic.too_few_errors = state->too_few_errors; state->backtrack->atomic.capture_change = state->capture_change; atomic = push_atomic(safe_state); if (!atomic) return RE_ERROR_MEMORY; atomic->backtrack_count = state->current_backtrack_block->count; atomic->current_backtrack_block = state->current_backtrack_block; atomic->is_lookaround = FALSE; atomic->has_groups = (node->status & RE_STATUS_HAS_GROUPS) != 0; atomic->has_repeats = (node->status & RE_STATUS_HAS_REPEATS) != 0; /* Save the groups and repeats. */ if (atomic->has_groups && !push_groups(safe_state)) return RE_ERROR_MEMORY; if (atomic->has_repeats && !push_repeats(safe_state)) return RE_ERROR_MEMORY; node = node->next_1.node; break; } case RE_OP_BOUNDARY: /* On a word boundary. */ TRACE(("%s %d\n", re_op_text[node->op], node->match)) status = try_match_BOUNDARY(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_BRANCH: /* 2-way branch. */ { RE_Position next_position; TRACE(("%s\n", re_op_text[node->op])) status = try_match(state, &node->next_1, state->text_pos, &next_position); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) { if (!add_backtrack(safe_state, RE_OP_BRANCH)) return RE_ERROR_BACKTRACKING; state->backtrack->branch.position.node = node->nonstring.next_2.node; state->backtrack->branch.position.text_pos = state->text_pos; node = next_position.node; state->text_pos = next_position.text_pos; } else node = node->nonstring.next_2.node; break; } case RE_OP_CALL_REF: /* A group call reference. */ { TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) if (!push_group_return(safe_state, NULL)) return RE_ERROR_MEMORY; if (!add_backtrack(safe_state, RE_OP_CALL_REF)) return RE_ERROR_BACKTRACKING; node = node->next_1.node; break; } case RE_OP_CHARACTER: /* A character. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_CHARACTER(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_CHARACTER_IGN: /* A character, ignoring case. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_CHARACTER_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_CHARACTER_IGN_REV: /* A character, backwards, ignoring case. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_CHARACTER_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_CHARACTER_REV: /* A character, backwards. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_CHARACTER(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_CONDITIONAL: /* Start of a conditional subpattern. */ { RE_AtomicData* conditional; TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (!add_backtrack(safe_state, RE_OP_CONDITIONAL)) return RE_ERROR_BACKTRACKING; state->backtrack->lookaround.too_few_errors = state->too_few_errors; state->backtrack->lookaround.capture_change = state->capture_change; state->backtrack->lookaround.inside = TRUE; state->backtrack->lookaround.node = node; conditional = push_atomic(safe_state); if (!conditional) return RE_ERROR_MEMORY; conditional->backtrack_count = state->current_backtrack_block->count; conditional->current_backtrack_block = state->current_backtrack_block; conditional->slice_start = state->slice_start; conditional->slice_end = state->slice_end; conditional->text_pos = state->text_pos; conditional->node = node; conditional->backtrack = state->backtrack; conditional->is_lookaround = TRUE; conditional->has_groups = (node->status & RE_STATUS_HAS_GROUPS) != 0; conditional->has_repeats = (node->status & RE_STATUS_HAS_REPEATS) != 0; /* Save the groups and repeats. */ if (conditional->has_groups && !push_groups(safe_state)) return RE_ERROR_MEMORY; if (conditional->has_repeats && !push_repeats(safe_state)) return RE_ERROR_MEMORY; conditional->saved_groups = state->current_saved_groups; conditional->saved_repeats = state->current_saved_repeats; state->slice_start = 0; state->slice_end = state->text_length; node = node->next_1.node; break; } case RE_OP_DEFAULT_BOUNDARY: /* On a default word boundary. */ TRACE(("%s %d\n", re_op_text[node->op], node->match)) status = try_match_DEFAULT_BOUNDARY(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_DEFAULT_END_OF_WORD: /* At the default end of a word. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_DEFAULT_END_OF_WORD(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_DEFAULT_START_OF_WORD: /* At the default start of a word. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_DEFAULT_START_OF_WORD(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_ATOMIC: /* End of an atomic group. */ { RE_AtomicData* atomic; /* Discard any backtracking info from inside the atomic group. */ atomic = top_atomic(safe_state); state->current_backtrack_block = atomic->current_backtrack_block; state->current_backtrack_block->count = atomic->backtrack_count; node = node->next_1.node; break; } case RE_OP_END_CONDITIONAL: /* End of a conditional subpattern. */ { RE_AtomicData* conditional; conditional = pop_atomic(safe_state); while (!conditional->is_lookaround) { if (conditional->has_repeats) drop_repeats(state); if (conditional->has_groups) drop_groups(state); conditional = pop_atomic(safe_state); } state->text_pos = conditional->text_pos; state->slice_end = conditional->slice_end; state->slice_start = conditional->slice_start; /* Discard any backtracking info from inside the lookaround. */ state->current_backtrack_block = conditional->current_backtrack_block; state->current_backtrack_block->count = conditional->backtrack_count; state->current_saved_groups = conditional->saved_groups; state->current_saved_repeats = conditional->saved_repeats; /* It's a positive lookaround that's succeeded. We're now going to * leave the lookaround. */ conditional->backtrack->lookaround.inside = FALSE; if (conditional->node->match) { /* It's a positive lookaround that's succeeded. * * Go to the 'true' branch. */ node = node->next_1.node; } else { /* It's a negative lookaround that's succeeded. * * Go to the 'false' branch. */ node = node->nonstring.next_2.node; } break; } case RE_OP_END_FUZZY: /* End of fuzzy matching. */ TRACE(("%s\n", re_op_text[node->op])) if (!fuzzy_insert(safe_state, state->text_pos, node)) return RE_ERROR_BACKTRACKING; /* If there were too few errors, in the fuzzy section, try again. */ if (state->too_few_errors) { state->too_few_errors = FALSE; goto backtrack; } state->total_fuzzy_counts[RE_FUZZY_SUB] += state->fuzzy_info.counts[RE_FUZZY_SUB]; state->total_fuzzy_counts[RE_FUZZY_INS] += state->fuzzy_info.counts[RE_FUZZY_INS]; state->total_fuzzy_counts[RE_FUZZY_DEL] += state->fuzzy_info.counts[RE_FUZZY_DEL]; node = node->next_1.node; break; case RE_OP_END_GREEDY_REPEAT: /* End of a greedy repeat. */ { RE_CODE index; RE_RepeatData* rp_data; BOOL changed; BOOL try_body; int body_status; RE_Position next_body_position; BOOL try_tail; int tail_status; RE_Position next_tail_position; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; /* The body has matched successfully at this position. */ if (!guard_repeat(safe_state, index, rp_data->start, RE_STATUS_BODY, FALSE)) return RE_ERROR_MEMORY; ++rp_data->count; /* Have we advanced through the text or has a capture group change? */ changed = rp_data->capture_change != state->capture_change || state->text_pos != rp_data->start; /* The counts are of type size_t, so the format needs to specify * that. */ TRACE(("min is %" PY_FORMAT_SIZE_T "u, max is %" PY_FORMAT_SIZE_T "u, count is %" PY_FORMAT_SIZE_T "u\n", node->values[1], node->values[2], rp_data->count)) /* Could the body or tail match? */ try_body = changed && (rp_data->count < node->values[2] || ~node->values[2] == 0) && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY); if (try_body) { body_status = try_match(state, &node->next_1, state->text_pos, &next_body_position); if (body_status < 0) return body_status; if (body_status == RE_ERROR_FAILURE) try_body = FALSE; } else body_status = RE_ERROR_FAILURE; try_tail = (!changed || rp_data->count >= node->values[1]) && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_TAIL); if (try_tail) { tail_status = try_match(state, &node->nonstring.next_2, state->text_pos, &next_tail_position); if (tail_status < 0) return tail_status; if (tail_status == RE_ERROR_FAILURE) try_tail = FALSE; } else tail_status = RE_ERROR_FAILURE; if (!try_body && !try_tail) { /* Neither the body nor the tail could match. */ --rp_data->count; goto backtrack; } if (body_status < 0 || (body_status == 0 && tail_status < 0)) return RE_ERROR_PARTIAL; /* Record info in case we backtrack into the body. */ if (!add_backtrack(safe_state, RE_OP_BODY_END)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count - 1; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; if (try_body) { /* Both the body and the tail could match. */ if (try_tail) { /* The body takes precedence. If the body fails to match * then we want to try the tail before backtracking * further. */ /* Record backtracking info for matching the tail. */ if (!add_backtrack(safe_state, RE_OP_MATCH_TAIL)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position = next_tail_position; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; } /* Record backtracking info in case the body fails to match. */ if (!add_backtrack(safe_state, RE_OP_BODY_START)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.text_pos = state->text_pos; rp_data->capture_change = state->capture_change; rp_data->start = state->text_pos; /* Advance into the body. */ node = next_body_position.node; state->text_pos = next_body_position.text_pos; } else { /* Only the tail could match. */ /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } break; } case RE_OP_END_GROUP: /* End of a capture group. */ { RE_CODE private_index; RE_CODE public_index; RE_GroupData* group; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[1])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). */ private_index = node->values[0]; public_index = node->values[1]; group = &state->groups[private_index - 1]; if (!add_backtrack(safe_state, RE_OP_END_GROUP)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->group.private_index = private_index; bt_data->group.public_index = public_index; bt_data->group.text_pos = group->span.end; bt_data->group.capture = (BOOL)node->values[2]; bt_data->group.current_capture = group->current_capture; if (pattern->group_info[private_index - 1].referenced && group->span.end != state->text_pos) ++state->capture_change; group->span.end = state->text_pos; /* Save the capture? */ if (node->values[2]) { group->current_capture = (Py_ssize_t)group->capture_count; if (!save_capture(safe_state, private_index, public_index)) return RE_ERROR_MEMORY; } node = node->next_1.node; break; } case RE_OP_END_LAZY_REPEAT: /* End of a lazy repeat. */ { RE_CODE index; RE_RepeatData* rp_data; BOOL changed; BOOL try_body; int body_status; RE_Position next_body_position; BOOL try_tail; int tail_status; RE_Position next_tail_position; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; /* The body has matched successfully at this position. */ if (!guard_repeat(safe_state, index, rp_data->start, RE_STATUS_BODY, FALSE)) return RE_ERROR_MEMORY; ++rp_data->count; /* Have we advanced through the text or has a capture group change? */ changed = rp_data->capture_change != state->capture_change || state->text_pos != rp_data->start; /* The counts are of type size_t, so the format needs to specify * that. */ TRACE(("min is %" PY_FORMAT_SIZE_T "u, max is %" PY_FORMAT_SIZE_T "u, count is %" PY_FORMAT_SIZE_T "u\n", node->values[1], node->values[2], rp_data->count)) /* Could the body or tail match? */ try_body = changed && (rp_data->count < node->values[2] || ~node->values[2] == 0) && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY); if (try_body) { body_status = try_match(state, &node->next_1, state->text_pos, &next_body_position); if (body_status < 0) return body_status; if (body_status == RE_ERROR_FAILURE) try_body = FALSE; } else body_status = RE_ERROR_FAILURE; try_tail = (!changed || rp_data->count >= node->values[1]); if (try_tail) { tail_status = try_match(state, &node->nonstring.next_2, state->text_pos, &next_tail_position); if (tail_status < 0) return tail_status; if (tail_status == RE_ERROR_FAILURE) try_tail = FALSE; } else tail_status = RE_ERROR_FAILURE; if (!try_body && !try_tail) { /* Neither the body nor the tail could match. */ --rp_data->count; goto backtrack; } if (body_status < 0 || (body_status == 0 && tail_status < 0)) return RE_ERROR_PARTIAL; /* Record info in case we backtrack into the body. */ if (!add_backtrack(safe_state, RE_OP_BODY_END)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count - 1; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; if (try_body) { /* Both the body and the tail could match. */ if (try_tail) { /* The tail takes precedence. If the tail fails to match * then we want to try the body before backtracking * further. */ /* Record backtracking info for matching the body. */ if (!add_backtrack(safe_state, RE_OP_MATCH_BODY)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position = next_body_position; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } else { /* Only the body could match. */ /* Record backtracking info in case the body fails to * match. */ if (!add_backtrack(safe_state, RE_OP_BODY_START)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.text_pos = state->text_pos; rp_data->capture_change = state->capture_change; rp_data->start = state->text_pos; /* Advance into the body. */ node = next_body_position.node; state->text_pos = next_body_position.text_pos; } } else { /* Only the tail could match. */ /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } break; } case RE_OP_END_LOOKAROUND: /* End of a lookaround subpattern. */ { RE_AtomicData* lookaround; lookaround = pop_atomic(safe_state); while (!lookaround->is_lookaround) { if (lookaround->has_repeats) drop_repeats(state); if (lookaround->has_groups) drop_groups(state); lookaround = pop_atomic(safe_state); } state->text_pos = lookaround->text_pos; state->slice_end = lookaround->slice_end; state->slice_start = lookaround->slice_start; /* Discard any backtracking info from inside the lookaround. */ state->current_backtrack_block = lookaround->current_backtrack_block; state->current_backtrack_block->count = lookaround->backtrack_count; state->current_saved_groups = lookaround->saved_groups; state->current_saved_repeats = lookaround->saved_repeats; if (lookaround->node->match) { /* It's a positive lookaround that's succeeded. We're now going * to leave the lookaround. */ lookaround->backtrack->lookaround.inside = FALSE; node = node->next_1.node; } else { /* It's a negative lookaround that's succeeded. The groups and * certain flags may have changed. We need to restore them and * then backtrack. */ if (lookaround->has_repeats) pop_repeats(state); if (lookaround->has_groups) pop_groups(state); state->too_few_errors = lookaround->backtrack->lookaround.too_few_errors; state->capture_change = lookaround->backtrack->lookaround.capture_change; discard_backtrack(state); goto backtrack; } break; } case RE_OP_END_OF_LINE: /* At the end of a line. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_LINE(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_LINE_U: /* At the end of a line. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_LINE_U(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_STRING: /* At the end of the string. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_STRING(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_STRING_LINE: /* At end of string or final newline. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_STRING_LINE(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_STRING_LINE_U: /* At end of string or final newline. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_STRING_LINE_U(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_END_OF_WORD: /* At the end of a word. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_END_OF_WORD(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_FAILURE: /* Failure. */ goto backtrack; case RE_OP_FUZZY: /* Fuzzy matching. */ { RE_FuzzyInfo* fuzzy_info; RE_BacktrackData* bt_data; TRACE(("%s\n", re_op_text[node->op])) fuzzy_info = &state->fuzzy_info; /* Save the current fuzzy info. */ if (!add_backtrack(safe_state, RE_OP_FUZZY)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; memmove(&bt_data->fuzzy.fuzzy_info, fuzzy_info, sizeof(RE_FuzzyInfo)); bt_data->fuzzy.index = node->values[0]; bt_data->fuzzy.text_pos = state->text_pos; /* Initialise the new fuzzy info. */ memset(fuzzy_info->counts, 0, 4 * sizeof(fuzzy_info->counts[0])); fuzzy_info->total_cost = 0; fuzzy_info->node = node; node = node->next_1.node; break; } case RE_OP_GRAPHEME_BOUNDARY: /* On a grapheme boundary. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_GRAPHEME_BOUNDARY(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_GREEDY_REPEAT: /* Greedy repeat. */ { RE_CODE index; RE_RepeatData* rp_data; RE_BacktrackData* bt_data; BOOL try_body; int body_status; RE_Position next_body_position; BOOL try_tail; int tail_status; RE_Position next_tail_position; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; /* We might need to backtrack into the head, so save the current * repeat. */ if (!add_backtrack(safe_state, RE_OP_GREEDY_REPEAT)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; /* Initialise the new repeat. */ rp_data->count = 0; rp_data->start = state->text_pos; rp_data->capture_change = state->capture_change; /* Could the body or tail match? */ try_body = node->values[2] > 0 && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY); if (try_body) { body_status = try_match(state, &node->next_1, state->text_pos, &next_body_position); if (body_status < 0) return body_status; if (body_status == RE_ERROR_FAILURE) try_body = FALSE; } else body_status = RE_ERROR_FAILURE; try_tail = node->values[1] == 0; if (try_tail) { tail_status = try_match(state, &node->nonstring.next_2, state->text_pos, &next_tail_position); if (tail_status < 0) return tail_status; if (tail_status == RE_ERROR_FAILURE) try_tail = FALSE; } else tail_status = RE_ERROR_FAILURE; if (!try_body && !try_tail) /* Neither the body nor the tail could match. */ goto backtrack; if (body_status < 0 || (body_status == 0 && tail_status < 0)) return RE_ERROR_PARTIAL; if (try_body) { if (try_tail) { /* Both the body and the tail could match, but the body * takes precedence. If the body fails to match then we * want to try the tail before backtracking further. */ /* Record backtracking info for matching the tail. */ if (!add_backtrack(safe_state, RE_OP_MATCH_TAIL)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position = next_tail_position; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; } /* Advance into the body. */ node = next_body_position.node; state->text_pos = next_body_position.text_pos; } else { /* Only the tail could match. */ /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } break; } case RE_OP_GREEDY_REPEAT_ONE: /* Greedy repeat for one character. */ { RE_CODE index; RE_RepeatData* rp_data; size_t count; BOOL is_partial; BOOL match; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; if (is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY)) goto backtrack; /* Count how many times the character repeats, up to the maximum. */ count = count_one(state, node->nonstring.next_2.node, state->text_pos, node->values[2], &is_partial); if (is_partial) { state->text_pos += (Py_ssize_t)count * node->step; return RE_ERROR_PARTIAL; } /* Unmatch until it's not guarded. */ match = FALSE; for (;;) { if (count < node->values[1]) /* The number of repeats is below the minimum. */ break; if (!is_repeat_guarded(safe_state, index, state->text_pos + (Py_ssize_t)count * node->step, RE_STATUS_TAIL)) { /* It's not guarded at this position. */ match = TRUE; break; } if (count == 0) break; --count; } if (!match) { /* The repeat has failed to match at this position. */ if (!guard_repeat(safe_state, index, state->text_pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; goto backtrack; } if (count > node->values[1]) { /* Record the backtracking info. */ if (!add_backtrack(safe_state, RE_OP_GREEDY_REPEAT_ONE)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position.node = node; bt_data->repeat.index = index; bt_data->repeat.text_pos = rp_data->start; bt_data->repeat.count = rp_data->count; rp_data->start = state->text_pos; rp_data->count = count; } /* Advance into the tail. */ state->text_pos += (Py_ssize_t)count * node->step; node = node->next_1.node; break; } case RE_OP_GROUP_CALL: /* Group call. */ { size_t index; size_t g; size_t r; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) index = node->values[0]; /* Save the capture groups and repeat guards. */ if (!push_group_return(safe_state, node->next_1.node)) return RE_ERROR_MEMORY; /* Clear the capture groups for the group call. They'll be restored * on return. */ for (g = 0; g < state->pattern->true_group_count; g++) { RE_GroupData* group; group = &state->groups[g]; group->span.start = -1; group->span.end = -1; group->current_capture = -1; } /* Clear the repeat guards for the group call. They'll be restored * on return. */ for (r = 0; r < state->pattern->repeat_count; r++) { RE_RepeatData* repeat; repeat = &state->repeats[r]; repeat->body_guard_list.count = 0; repeat->body_guard_list.last_text_pos = -1; repeat->tail_guard_list.count = 0; repeat->tail_guard_list.last_text_pos = -1; } /* Call a group, skipping its CALL_REF node. */ node = pattern->call_ref_info[index].node->next_1.node; if (!add_backtrack(safe_state, RE_OP_GROUP_CALL)) return RE_ERROR_BACKTRACKING; break; } case RE_OP_GROUP_EXISTS: /* Capture group exists. */ { TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. * * A group index of 0, however, means that it's a DEFINE, which we * should skip. */ if (node->values[0] == 0) /* Skip past the body. */ node = node->nonstring.next_2.node; else { RE_GroupData* group; group = &state->groups[node->values[0] - 1]; if (group->current_capture >= 0) /* The 'true' branch. */ node = node->next_1.node; else /* The 'false' branch. */ node = node->nonstring.next_2.node; } break; } case RE_OP_GROUP_RETURN: /* Group return. */ { RE_Node* return_node; RE_BacktrackData* bt_data; TRACE(("%s\n", re_op_text[node->op])) return_node = top_group_return(state); if (!add_backtrack(safe_state, RE_OP_GROUP_RETURN)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->group_call.node = return_node; bt_data->group_call.capture_change = state->capture_change; if (return_node) { /* The group was called. */ node = return_node; /* Save the groups. */ if (!push_groups(safe_state)) return RE_ERROR_MEMORY; /* Save the repeats. */ if (!push_repeats(safe_state)) return RE_ERROR_MEMORY; } else /* The group was not called. */ node = node->next_1.node; pop_group_return(state); break; } case RE_OP_KEEP: /* Keep. */ { RE_BacktrackData* bt_data; TRACE(("%s\n", re_op_text[node->op])) if (!add_backtrack(safe_state, RE_OP_KEEP)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->keep.match_pos = state->match_pos; state->match_pos = state->text_pos; node = node->next_1.node; break; } case RE_OP_LAZY_REPEAT: /* Lazy repeat. */ { RE_CODE index; RE_RepeatData* rp_data; RE_BacktrackData* bt_data; BOOL try_body; int body_status; RE_Position next_body_position; BOOL try_tail; int tail_status; RE_Position next_tail_position; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; /* We might need to backtrack into the head, so save the current * repeat. */ if (!add_backtrack(safe_state, RE_OP_LAZY_REPEAT)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; /* Initialise the new repeat. */ rp_data->count = 0; rp_data->start = state->text_pos; rp_data->capture_change = state->capture_change; /* Could the body or tail match? */ try_body = node->values[2] > 0 && !is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY); if (try_body) { body_status = try_match(state, &node->next_1, state->text_pos, &next_body_position); if (body_status < 0) return body_status; if (body_status == RE_ERROR_FAILURE) try_body = FALSE; } else body_status = RE_ERROR_FAILURE; try_tail = node->values[1] == 0; if (try_tail) { tail_status = try_match(state, &node->nonstring.next_2, state->text_pos, &next_tail_position); if (tail_status < 0) return tail_status; if (tail_status == RE_ERROR_FAILURE) try_tail = FALSE; } else tail_status = RE_ERROR_FAILURE; if (!try_body && !try_tail) /* Neither the body nor the tail could match. */ goto backtrack; if (body_status < 0 || (body_status == 0 && tail_status < 0)) return RE_ERROR_PARTIAL; if (try_body) { if (try_tail) { /* Both the body and the tail could match, but the tail * takes precedence. If the tail fails to match then we * want to try the body before backtracking further. */ /* Record backtracking info for matching the tail. */ if (!add_backtrack(safe_state, RE_OP_MATCH_BODY)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position = next_body_position; bt_data->repeat.index = index; bt_data->repeat.count = rp_data->count; bt_data->repeat.start = rp_data->start; bt_data->repeat.capture_change = rp_data->capture_change; bt_data->repeat.text_pos = state->text_pos; /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } else { /* Advance into the body. */ node = next_body_position.node; state->text_pos = next_body_position.text_pos; } } else { /* Only the tail could match. */ /* Advance into the tail. */ node = next_tail_position.node; state->text_pos = next_tail_position.text_pos; } break; } case RE_OP_LAZY_REPEAT_ONE: /* Lazy repeat for one character. */ { RE_CODE index; RE_RepeatData* rp_data; size_t count; BOOL is_partial; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Repeat indexes are 0-based. */ index = node->values[0]; rp_data = &state->repeats[index]; if (is_repeat_guarded(safe_state, index, state->text_pos, RE_STATUS_BODY)) goto backtrack; /* Count how many times the character repeats, up to the minimum. */ count = count_one(state, node->nonstring.next_2.node, state->text_pos, node->values[1], &is_partial); if (is_partial) { state->text_pos += (Py_ssize_t)count * node->step; return RE_ERROR_PARTIAL; } /* Have we matched at least the minimum? */ if (count < node->values[1]) { /* The repeat has failed to match at this position. */ if (!guard_repeat(safe_state, index, state->text_pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; goto backtrack; } if (count < node->values[2]) { /* The match is shorter than the maximum, so we might need to * backtrack the repeat to consume more. */ RE_BacktrackData* bt_data; /* Get the offset to the repeat values in the context. */ rp_data = &state->repeats[index]; if (!add_backtrack(safe_state, RE_OP_LAZY_REPEAT_ONE)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->repeat.position.node = node; bt_data->repeat.index = index; bt_data->repeat.text_pos = rp_data->start; bt_data->repeat.count = rp_data->count; rp_data->start = state->text_pos; rp_data->count = count; } /* Advance into the tail. */ state->text_pos += (Py_ssize_t)count * node->step; node = node->next_1.node; break; } case RE_OP_LOOKAROUND: /* Start of a lookaround subpattern. */ { RE_AtomicData* lookaround; TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (!add_backtrack(safe_state, RE_OP_LOOKAROUND)) return RE_ERROR_BACKTRACKING; state->backtrack->lookaround.too_few_errors = state->too_few_errors; state->backtrack->lookaround.capture_change = state->capture_change; state->backtrack->lookaround.inside = TRUE; state->backtrack->lookaround.node = node; lookaround = push_atomic(safe_state); if (!lookaround) return RE_ERROR_MEMORY; lookaround->backtrack_count = state->current_backtrack_block->count; lookaround->current_backtrack_block = state->current_backtrack_block; lookaround->slice_start = state->slice_start; lookaround->slice_end = state->slice_end; lookaround->text_pos = state->text_pos; lookaround->node = node; lookaround->backtrack = state->backtrack; lookaround->is_lookaround = TRUE; lookaround->has_groups = (node->status & RE_STATUS_HAS_GROUPS) != 0; lookaround->has_repeats = (node->status & RE_STATUS_HAS_REPEATS) != 0; /* Save the groups and repeats. */ if (lookaround->has_groups && !push_groups(safe_state)) return RE_ERROR_MEMORY; if (lookaround->has_repeats && !push_repeats(safe_state)) return RE_ERROR_MEMORY; lookaround->saved_groups = state->current_saved_groups; lookaround->saved_repeats = state->current_saved_repeats; state->slice_start = 0; state->slice_end = state->text_length; node = node->next_1.node; break; } case RE_OP_PROPERTY: /* A property. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_PROPERTY(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_PROPERTY_IGN: /* A property, ignoring case. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_PROPERTY_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_PROPERTY_IGN_REV: /* A property, backwards, ignoring case. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_PROPERTY_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_PROPERTY_REV: /* A property, backwards. */ TRACE(("%s %d %d\n", re_op_text[node->op], node->match, node->values[0])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_PROPERTY(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_PRUNE: /* Prune the backtracking. */ TRACE(("%s\n", re_op_text[node->op])) prune_backtracking(state); node = node->next_1.node; break; case RE_OP_RANGE: /* A range. */ TRACE(("%s %d %d %d\n", re_op_text[node->op], node->match, node->values[0], node->values[1])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_RANGE(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_RANGE_IGN: /* A range, ignoring case. */ TRACE(("%s %d %d %d\n", re_op_text[node->op], node->match, node->values[0], node->values[1])) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_RANGE_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_RANGE_IGN_REV: /* A range, backwards, ignoring case. */ TRACE(("%s %d %d %d\n", re_op_text[node->op], node->match, node->values[0], node->values[1])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_RANGE_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_RANGE_REV: /* A range, backwards. */ TRACE(("%s %d %d %d\n", re_op_text[node->op], node->match, node->values[0], node->values[1])) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_RANGE(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_REF_GROUP: /* Reference to a capture group. */ { RE_GroupData* group; RE_GroupSpan* span; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; if (string_pos < 0) string_pos = span->start; /* Try comparing. */ while (string_pos < span->end) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && same_char(char_at(state->text, state->text_pos), char_at(state->text, string_pos))) { ++string_pos; ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_FLD: /* Reference to a capture group, ignoring case. */ { RE_GroupData* group; RE_GroupSpan* span; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); int folded_len; int gfolded_len; Py_UCS4 folded[RE_MAX_FOLDED]; Py_UCS4 gfolded[RE_MAX_FOLDED]; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; full_case_fold = encoding->full_case_fold; if (string_pos < 0) { string_pos = span->start; folded_pos = 0; folded_len = 0; gfolded_pos = 0; gfolded_len = 0; } else { folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos), folded); gfolded_len = full_case_fold(locale_info, char_at(state->text, string_pos), gfolded); } /* Try comparing. */ while (string_pos < span->end) { /* Case-fold at current position in text. */ if (folded_pos >= folded_len) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end) folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos), folded); else folded_len = 0; folded_pos = 0; } /* Case-fold at current position in group. */ if (gfolded_pos >= gfolded_len) { gfolded_len = full_case_fold(locale_info, char_at(state->text, string_pos), gfolded); gfolded_pos = 0; } if (folded_pos < folded_len && folded[folded_pos] == gfolded[gfolded_pos]) { ++folded_pos; ++gfolded_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_group_fld(safe_state, search, &state->text_pos, node, &folded_pos, folded_len, &string_pos, &gfolded_pos, gfolded_len, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } if (folded_pos >= folded_len && folded_len > 0) ++state->text_pos; if (gfolded_pos >= gfolded_len) ++string_pos; } string_pos = -1; if (folded_pos < folded_len || gfolded_pos < gfolded_len) goto backtrack; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_FLD_REV: /* Reference to a capture group, ignoring case. */ { RE_GroupData* group; RE_GroupSpan* span; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); int folded_len; int gfolded_len; Py_UCS4 folded[RE_MAX_FOLDED]; Py_UCS4 gfolded[RE_MAX_FOLDED]; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; full_case_fold = encoding->full_case_fold; if (string_pos < 0) { string_pos = span->end; folded_pos = 0; folded_len = 0; gfolded_pos = 0; gfolded_len = 0; } else { folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos - 1), folded); gfolded_len = full_case_fold(locale_info, char_at(state->text, string_pos - 1), gfolded); } /* Try comparing. */ while (string_pos > span->start) { /* Case-fold at current position in text. */ if (folded_pos <= 0) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start) folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos - 1), folded); else folded_len = 0; folded_pos = folded_len; } /* Case-fold at current position in group. */ if (gfolded_pos <= 0) { gfolded_len = full_case_fold(locale_info, char_at(state->text, string_pos - 1), gfolded); gfolded_pos = gfolded_len; } if (folded_pos > 0 && folded[folded_pos - 1] == gfolded[gfolded_pos - 1]) { --folded_pos; --gfolded_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_group_fld(safe_state, search, &state->text_pos, node, &folded_pos, folded_len, &string_pos, &gfolded_pos, gfolded_len, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } if (folded_pos <= 0 && folded_len > 0) --state->text_pos; if (gfolded_pos <= 0) --string_pos; } string_pos = -1; if (folded_pos > 0 || gfolded_pos > 0) goto backtrack; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_IGN: /* Reference to a capture group, ignoring case. */ { RE_GroupData* group; RE_GroupSpan* span; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; if (string_pos < 0) string_pos = span->start; /* Try comparing. */ while (string_pos < span->end) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && same_char_ign(encoding, locale_info, char_at(state->text, state->text_pos), char_at(state->text, string_pos))) { ++string_pos; ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_IGN_REV: /* Reference to a capture group, ignoring case. */ { RE_GroupData* group; RE_GroupSpan* span; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; if (string_pos < 0) string_pos = span->end; /* Try comparing. */ while (string_pos > span->start) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && same_char_ign(encoding, locale_info, char_at(state->text, state->text_pos - 1), char_at(state->text, string_pos - 1))) { --string_pos; --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_REF_GROUP_REV: /* Reference to a capture group. */ { RE_GroupData* group; RE_GroupSpan* span; TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). * * Check whether the captured text, if any, exists at this position * in the string. */ /* Did the group capture anything? */ group = &state->groups[node->values[0] - 1]; if (group->current_capture < 0) goto backtrack; span = &group->captures[group->current_capture]; if (string_pos < 0) string_pos = span->end; /* Try comparing. */ while (string_pos > span->start) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && same_char(char_at(state->text, state->text_pos - 1), char_at(state->text, string_pos - 1))) { --string_pos; --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_SEARCH_ANCHOR: /* At the start of the search. */ TRACE(("%s %d\n", re_op_text[node->op], node->values[0])) if (state->text_pos == state->search_anchor) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SET_DIFF: /* Character set. */ case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_SET(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SET_DIFF_IGN: /* Character set, ignoring case. */ case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION_IGN: TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && matches_SET_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SET_DIFF_IGN_REV: /* Character set, ignoring case. */ case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_UNION_IGN_REV: TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_SET_IGN(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SET_DIFF_REV: /* Character set. */ case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_REV: TRACE(("%s %d\n", re_op_text[node->op], node->match)) if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && matches_SET(encoding, locale_info, node, char_at(state->text, state->text_pos - 1)) == node->match) { state->text_pos += node->step; node = node->next_1.node; } else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_SKIP: /* Skip the part of the text already matched. */ TRACE(("%s\n", re_op_text[node->op])) if (node->status & RE_STATUS_REVERSE) state->slice_end = state->text_pos; else state->slice_start = state->text_pos; prune_backtracking(state); node = node->next_1.node; break; case RE_OP_START_GROUP: /* Start of a capture group. */ { RE_CODE private_index; RE_CODE public_index; RE_GroupData* group; RE_BacktrackData* bt_data; TRACE(("%s %d\n", re_op_text[node->op], node->values[1])) /* Capture group indexes are 1-based (excluding group 0, which is * the entire matched string). */ private_index = node->values[0]; public_index = node->values[1]; group = &state->groups[private_index - 1]; if (!add_backtrack(safe_state, RE_OP_START_GROUP)) return RE_ERROR_BACKTRACKING; bt_data = state->backtrack; bt_data->group.private_index = private_index; bt_data->group.public_index = public_index; bt_data->group.text_pos = group->span.start; bt_data->group.capture = (BOOL)node->values[2]; bt_data->group.current_capture = group->current_capture; if (pattern->group_info[private_index - 1].referenced && group->span.start != state->text_pos) ++state->capture_change; group->span.start = state->text_pos; /* Save the capture? */ if (node->values[2]) { group->current_capture = (Py_ssize_t)group->capture_count; if (!save_capture(safe_state, private_index, public_index)) return RE_ERROR_MEMORY; } node = node->next_1.node; break; } case RE_OP_START_OF_LINE: /* At the start of a line. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_START_OF_LINE(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_START_OF_LINE_U: /* At the start of a line. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_START_OF_LINE_U(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_START_OF_STRING: /* At the start of the string. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_START_OF_STRING(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_START_OF_WORD: /* At the start of a word. */ TRACE(("%s\n", re_op_text[node->op])) status = try_match_START_OF_WORD(state, node, state->text_pos); if (status < 0) return status; if (status == RE_ERROR_SUCCESS) node = node->next_1.node; else if (node->status & RE_STATUS_FUZZY) { status = fuzzy_match_item(safe_state, search, &state->text_pos, &node, 0); if (status < 0) return status; if (!node) goto backtrack; } else goto backtrack; break; case RE_OP_STRING: /* A string. */ { Py_ssize_t length; RE_CODE* values; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; if (string_pos < 0) string_pos = 0; values = node->values; /* Try comparing. */ while (string_pos < length) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && same_char(char_at(state->text, state->text_pos), values[string_pos])) { ++string_pos; ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_FLD: /* A string, ignoring case. */ { Py_ssize_t length; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); RE_CODE* values; int folded_len; Py_UCS4 folded[RE_MAX_FOLDED]; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; full_case_fold = encoding->full_case_fold; if (string_pos < 0) { string_pos = 0; folded_pos = 0; folded_len = 0; } else { folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos), folded); if (folded_pos >= folded_len) { if (state->text_pos >= state->slice_end) goto backtrack; ++state->text_pos; folded_pos = 0; folded_len = 0; } } values = node->values; /* Try comparing. */ while (string_pos < length) { if (folded_pos >= folded_len) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end) folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos), folded); else folded_len = 0; folded_pos = 0; } if (folded_pos < folded_len && same_char_ign(encoding, locale_info, folded[folded_pos], values[string_pos])) { ++string_pos; ++folded_pos; if (folded_pos >= folded_len) ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string_fld(safe_state, search, &state->text_pos, node, &string_pos, &folded_pos, folded_len, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } if (folded_pos >= folded_len && folded_len > 0) ++state->text_pos; } else { string_pos = -1; goto backtrack; } } if (node->status & RE_STATUS_FUZZY) { while (folded_pos < folded_len) { BOOL matched; if (!fuzzy_match_string_fld(safe_state, search, &state->text_pos, node, &string_pos, &folded_pos, folded_len, &matched, 1)) return RE_ERROR_BACKTRACKING; if (!matched) { string_pos = -1; goto backtrack; } if (folded_pos >= folded_len && folded_len > 0) ++state->text_pos; } } string_pos = -1; if (folded_pos < folded_len) goto backtrack; } /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_FLD_REV: /* A string, ignoring case. */ { Py_ssize_t length; int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); RE_CODE* values; int folded_len; Py_UCS4 folded[RE_MAX_FOLDED]; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; full_case_fold = encoding->full_case_fold; if (string_pos < 0) { string_pos = length; folded_pos = 0; folded_len = 0; } else { folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos - 1), folded); if (folded_pos <= 0) { if (state->text_pos <= state->slice_start) goto backtrack; --state->text_pos; folded_pos = 0; folded_len = 0; } } values = node->values; /* Try comparing. */ while (string_pos > 0) { if (folded_pos <= 0) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start) folded_len = full_case_fold(locale_info, char_at(state->text, state->text_pos - 1), folded); else folded_len = 0; folded_pos = folded_len; } if (folded_pos > 0 && same_char_ign(encoding, locale_info, folded[folded_pos - 1], values[string_pos - 1])) { --string_pos; --folded_pos; if (folded_pos <= 0) --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string_fld(safe_state, search, &state->text_pos, node, &string_pos, &folded_pos, folded_len, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } if (folded_pos <= 0 && folded_len > 0) --state->text_pos; } else { string_pos = -1; goto backtrack; } } if (node->status & RE_STATUS_FUZZY) { while (folded_pos > 0) { BOOL matched; if (!fuzzy_match_string_fld(safe_state, search, &state->text_pos, node, &string_pos, &folded_pos, folded_len, &matched, -1)) return RE_ERROR_BACKTRACKING; if (!matched) { string_pos = -1; goto backtrack; } if (folded_pos <= 0 && folded_len > 0) --state->text_pos; } } string_pos = -1; if (folded_pos > 0) goto backtrack; } /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_IGN: /* A string, ignoring case. */ { Py_ssize_t length; RE_CODE* values; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; if (string_pos < 0) string_pos = 0; values = node->values; /* Try comparing. */ while (string_pos < length) { if (state->text_pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (state->text_pos < state->slice_end && same_char_ign(encoding, locale_info, char_at(state->text, state->text_pos), values[string_pos])) { ++string_pos; ++state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, 1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_IGN_REV: /* A string, ignoring case. */ { Py_ssize_t length; RE_CODE* values; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; if (string_pos < 0) string_pos = length; values = node->values; /* Try comparing. */ while (string_pos > 0) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && same_char_ign(encoding, locale_info, char_at(state->text, state->text_pos - 1), values[string_pos - 1])) { --string_pos; --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_REV: /* A string. */ { Py_ssize_t length; RE_CODE* values; TRACE(("%s %d\n", re_op_text[node->op], node->value_count)) if ((node->status & RE_STATUS_REQUIRED) && state->text_pos == state->req_pos && string_pos < 0) state->text_pos = state->req_end; else { length = (Py_ssize_t)node->value_count; if (string_pos < 0) string_pos = length; values = node->values; /* Try comparing. */ while (string_pos > 0) { if (state->text_pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (state->text_pos > state->slice_start && same_char(char_at(state->text, state->text_pos - 1), values[string_pos - 1])) { --string_pos; --state->text_pos; } else if (node->status & RE_STATUS_FUZZY) { BOOL matched; status = fuzzy_match_string(safe_state, search, &state->text_pos, node, &string_pos, &matched, -1); if (status < 0) return RE_ERROR_PARTIAL; if (!matched) { string_pos = -1; goto backtrack; } } else { string_pos = -1; goto backtrack; } } } string_pos = -1; /* Successful match. */ node = node->next_1.node; break; } case RE_OP_STRING_SET: /* Member of a string set. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_fwdrev(safe_state, node, FALSE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_FLD: /* Member of a string set, ignoring case. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_fld_fwdrev(safe_state, node, FALSE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_FLD_REV: /* Member of a string set, ignoring case. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_fld_fwdrev(safe_state, node, TRUE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_IGN: /* Member of a string set, ignoring case. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_ign_fwdrev(safe_state, node, FALSE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_IGN_REV: /* Member of a string set, ignoring case. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_ign_fwdrev(safe_state, node, TRUE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_STRING_SET_REV: /* Member of a string set. */ { int status; TRACE(("%s\n", re_op_text[node->op])) status = string_set_match_fwdrev(safe_state, node, TRUE); if (status < 0) return status; if (status == 0) goto backtrack; node = node->next_1.node; break; } case RE_OP_SUCCESS: /* Success. */ /* Must the match advance past its start? */ TRACE(("%s\n", re_op_text[node->op])) if (state->text_pos == state->search_anchor && state->must_advance) goto backtrack; if (state->match_all) { /* We want to match all of the slice. */ if (state->reverse) { if (state->text_pos != state->slice_start) goto backtrack; } else { if (state->text_pos != state->slice_end) goto backtrack; } } if (state->pattern->flags & RE_FLAG_POSIX) { /* If we're looking for a POSIX match, check whether this one * is better and then keep looking. */ if (!check_posix_match(safe_state)) return RE_ERROR_MEMORY; goto backtrack; } return RE_ERROR_SUCCESS; default: /* Illegal opcode! */ TRACE(("UNKNOWN OP %d\n", node->op)) return RE_ERROR_ILLEGAL; } } backtrack: for (;;) { RE_BacktrackData* bt_data; TRACE(("BACKTRACK ")) /* Should we abort the matching? */ ++state->iterations; if (state->iterations == 0 && safe_check_signals(safe_state)) return RE_ERROR_INTERRUPTED; bt_data = last_backtrack(state); switch (bt_data->op) { case RE_OP_ANY: /* Any character except a newline. */ case RE_OP_ANY_ALL: /* Any character at all. */ case RE_OP_ANY_ALL_REV: /* Any character at all, backwards. */ case RE_OP_ANY_REV: /* Any character except a newline, backwards. */ case RE_OP_ANY_U: /* Any character except a line separator. */ case RE_OP_ANY_U_REV: /* Any character except a line separator, backwards. */ case RE_OP_CHARACTER: /* A character. */ case RE_OP_CHARACTER_IGN: /* A character, ignoring case. */ case RE_OP_CHARACTER_IGN_REV: /* A character, ignoring case, backwards. */ case RE_OP_CHARACTER_REV: /* A character, backwards. */ case RE_OP_PROPERTY: /* A property. */ case RE_OP_PROPERTY_IGN: /* A property, ignoring case. */ case RE_OP_PROPERTY_IGN_REV: /* A property, ignoring case, backwards. */ case RE_OP_PROPERTY_REV: /* A property, backwards. */ case RE_OP_RANGE: /* A range. */ case RE_OP_RANGE_IGN: /* A range, ignoring case. */ case RE_OP_RANGE_IGN_REV: /* A range, ignoring case, backwards. */ case RE_OP_RANGE_REV: /* A range, backwards. */ case RE_OP_SET_DIFF: /* Set difference. */ case RE_OP_SET_DIFF_IGN: /* Set difference, ignoring case. */ case RE_OP_SET_DIFF_IGN_REV: /* Set difference, ignoring case, backwards. */ case RE_OP_SET_DIFF_REV: /* Set difference, backwards. */ case RE_OP_SET_INTER: /* Set intersection. */ case RE_OP_SET_INTER_IGN: /* Set intersection, ignoring case. */ case RE_OP_SET_INTER_IGN_REV: /* Set intersection, ignoring case, backwards. */ case RE_OP_SET_INTER_REV: /* Set intersection, backwards. */ case RE_OP_SET_SYM_DIFF: /* Set symmetric difference. */ case RE_OP_SET_SYM_DIFF_IGN: /* Set symmetric difference, ignoring case. */ case RE_OP_SET_SYM_DIFF_IGN_REV: /* Set symmetric difference, ignoring case, backwards. */ case RE_OP_SET_SYM_DIFF_REV: /* Set symmetric difference, backwards. */ case RE_OP_SET_UNION: /* Set union. */ case RE_OP_SET_UNION_IGN: /* Set union, ignoring case. */ case RE_OP_SET_UNION_IGN_REV: /* Set union, ignoring case, backwards. */ case RE_OP_SET_UNION_REV: /* Set union, backwards. */ TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_item(safe_state, search, &state->text_pos, &node, TRUE); if (status < 0) return RE_ERROR_PARTIAL; if (node) goto advance; break; case RE_OP_ATOMIC: /* Start of an atomic group. */ { RE_AtomicData* atomic; /* backtrack to the start of an atomic group. */ atomic = pop_atomic(safe_state); if (atomic->has_repeats) pop_repeats(state); if (atomic->has_groups) pop_groups(state); state->too_few_errors = bt_data->atomic.too_few_errors; state->capture_change = bt_data->atomic.capture_change; discard_backtrack(state); break; } case RE_OP_BODY_END: { RE_RepeatData* rp_data; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->repeat.index)) /* We're backtracking into the body. */ rp_data = &state->repeats[bt_data->repeat.index]; /* Restore the repeat info. */ rp_data->count = bt_data->repeat.count; rp_data->start = bt_data->repeat.start; rp_data->capture_change = bt_data->repeat.capture_change; discard_backtrack(state); break; } case RE_OP_BODY_START: { TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->repeat.index)) /* The body may have failed to match at this position. */ if (!guard_repeat(safe_state, bt_data->repeat.index, bt_data->repeat.text_pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; discard_backtrack(state); break; } case RE_OP_BOUNDARY: /* On a word boundary. */ case RE_OP_DEFAULT_BOUNDARY: /* On a default word boundary. */ case RE_OP_DEFAULT_END_OF_WORD: /* At a default end of a word. */ case RE_OP_DEFAULT_START_OF_WORD: /* At a default start of a word. */ case RE_OP_END_OF_LINE: /* At the end of a line. */ case RE_OP_END_OF_LINE_U: /* At the end of a line. */ case RE_OP_END_OF_STRING: /* At the end of the string. */ case RE_OP_END_OF_STRING_LINE: /* At end of string or final newline. */ case RE_OP_END_OF_STRING_LINE_U: /* At end of string or final newline. */ case RE_OP_END_OF_WORD: /* At end of a word. */ case RE_OP_GRAPHEME_BOUNDARY: /* On a grapheme boundary. */ case RE_OP_SEARCH_ANCHOR: /* At the start of the search. */ case RE_OP_START_OF_LINE: /* At the start of a line. */ case RE_OP_START_OF_LINE_U: /* At the start of a line. */ case RE_OP_START_OF_STRING: /* At the start of the string. */ case RE_OP_START_OF_WORD: /* At start of a word. */ TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_item(safe_state, search, &state->text_pos, &node, FALSE); if (status < 0) return RE_ERROR_PARTIAL; if (node) goto advance; break; case RE_OP_BRANCH: /* 2-way branch. */ TRACE(("%s\n", re_op_text[bt_data->op])) node = bt_data->branch.position.node; state->text_pos = bt_data->branch.position.text_pos; discard_backtrack(state); goto advance; case RE_OP_CALL_REF: /* A group call ref. */ case RE_OP_GROUP_CALL: /* Group call. */ TRACE(("%s\n", re_op_text[bt_data->op])) pop_group_return(state); discard_backtrack(state); break; case RE_OP_CONDITIONAL: /* Conditional subpattern. */ { TRACE(("%s\n", re_op_text[bt_data->op])) if (bt_data->lookaround.inside) { /* Backtracked to the start of a lookaround. */ RE_AtomicData* conditional; conditional = pop_atomic(safe_state); state->text_pos = conditional->text_pos; state->slice_end = conditional->slice_end; state->slice_start = conditional->slice_start; state->current_backtrack_block = conditional->current_backtrack_block; state->current_backtrack_block->count = conditional->backtrack_count; /* Restore the groups and repeats and certain flags. */ if (conditional->has_repeats) pop_repeats(state); if (conditional->has_groups) pop_groups(state); state->too_few_errors = bt_data->lookaround.too_few_errors; state->capture_change = bt_data->lookaround.capture_change; if (bt_data->lookaround.node->match) { /* It's a positive lookaround that's failed. * * Go to the 'false' branch. */ node = bt_data->lookaround.node->nonstring.next_2.node; } else { /* It's a negative lookaround that's failed. * * Go to the 'true' branch. */ node = bt_data->lookaround.node->nonstring.next_2.node; } discard_backtrack(state); goto advance; } else { /* Backtracked to a lookaround. If it's a positive lookaround * that succeeded, we need to restore the groups; if it's a * negative lookaround that failed, it would have completely * backtracked inside and already restored the groups. We also * need to restore certain flags. */ if (bt_data->lookaround.node->match) pop_groups(state); state->too_few_errors = bt_data->lookaround.too_few_errors; state->capture_change = bt_data->lookaround.capture_change; discard_backtrack(state); } break; } case RE_OP_END_FUZZY: /* End of fuzzy matching. */ TRACE(("%s\n", re_op_text[bt_data->op])) state->total_fuzzy_counts[RE_FUZZY_SUB] -= state->fuzzy_info.counts[RE_FUZZY_SUB]; state->total_fuzzy_counts[RE_FUZZY_INS] -= state->fuzzy_info.counts[RE_FUZZY_INS]; state->total_fuzzy_counts[RE_FUZZY_DEL] -= state->fuzzy_info.counts[RE_FUZZY_DEL]; /* We need to retry the fuzzy match. */ status = retry_fuzzy_insert(safe_state, &state->text_pos, &node); if (status < 0) return RE_ERROR_PARTIAL; /* If there were too few errors, in the fuzzy section, try again. */ if (state->too_few_errors) { state->too_few_errors = FALSE; goto backtrack; } if (node) { state->total_fuzzy_counts[RE_FUZZY_SUB] += state->fuzzy_info.counts[RE_FUZZY_SUB]; state->total_fuzzy_counts[RE_FUZZY_INS] += state->fuzzy_info.counts[RE_FUZZY_INS]; state->total_fuzzy_counts[RE_FUZZY_DEL] += state->fuzzy_info.counts[RE_FUZZY_DEL]; node = node->next_1.node; goto advance; } break; case RE_OP_END_GROUP: /* End of a capture group. */ { RE_CODE private_index; RE_GroupData* group; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->group.public_index)) private_index = bt_data->group.private_index; group = &state->groups[private_index - 1]; /* Unsave the capture? */ if (bt_data->group.capture) unsave_capture(state, bt_data->group.private_index, bt_data->group.public_index); if (pattern->group_info[private_index - 1].referenced && group->span.end != bt_data->group.text_pos) --state->capture_change; group->span.end = bt_data->group.text_pos; group->current_capture = bt_data->group.current_capture; discard_backtrack(state); break; } case RE_OP_FAILURE: { TRACE(("%s\n", re_op_text[bt_data->op])) /* Have we been looking for a POSIX match? */ if (state->found_match) { restore_best_match(safe_state); return RE_OP_SUCCESS; } /* Do we have to advance? */ if (!search) return RE_ERROR_FAILURE; /* Can we advance? */ state->text_pos = state->match_pos; if (state->reverse) { if (state->text_pos <= state->slice_start) return RE_ERROR_FAILURE; } else { if (state->text_pos >= state->slice_end) return RE_ERROR_FAILURE; } /* Skip over any repeated leading characters. */ switch (start_node->op) { case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: { size_t count; BOOL is_partial; /* How many characters did the repeat actually match? */ count = count_one(state, start_node->nonstring.next_2.node, state->text_pos, start_node->values[2], &is_partial); /* If it's fewer than the maximum then skip over those * characters. */ if (count < start_node->values[2]) state->text_pos += (Py_ssize_t)count * pattern_step; break; } } /* Advance and try to match again. e also need to check whether we * need to skip. */ if (state->reverse) { if (state->text_pos > state->slice_end) state->text_pos = state->slice_end; else --state->text_pos; } else { if (state->text_pos < state->slice_start) state->text_pos = state->slice_start; else ++state->text_pos; } /* Clear the groups. */ clear_groups(state); goto start_match; } case RE_OP_FUZZY: /* Fuzzy matching. */ { RE_FuzzyInfo* fuzzy_info; TRACE(("%s\n", re_op_text[bt_data->op])) /* Restore the previous fuzzy info. */ fuzzy_info = &state->fuzzy_info; memmove(fuzzy_info, &bt_data->fuzzy.fuzzy_info, sizeof(RE_FuzzyInfo)); discard_backtrack(state); break; } case RE_OP_GREEDY_REPEAT: /* Greedy repeat. */ case RE_OP_LAZY_REPEAT: /* Lazy repeat. */ { RE_RepeatData* rp_data; TRACE(("%s\n", re_op_text[bt_data->op])) /* The repeat failed to match. */ rp_data = &state->repeats[bt_data->repeat.index]; /* The body may have failed to match at this position. */ if (!guard_repeat(safe_state, bt_data->repeat.index, bt_data->repeat.text_pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; /* Restore the previous repeat. */ rp_data->count = bt_data->repeat.count; rp_data->start = bt_data->repeat.start; rp_data->capture_change = bt_data->repeat.capture_change; discard_backtrack(state); break; } case RE_OP_GREEDY_REPEAT_ONE: /* Greedy repeat for one character. */ { RE_RepeatData* rp_data; size_t count; Py_ssize_t step; Py_ssize_t pos; Py_ssize_t limit; RE_Node* test; BOOL match; BOOL m; size_t index; TRACE(("%s\n", re_op_text[bt_data->op])) node = bt_data->repeat.position.node; rp_data = &state->repeats[bt_data->repeat.index]; /* Unmatch one character at a time until the tail could match or we * have reached the minimum. */ state->text_pos = rp_data->start; count = rp_data->count; step = node->step; pos = state->text_pos + (Py_ssize_t)count * step; limit = state->text_pos + (Py_ssize_t)node->values[1] * step; /* The tail failed to match at this position. */ if (!guard_repeat(safe_state, bt_data->repeat.index, pos, RE_STATUS_TAIL, TRUE)) return RE_ERROR_MEMORY; /* A (*SKIP) might have change the size of the slice. */ if (step > 0) { if (limit < state->slice_start) limit = state->slice_start; } else { if (limit > state->slice_end) limit = state->slice_end; } if (pos == limit) { /* We've backtracked the repeat as far as we can. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); break; } test = node->next_1.test; m = test->match; index = node->values[0]; match = FALSE; if (test->status & RE_STATUS_FUZZY) { for (;;) { int status; RE_Position next_position; pos -= step; status = try_match(state, &node->next_1, pos, &next_position); if (status < 0) return status; if (status != RE_ERROR_FAILURE && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } } else { /* A repeated single-character match is often followed by a * literal, so checking specially for it can be a good * optimisation when working with long strings. */ switch (test->op) { case RE_OP_CHARACTER: { Py_UCS4 ch; ch = test->values[0]; for (;;) { --pos; if (same_char(char_at(state->text, pos), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } case RE_OP_CHARACTER_IGN: { Py_UCS4 ch; ch = test->values[0]; for (;;) { --pos; if (same_char_ign(encoding, locale_info, char_at(state->text, pos), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } case RE_OP_CHARACTER_IGN_REV: { Py_UCS4 ch; ch = test->values[0]; for (;;) { ++pos; if (same_char_ign(encoding, locale_info, char_at(state->text, pos - 1), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } case RE_OP_CHARACTER_REV: { Py_UCS4 ch; ch = test->values[0]; for (;;) { ++pos; if (same_char(char_at(state->text, pos - 1), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } case RE_OP_STRING: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ pos = min_ssize_t(pos - 1, state->slice_end - length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos < limit) break; found = string_search_rev(safe_state, test, pos + length, limit, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found - length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } --pos; } break; } case RE_OP_STRING_FLD: { int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t folded_length; size_t i; Py_UCS4 folded[RE_MAX_FOLDED]; full_case_fold = encoding->full_case_fold; folded_length = 0; for (i = 0; i < test->value_count; i++) folded_length += full_case_fold(locale_info, test->values[i], folded); /* The tail is a string. We don't want to go off the end of * the slice. */ pos = min_ssize_t(pos - 1, state->slice_end - folded_length); for (;;) { Py_ssize_t found; Py_ssize_t new_pos; BOOL is_partial; if (pos < limit) break; found = string_search_fld_rev(safe_state, test, pos + folded_length, limit, &new_pos, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found - folded_length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } --pos; } break; } case RE_OP_STRING_FLD_REV: { int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t folded_length; size_t i; Py_UCS4 folded[RE_MAX_FOLDED]; full_case_fold = encoding->full_case_fold; folded_length = 0; for (i = 0; i < test->value_count; i++) folded_length += full_case_fold(locale_info, test->values[i], folded); /* The tail is a string. We don't want to go off the end of * the slice. */ pos = max_ssize_t(pos + 1, state->slice_start + folded_length); for (;;) { Py_ssize_t found; Py_ssize_t new_pos; BOOL is_partial; if (pos > limit) break; found = string_search_fld(safe_state, test, pos - folded_length, limit, &new_pos, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found + folded_length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } ++pos; } break; } case RE_OP_STRING_IGN: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ pos = min_ssize_t(pos - 1, state->slice_end - length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos < limit) break; found = string_search_ign_rev(safe_state, test, pos + length, limit, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found - length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } --pos; } break; } case RE_OP_STRING_IGN_REV: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ pos = max_ssize_t(pos + 1, state->slice_start + length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos > limit) break; found = string_search_ign(safe_state, test, pos - length, limit, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found + length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } ++pos; } break; } case RE_OP_STRING_REV: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ pos = max_ssize_t(pos + 1, state->slice_start + length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos > limit) break; found = string_search(safe_state, test, pos - length, limit, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; pos = found + length; if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } ++pos; } break; } default: for (;;) { RE_Position next_position; pos -= step; status = try_match(state, &node->next_1, pos, &next_position); if (status < 0) return status; if (status == RE_ERROR_SUCCESS && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } } if (match) { count = (size_t)abs_ssize_t(pos - state->text_pos); /* The tail could match. */ if (count > node->values[1]) /* The match is longer than the minimum, so we might need * to backtrack the repeat again to consume less. */ rp_data->count = count; else { /* We've reached or passed the minimum, so we won't need to * backtrack the repeat again. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); /* Have we passed the minimum? */ if (count < node->values[1]) goto backtrack; } node = node->next_1.node; state->text_pos = pos; goto advance; } else { /* Don't try this repeated match again. */ if (step > 0) { if (!guard_repeat_range(safe_state, bt_data->repeat.index, limit, pos, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; } else if (step < 0) { if (!guard_repeat_range(safe_state, bt_data->repeat.index, pos, limit, RE_STATUS_BODY, TRUE)) return RE_ERROR_MEMORY; } /* We've backtracked the repeat as far as we can. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); } break; } case RE_OP_GROUP_RETURN: /* Group return. */ { RE_Node* return_node; TRACE(("%s\n", re_op_text[bt_data->op])) return_node = bt_data->group_call.node; push_group_return(safe_state, return_node); if (return_node) { /* Restore the groups. */ pop_groups(state); state->capture_change = bt_data->group_call.capture_change; /* Restore the repeats. */ pop_repeats(state); } discard_backtrack(state); break; } case RE_OP_KEEP: /* Keep. */ { state->match_pos = bt_data->keep.match_pos; discard_backtrack(state); break; } case RE_OP_LAZY_REPEAT_ONE: /* Lazy repeat for one character. */ { RE_RepeatData* rp_data; size_t count; Py_ssize_t step; Py_ssize_t pos; Py_ssize_t available; size_t max_count; Py_ssize_t limit; RE_Node* repeated; RE_Node* test; BOOL match; BOOL m; size_t index; TRACE(("%s\n", re_op_text[bt_data->op])) node = bt_data->repeat.position.node; rp_data = &state->repeats[bt_data->repeat.index]; /* Match one character at a time until the tail could match or we * have reached the maximum. */ state->text_pos = rp_data->start; count = rp_data->count; step = node->step; pos = state->text_pos + (Py_ssize_t)count * step; available = step > 0 ? state->slice_end - state->text_pos : state->text_pos - state->slice_start; max_count = min_size_t((size_t)available, node->values[2]); limit = state->text_pos + (Py_ssize_t)max_count * step; repeated = node->nonstring.next_2.node; test = node->next_1.test; m = test->match; index = node->values[0]; match = FALSE; if (test->status & RE_STATUS_FUZZY) { for (;;) { RE_Position next_position; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; pos += step; status = try_match(state, &node->next_1, pos, &next_position); if (status < 0) return status; if (status == RE_ERROR_SUCCESS && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } } else { /* A repeated single-character match is often followed by a * literal, so checking specially for it can be a good * optimisation when working with long strings. */ switch (test->op) { case RE_OP_CHARACTER: { Py_UCS4 ch; ch = test->values[0]; /* The tail is a character. We don't want to go off the end * of the slice. */ limit = min_ssize_t(limit, state->slice_end - 1); for (;;) { if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; if (same_char(char_at(state->text, pos), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_CHARACTER_IGN: { Py_UCS4 ch; ch = test->values[0]; /* The tail is a character. We don't want to go off the end * of the slice. */ limit = min_ssize_t(limit, state->slice_end - 1); for (;;) { if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; if (same_char_ign(encoding, locale_info, char_at(state->text, pos), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_CHARACTER_IGN_REV: { Py_UCS4 ch; ch = test->values[0]; /* The tail is a character. We don't want to go off the end * of the slice. */ limit = max_ssize_t(limit, state->slice_start + 1); for (;;) { if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; if (same_char_ign(encoding, locale_info, char_at(state->text, pos - 1), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_CHARACTER_REV: { Py_UCS4 ch; ch = test->values[0]; /* The tail is a character. We don't want to go off the end * of the slice. */ limit = max_ssize_t(limit, state->slice_start + 1); for (;;) { if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; if (same_char(char_at(state->text, pos - 1), ch) == m && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ limit = min_ssize_t(limit, state->slice_end - length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; /* Look for the tail string. */ found = string_search(safe_state, test, pos + 1, limit + length, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_FLD: { /* The tail is a string. We don't want to go off the end of * the slice. */ limit = min_ssize_t(limit, state->slice_end); for (;;) { Py_ssize_t found; Py_ssize_t new_pos; BOOL is_partial; if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; /* Look for the tail string. */ found = string_search_fld(safe_state, test, pos + 1, limit, &new_pos, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_FLD_REV: { /* The tail is a string. We don't want to go off the end of * the slice. */ limit = max_ssize_t(limit, state->slice_start); for (;;) { Py_ssize_t found; Py_ssize_t new_pos; BOOL is_partial; if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; /* Look for the tail string. */ found = string_search_fld_rev(safe_state, test, pos - 1, limit, &new_pos, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_IGN: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ limit = min_ssize_t(limit, state->slice_end - length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos >= state->text_length && state->partial_side == RE_PARTIAL_RIGHT) return RE_ERROR_PARTIAL; if (pos >= limit) break; /* Look for the tail string. */ found = string_search_ign(safe_state, test, pos + 1, limit + length, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; ++pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_IGN_REV: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ limit = max_ssize_t(limit, state->slice_start + length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; /* Look for the tail string. */ found = string_search_ign_rev(safe_state, test, pos - 1, limit - length, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } case RE_OP_STRING_REV: { Py_ssize_t length; length = (Py_ssize_t)test->value_count; /* The tail is a string. We don't want to go off the end of * the slice. */ limit = max_ssize_t(limit, state->slice_start + length); for (;;) { Py_ssize_t found; BOOL is_partial; if (pos <= 0 && state->partial_side == RE_PARTIAL_LEFT) return RE_ERROR_PARTIAL; if (pos <= limit) break; /* Look for the tail string. */ found = string_search_rev(safe_state, test, pos - 1, limit - length, &is_partial); if (is_partial) return RE_ERROR_PARTIAL; if (found < 0) break; if (repeated->op == RE_OP_ANY_ALL) /* Anything can precede the tail. */ pos = found; else { /* Check that what precedes the tail will match. */ while (pos != found) { status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; --pos; } if (pos != found) /* Something preceding the tail didn't match. */ break; } if (!is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } } break; } default: for (;;) { RE_Position next_position; status = match_one(state, repeated, pos); if (status < 0) return status; if (status == RE_ERROR_FAILURE) break; pos += step; status = try_match(state, &node->next_1, pos, &next_position); if (status < 0) return RE_ERROR_PARTIAL; if (status == RE_ERROR_SUCCESS && !is_repeat_guarded(safe_state, index, pos, RE_STATUS_TAIL)) { match = TRUE; break; } if (pos == limit) break; } break; } } if (match) { /* The tail could match. */ count = (size_t)abs_ssize_t(pos - state->text_pos); state->text_pos = pos; if (count < max_count) { /* The match is shorter than the maximum, so we might need * to backtrack the repeat again to consume more. */ rp_data->count = count; } else { /* We've reached or passed the maximum, so we won't need to * backtrack the repeat again. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); /* Have we passed the maximum? */ if (count > max_count) goto backtrack; } node = node->next_1.node; goto advance; } else { /* The tail couldn't match. */ rp_data->start = bt_data->repeat.text_pos; rp_data->count = bt_data->repeat.count; discard_backtrack(state); } break; } case RE_OP_LOOKAROUND: /* Lookaround subpattern. */ { TRACE(("%s\n", re_op_text[bt_data->op])) if (bt_data->lookaround.inside) { /* Backtracked to the start of a lookaround. */ RE_AtomicData* lookaround; lookaround = pop_atomic(safe_state); state->text_pos = lookaround->text_pos; state->slice_end = lookaround->slice_end; state->slice_start = lookaround->slice_start; state->current_backtrack_block = lookaround->current_backtrack_block; state->current_backtrack_block->count = lookaround->backtrack_count; /* Restore the groups and repeats and certain flags. */ if (lookaround->has_repeats) pop_repeats(state); if (lookaround->has_groups) pop_groups(state); state->too_few_errors = bt_data->lookaround.too_few_errors; state->capture_change = bt_data->lookaround.capture_change; if (bt_data->lookaround.node->match) { /* It's a positive lookaround that's failed. */ discard_backtrack(state); } else { /* It's a negative lookaround that's failed. Record that * we've now left the lookaround and continue to the * following node. */ bt_data->lookaround.inside = FALSE; node = bt_data->lookaround.node->nonstring.next_2.node; goto advance; } } else { /* Backtracked to a lookaround. If it's a positive lookaround * that succeeded, we need to restore the groups; if it's a * negative lookaround that failed, it would have completely * backtracked inside and already restored the groups. We also * need to restore certain flags. */ if (bt_data->lookaround.node->match && (bt_data->lookaround.node->status & RE_STATUS_HAS_GROUPS)) pop_groups(state); state->too_few_errors = bt_data->lookaround.too_few_errors; state->capture_change = bt_data->lookaround.capture_change; discard_backtrack(state); } break; } case RE_OP_MATCH_BODY: { RE_RepeatData* rp_data; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->repeat.index)) /* We want to match the body. */ rp_data = &state->repeats[bt_data->repeat.index]; /* Restore the repeat info. */ rp_data->count = bt_data->repeat.count; rp_data->start = bt_data->repeat.start; rp_data->capture_change = bt_data->repeat.capture_change; /* Record backtracking info in case the body fails to match. */ bt_data->op = RE_OP_BODY_START; /* Advance into the body. */ node = bt_data->repeat.position.node; state->text_pos = bt_data->repeat.position.text_pos; goto advance; } case RE_OP_MATCH_TAIL: { RE_RepeatData* rp_data; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->repeat.index)) /* We want to match the tail. */ rp_data = &state->repeats[bt_data->repeat.index]; /* Restore the repeat info. */ rp_data->count = bt_data->repeat.count; rp_data->start = bt_data->repeat.start; rp_data->capture_change = bt_data->repeat.capture_change; /* Advance into the tail. */ node = bt_data->repeat.position.node; state->text_pos = bt_data->repeat.position.text_pos; discard_backtrack(state); goto advance; } case RE_OP_REF_GROUP: /* Reference to a capture group. */ case RE_OP_REF_GROUP_IGN: /* Reference to a capture group, ignoring case. */ case RE_OP_REF_GROUP_IGN_REV: /* Reference to a capture group, backwards, ignoring case. */ case RE_OP_REF_GROUP_REV: /* Reference to a capture group, backwards. */ case RE_OP_STRING: /* A string. */ case RE_OP_STRING_IGN: /* A string, ignoring case. */ case RE_OP_STRING_IGN_REV: /* A string, backwards, ignoring case. */ case RE_OP_STRING_REV: /* A string, backwards. */ { BOOL matched; TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_string(safe_state, search, &state->text_pos, &node, &string_pos, &matched); if (status < 0) return RE_ERROR_PARTIAL; if (matched) goto advance; string_pos = -1; break; } case RE_OP_REF_GROUP_FLD: /* Reference to a capture group, ignoring case. */ case RE_OP_REF_GROUP_FLD_REV: /* Reference to a capture group, backwards, ignoring case. */ { BOOL matched; TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_group_fld(safe_state, search, &state->text_pos, &node, &folded_pos, &string_pos, &gfolded_pos, &matched); if (status < 0) return RE_ERROR_PARTIAL; if (matched) goto advance; string_pos = -1; break; } case RE_OP_START_GROUP: /* Start of a capture group. */ { RE_CODE private_index; RE_GroupData* group; TRACE(("%s %d\n", re_op_text[bt_data->op], bt_data->group.public_index)) private_index = bt_data->group.private_index; group = &state->groups[private_index - 1]; /* Unsave the capture? */ if (bt_data->group.capture) unsave_capture(state, bt_data->group.private_index, bt_data->group.public_index); if (pattern->group_info[private_index - 1].referenced && group->span.start != bt_data->group.text_pos) --state->capture_change; group->span.start = bt_data->group.text_pos; group->current_capture = bt_data->group.current_capture; discard_backtrack(state); break; } case RE_OP_STRING_FLD: /* A string, ignoring case. */ case RE_OP_STRING_FLD_REV: /* A string, backwards, ignoring case. */ { BOOL matched; TRACE(("%s\n", re_op_text[bt_data->op])) status = retry_fuzzy_match_string_fld(safe_state, search, &state->text_pos, &node, &string_pos, &folded_pos, &matched); if (status < 0) return RE_ERROR_PARTIAL; if (matched) goto advance; string_pos = -1; break; } default: TRACE(("UNKNOWN OP %d\n", bt_data->op)) return RE_ERROR_ILLEGAL; } } } /* Saves group data for fuzzy matching. */ Py_LOCAL_INLINE(RE_GroupData*) save_groups(RE_SafeState* safe_state, RE_GroupData* saved_groups) { RE_State* state; PatternObject* pattern; size_t g; /* Re-acquire the GIL. */ acquire_GIL(safe_state); state = safe_state->re_state; pattern = state->pattern; if (!saved_groups) { saved_groups = (RE_GroupData*)re_alloc(pattern->true_group_count * sizeof(RE_GroupData)); if (!saved_groups) goto error; memset(saved_groups, 0, pattern->true_group_count * sizeof(RE_GroupData)); } for (g = 0; g < pattern->true_group_count; g++) { RE_GroupData* orig; RE_GroupData* copy; orig = &state->groups[g]; copy = &saved_groups[g]; copy->span = orig->span; if (orig->capture_count > copy->capture_capacity) { RE_GroupSpan* cap_copy; cap_copy = (RE_GroupSpan*)re_realloc(copy->captures, orig->capture_count * sizeof(RE_GroupSpan)); if (!cap_copy) goto error; copy->capture_capacity = orig->capture_count; copy->captures = cap_copy; } copy->capture_count = orig->capture_count; Py_MEMCPY(copy->captures, orig->captures, orig->capture_count * sizeof(RE_GroupSpan)); } /* Release the GIL. */ release_GIL(safe_state); return saved_groups; error: if (saved_groups) { for (g = 0; g < pattern->true_group_count; g++) re_dealloc(saved_groups[g].captures); re_dealloc(saved_groups); } /* Release the GIL. */ release_GIL(safe_state); return NULL; } /* Restores group data for fuzzy matching. */ Py_LOCAL_INLINE(void) restore_groups(RE_SafeState* safe_state, RE_GroupData* saved_groups) { RE_State* state; PatternObject* pattern; size_t g; /* Re-acquire the GIL. */ acquire_GIL(safe_state); state = safe_state->re_state; pattern = state->pattern; for (g = 0; g < pattern->true_group_count; g++) re_dealloc(state->groups[g].captures); Py_MEMCPY(state->groups, saved_groups, pattern->true_group_count * sizeof(RE_GroupData)); re_dealloc(saved_groups); /* Release the GIL. */ release_GIL(safe_state); } /* Discards group data for fuzzy matching. */ Py_LOCAL_INLINE(void) discard_groups(RE_SafeState* safe_state, RE_GroupData* saved_groups) { RE_State* state; PatternObject* pattern; size_t g; /* Re-acquire the GIL. */ acquire_GIL(safe_state); state = safe_state->re_state; pattern = state->pattern; for (g = 0; g < pattern->true_group_count; g++) re_dealloc(saved_groups[g].captures); re_dealloc(saved_groups); /* Release the GIL. */ release_GIL(safe_state); } /* Saves the fuzzy info. */ Py_LOCAL_INLINE(void) save_fuzzy_counts(RE_State* state, size_t* fuzzy_counts) { Py_MEMCPY(fuzzy_counts, state->total_fuzzy_counts, sizeof(state->total_fuzzy_counts)); } /* Restores the fuzzy info. */ Py_LOCAL_INLINE(void) restore_fuzzy_counts(RE_State* state, size_t* fuzzy_counts) { Py_MEMCPY(state->total_fuzzy_counts, fuzzy_counts, sizeof(state->total_fuzzy_counts)); } /* Makes the list of best matches found so far. */ Py_LOCAL_INLINE(void) make_best_list(RE_BestList* best_list) { best_list->capacity = 0; best_list->count = 0; best_list->entries = NULL; } /* Clears the list of best matches found so far. */ Py_LOCAL_INLINE(void) clear_best_list(RE_BestList* best_list) { best_list->count = 0; } /* Adds a new entry to the list of best matches found so far. */ Py_LOCAL_INLINE(BOOL) add_to_best_list(RE_SafeState* safe_state, RE_BestList* best_list, Py_ssize_t match_pos, Py_ssize_t text_pos) { RE_BestEntry* entry; if (best_list->count >= best_list->capacity) { RE_BestEntry* new_entries; best_list->capacity = best_list->capacity == 0 ? 16 : best_list->capacity * 2; new_entries = safe_realloc(safe_state, best_list->entries, best_list->capacity * sizeof(RE_BestEntry)); if (!new_entries) return FALSE; best_list->entries = new_entries; } entry = &best_list->entries[best_list->count++]; entry->match_pos = match_pos; entry->text_pos = text_pos; return TRUE; } /* Destroy the list of best matches found so far. */ Py_LOCAL_INLINE(void) destroy_best_list(RE_SafeState* safe_state, RE_BestList* best_list) { if (best_list->entries) safe_dealloc(safe_state, best_list->entries); } /* Performs a match or search from the current text position for a best fuzzy * match. */ Py_LOCAL_INLINE(int) do_best_fuzzy_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; Py_ssize_t available; int step; size_t fewest_errors; BOOL must_advance; BOOL found_match; RE_BestList best_list; Py_ssize_t start_pos; int status; TRACE(("<>\n")) state = safe_state->re_state; if (state->reverse) { available = state->text_pos - state->slice_start; step = -1; } else { available = state->slice_end - state->text_pos; step = 1; } /* The maximum permitted cost. */ state->max_errors = PY_SSIZE_T_MAX; fewest_errors = PY_SSIZE_T_MAX; state->best_text_pos = state->reverse ? state->slice_start : state->slice_end; must_advance = state->must_advance; found_match = FALSE; make_best_list(&best_list); /* Search the text for the best match. */ start_pos = state->text_pos; while (state->slice_start <= start_pos && start_pos <= state->slice_end) { state->text_pos = start_pos; state->must_advance = must_advance; /* Initialise the state. */ init_match(state); status = RE_ERROR_SUCCESS; if (state->max_errors == 0 && state->partial_side == RE_PARTIAL_NONE) { /* An exact match, and partial matches not permitted. */ if (available < state->min_width || (available == 0 && state->must_advance)) status = RE_ERROR_FAILURE; } if (status == RE_ERROR_SUCCESS) status = basic_match(safe_state, search); /* Has an error occurred, or is it a partial match? */ if (status < 0) break; if (status == RE_ERROR_SUCCESS) { /* It was a successful match. */ found_match = TRUE; if (state->total_errors < fewest_errors) { /* This match was better than any of the previous ones. */ fewest_errors = state->total_errors; if (state->total_errors == 0) /* It was a perfect match. */ break; /* Forget all the previous worse matches and remember this one. */ clear_best_list(&best_list); if (!add_to_best_list(safe_state, &best_list, state->match_pos, state->text_pos)) return RE_ERROR_MEMORY; } else if (state->total_errors == fewest_errors) /* This match was as good as the previous matches. Remember * this one. */ add_to_best_list(safe_state, &best_list, state->match_pos, state->text_pos); } /* Should we keep searching? */ if (!search) break; start_pos = state->match_pos + step; } if (found_match) { /* We found a match. */ if (fewest_errors > 0) { /* It doesn't look like a perfect match. */ int i; Py_ssize_t slice_start; Py_ssize_t slice_end; size_t error_limit; size_t best_fuzzy_counts[RE_FUZZY_COUNT]; RE_GroupData* best_groups; Py_ssize_t best_match_pos; Py_ssize_t best_text_pos; slice_start = state->slice_start; slice_end = state->slice_end; error_limit = fewest_errors; if (error_limit > RE_MAX_ERRORS) error_limit = RE_MAX_ERRORS; best_groups = NULL; /* Look again at the best of the matches that we've seen. */ for (i = 0; i < best_list.count; i++) { RE_BestEntry* entry; Py_ssize_t max_offset; Py_ssize_t offset; /* Look for the best fit at this position. */ entry = &best_list.entries[i]; if (search) { max_offset = state->reverse ? entry->match_pos - state->slice_start : state->slice_end - entry->match_pos; if (max_offset > (Py_ssize_t)fewest_errors) max_offset = (Py_ssize_t)fewest_errors; if (max_offset > (Py_ssize_t)error_limit) max_offset = (Py_ssize_t)error_limit; } else max_offset = 0; start_pos = entry->match_pos; offset = 0; while (offset <= max_offset) { state->max_errors = 1; while (state->max_errors <= error_limit) { state->text_pos = start_pos; init_match(state); status = basic_match(safe_state, FALSE); if (status == RE_ERROR_SUCCESS) { BOOL better; if (state->total_errors < error_limit || i == 0 && offset == 0) better = TRUE; else if (state->total_errors == error_limit) /* The cost is as low as the current best, but * is it earlier? */ better = state->reverse ? state->match_pos > best_match_pos : state->match_pos < best_match_pos; if (better) { save_fuzzy_counts(state, best_fuzzy_counts); best_groups = save_groups(safe_state, best_groups); if (!best_groups) { destroy_best_list(safe_state, &best_list); return RE_ERROR_MEMORY; } best_match_pos = state->match_pos; best_text_pos = state->text_pos; error_limit = state->total_errors; } break; } ++state->max_errors; } start_pos += step; ++offset; } if (status == RE_ERROR_SUCCESS && state->total_errors == 0) break; } if (best_groups) { status = RE_ERROR_SUCCESS; state->match_pos = best_match_pos; state->text_pos = best_text_pos; restore_groups(safe_state, best_groups); restore_fuzzy_counts(state, best_fuzzy_counts); } else { /* None of the "best" matches could be improved on, so pick the * first. */ RE_BestEntry* entry; /* Look at only the part of the string around the match. */ entry = &best_list.entries[0]; if (state->reverse) { state->slice_start = entry->text_pos; state->slice_end = entry->match_pos; } else { state->slice_start = entry->match_pos; state->slice_end = entry->text_pos; } /* We'll expand the part that we're looking at to take to * compensate for any matching errors that have occurred. */ if (state->slice_start - slice_start >= (Py_ssize_t)fewest_errors) state->slice_start -= (Py_ssize_t)fewest_errors; else state->slice_start = slice_start; if (slice_end - state->slice_end >= (Py_ssize_t)fewest_errors) state->slice_end += (Py_ssize_t)fewest_errors; else state->slice_end = slice_end; state->max_errors = fewest_errors; state->text_pos = entry->match_pos; init_match(state); status = basic_match(safe_state, search); } state->slice_start = slice_start; state->slice_end = slice_end; } } destroy_best_list(safe_state, &best_list); return status; } /* Performs a match or search from the current text position for an enhanced * fuzzy match. */ Py_LOCAL_INLINE(int) do_enhanced_fuzzy_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; PatternObject* pattern; Py_ssize_t available; size_t fewest_errors; RE_GroupData* best_groups; Py_ssize_t best_match_pos; BOOL must_advance; Py_ssize_t slice_start; Py_ssize_t slice_end; int status; size_t best_fuzzy_counts[RE_FUZZY_COUNT]; Py_ssize_t best_text_pos = 0; /* Initialise to stop compiler warning. */ TRACE(("<>\n")) state = safe_state->re_state; pattern = state->pattern; if (state->reverse) available = state->text_pos - state->slice_start; else available = state->slice_end - state->text_pos; /* The maximum permitted cost. */ state->max_errors = PY_SSIZE_T_MAX; fewest_errors = PY_SSIZE_T_MAX; best_groups = NULL; state->best_match_pos = state->text_pos; state->best_text_pos = state->reverse ? state->slice_start : state->slice_end; best_match_pos = state->text_pos; must_advance = state->must_advance; slice_start = state->slice_start; slice_end = state->slice_end; for (;;) { /* If there's a better match, it won't start earlier in the string than * the current best match, so there's no need to start earlier than * that match. */ state->must_advance = must_advance; /* Initialise the state. */ init_match(state); status = RE_ERROR_SUCCESS; if (state->max_errors == 0 && state->partial_side == RE_PARTIAL_NONE) { /* An exact match, and partial matches not permitted. */ if (available < state->min_width || (available == 0 && state->must_advance)) status = RE_ERROR_FAILURE; } if (status == RE_ERROR_SUCCESS) status = basic_match(safe_state, search); /* Has an error occurred, or is it a partial match? */ if (status < 0) break; if (status == RE_ERROR_SUCCESS) { BOOL better; better = state->total_errors < fewest_errors; if (better) { BOOL same_match; fewest_errors = state->total_errors; state->max_errors = fewest_errors; save_fuzzy_counts(state, best_fuzzy_counts); same_match = state->match_pos == best_match_pos && state->text_pos == best_text_pos; same_match = FALSE; if (best_groups) { size_t g; /* Did we get the same match as the best so far? */ for (g = 0; same_match && g < pattern->public_group_count; g++) { same_match = state->groups[g].span.start == best_groups[g].span.start && state->groups[g].span.end == best_groups[g].span.end; } } /* Save the best result so far. */ best_groups = save_groups(safe_state, best_groups); if (!best_groups) { status = RE_ERROR_MEMORY; break; } best_match_pos = state->match_pos; best_text_pos = state->text_pos; if (same_match || state->total_errors == 0) break; state->max_errors = state->total_errors; if (state->max_errors < RE_MAX_ERRORS) --state->max_errors; } else break; if (state->reverse) { state->slice_start = state->text_pos; state->slice_end = state->match_pos; } else { state->slice_start = state->match_pos; state->slice_end = state->text_pos; } state->text_pos = state->match_pos; if (state->max_errors == PY_SSIZE_T_MAX) state->max_errors = 0; } else break; } state->slice_start = slice_start; state->slice_end = slice_end; if (best_groups) { if (status == RE_ERROR_SUCCESS && state->total_errors == 0) /* We have a perfect match, so the previous best match. */ discard_groups(safe_state, best_groups); else { /* Restore the previous best match. */ status = RE_ERROR_SUCCESS; state->match_pos = best_match_pos; state->text_pos = best_text_pos; restore_groups(safe_state, best_groups); restore_fuzzy_counts(state, best_fuzzy_counts); } } return status; } /* Performs a match or search from the current text position for a simple fuzzy * match. */ Py_LOCAL_INLINE(int) do_simple_fuzzy_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; Py_ssize_t available; int status; TRACE(("<>\n")) state = safe_state->re_state; if (state->reverse) available = state->text_pos - state->slice_start; else available = state->slice_end - state->text_pos; /* The maximum permitted cost. */ state->max_errors = PY_SSIZE_T_MAX; state->best_match_pos = state->text_pos; state->best_text_pos = state->reverse ? state->slice_start : state->slice_end; /* Initialise the state. */ init_match(state); status = RE_ERROR_SUCCESS; if (state->max_errors == 0 && state->partial_side == RE_PARTIAL_NONE) { /* An exact match, and partial matches not permitted. */ if (available < state->min_width || (available == 0 && state->must_advance)) status = RE_ERROR_FAILURE; } if (status == RE_ERROR_SUCCESS) status = basic_match(safe_state, search); return status; } /* Performs a match or search from the current text position for an exact * match. */ Py_LOCAL_INLINE(int) do_exact_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; Py_ssize_t available; int status; TRACE(("<>\n")) state = safe_state->re_state; if (state->reverse) available = state->text_pos - state->slice_start; else available = state->slice_end - state->text_pos; /* The maximum permitted cost. */ state->max_errors = 0; state->best_match_pos = state->text_pos; state->best_text_pos = state->reverse ? state->slice_start : state->slice_end; /* Initialise the state. */ init_match(state); status = RE_ERROR_SUCCESS; if (state->max_errors == 0 && state->partial_side == RE_PARTIAL_NONE) { /* An exact match, and partial matches not permitted. */ if (available < state->min_width || (available == 0 && state->must_advance)) status = RE_ERROR_FAILURE; } if (status == RE_ERROR_SUCCESS) status = basic_match(safe_state, search); return status; } /* Performs a match or search from the current text position. * * The state can sometimes be shared across threads. In such instances there's * a lock (mutex) on it. The lock is held for the duration of matching. */ Py_LOCAL_INLINE(int) do_match(RE_SafeState* safe_state, BOOL search) { RE_State* state; PatternObject* pattern; int status; TRACE(("<>\n")) state = safe_state->re_state; pattern = state->pattern; /* Is there enough to search? */ if (state->reverse) { if (state->text_pos < state->slice_start) return FALSE; } else { if (state->text_pos > state->slice_end) return FALSE; } /* Release the GIL. */ release_GIL(safe_state); if (pattern->is_fuzzy) { if (pattern->flags & RE_FLAG_BESTMATCH) status = do_best_fuzzy_match(safe_state, search); else if (pattern->flags & RE_FLAG_ENHANCEMATCH) status = do_enhanced_fuzzy_match(safe_state, search); else status = do_simple_fuzzy_match(safe_state, search); } else status = do_exact_match(safe_state, search); if (status == RE_ERROR_SUCCESS || status == RE_ERROR_PARTIAL) { Py_ssize_t max_end_index; RE_GroupInfo* group_info; size_t g; /* Store the results. */ state->lastindex = -1; state->lastgroup = -1; max_end_index = -1; if (status == RE_ERROR_PARTIAL) { /* We've matched up to the limit of the slice. */ if (state->reverse) state->text_pos = state->slice_start; else state->text_pos = state->slice_end; } /* Store the capture groups. */ group_info = pattern->group_info; for (g = 0; g < pattern->public_group_count; g++) { RE_GroupSpan* span; span = &state->groups[g].span; /* The string positions are of type Py_ssize_t, so the format needs * to specify that. */ TRACE(("group %d from %" PY_FORMAT_SIZE_T "d to %" PY_FORMAT_SIZE_T "d\n", g + 1, span->start, span->end)) if (span->start >= 0 && span->end >= 0 && group_info[g].end_index > max_end_index) { max_end_index = group_info[g].end_index; state->lastindex = (Py_ssize_t)g + 1; if (group_info[g].has_name) state->lastgroup = (Py_ssize_t)g + 1; } } } /* Re-acquire the GIL. */ acquire_GIL(safe_state); if (status < 0 && status != RE_ERROR_PARTIAL && !PyErr_Occurred()) set_error(status, NULL); return status; } /* Gets a string from a Python object. * * If the function returns true and str_info->should_release is true then it's * the responsibility of the caller to release the buffer when it's no longer * needed. */ Py_LOCAL_INLINE(BOOL) get_string(PyObject* string, RE_StringInfo* str_info) { /* Given a Python object, return a data pointer, a length (in characters), * and a character size. Return FALSE if the object is not a string (or not * compatible). */ PyBufferProcs* buffer; Py_ssize_t bytes; Py_ssize_t size; /* Unicode objects do not support the buffer API. So, get the data directly * instead. */ if (PyUnicode_Check(string)) { /* Unicode strings doesn't always support the buffer interface. */ #if PY_VERSION_HEX >= 0x03030000 if (PyUnicode_READY(string) == -1) return FALSE; str_info->characters = (void*)PyUnicode_DATA(string); str_info->length = PyUnicode_GET_LENGTH(string); str_info->charsize = PyUnicode_KIND(string); #else str_info->characters = (void*)PyUnicode_AS_DATA(string); str_info->length = PyUnicode_GET_SIZE(string); str_info->charsize = sizeof(Py_UNICODE); #endif str_info->is_unicode = TRUE; str_info->should_release = FALSE; return TRUE; } /* Get pointer to string buffer. */ buffer = Py_TYPE(string)->tp_as_buffer; str_info->view.len = -1; if (!buffer) { PyErr_SetString(PyExc_TypeError, "expected string or buffer"); return FALSE; } if (!buffer->bf_getbuffer || (*buffer->bf_getbuffer)(string, &str_info->view, PyBUF_SIMPLE) < 0) { PyErr_SetString(PyExc_TypeError, "expected string or buffer"); return FALSE; } str_info->should_release = TRUE; /* Determine buffer size. */ bytes = str_info->view.len; str_info->characters = str_info->view.buf; if (str_info->characters == NULL) { PyBuffer_Release(&str_info->view); PyErr_SetString(PyExc_ValueError, "buffer is NULL"); return FALSE; } if (bytes < 0) { PyBuffer_Release(&str_info->view); PyErr_SetString(PyExc_TypeError, "buffer has negative size"); return FALSE; } /* Determine character size. */ size = PyObject_Size(string); if (PyBytes_Check(string) || bytes == size) str_info->charsize = 1; else { PyBuffer_Release(&str_info->view); PyErr_SetString(PyExc_TypeError, "buffer size mismatch"); return FALSE; } str_info->length = size; str_info->is_unicode = FALSE; return TRUE; } /* Deallocates the groups storage. */ Py_LOCAL_INLINE(void) dealloc_groups(RE_GroupData* groups, size_t group_count) { size_t g; if (!groups) return; for (g = 0; g < group_count; g++) re_dealloc(groups[g].captures); re_dealloc(groups); } /* Initialises a state object. */ Py_LOCAL_INLINE(BOOL) state_init_2(RE_State* state, PatternObject* pattern, PyObject* string, RE_StringInfo* str_info, Py_ssize_t start, Py_ssize_t end, BOOL overlapped, int concurrent, BOOL partial, BOOL use_lock, BOOL visible_captures, BOOL match_all) { int i; Py_ssize_t final_pos; state->groups = NULL; state->best_match_groups = NULL; state->repeats = NULL; state->visible_captures = visible_captures; state->match_all = match_all; state->backtrack_block.previous = NULL; state->backtrack_block.next = NULL; state->backtrack_block.capacity = RE_BACKTRACK_BLOCK_SIZE; state->backtrack_allocated = RE_BACKTRACK_BLOCK_SIZE; state->current_atomic_block = NULL; state->first_saved_groups = NULL; state->current_saved_groups = NULL; state->first_saved_repeats = NULL; state->current_saved_repeats = NULL; state->lock = NULL; state->fuzzy_guards = NULL; state->first_group_call_frame = NULL; state->current_group_call_frame = NULL; state->group_call_guard_list = NULL; state->req_pos = -1; /* The call guards used by recursive patterns. */ if (pattern->call_ref_info_count > 0) { state->group_call_guard_list = (RE_GuardList*)re_alloc(pattern->call_ref_info_count * sizeof(RE_GuardList)); if (!state->group_call_guard_list) goto error; memset(state->group_call_guard_list, 0, pattern->call_ref_info_count * sizeof(RE_GuardList)); } /* The capture groups. */ if (pattern->true_group_count) { size_t g; if (pattern->groups_storage) { state->groups = pattern->groups_storage; pattern->groups_storage = NULL; } else { state->groups = (RE_GroupData*)re_alloc(pattern->true_group_count * sizeof(RE_GroupData)); if (!state->groups) goto error; memset(state->groups, 0, pattern->true_group_count * sizeof(RE_GroupData)); for (g = 0; g < pattern->true_group_count; g++) { RE_GroupSpan* captures; captures = (RE_GroupSpan*)re_alloc(sizeof(RE_GroupSpan)); if (!captures) { size_t i; for (i = 0; i < g; i++) re_dealloc(state->groups[i].captures); goto error; } state->groups[g].captures = captures; state->groups[g].capture_capacity = 1; } } } /* Adjust boundaries. */ if (start < 0) start += str_info->length; if (start < 0) start = 0; else if (start > str_info->length) start = str_info->length; if (end < 0) end += str_info->length; if (end < 0) end = 0; else if (end > str_info->length) end = str_info->length; state->overlapped = overlapped; state->min_width = pattern->min_width; /* Initialise the getters and setters for the character size. */ state->charsize = str_info->charsize; state->is_unicode = str_info->is_unicode; /* Are we using a buffer object? If so, we need to copy the info. */ state->should_release = str_info->should_release; if (state->should_release) state->view = str_info->view; switch (state->charsize) { case 1: state->char_at = bytes1_char_at; state->set_char_at = bytes1_set_char_at; state->point_to = bytes1_point_to; break; case 2: state->char_at = bytes2_char_at; state->set_char_at = bytes2_set_char_at; state->point_to = bytes2_point_to; break; case 4: state->char_at = bytes4_char_at; state->set_char_at = bytes4_set_char_at; state->point_to = bytes4_point_to; break; default: goto error; } state->encoding = pattern->encoding; state->locale_info = pattern->locale_info; /* The state object contains a reference to the string and also a pointer * to its contents. * * The documentation says that the end of the slice behaves like the end of * the string. */ state->text = str_info->characters; state->text_length = end; state->reverse = (pattern->flags & RE_FLAG_REVERSE) != 0; if (partial) state->partial_side = state->reverse ? RE_PARTIAL_LEFT : RE_PARTIAL_RIGHT; else state->partial_side = RE_PARTIAL_NONE; state->slice_start = start; state->slice_end = state->text_length; state->text_pos = state->reverse ? state->slice_end : state->slice_start; /* Point to the final newline and line separator if it's at the end of the * string, otherwise just -1. */ state->final_newline = -1; state->final_line_sep = -1; final_pos = state->text_length - 1; if (final_pos >= 0) { Py_UCS4 ch; ch = state->char_at(state->text, final_pos); if (ch == 0x0A) { /* The string ends with LF. */ state->final_newline = final_pos; state->final_line_sep = final_pos; /* Does the string end with CR/LF? */ --final_pos; if (final_pos >= 0 && state->char_at(state->text, final_pos) == 0x0D) state->final_line_sep = final_pos; } else { /* The string doesn't end with LF, but it could be another kind of * line separator. */ if (state->encoding->is_line_sep(ch)) state->final_line_sep = final_pos; } } /* If the 'new' behaviour is enabled then split correctly on zero-width * matches. */ state->version_0 = (pattern->flags & RE_FLAG_VERSION1) == 0; state->must_advance = FALSE; state->pattern = pattern; state->string = string; if (pattern->repeat_count) { if (pattern->repeats_storage) { state->repeats = pattern->repeats_storage; pattern->repeats_storage = NULL; } else { state->repeats = (RE_RepeatData*)re_alloc(pattern->repeat_count * sizeof(RE_RepeatData)); if (!state->repeats) goto error; memset(state->repeats, 0, pattern->repeat_count * sizeof(RE_RepeatData)); } } if (pattern->fuzzy_count) { state->fuzzy_guards = (RE_FuzzyGuards*)re_alloc(pattern->fuzzy_count * sizeof(RE_FuzzyGuards)); if (!state->fuzzy_guards) goto error; memset(state->fuzzy_guards, 0, pattern->fuzzy_count * sizeof(RE_FuzzyGuards)); } Py_INCREF(state->pattern); Py_INCREF(state->string); /* Multithreading is allowed during matching when explicitly enabled or on * immutable strings. */ switch (concurrent) { case RE_CONC_NO: state->is_multithreaded = FALSE; break; case RE_CONC_YES: state->is_multithreaded = TRUE; break; default: state->is_multithreaded = PyUnicode_Check(string) || PyBytes_Check(string); break; } /* A state struct can sometimes be shared across threads. In such * instances, if multithreading is enabled we need to protect the state * with a lock (mutex) during matching. */ if (state->is_multithreaded && use_lock) state->lock = PyThread_allocate_lock(); for (i = 0; i < MAX_SEARCH_POSITIONS; i++) state->search_positions[i].start_pos = -1; return TRUE; error: re_dealloc(state->group_call_guard_list); re_dealloc(state->repeats); dealloc_groups(state->groups, pattern->true_group_count); re_dealloc(state->fuzzy_guards); state->repeats = NULL; state->groups = NULL; state->fuzzy_guards = NULL; return FALSE; } /* Checks that the string has the same charsize as the pattern. */ Py_LOCAL_INLINE(BOOL) check_compatible(PatternObject* pattern, BOOL unicode) { if (PyBytes_Check(pattern->pattern)) { if (unicode) { PyErr_SetString(PyExc_TypeError, "cannot use a bytes pattern on a string-like object"); return FALSE; } } else { if (!unicode) { PyErr_SetString(PyExc_TypeError, "cannot use a string pattern on a bytes-like object"); return FALSE; } } return TRUE; } /* Releases the string's buffer, if necessary. */ Py_LOCAL_INLINE(void) release_buffer(RE_StringInfo* str_info) { if (str_info->should_release) PyBuffer_Release(&str_info->view); } /* Initialises a state object. */ Py_LOCAL_INLINE(BOOL) state_init(RE_State* state, PatternObject* pattern, PyObject* string, Py_ssize_t start, Py_ssize_t end, BOOL overlapped, int concurrent, BOOL partial, BOOL use_lock, BOOL visible_captures, BOOL match_all) { RE_StringInfo str_info; /* Get the string to search or match. */ if (!get_string(string, &str_info)) return FALSE; /* If we fail to initialise the state then we need to release the buffer if * the string is a buffer object. */ if (!check_compatible(pattern, str_info.is_unicode)) { release_buffer(&str_info); return FALSE; } if (!state_init_2(state, pattern, string, &str_info, start, end, overlapped, concurrent, partial, use_lock, visible_captures, match_all)) { release_buffer(&str_info); return FALSE; } /* The state has been initialised successfully, so now the state has the * responsibility of releasing the buffer if the string is a buffer object. */ return TRUE; } /* Deallocates repeat data. */ Py_LOCAL_INLINE(void) dealloc_repeats(RE_RepeatData* repeats, size_t repeat_count) { size_t i; if (!repeats) return; for (i = 0; i < repeat_count; i++) { re_dealloc(repeats[i].body_guard_list.spans); re_dealloc(repeats[i].tail_guard_list.spans); } re_dealloc(repeats); } /* Deallocates fuzzy guards. */ Py_LOCAL_INLINE(void) dealloc_fuzzy_guards(RE_FuzzyGuards* guards, size_t fuzzy_count) { size_t i; if (!guards) return; for (i = 0; i < fuzzy_count; i++) { re_dealloc(guards[i].body_guard_list.spans); re_dealloc(guards[i].tail_guard_list.spans); } re_dealloc(guards); } /* Finalises a state object, discarding its contents. */ Py_LOCAL_INLINE(void) state_fini(RE_State* state) { RE_BacktrackBlock* current_backtrack; RE_AtomicBlock* current_atomic; PatternObject* pattern; RE_SavedGroups* saved_groups; RE_SavedRepeats* saved_repeats; RE_GroupCallFrame* frame; size_t i; /* Discard the lock (mutex) if there's one. */ if (state->lock) PyThread_free_lock(state->lock); /* Deallocate the backtrack blocks. */ current_backtrack = state->backtrack_block.next; while (current_backtrack) { RE_BacktrackBlock* next; next = current_backtrack->next; re_dealloc(current_backtrack); state->backtrack_allocated -= RE_BACKTRACK_BLOCK_SIZE; current_backtrack = next; } /* Deallocate the atomic blocks. */ current_atomic = state->current_atomic_block; while (current_atomic) { RE_AtomicBlock* next; next = current_atomic->next; re_dealloc(current_atomic); current_atomic = next; } state->current_atomic_block = NULL; pattern = state->pattern; saved_groups = state->first_saved_groups; while (saved_groups) { RE_SavedGroups* next; next = saved_groups->next; re_dealloc(saved_groups->spans); re_dealloc(saved_groups->counts); re_dealloc(saved_groups); saved_groups = next; } saved_repeats = state->first_saved_repeats; while (saved_repeats) { RE_SavedRepeats* next; next = saved_repeats->next; dealloc_repeats(saved_repeats->repeats, pattern->repeat_count); re_dealloc(saved_repeats); saved_repeats = next; } if (state->best_match_groups) dealloc_groups(state->best_match_groups, pattern->true_group_count); if (pattern->groups_storage) dealloc_groups(state->groups, pattern->true_group_count); else pattern->groups_storage = state->groups; if (pattern->repeats_storage) dealloc_repeats(state->repeats, pattern->repeat_count); else pattern->repeats_storage = state->repeats; frame = state->first_group_call_frame; while (frame) { RE_GroupCallFrame* next; next = frame->next; dealloc_groups(frame->groups, pattern->true_group_count); dealloc_repeats(frame->repeats, pattern->repeat_count); re_dealloc(frame); frame = next; } for (i = 0; i < pattern->call_ref_info_count; i++) re_dealloc(state->group_call_guard_list[i].spans); if (state->group_call_guard_list) re_dealloc(state->group_call_guard_list); if (state->fuzzy_guards) dealloc_fuzzy_guards(state->fuzzy_guards, pattern->fuzzy_count); Py_DECREF(state->pattern); Py_DECREF(state->string); if (state->should_release) PyBuffer_Release(&state->view); } /* Converts a string index to an integer. * * If the index is None then the default will be returned. */ Py_LOCAL_INLINE(Py_ssize_t) as_string_index(PyObject* obj, Py_ssize_t def) { Py_ssize_t value; if (obj == Py_None) return def; value = PyLong_AsLong(obj); if (value != -1 || !PyErr_Occurred()) return value; set_error(RE_ERROR_INDEX, NULL); return 0; } /* Deallocates a MatchObject. */ static void match_dealloc(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; Py_XDECREF(self->string); Py_XDECREF(self->substring); Py_DECREF(self->pattern); if (self->groups) re_dealloc(self->groups); Py_XDECREF(self->regs); PyObject_DEL(self); } #if PY_VERSION_HEX >= 0x03040000 /* Ensures that the string is the immutable Unicode string or bytestring. * DECREFs the original string if a copy is returned. */ Py_LOCAL_INLINE(PyObject*) ensure_immutable(PyObject* string) { PyObject* new_string; if (PyUnicode_CheckExact(string) || PyBytes_CheckExact(string)) return string; if (PyUnicode_Check(string)) new_string = PyUnicode_FromObject(string); else new_string = PyBytes_FromObject(string); Py_DECREF(string); return new_string; } #endif /* Restricts a value to a range. */ Py_LOCAL_INLINE(Py_ssize_t) limited_range(Py_ssize_t value, Py_ssize_t lower, Py_ssize_t upper) { if (value < lower) return lower; if (value > upper) return upper; return value; } /* Gets a slice from a Unicode string. */ Py_LOCAL_INLINE(PyObject*) unicode_slice(PyObject* string, Py_ssize_t start, Py_ssize_t end) { Py_ssize_t length; #if PY_VERSION_HEX < 0x03030000 Py_UNICODE* buffer; #endif length = PyUnicode_GET_SIZE(string); start = limited_range(start, 0, length); end = limited_range(end, 0, length); #if PY_VERSION_HEX >= 0x03030000 return PyUnicode_Substring(string, start, end); #else buffer = PyUnicode_AsUnicode(string); return PyUnicode_FromUnicode(buffer + start, end - start); #endif } /* Gets a slice from a bytestring. */ Py_LOCAL_INLINE(PyObject*) bytes_slice(PyObject* string, Py_ssize_t start, Py_ssize_t end) { Py_ssize_t length; char* buffer; length = PyBytes_GET_SIZE(string); start = limited_range(start, 0, length); end = limited_range(end, 0, length); buffer = PyBytes_AsString(string); return PyBytes_FromStringAndSize(buffer + start, end - start); } /* Gets a slice from a string, returning either a Unicode string or a * bytestring. */ Py_LOCAL_INLINE(PyObject*) get_slice(PyObject* string, Py_ssize_t start, Py_ssize_t end) { if (PyUnicode_Check(string)) return unicode_slice(string, start, end); if (PyBytes_Check(string)) return bytes_slice(string, start, end); #if PY_VERSION_HEX >= 0x03040000 return ensure_immutable(PySequence_GetSlice(string, start, end)); #else return PySequence_GetSlice(string, start, end); #endif } /* Gets a MatchObject's group by integer index. */ static PyObject* match_get_group_by_index(MatchObject* self, Py_ssize_t index, PyObject* def) { RE_GroupSpan* span; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) return get_slice(self->substring, self->match_start - self->substring_offset, self->match_end - self->substring_offset); /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ span = &self->groups[index - 1].span; if (span->start < 0 || span->end < 0) { /* Return default value if the string or group is undefined. */ Py_INCREF(def); return def; } return get_slice(self->substring, span->start - self->substring_offset, span->end - self->substring_offset); } /* Gets a MatchObject's start by integer index. */ static PyObject* match_get_start_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupSpan* span; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) return Py_BuildValue("n", self->match_start); /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ span = &self->groups[index - 1].span; return Py_BuildValue("n", span->start); } /* Gets a MatchObject's starts by integer index. */ static PyObject* match_get_starts_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupData* group; PyObject* result; PyObject* item; size_t i; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) { result = PyList_New(1); if (!result) return NULL; item = Py_BuildValue("n", self->match_start); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, 0, item); return result; } /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ group = &self->groups[index - 1]; result = PyList_New((Py_ssize_t)group->capture_count); if (!result) return NULL; for (i = 0; i < group->capture_count; i++) { item = Py_BuildValue("n", group->captures[i].start); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, i, item); } return result; error: Py_DECREF(result); return NULL; } /* Gets a MatchObject's end by integer index. */ static PyObject* match_get_end_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupSpan* span; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) return Py_BuildValue("n", self->match_end); /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ span = &self->groups[index - 1].span; return Py_BuildValue("n", span->end); } /* Gets a MatchObject's ends by integer index. */ static PyObject* match_get_ends_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupData* group; PyObject* result; PyObject* item; size_t i; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) { result = PyList_New(1); if (!result) return NULL; item = Py_BuildValue("n", self->match_end); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, 0, item); return result; } /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ group = &self->groups[index - 1]; result = PyList_New((Py_ssize_t)group->capture_count); if (!result) return NULL; for (i = 0; i < group->capture_count; i++) { item = Py_BuildValue("n", group->captures[i].end); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, i, item); } return result; error: Py_DECREF(result); return NULL; } /* Gets a MatchObject's span by integer index. */ static PyObject* match_get_span_by_index(MatchObject* self, Py_ssize_t index) { RE_GroupSpan* span; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) return Py_BuildValue("nn", self->match_start, self->match_end); /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ span = &self->groups[index - 1].span; return Py_BuildValue("nn", span->start, span->end); } /* Gets a MatchObject's spans by integer index. */ static PyObject* match_get_spans_by_index(MatchObject* self, Py_ssize_t index) { PyObject* result; PyObject* item; RE_GroupData* group; size_t i; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) { result = PyList_New(1); if (!result) return NULL; item = Py_BuildValue("nn", self->match_start, self->match_end); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, 0, item); return result; } /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ group = &self->groups[index - 1]; result = PyList_New((Py_ssize_t)group->capture_count); if (!result) return NULL; for (i = 0; i < group->capture_count; i++) { item = Py_BuildValue("nn", group->captures[i].start, group->captures[i].end); if (!item) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, i, item); } return result; error: Py_DECREF(result); return NULL; } /* Gets a MatchObject's captures by integer index. */ static PyObject* match_get_captures_by_index(MatchObject* self, Py_ssize_t index) { PyObject* result; PyObject* slice; RE_GroupData* group; size_t i; if (index < 0 || (size_t)index > self->group_count) { /* Raise error if we were given a bad group number. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } if (index == 0) { result = PyList_New(1); if (!result) return NULL; slice = get_slice(self->substring, self->match_start - self->substring_offset, self->match_end - self->substring_offset); if (!slice) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, 0, slice); return result; } /* Capture group indexes are 1-based (excluding group 0, which is the * entire matched string). */ group = &self->groups[index - 1]; result = PyList_New((Py_ssize_t)group->capture_count); if (!result) return NULL; for (i = 0; i < group->capture_count; i++) { slice = get_slice(self->substring, group->captures[i].start - self->substring_offset, group->captures[i].end - self->substring_offset); if (!slice) goto error; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(result, i, slice); } return result; error: Py_DECREF(result); return NULL; } /* Converts a group index to an integer. */ Py_LOCAL_INLINE(Py_ssize_t) as_group_index(PyObject* obj) { Py_ssize_t value; value = PyLong_AsLong(obj); if (value != -1 || !PyErr_Occurred()) return value; set_error(RE_ERROR_INDEX, NULL); return -1; } /* Gets a MatchObject's group index. * * The supplied index can be an integer or a string (group name) object. */ Py_LOCAL_INLINE(Py_ssize_t) match_get_group_index(MatchObject* self, PyObject* index, BOOL allow_neg) { Py_ssize_t group; /* Is the index an integer? */ group = as_group_index(index); if (group != -1 || !PyErr_Occurred()) { Py_ssize_t min_group = 0; /* Adjust negative indices where valid and allowed. */ if (group < 0 && allow_neg) { group += (Py_ssize_t)self->group_count + 1; min_group = 1; } if (min_group <= group && (size_t)group <= self->group_count) return group; return -1; } /* The index might be a group name. */ if (self->pattern->groupindex) { /* Look up the name. */ PyErr_Clear(); index = PyObject_GetItem(self->pattern->groupindex, index); if (index) { /* Check that we have an integer. */ group = as_group_index(index); Py_DECREF(index); if (group != -1 || !PyErr_Occurred()) return group; } } PyErr_Clear(); return -1; } /* Gets a MatchObject's group by object index. */ Py_LOCAL_INLINE(PyObject*) match_get_group(MatchObject* self, PyObject* index, PyObject* def, BOOL allow_neg) { /* Check that the index is an integer or a string. */ if (PyLong_Check(index) || PyUnicode_Check(index) || PyBytes_Check(index)) return match_get_group_by_index(self, match_get_group_index(self, index, allow_neg), def); set_error(RE_ERROR_GROUP_INDEX_TYPE, index); return NULL; } /* Gets info from a MatchObject by object index. */ Py_LOCAL_INLINE(PyObject*) get_by_arg(MatchObject* self, PyObject* index, RE_GetByIndexFunc get_by_index) { /* Check that the index is an integer or a string. */ if (PyLong_Check(index) || PyUnicode_Check(index) || PyBytes_Check(index)) return get_by_index(self, match_get_group_index(self, index, FALSE)); set_error(RE_ERROR_GROUP_INDEX_TYPE, index); return NULL; } /* MatchObject's 'group' method. */ static PyObject* match_group(MatchObject* self, PyObject* args) { Py_ssize_t size; PyObject* result; Py_ssize_t i; size = PyTuple_GET_SIZE(args); switch (size) { case 0: /* group() */ result = match_get_group_by_index(self, 0, Py_None); break; case 1: /* group(x). PyTuple_GET_ITEM borrows the reference. */ result = match_get_group(self, PyTuple_GET_ITEM(args, 0), Py_None, FALSE); break; default: /* group(x, y, z, ...) */ /* Fetch multiple items. */ result = PyTuple_New(size); if (!result) return NULL; for (i = 0; i < size; i++) { PyObject* item; /* PyTuple_GET_ITEM borrows the reference. */ item = match_get_group(self, PyTuple_GET_ITEM(args, i), Py_None, FALSE); if (!item) { Py_DECREF(result); return NULL; } /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(result, i, item); } break; } return result; } /* Generic method for getting info from a MatchObject. */ Py_LOCAL_INLINE(PyObject*) get_from_match(MatchObject* self, PyObject* args, RE_GetByIndexFunc get_by_index) { Py_ssize_t size; PyObject* result; Py_ssize_t i; size = PyTuple_GET_SIZE(args); switch (size) { case 0: /* get() */ result = get_by_index(self, 0); break; case 1: /* get(x). PyTuple_GET_ITEM borrows the reference. */ result = get_by_arg(self, PyTuple_GET_ITEM(args, 0), get_by_index); break; default: /* get(x, y, z, ...) */ /* Fetch multiple items. */ result = PyTuple_New(size); if (!result) return NULL; for (i = 0; i < size; i++) { PyObject* item; /* PyTuple_GET_ITEM borrows the reference. */ item = get_by_arg(self, PyTuple_GET_ITEM(args, i), get_by_index); if (!item) { Py_DECREF(result); return NULL; } /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(result, i, item); } break; } return result; } /* MatchObject's 'start' method. */ static PyObject* match_start(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_start_by_index); } /* MatchObject's 'starts' method. */ static PyObject* match_starts(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_starts_by_index); } /* MatchObject's 'end' method. */ static PyObject* match_end(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_end_by_index); } /* MatchObject's 'ends' method. */ static PyObject* match_ends(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_ends_by_index); } /* MatchObject's 'span' method. */ static PyObject* match_span(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_span_by_index); } /* MatchObject's 'spans' method. */ static PyObject* match_spans(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_spans_by_index); } /* MatchObject's 'captures' method. */ static PyObject* match_captures(MatchObject* self, PyObject* args) { return get_from_match(self, args, match_get_captures_by_index); } /* MatchObject's 'groups' method. */ static PyObject* match_groups(MatchObject* self, PyObject* args, PyObject* kwargs) { PyObject* result; size_t g; PyObject* def = Py_None; static char* kwlist[] = { "default", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O:groups", kwlist, &def)) return NULL; result = PyTuple_New((Py_ssize_t)self->group_count); if (!result) return NULL; /* Group 0 is the entire matched portion of the string. */ for (g = 0; g < self->group_count; g++) { PyObject* item; item = match_get_group_by_index(self, (Py_ssize_t)g + 1, def); if (!item) goto error; /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(result, g, item); } return result; error: Py_DECREF(result); return NULL; } /* MatchObject's 'groupdict' method. */ static PyObject* match_groupdict(MatchObject* self, PyObject* args, PyObject* kwargs) { PyObject* result; PyObject* keys; Py_ssize_t g; PyObject* def = Py_None; static char* kwlist[] = { "default", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O:groupdict", kwlist, &def)) return NULL; result = PyDict_New(); if (!result || !self->pattern->groupindex) return result; keys = PyMapping_Keys(self->pattern->groupindex); if (!keys) goto failed; for (g = 0; g < PyList_GET_SIZE(keys); g++) { PyObject* key; PyObject* value; int status; /* PyList_GET_ITEM borrows a reference. */ key = PyList_GET_ITEM(keys, g); if (!key) goto failed; value = match_get_group(self, key, def, FALSE); if (!value) goto failed; status = PyDict_SetItem(result, key, value); Py_DECREF(value); if (status < 0) goto failed; } Py_DECREF(keys); return result; failed: Py_XDECREF(keys); Py_DECREF(result); return NULL; } /* MatchObject's 'capturesdict' method. */ static PyObject* match_capturesdict(MatchObject* self) { PyObject* result; PyObject* keys; Py_ssize_t g; result = PyDict_New(); if (!result || !self->pattern->groupindex) return result; keys = PyMapping_Keys(self->pattern->groupindex); if (!keys) goto failed; for (g = 0; g < PyList_GET_SIZE(keys); g++) { PyObject* key; Py_ssize_t group; PyObject* captures; int status; /* PyList_GET_ITEM borrows a reference. */ key = PyList_GET_ITEM(keys, g); if (!key) goto failed; group = match_get_group_index(self, key, FALSE); if (group < 0) goto failed; captures = match_get_captures_by_index(self, group); if (!captures) goto failed; status = PyDict_SetItem(result, key, captures); Py_DECREF(captures); if (status < 0) goto failed; } Py_DECREF(keys); return result; failed: Py_XDECREF(keys); Py_DECREF(result); return NULL; } /* Gets a Python object by name from a named module. */ Py_LOCAL_INLINE(PyObject*) get_object(char* module_name, char* object_name) { PyObject* module; PyObject* object; module = PyImport_ImportModule(module_name); if (!module) return NULL; object = PyObject_GetAttrString(module, object_name); Py_DECREF(module); return object; } /* Calls a function in a module. */ Py_LOCAL_INLINE(PyObject*) call(char* module_name, char* function_name, PyObject* args) { PyObject* function; PyObject* result; if (!args) return NULL; function = get_object(module_name, function_name); if (!function) return NULL; result = PyObject_CallObject(function, args); Py_DECREF(function); Py_DECREF(args); return result; } /* Gets a replacement item from the replacement list. * * The replacement item could be a string literal or a group. */ Py_LOCAL_INLINE(PyObject*) get_match_replacement(MatchObject* self, PyObject* item, size_t group_count) { Py_ssize_t index; if (PyUnicode_Check(item) || PyBytes_Check(item)) { /* It's a literal, which can be added directly to the list. */ #if PY_VERSION_HEX >= 0x03040000 /* ensure_immutable will DECREF the original item if it has to make an * immutable copy, but that original item might have a borrowed * reference, so we must INCREF it first in order to ensure it won't be * destroyed. */ Py_INCREF(item); item = ensure_immutable(item); #else Py_INCREF(item); #endif return item; } /* Is it a group reference? */ index = as_group_index(item); if (index == -1 && PyErr_Occurred()) { /* Not a group either! */ set_error(RE_ERROR_REPLACEMENT, NULL); return NULL; } if (index == 0) { /* The entire matched portion of the string. */ return get_slice(self->substring, self->match_start - self->substring_offset, self->match_end - self->substring_offset); } else if (index >= 1 && (size_t)index <= group_count) { /* A group. If it didn't match then return None instead. */ RE_GroupData* group; group = &self->groups[index - 1]; if (group->capture_count > 0) return get_slice(self->substring, group->span.start - self->substring_offset, group->span.end - self->substring_offset); else { Py_INCREF(Py_None); return Py_None; } } else { /* No such group. */ set_error(RE_ERROR_NO_SUCH_GROUP, NULL); return NULL; } } /* Initialises the join list. */ Py_LOCAL_INLINE(void) init_join_list(JoinInfo* join_info, BOOL reversed, BOOL is_unicode) { join_info->list = NULL; join_info->item = NULL; join_info->reversed = reversed; join_info->is_unicode = is_unicode; } /* Adds an item to the join list. */ Py_LOCAL_INLINE(int) add_to_join_list(JoinInfo* join_info, PyObject* item) { PyObject* new_item; int status; if (join_info->is_unicode) { #if PY_VERSION_HEX >= 0x03040000 if (PyUnicode_CheckExact(item)) { #else if (PyUnicode_Check(item)) { #endif new_item = item; Py_INCREF(new_item); } else { new_item = PyUnicode_FromObject(item); if (!new_item) { set_error(RE_ERROR_NOT_UNICODE, item); return RE_ERROR_NOT_UNICODE; } } } else { #if PY_VERSION_HEX >= 0x03040000 if (PyBytes_CheckExact(item)) { #else if (PyBytes_Check(item)) { #endif new_item = item; Py_INCREF(new_item); } else { new_item = PyBytes_FromObject(item); if (!new_item) { set_error(RE_ERROR_NOT_BYTES, item); return RE_ERROR_NOT_BYTES; } } } /* If the list already exists then just add the item to it. */ if (join_info->list) { status = PyList_Append(join_info->list, new_item); if (status < 0) goto error; Py_DECREF(new_item); return status; } /* If we already have an item then we now have 2(!) and we need to put them * into a list. */ if (join_info->item) { join_info->list = PyList_New(2); if (!join_info->list) { status = RE_ERROR_MEMORY; goto error; } /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(join_info->list, 0, join_info->item); join_info->item = NULL; /* PyList_SET_ITEM borrows the reference. */ PyList_SET_ITEM(join_info->list, 1, new_item); return 0; } /* This is the first item. */ join_info->item = new_item; return 0; error: Py_DECREF(new_item); set_error(status, NULL); return status; } /* Clears the join list. */ Py_LOCAL_INLINE(void) clear_join_list(JoinInfo* join_info) { Py_XDECREF(join_info->list); Py_XDECREF(join_info->item); } /* Joins together a list of strings for pattern_subx. */ Py_LOCAL_INLINE(PyObject*) join_list_info(JoinInfo* join_info) { /* If the list already exists then just do the join. */ if (join_info->list) { PyObject* joiner; PyObject* result; if (join_info->reversed) /* The list needs to be reversed before being joined. */ PyList_Reverse(join_info->list); if (join_info->is_unicode) { /* Concatenate the Unicode strings. */ joiner = PyUnicode_FromUnicode(NULL, 0); if (!joiner) { clear_join_list(join_info); return NULL; } result = PyUnicode_Join(joiner, join_info->list); } else { joiner = PyBytes_FromString(""); if (!joiner) { clear_join_list(join_info); return NULL; } /* Concatenate the bytestrings. */ result = _PyBytes_Join(joiner, join_info->list); } Py_DECREF(joiner); clear_join_list(join_info); return result; } /* If we have only 1 item, so we'll just return it. */ if (join_info->item) return join_info->item; /* There are no items, so return an empty string. */ if (join_info->is_unicode) return PyUnicode_FromUnicode(NULL, 0); else return PyBytes_FromString(""); } /* Checks whether a string replacement is a literal. * * To keep it simple we'll say that a literal is a string which can be used * as-is. * * Returns its length if it is a literal, otherwise -1. */ Py_LOCAL_INLINE(Py_ssize_t) check_replacement_string(PyObject* str_replacement, unsigned char special_char) { RE_StringInfo str_info; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); Py_ssize_t pos; if (!get_string(str_replacement, &str_info)) return -1; switch (str_info.charsize) { case 1: char_at = bytes1_char_at; break; case 2: char_at = bytes2_char_at; break; case 4: char_at = bytes4_char_at; break; default: release_buffer(&str_info); return -1; } for (pos = 0; pos < str_info.length; pos++) { if (char_at(str_info.characters, pos) == special_char) { release_buffer(&str_info); return -1; } } release_buffer(&str_info); return str_info.length; } /* MatchObject's 'expand' method. */ static PyObject* match_expand(MatchObject* self, PyObject* str_template) { Py_ssize_t literal_length; PyObject* replacement; JoinInfo join_info; Py_ssize_t size; Py_ssize_t i; /* Is the template just a literal? */ literal_length = check_replacement_string(str_template, '\\'); if (literal_length >= 0) { /* It's a literal. */ Py_INCREF(str_template); return str_template; } /* Hand the template to the template compiler. */ replacement = call(RE_MODULE, "_compile_replacement_helper", PyTuple_Pack(2, self->pattern, str_template)); if (!replacement) return NULL; init_join_list(&join_info, FALSE, PyUnicode_Check(self->string)); /* Add each part of the template to the list. */ size = PyList_GET_SIZE(replacement); for (i = 0; i < size; i++) { PyObject* item; PyObject* str_item; /* PyList_GET_ITEM borrows a reference. */ item = PyList_GET_ITEM(replacement, i); str_item = get_match_replacement(self, item, self->group_count); if (!str_item) goto error; /* Add to the list. */ if (str_item == Py_None) Py_DECREF(str_item); else { int status; status = add_to_join_list(&join_info, str_item); Py_DECREF(str_item); if (status < 0) goto error; } } Py_DECREF(replacement); /* Convert the list to a single string (also cleans up join_info). */ return join_list_info(&join_info); error: clear_join_list(&join_info); Py_DECREF(replacement); return NULL; } /* Gets a MatchObject's group dictionary. */ Py_LOCAL_INLINE(PyObject*) match_get_group_dict(MatchObject* self) { PyObject* result; PyObject* keys; Py_ssize_t g; result = PyDict_New(); if (!result || !self->pattern->groupindex) return result; keys = PyMapping_Keys(self->pattern->groupindex); if (!keys) goto failed; for (g = 0; g < PyList_GET_SIZE(keys); g++) { PyObject* key; PyObject* value; int status; /* PyList_GET_ITEM borrows a reference. */ key = PyList_GET_ITEM(keys, g); if (!key) goto failed; value = match_get_group(self, key, Py_None, FALSE); if (!value) goto failed; status = PyDict_SetItem(result, key, value); Py_DECREF(value); if (status < 0) goto failed; } Py_DECREF(keys); return result; failed: Py_XDECREF(keys); Py_DECREF(result); return NULL; } static PyTypeObject Capture_Type = { PyVarObject_HEAD_INIT(NULL,0) "_" RE_MODULE "." "Capture", sizeof(MatchObject) }; /* Creates a new CaptureObject. */ Py_LOCAL_INLINE(PyObject*) make_capture_object(MatchObject** match_indirect, Py_ssize_t index) { CaptureObject* capture; capture = PyObject_NEW(CaptureObject, &Capture_Type); if (!capture) return NULL; capture->group_index = index; capture->match_indirect = match_indirect; return (PyObject*)capture; } /* Makes a MatchObject's capture dictionary. */ Py_LOCAL_INLINE(PyObject*) make_capture_dict(MatchObject* match, MatchObject** match_indirect) { PyObject* result; PyObject* keys; PyObject* values = NULL; Py_ssize_t g; result = PyDict_New(); if (!result) return result; keys = PyMapping_Keys(match->pattern->groupindex); if (!keys) goto failed; values = PyMapping_Values(match->pattern->groupindex); if (!values) goto failed; for (g = 0; g < PyList_GET_SIZE(keys); g++) { PyObject* key; PyObject* value; Py_ssize_t v; int status; /* PyList_GET_ITEM borrows a reference. */ key = PyList_GET_ITEM(keys, g); if (!key) goto failed; /* PyList_GET_ITEM borrows a reference. */ value = PyList_GET_ITEM(values, g); if (!value) goto failed; v = PyLong_AsLong(value); if (v == -1 && PyErr_Occurred()) goto failed; value = make_capture_object(match_indirect, v); if (!value) goto failed; status = PyDict_SetItem(result, key, value); Py_DECREF(value); if (status < 0) goto failed; } Py_DECREF(values); Py_DECREF(keys); return result; failed: Py_XDECREF(values); Py_XDECREF(keys); Py_DECREF(result); return NULL; } /* MatchObject's 'expandf' method. */ static PyObject* match_expandf(MatchObject* self, PyObject* str_template) { PyObject* format_func; PyObject* args = NULL; size_t g; PyObject* kwargs = NULL; PyObject* result; format_func = PyObject_GetAttrString(str_template, "format"); if (!format_func) return NULL; args = PyTuple_New((Py_ssize_t)self->group_count + 1); if (!args) goto error; for (g = 0; g < self->group_count + 1; g++) /* PyTuple_SetItem borrows the reference. */ PyTuple_SetItem(args, (Py_ssize_t)g, make_capture_object(&self, (Py_ssize_t)g)); kwargs = make_capture_dict(self, &self); if (!kwargs) goto error; result = PyObject_Call(format_func, args, kwargs); Py_DECREF(kwargs); Py_DECREF(args); Py_DECREF(format_func); return result; error: Py_XDECREF(args); Py_DECREF(format_func); return NULL; } Py_LOCAL_INLINE(PyObject*) make_match_copy(MatchObject* self); /* MatchObject's '__copy__' method. */ static PyObject* match_copy(MatchObject* self, PyObject *unused) { return make_match_copy(self); } /* MatchObject's '__deepcopy__' method. */ static PyObject* match_deepcopy(MatchObject* self, PyObject* memo) { return make_match_copy(self); } /* MatchObject's 'regs' attribute. */ static PyObject* match_regs(MatchObject* self) { PyObject* regs; PyObject* item; size_t g; regs = PyTuple_New((Py_ssize_t)self->group_count + 1); if (!regs) return NULL; item = Py_BuildValue("nn", self->match_start, self->match_end); if (!item) goto error; /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(regs, 0, item); for (g = 0; g < self->group_count; g++) { RE_GroupSpan* span; span = &self->groups[g].span; item = Py_BuildValue("nn", span->start, span->end); if (!item) goto error; /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(regs, g + 1, item); } Py_INCREF(regs); self->regs = regs; return regs; error: Py_DECREF(regs); return NULL; } /* MatchObject's slice method. */ Py_LOCAL_INLINE(PyObject*) match_get_group_slice(MatchObject* self, PyObject* slice) { Py_ssize_t start; Py_ssize_t end; Py_ssize_t step; Py_ssize_t slice_length; #if PY_VERSION_HEX >= 0x03020000 if (PySlice_GetIndicesEx(slice, (Py_ssize_t)self->group_count + 1, &start, &end, &step, &slice_length) < 0) #else if (PySlice_GetIndicesEx((PySliceObject*)slice, (Py_ssize_t)self->group_count + 1, &start, &end, &step, &slice_length) < 0) #endif return NULL; if (slice_length <= 0) return PyTuple_New(0); else { PyObject* result; Py_ssize_t cur; Py_ssize_t i; result = PyTuple_New(slice_length); if (!result) return NULL; cur = start; for (i = 0; i < slice_length; i++) { /* PyTuple_SetItem borrows the reference. */ PyTuple_SetItem(result, i, match_get_group_by_index(self, cur, Py_None)); cur += step; } return result; } } /* MatchObject's length method. */ Py_LOCAL_INLINE(Py_ssize_t) match_length(MatchObject* self) { return (Py_ssize_t)self->group_count + 1; } /* MatchObject's '__getitem__' method. */ static PyObject* match_getitem(MatchObject* self, PyObject* item) { if (PySlice_Check(item)) return match_get_group_slice(self, item); return match_get_group(self, item, Py_None, TRUE); } /* Determines the portion of the target string which is covered by the group * captures. */ Py_LOCAL_INLINE(void) determine_target_substring(MatchObject* match, Py_ssize_t* slice_start, Py_ssize_t* slice_end) { Py_ssize_t start; Py_ssize_t end; size_t g; start = match->pos; end = match->endpos; for (g = 0; g < match->group_count; g++) { RE_GroupSpan* span; size_t c; span = &match->groups[g].span; if (span->start >= 0 && span->start < start) start = span->start; if (span->end >= 0 && span->end > end) end = span->end; for (c = 0; c < match->groups[g].capture_count; c++) { RE_GroupSpan* span; span = match->groups[g].captures; if (span->start >= 0 && span->start < start) start = span->start; if (span->end >= 0 && span->end > end) end = span->end; } } *slice_start = start; *slice_end = end; } /* MatchObject's 'detach_string' method. */ static PyObject* match_detach_string(MatchObject* self, PyObject* unused) { if (self->string) { Py_ssize_t start; Py_ssize_t end; PyObject* substring; determine_target_substring(self, &start, &end); substring = get_slice(self->string, start, end); if (substring) { Py_XDECREF(self->substring); self->substring = substring; self->substring_offset = start; Py_DECREF(self->string); self->string = NULL; } } Py_INCREF(Py_None); return Py_None; } /* The documentation of a MatchObject. */ PyDoc_STRVAR(match_group_doc, "group([group1, ...]) --> string or tuple of strings.\n\ Return one or more subgroups of the match. If there is a single argument,\n\ the result is a single string, or None if the group did not contribute to\n\ the match; if there are multiple arguments, the result is a tuple with one\n\ item per argument; if there are no arguments, the whole match is returned.\n\ Group 0 is the whole match."); PyDoc_STRVAR(match_start_doc, "start([group1, ...]) --> int or tuple of ints.\n\ Return the index of the start of one or more subgroups of the match. If\n\ there is a single argument, the result is an index, or -1 if the group did\n\ not contribute to the match; if there are multiple arguments, the result is\n\ a tuple with one item per argument; if there are no arguments, the index of\n\ the start of the whole match is returned. Group 0 is the whole match."); PyDoc_STRVAR(match_end_doc, "end([group1, ...]) --> int or tuple of ints.\n\ Return the index of the end of one or more subgroups of the match. If there\n\ is a single argument, the result is an index, or -1 if the group did not\n\ contribute to the match; if there are multiple arguments, the result is a\n\ tuple with one item per argument; if there are no arguments, the index of\n\ the end of the whole match is returned. Group 0 is the whole match."); PyDoc_STRVAR(match_span_doc, "span([group1, ...]) --> 2-tuple of int or tuple of 2-tuple of ints.\n\ Return the span (a 2-tuple of the indices of the start and end) of one or\n\ more subgroups of the match. If there is a single argument, the result is a\n\ span, or (-1, -1) if the group did not contribute to the match; if there are\n\ multiple arguments, the result is a tuple with one item per argument; if\n\ there are no arguments, the span of the whole match is returned. Group 0 is\n\ the whole match."); PyDoc_STRVAR(match_groups_doc, "groups(default=None) --> tuple of strings.\n\ Return a tuple containing all the subgroups of the match. The argument is\n\ the default for groups that did not participate in the match."); PyDoc_STRVAR(match_groupdict_doc, "groupdict(default=None) --> dict.\n\ Return a dictionary containing all the named subgroups of the match, keyed\n\ by the subgroup name. The argument is the value to be given for groups that\n\ did not participate in the match."); PyDoc_STRVAR(match_capturesdict_doc, "capturesdict() --> dict.\n\ Return a dictionary containing the captures of all the named subgroups of the\n\ match, keyed by the subgroup name."); PyDoc_STRVAR(match_expand_doc, "expand(template) --> string.\n\ Return the string obtained by doing backslash substitution on the template,\n\ as done by the sub() method."); PyDoc_STRVAR(match_expandf_doc, "expandf(format) --> string.\n\ Return the string obtained by using the format, as done by the subf()\n\ method."); PyDoc_STRVAR(match_captures_doc, "captures([group1, ...]) --> list of strings or tuple of list of strings.\n\ Return the captures of one or more subgroups of the match. If there is a\n\ single argument, the result is a list of strings; if there are multiple\n\ arguments, the result is a tuple of lists with one item per argument; if\n\ there are no arguments, the captures of the whole match is returned. Group\n\ 0 is the whole match."); PyDoc_STRVAR(match_starts_doc, "starts([group1, ...]) --> list of ints or tuple of list of ints.\n\ Return the indices of the starts of the captures of one or more subgroups of\n\ the match. If there is a single argument, the result is a list of indices;\n\ if there are multiple arguments, the result is a tuple of lists with one\n\ item per argument; if there are no arguments, the indices of the starts of\n\ the captures of the whole match is returned. Group 0 is the whole match."); PyDoc_STRVAR(match_ends_doc, "ends([group1, ...]) --> list of ints or tuple of list of ints.\n\ Return the indices of the ends of the captures of one or more subgroups of\n\ the match. If there is a single argument, the result is a list of indices;\n\ if there are multiple arguments, the result is a tuple of lists with one\n\ item per argument; if there are no arguments, the indices of the ends of the\n\ captures of the whole match is returned. Group 0 is the whole match."); PyDoc_STRVAR(match_spans_doc, "spans([group1, ...]) --> list of 2-tuple of ints or tuple of list of 2-tuple of ints.\n\ Return the spans (a 2-tuple of the indices of the start and end) of the\n\ captures of one or more subgroups of the match. If there is a single\n\ argument, the result is a list of spans; if there are multiple arguments,\n\ the result is a tuple of lists with one item per argument; if there are no\n\ arguments, the spans of the captures of the whole match is returned. Group\n\ 0 is the whole match."); PyDoc_STRVAR(match_detach_string_doc, "detach_string()\n\ Detaches the target string from the match object. The 'string' attribute\n\ will become None."); /* MatchObject's methods. */ static PyMethodDef match_methods[] = { {"group", (PyCFunction)match_group, METH_VARARGS, match_group_doc}, {"start", (PyCFunction)match_start, METH_VARARGS, match_start_doc}, {"end", (PyCFunction)match_end, METH_VARARGS, match_end_doc}, {"span", (PyCFunction)match_span, METH_VARARGS, match_span_doc}, {"groups", (PyCFunction)match_groups, METH_VARARGS|METH_KEYWORDS, match_groups_doc}, {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS|METH_KEYWORDS, match_groupdict_doc}, {"capturesdict", (PyCFunction)match_capturesdict, METH_NOARGS, match_capturesdict_doc}, {"expand", (PyCFunction)match_expand, METH_O, match_expand_doc}, {"expandf", (PyCFunction)match_expandf, METH_O, match_expandf_doc}, {"captures", (PyCFunction)match_captures, METH_VARARGS, match_captures_doc}, {"starts", (PyCFunction)match_starts, METH_VARARGS, match_starts_doc}, {"ends", (PyCFunction)match_ends, METH_VARARGS, match_ends_doc}, {"spans", (PyCFunction)match_spans, METH_VARARGS, match_spans_doc}, {"detach_string", (PyCFunction)match_detach_string, METH_NOARGS, match_detach_string_doc}, {"__copy__", (PyCFunction)match_copy, METH_NOARGS}, {"__deepcopy__", (PyCFunction)match_deepcopy, METH_O}, {"__getitem__", (PyCFunction)match_getitem, METH_O|METH_COEXIST}, {NULL, NULL} }; PyDoc_STRVAR(match_doc, "Match object"); /* MatchObject's 'lastindex' attribute. */ static PyObject* match_lastindex(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; if (self->lastindex >= 0) return Py_BuildValue("n", self->lastindex); Py_INCREF(Py_None); return Py_None; } /* MatchObject's 'lastgroup' attribute. */ static PyObject* match_lastgroup(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; if (self->pattern->indexgroup && self->lastgroup >= 0) { PyObject* index; PyObject* result; index = Py_BuildValue("n", self->lastgroup); /* PyDict_GetItem returns borrows a reference. */ result = PyDict_GetItem(self->pattern->indexgroup, index); Py_DECREF(index); if (result) { Py_INCREF(result); return result; } PyErr_Clear(); } Py_INCREF(Py_None); return Py_None; } /* MatchObject's 'string' attribute. */ static PyObject* match_string(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; if (self->string) { Py_INCREF(self->string); return self->string; } else { Py_INCREF(Py_None); return Py_None; } } /* MatchObject's 'fuzzy_counts' attribute. */ static PyObject* match_fuzzy_counts(PyObject* self_) { MatchObject* self; self = (MatchObject*)self_; return Py_BuildValue("nnn", self->fuzzy_counts[RE_FUZZY_SUB], self->fuzzy_counts[RE_FUZZY_INS], self->fuzzy_counts[RE_FUZZY_DEL]); } static PyGetSetDef match_getset[] = { {"lastindex", (getter)match_lastindex, (setter)NULL, "The group number of the last matched capturing group, or None."}, {"lastgroup", (getter)match_lastgroup, (setter)NULL, "The name of the last matched capturing group, or None."}, {"regs", (getter)match_regs, (setter)NULL, "A tuple of the spans of the capturing groups."}, {"string", (getter)match_string, (setter)NULL, "The string that was searched, or None if it has been detached."}, {"fuzzy_counts", (getter)match_fuzzy_counts, (setter)NULL, "A tuple of the number of substitutions, insertions and deletions."}, {NULL} /* Sentinel */ }; static PyMemberDef match_members[] = { {"re", T_OBJECT, offsetof(MatchObject, pattern), READONLY, "The regex object that produced this match object."}, {"pos", T_PYSSIZET, offsetof(MatchObject, pos), READONLY, "The position at which the regex engine starting searching."}, {"endpos", T_PYSSIZET, offsetof(MatchObject, endpos), READONLY, "The final position beyond which the regex engine won't search."}, {"partial", T_BOOL, offsetof(MatchObject, partial), READONLY, "Whether it's a partial match."}, {NULL} /* Sentinel */ }; static PyMappingMethods match_as_mapping = { (lenfunc)match_length, /* mp_length */ (binaryfunc)match_getitem, /* mp_subscript */ 0, /* mp_ass_subscript */ }; static PyTypeObject Match_Type = { PyVarObject_HEAD_INIT(NULL,0) "_" RE_MODULE "." "Match", sizeof(MatchObject) }; /* Copies the groups. */ Py_LOCAL_INLINE(RE_GroupData*) copy_groups(RE_GroupData* groups, size_t group_count) { size_t span_count; size_t g; RE_GroupData* groups_copy; RE_GroupSpan* spans_copy; size_t offset; /* Calculate the total size of the group info. */ span_count = 0; for (g = 0; g < group_count; g++) span_count += groups[g].capture_count; /* Allocate the storage for the group info in a single block. */ groups_copy = (RE_GroupData*)re_alloc(group_count * sizeof(RE_GroupData) + span_count * sizeof(RE_GroupSpan)); if (!groups_copy) return NULL; /* The storage for the spans comes after the other group info. */ spans_copy = (RE_GroupSpan*)&groups_copy[group_count]; /* There's no need to initialise the spans info. */ memset(groups_copy, 0, group_count * sizeof(RE_GroupData)); offset = 0; for (g = 0; g < group_count; g++) { RE_GroupData* orig; RE_GroupData* copy; orig = &groups[g]; copy = &groups_copy[g]; copy->span = orig->span; copy->captures = &spans_copy[offset]; offset += orig->capture_count; if (orig->capture_count > 0) { Py_MEMCPY(copy->captures, orig->captures, orig->capture_count * sizeof(RE_GroupSpan)); copy->capture_capacity = orig->capture_count; copy->capture_count = orig->capture_count; } } return groups_copy; } /* Makes a copy of a MatchObject. */ Py_LOCAL_INLINE(PyObject*) make_match_copy(MatchObject* self) { MatchObject* match; if (!self->string) { /* The target string has been detached, so the MatchObject is now * immutable. */ Py_INCREF(self); return (PyObject*)self; } /* Create a MatchObject. */ match = PyObject_NEW(MatchObject, &Match_Type); if (!match) return NULL; Py_MEMCPY(match, self, sizeof(MatchObject)); Py_INCREF(match->string); Py_INCREF(match->substring); Py_INCREF(match->pattern); /* Copy the groups to the MatchObject. */ if (self->group_count > 0) { match->groups = copy_groups(self->groups, self->group_count); if (!match->groups) { Py_DECREF(match); return NULL; } } return (PyObject*)match; } /* Creates a new MatchObject. */ Py_LOCAL_INLINE(PyObject*) pattern_new_match(PatternObject* pattern, RE_State* state, int status) { /* Create MatchObject (from state object). */ if (status > 0 || status == RE_ERROR_PARTIAL) { MatchObject* match; /* Create a MatchObject. */ match = PyObject_NEW(MatchObject, &Match_Type); if (!match) return NULL; match->string = state->string; match->substring = state->string; match->substring_offset = 0; match->pattern = pattern; match->regs = NULL; if (pattern->is_fuzzy) { match->fuzzy_counts[RE_FUZZY_SUB] = state->total_fuzzy_counts[RE_FUZZY_SUB]; match->fuzzy_counts[RE_FUZZY_INS] = state->total_fuzzy_counts[RE_FUZZY_INS]; match->fuzzy_counts[RE_FUZZY_DEL] = state->total_fuzzy_counts[RE_FUZZY_DEL]; } else memset(match->fuzzy_counts, 0, sizeof(match->fuzzy_counts)); match->partial = status == RE_ERROR_PARTIAL; Py_INCREF(match->string); Py_INCREF(match->substring); Py_INCREF(match->pattern); /* Copy the groups to the MatchObject. */ if (pattern->public_group_count > 0) { match->groups = copy_groups(state->groups, pattern->public_group_count); if (!match->groups) { Py_DECREF(match); return NULL; } } else match->groups = NULL; match->group_count = pattern->public_group_count; match->pos = state->slice_start; match->endpos = state->slice_end; if (state->reverse) { match->match_start = state->text_pos; match->match_end = state->match_pos; } else { match->match_start = state->match_pos; match->match_end = state->text_pos; } match->lastindex = state->lastindex; match->lastgroup = state->lastgroup; return (PyObject*)match; } else if (status == 0) { /* No match. */ Py_INCREF(Py_None); return Py_None; } else { /* Internal error. */ set_error(status, NULL); return NULL; } } /* Gets the text of a capture group from a state. */ Py_LOCAL_INLINE(PyObject*) state_get_group(RE_State* state, Py_ssize_t index, PyObject* string, BOOL empty) { RE_GroupData* group; Py_ssize_t start; Py_ssize_t end; group = &state->groups[index - 1]; if (string != Py_None && index >= 1 && (size_t)index <= state->pattern->public_group_count && group->capture_count > 0) { start = group->span.start; end = group->span.end; } else { if (empty) /* Want an empty string. */ start = end = 0; else { Py_INCREF(Py_None); return Py_None; } } return get_slice(string, start, end); } /* Acquires the lock (mutex) on the state if there's one. * * It also increments the owner's refcount just to ensure that it won't be * destroyed by another thread. */ Py_LOCAL_INLINE(void) acquire_state_lock(PyObject* owner, RE_SafeState* safe_state) { RE_State* state; state = safe_state->re_state; if (state->lock) { /* In order to avoid deadlock we need to release the GIL while trying * to acquire the lock. */ Py_INCREF(owner); if (!PyThread_acquire_lock(state->lock, 0)) { release_GIL(safe_state); PyThread_acquire_lock(state->lock, 1); acquire_GIL(safe_state); } } } /* Releases the lock (mutex) on the state if there's one. * * It also decrements the owner's refcount, which was incremented when the lock * was acquired. */ Py_LOCAL_INLINE(void) release_state_lock(PyObject* owner, RE_SafeState* safe_state) { RE_State* state; state = safe_state->re_state; if (state->lock) { PyThread_release_lock(state->lock); Py_DECREF(owner); } } /* Implements the functionality of ScanObject's search and match methods. */ Py_LOCAL_INLINE(PyObject*) scanner_search_or_match(ScannerObject* self, BOOL search) { RE_State* state; RE_SafeState safe_state; PyObject* match; state = &self->state; /* Initialise the "safe state" structure. */ safe_state.re_state = state; safe_state.thread_state = NULL; /* Acquire the state lock in case we're sharing the scanner object across * threads. */ acquire_state_lock((PyObject*)self, &safe_state); if (self->status == RE_ERROR_FAILURE || self->status == RE_ERROR_PARTIAL) { /* No or partial match. */ release_state_lock((PyObject*)self, &safe_state); Py_INCREF(Py_None); return Py_None; } else if (self->status < 0) { /* Internal error. */ release_state_lock((PyObject*)self, &safe_state); set_error(self->status, NULL); return NULL; } /* Look for another match. */ self->status = do_match(&safe_state, search); if (self->status >= 0 || self->status == RE_ERROR_PARTIAL) { /* Create the match object. */ match = pattern_new_match(self->pattern, state, self->status); if (search && state->overlapped) { /* Advance one character. */ Py_ssize_t step; step = state->reverse ? -1 : 1; state->text_pos = state->match_pos + step; state->must_advance = FALSE; } else /* Continue from where we left off, but don't allow 2 contiguous * zero-width matches. */ state->must_advance = state->text_pos == state->match_pos; } else /* Internal error. */ match = NULL; /* Release the state lock. */ release_state_lock((PyObject*)self, &safe_state); return match; } /* ScannerObject's 'match' method. */ static PyObject* scanner_match(ScannerObject* self, PyObject* unused) { return scanner_search_or_match(self, FALSE); } /* ScannerObject's 'search' method. */ static PyObject* scanner_search(ScannerObject* self, PyObject *unused) { return scanner_search_or_match(self, TRUE); } /* Returns an iterator for a ScannerObject. * * The iterator is actually the ScannerObject itself. */ static PyObject* scanner_iter(PyObject* self) { Py_INCREF(self); return self; } /* Gets the next result from a scanner iterator. */ static PyObject* scanner_iternext(PyObject* self) { PyObject* match; match = scanner_search((ScannerObject*)self, NULL); if (match == Py_None) { /* No match. */ Py_DECREF(match); return NULL; } return match; } /* Makes a copy of a ScannerObject. * * It actually doesn't make a copy, just returns the original object. */ Py_LOCAL_INLINE(PyObject*) make_scanner_copy(ScannerObject* self) { Py_INCREF(self); return (PyObject*)self; } /* ScannerObject's '__copy__' method. */ static PyObject* scanner_copy(ScannerObject* self, PyObject *unused) { return make_scanner_copy(self); } /* ScannerObject's '__deepcopy__' method. */ static PyObject* scanner_deepcopy(ScannerObject* self, PyObject* memo) { return make_scanner_copy(self); } /* The documentation of a ScannerObject. */ PyDoc_STRVAR(scanner_match_doc, "match() --> MatchObject or None.\n\ Match at the current position in the string."); PyDoc_STRVAR(scanner_search_doc, "search() --> MatchObject or None.\n\ Search from the current position in the string."); /* ScannerObject's methods. */ static PyMethodDef scanner_methods[] = { {"match", (PyCFunction)scanner_match, METH_NOARGS, scanner_match_doc}, {"search", (PyCFunction)scanner_search, METH_NOARGS, scanner_search_doc}, {"__copy__", (PyCFunction)scanner_copy, METH_NOARGS}, {"__deepcopy__", (PyCFunction)scanner_deepcopy, METH_O}, {NULL, NULL} }; PyDoc_STRVAR(scanner_doc, "Scanner object"); /* Deallocates a ScannerObject. */ static void scanner_dealloc(PyObject* self_) { ScannerObject* self; self = (ScannerObject*)self_; state_fini(&self->state); Py_DECREF(self->pattern); PyObject_DEL(self); } static PyMemberDef scanner_members[] = { {"pattern", T_OBJECT, offsetof(ScannerObject, pattern), READONLY, "The regex object that produced this scanner object."}, {NULL} /* Sentinel */ }; static PyTypeObject Scanner_Type = { PyVarObject_HEAD_INIT(NULL, 0) "_" RE_MODULE "." "Scanner", sizeof(ScannerObject) }; /* Decodes a 'concurrent' argument. */ Py_LOCAL_INLINE(int) decode_concurrent(PyObject* concurrent) { Py_ssize_t value; if (concurrent == Py_None) return RE_CONC_DEFAULT; value = PyLong_AsLong(concurrent); if (value == -1 && PyErr_Occurred()) { set_error(RE_ERROR_CONCURRENT, NULL); return -1; } return value ? RE_CONC_YES : RE_CONC_NO; } /* Decodes a 'partial' argument. */ Py_LOCAL_INLINE(BOOL) decode_partial(PyObject* partial) { Py_ssize_t value; if (partial == Py_False) return FALSE; if (partial == Py_True) return TRUE; value = PyLong_AsLong(partial); if (value == -1 && PyErr_Occurred()) { PyErr_Clear(); return TRUE; } return value != 0; } /* Creates a new ScannerObject. */ static PyObject* pattern_scanner(PatternObject* pattern, PyObject* args, PyObject* kwargs) { /* Create search state object. */ ScannerObject* self; Py_ssize_t start; Py_ssize_t end; int conc; BOOL part; PyObject* string; PyObject* pos = Py_None; PyObject* endpos = Py_None; Py_ssize_t overlapped = FALSE; PyObject* concurrent = Py_None; PyObject* partial = Py_False; static char* kwlist[] = { "string", "pos", "endpos", "overlapped", "concurrent", "partial", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OOnOO:scanner", kwlist, &string, &pos, &endpos, &overlapped, &concurrent, &partial)) return NULL; start = as_string_index(pos, 0); if (start == -1 && PyErr_Occurred()) return NULL; end = as_string_index(endpos, PY_SSIZE_T_MAX); if (end == -1 && PyErr_Occurred()) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; part = decode_partial(partial); /* Create a scanner object. */ self = PyObject_NEW(ScannerObject, &Scanner_Type); if (!self) return NULL; self->pattern = pattern; Py_INCREF(self->pattern); /* The MatchObject, and therefore repeated captures, will be visible. */ if (!state_init(&self->state, pattern, string, start, end, overlapped != 0, conc, part, TRUE, TRUE, FALSE)) { PyObject_DEL(self); return NULL; } self->status = RE_ERROR_SUCCESS; return (PyObject*) self; } /* Performs the split for the SplitterObject. */ Py_LOCAL_INLINE(PyObject*) next_split_part(SplitterObject* self) { RE_State* state; RE_SafeState safe_state; PyObject* result = NULL; /* Initialise to stop compiler warning. */ state = &self->state; /* Initialise the "safe state" structure. */ safe_state.re_state = state; safe_state.thread_state = NULL; /* Acquire the state lock in case we're sharing the splitter object across * threads. */ acquire_state_lock((PyObject*)self, &safe_state); if (self->status == RE_ERROR_FAILURE || self->status == RE_ERROR_PARTIAL) { /* Finished. */ release_state_lock((PyObject*)self, &safe_state); result = Py_False; Py_INCREF(result); return result; } else if (self->status < 0) { /* Internal error. */ release_state_lock((PyObject*)self, &safe_state); set_error(self->status, NULL); return NULL; } if (self->index == 0) { if (self->split_count < self->maxsplit) { Py_ssize_t step; Py_ssize_t end_pos; if (state->reverse) { step = -1; end_pos = state->slice_start; } else { step = 1; end_pos = state->slice_end; } retry: self->status = do_match(&safe_state, TRUE); if (self->status < 0) goto error; if (self->status == RE_ERROR_SUCCESS) { if (state->version_0) { /* Version 0 behaviour is to advance one character if the * split was zero-width. Unfortunately, this can give an * incorrect result. GvR wants this behaviour to be * retained so as not to break any existing software which * might rely on it. */ if (state->text_pos == state->match_pos) { if (self->last_pos == end_pos) goto no_match; /* Advance one character. */ state->text_pos += step; state->must_advance = FALSE; goto retry; } } ++self->split_count; /* Get segment before this match. */ if (state->reverse) result = get_slice(state->string, state->match_pos, self->last_pos); else result = get_slice(state->string, self->last_pos, state->match_pos); if (!result) goto error; self->last_pos = state->text_pos; /* Version 0 behaviour is to advance one character if the match * was zero-width. Unfortunately, this can give an incorrect * result. GvR wants this behaviour to be retained so as not to * break any existing software which might rely on it. */ if (state->version_0) { if (state->text_pos == state->match_pos) /* Advance one character. */ state->text_pos += step; state->must_advance = FALSE; } else /* Continue from where we left off, but don't allow a * contiguous zero-width match. */ state->must_advance = TRUE; } } else goto no_match; if (self->status == RE_ERROR_FAILURE || self->status == RE_ERROR_PARTIAL) { no_match: /* Get segment following last match (even if empty). */ if (state->reverse) result = get_slice(state->string, 0, self->last_pos); else result = get_slice(state->string, self->last_pos, state->text_length); if (!result) goto error; } } else { /* Add group. */ result = state_get_group(state, self->index, state->string, FALSE); if (!result) goto error; } ++self->index; if ((size_t)self->index > state->pattern->public_group_count) self->index = 0; /* Release the state lock. */ release_state_lock((PyObject*)self, &safe_state); return result; error: /* Release the state lock. */ release_state_lock((PyObject*)self, &safe_state); return NULL; } /* SplitterObject's 'split' method. */ static PyObject* splitter_split(SplitterObject* self, PyObject *unused) { PyObject* result; result = next_split_part(self); if (result == Py_False) { /* The sentinel. */ Py_DECREF(Py_False); Py_INCREF(Py_None); return Py_None; } return result; } /* Returns an iterator for a SplitterObject. * * The iterator is actually the SplitterObject itself. */ static PyObject* splitter_iter(PyObject* self) { Py_INCREF(self); return self; } /* Gets the next result from a splitter iterator. */ static PyObject* splitter_iternext(PyObject* self) { PyObject* result; result = next_split_part((SplitterObject*)self); if (result == Py_False) { /* No match. */ Py_DECREF(result); return NULL; } return result; } /* Makes a copy of a SplitterObject. * * It actually doesn't make a copy, just returns the original object. */ Py_LOCAL_INLINE(PyObject*) make_splitter_copy(SplitterObject* self) { Py_INCREF(self); return (PyObject*)self; } /* SplitterObject's '__copy__' method. */ static PyObject* splitter_copy(SplitterObject* self, PyObject *unused) { return make_splitter_copy(self); } /* SplitterObject's '__deepcopy__' method. */ static PyObject* splitter_deepcopy(SplitterObject* self, PyObject* memo) { return make_splitter_copy(self); } /* The documentation of a SplitterObject. */ PyDoc_STRVAR(splitter_split_doc, "split() --> string or None.\n\ Return the next part of the split string."); /* SplitterObject's methods. */ static PyMethodDef splitter_methods[] = { {"split", (PyCFunction)splitter_split, METH_NOARGS, splitter_split_doc}, {"__copy__", (PyCFunction)splitter_copy, METH_NOARGS}, {"__deepcopy__", (PyCFunction)splitter_deepcopy, METH_O}, {NULL, NULL} }; PyDoc_STRVAR(splitter_doc, "Splitter object"); /* Deallocates a SplitterObject. */ static void splitter_dealloc(PyObject* self_) { SplitterObject* self; self = (SplitterObject*)self_; state_fini(&self->state); Py_DECREF(self->pattern); PyObject_DEL(self); } /* Converts a captures index to an integer. * * A negative capture index in 'expandf' and 'subf' is passed as a string * because negative indexes are not supported by 'str.format'. */ Py_LOCAL_INLINE(Py_ssize_t) index_to_integer(PyObject* item) { Py_ssize_t value; value = PyLong_AsLong(item); if (value != -1 || !PyErr_Occurred()) return value; PyErr_Clear(); /* Is the index a string representation of an integer? */ if (PyUnicode_Check(item)) { PyObject* int_obj; #if PY_VERSION_HEX < 0x03030000 Py_UNICODE* characters; Py_ssize_t length; #endif #if PY_VERSION_HEX >= 0x03030000 int_obj = PyLong_FromUnicodeObject(item, 0); #else characters = (Py_UNICODE*)PyUnicode_AS_DATA(item); length = PyUnicode_GET_SIZE(item); int_obj = PyLong_FromUnicode(characters, length, 0); #endif if (!int_obj) goto error; value = PyLong_AsLong(int_obj); Py_DECREF(int_obj); if (!PyErr_Occurred()) return value; } else if (PyBytes_Check(item)) { char* characters; PyObject* int_obj; characters = PyBytes_AsString(item); int_obj = PyLong_FromString(characters, NULL, 0); if (!int_obj) goto error; value = PyLong_AsLong(int_obj); Py_DECREF(int_obj); if (!PyErr_Occurred()) return value; } error: PyErr_Format(PyExc_TypeError, "list indices must be integers, not %.200s", item->ob_type->tp_name); return -1; } /* CaptureObject's length method. */ Py_LOCAL_INLINE(Py_ssize_t) capture_length(CaptureObject* self) { MatchObject* match; RE_GroupData* group; if (self->group_index == 0) return 1; match = *self->match_indirect; group = &match->groups[self->group_index - 1]; return (Py_ssize_t)group->capture_count; } /* CaptureObject's '__getitem__' method. */ static PyObject* capture_getitem(CaptureObject* self, PyObject* item) { Py_ssize_t index; MatchObject* match; Py_ssize_t start; Py_ssize_t end; index = index_to_integer(item); if (index == -1 && PyErr_Occurred()) return NULL; match = *self->match_indirect; if (self->group_index == 0) { if (index < 0) index += 1; if (index != 0) { PyErr_SetString(PyExc_IndexError, "list index out of range"); return NULL; } start = match->match_start; end = match->match_end; } else { RE_GroupData* group; RE_GroupSpan* span; group = &match->groups[self->group_index - 1]; if (index < 0) index += group->capture_count; if (index < 0 || index >= (Py_ssize_t)group->capture_count) { PyErr_SetString(PyExc_IndexError, "list index out of range"); return NULL; } span = &group->captures[index]; start = span->start; end = span->end; } return get_slice(match->substring, start - match->substring_offset, end - match->substring_offset); } static PyMappingMethods capture_as_mapping = { (lenfunc)capture_length, /* mp_length */ (binaryfunc)capture_getitem, /* mp_subscript */ 0, /* mp_ass_subscript */ }; /* CaptureObject's methods. */ static PyMethodDef capture_methods[] = { {"__getitem__", (PyCFunction)capture_getitem, METH_O|METH_COEXIST}, {NULL, NULL} }; /* Deallocates a CaptureObject. */ static void capture_dealloc(PyObject* self_) { CaptureObject* self; self = (CaptureObject*)self_; PyObject_DEL(self); } /* CaptureObject's 'str' method. */ static PyObject* capture_str(PyObject* self_) { CaptureObject* self; MatchObject* match; self = (CaptureObject*)self_; match = *self->match_indirect; return match_get_group_by_index(match, self->group_index, Py_None); } static PyMemberDef splitter_members[] = { {"pattern", T_OBJECT, offsetof(SplitterObject, pattern), READONLY, "The regex object that produced this splitter object."}, {NULL} /* Sentinel */ }; static PyTypeObject Splitter_Type = { PyVarObject_HEAD_INIT(NULL, 0) "_" RE_MODULE "." "Splitter", sizeof(SplitterObject) }; /* Creates a new SplitterObject. */ Py_LOCAL_INLINE(PyObject*) pattern_splitter(PatternObject* pattern, PyObject* args, PyObject* kwargs) { /* Create split state object. */ int conc; SplitterObject* self; RE_State* state; PyObject* string; Py_ssize_t maxsplit = 0; PyObject* concurrent = Py_None; static char* kwlist[] = { "string", "maxsplit", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nO:splitter", kwlist, &string, &maxsplit, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; /* Create a splitter object. */ self = PyObject_NEW(SplitterObject, &Splitter_Type); if (!self) return NULL; self->pattern = pattern; Py_INCREF(self->pattern); if (maxsplit == 0) maxsplit = PY_SSIZE_T_MAX; state = &self->state; /* The MatchObject, and therefore repeated captures, will not be visible. */ if (!state_init(state, pattern, string, 0, PY_SSIZE_T_MAX, FALSE, conc, FALSE, TRUE, FALSE, FALSE)) { PyObject_DEL(self); return NULL; } self->maxsplit = maxsplit; self->last_pos = state->reverse ? state->text_length : 0; self->split_count = 0; self->index = 0; self->status = 1; return (PyObject*) self; } /* Implements the functionality of PatternObject's search and match methods. */ Py_LOCAL_INLINE(PyObject*) pattern_search_or_match(PatternObject* self, PyObject* args, PyObject* kwargs, char* args_desc, BOOL search, BOOL match_all) { Py_ssize_t start; Py_ssize_t end; int conc; BOOL part; RE_State state; RE_SafeState safe_state; int status; PyObject* match; PyObject* string; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; PyObject* partial = Py_False; static char* kwlist[] = { "string", "pos", "endpos", "concurrent", "partial", NULL }; /* When working with a short string, such as a line from a file, the * relative cost of PyArg_ParseTupleAndKeywords can be significant, and * it's worth not using it when there are only positional arguments. */ Py_ssize_t arg_count; if (args && !kwargs && PyTuple_CheckExact(args)) arg_count = PyTuple_GET_SIZE(args); else arg_count = -1; if (1 <= arg_count && arg_count <= 5) { /* PyTuple_GET_ITEM borrows the reference. */ string = PyTuple_GET_ITEM(args, 0); if (arg_count >= 2) pos = PyTuple_GET_ITEM(args, 1); if (arg_count >= 3) endpos = PyTuple_GET_ITEM(args, 2); if (arg_count >= 4) concurrent = PyTuple_GET_ITEM(args, 3); if (arg_count >= 5) partial = PyTuple_GET_ITEM(args, 4); } else if (!PyArg_ParseTupleAndKeywords(args, kwargs, args_desc, kwlist, &string, &pos, &endpos, &concurrent, &partial)) return NULL; start = as_string_index(pos, 0); if (start == -1 && PyErr_Occurred()) return NULL; end = as_string_index(endpos, PY_SSIZE_T_MAX); if (end == -1 && PyErr_Occurred()) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; part = decode_partial(partial); /* The MatchObject, and therefore repeated captures, will be visible. */ if (!state_init(&state, self, string, start, end, FALSE, conc, part, FALSE, TRUE, match_all)) return NULL; /* Initialise the "safe state" structure. */ safe_state.re_state = &state; safe_state.thread_state = NULL; status = do_match(&safe_state, search); if (status >= 0 || status == RE_ERROR_PARTIAL) /* Create the match object. */ match = pattern_new_match(self, &state, status); else match = NULL; state_fini(&state); return match; } /* PatternObject's 'match' method. */ static PyObject* pattern_match(PatternObject* self, PyObject* args, PyObject* kwargs) { return pattern_search_or_match(self, args, kwargs, "O|OOOO:match", FALSE, FALSE); } /* PatternObject's 'fullmatch' method. */ static PyObject* pattern_fullmatch(PatternObject* self, PyObject* args, PyObject* kwargs) { return pattern_search_or_match(self, args, kwargs, "O|OOOO:fullmatch", FALSE, TRUE); } /* PatternObject's 'search' method. */ static PyObject* pattern_search(PatternObject* self, PyObject* args, PyObject* kwargs) { return pattern_search_or_match(self, args, kwargs, "O|OOOO:search", TRUE, FALSE); } /* Gets the limits of the matching. */ Py_LOCAL_INLINE(BOOL) get_limits(PyObject* pos, PyObject* endpos, Py_ssize_t length, Py_ssize_t* start, Py_ssize_t* end) { Py_ssize_t s; Py_ssize_t e; s = as_string_index(pos, 0); if (s == -1 && PyErr_Occurred()) return FALSE; e = as_string_index(endpos, PY_SSIZE_T_MAX); if (e == -1 && PyErr_Occurred()) return FALSE; /* Adjust boundaries. */ if (s < 0) s += length; if (s < 0) s = 0; else if (s > length) s = length; if (e < 0) e += length; if (e < 0) e = 0; else if (e > length) e = length; *start = s; *end = e; return TRUE; } /* Gets a replacement item from the replacement list. * * The replacement item could be a string literal or a group. * * It can return None to represent an empty string. */ Py_LOCAL_INLINE(PyObject*) get_sub_replacement(PyObject* item, PyObject* string, RE_State* state, size_t group_count) { Py_ssize_t index; if (PyUnicode_CheckExact(item) || PyBytes_CheckExact(item)) { /* It's a literal, which can be added directly to the list. */ #if PY_VERSION_HEX >= 0x03040000 /* ensure_immutable will DECREF the original item if it has to make an * immutable copy, but that original item might have a borrowed * reference, so we must INCREF it first in order to ensure it won't be * destroyed. */ Py_INCREF(item); item = ensure_immutable(item); #else Py_INCREF(item); #endif return item; } /* Is it a group reference? */ index = as_group_index(item); if (index == -1 && PyErr_Occurred()) { /* Not a group either! */ set_error(RE_ERROR_REPLACEMENT, NULL); return NULL; } if (index == 0) { /* The entire matched portion of the string. */ if (state->match_pos == state->text_pos) { /* Return None for "". */ Py_INCREF(Py_None); return Py_None; } if (state->reverse) return get_slice(string, state->text_pos, state->match_pos); else return get_slice(string, state->match_pos, state->text_pos); } else if (1 <= index && (size_t)index <= group_count) { /* A group. */ RE_GroupData* group; group = &state->groups[index - 1]; if (group->capture_count == 0 && group->span.start != group->span.end) { /* The group didn't match or is "", so return None for "". */ Py_INCREF(Py_None); return Py_None; } return get_slice(string, group->span.start, group->span.end); } else { /* No such group. */ set_error(RE_ERROR_INVALID_GROUP_REF, NULL); return NULL; } } /* PatternObject's 'subx' method. */ Py_LOCAL_INLINE(PyObject*) pattern_subx(PatternObject* self, PyObject* str_template, PyObject* string, Py_ssize_t maxsub, int sub_type, PyObject* pos, PyObject* endpos, int concurrent) { RE_StringInfo str_info; Py_ssize_t start; Py_ssize_t end; BOOL is_callable = FALSE; PyObject* replacement = NULL; BOOL is_literal = FALSE; BOOL is_format = FALSE; BOOL is_template = FALSE; RE_State state; RE_SafeState safe_state; JoinInfo join_info; Py_ssize_t sub_count; Py_ssize_t last_pos; Py_ssize_t step; PyObject* item; MatchObject* match; BOOL built_capture = FALSE; PyObject* args; PyObject* kwargs; Py_ssize_t end_pos; /* Get the string. */ if (!get_string(string, &str_info)) return NULL; if (!check_compatible(self, str_info.is_unicode)) { release_buffer(&str_info); return NULL; } /* Get the limits of the search. */ if (!get_limits(pos, endpos, str_info.length, &start, &end)) { release_buffer(&str_info); return NULL; } /* If the pattern is too long for the string, then take a shortcut, unless * it's a fuzzy pattern. */ if (!self->is_fuzzy && self->min_width > end - start) { PyObject* result; Py_INCREF(string); if (sub_type & RE_SUBN) result = Py_BuildValue("Nn", string, 0); else result = string; release_buffer(&str_info); return result; } if (maxsub == 0) maxsub = PY_SSIZE_T_MAX; /* sub/subn takes either a function or a string template. */ if (PyCallable_Check(str_template)) { /* It's callable. */ is_callable = TRUE; replacement = str_template; Py_INCREF(replacement); } else if (sub_type & RE_SUBF) { /* Is it a literal format? * * To keep it simple we'll say that a literal is a string which can be * used as-is, so no placeholders. */ Py_ssize_t literal_length; literal_length = check_replacement_string(str_template, '{'); if (literal_length > 0) { /* It's a literal. */ is_literal = TRUE; replacement = str_template; Py_INCREF(replacement); } else if (literal_length < 0) { /* It isn't a literal, so get the 'format' method. */ is_format = TRUE; replacement = PyObject_GetAttrString(str_template, "format"); if (!replacement) { release_buffer(&str_info); return NULL; } } } else { /* Is it a literal template? * * To keep it simple we'll say that a literal is a string which can be * used as-is, so no backslashes. */ Py_ssize_t literal_length; literal_length = check_replacement_string(str_template, '\\'); if (literal_length > 0) { /* It's a literal. */ is_literal = TRUE; replacement = str_template; Py_INCREF(replacement); } else if (literal_length < 0 ) { /* It isn't a literal, so hand it over to the template compiler. */ is_template = TRUE; replacement = call(RE_MODULE, "_compile_replacement_helper", PyTuple_Pack(2, self, str_template)); if (!replacement) { release_buffer(&str_info); return NULL; } } } /* The MatchObject, and therefore repeated captures, will be visible only * if the replacement is callable or subf is used. */ if (!state_init_2(&state, self, string, &str_info, start, end, FALSE, concurrent, FALSE, FALSE, is_callable || (sub_type & RE_SUBF) != 0, FALSE)) { release_buffer(&str_info); Py_XDECREF(replacement); return NULL; } /* Initialise the "safe state" structure. */ safe_state.re_state = &state; safe_state.thread_state = NULL; init_join_list(&join_info, state.reverse, PyUnicode_Check(string)); sub_count = 0; last_pos = state.reverse ? state.text_length : 0; step = state.reverse ? -1 : 1; while (sub_count < maxsub) { int status; status = do_match(&safe_state, TRUE); if (status < 0) goto error; if (status == 0) break; /* Append the segment before this match. */ if (state.match_pos != last_pos) { if (state.reverse) item = get_slice(string, state.match_pos, last_pos); else item = get_slice(string, last_pos, state.match_pos); if (!item) goto error; /* Add to the list. */ status = add_to_join_list(&join_info, item); Py_DECREF(item); if (status < 0) goto error; } /* Add this match. */ if (is_literal) { /* The replacement is a literal string. */ status = add_to_join_list(&join_info, replacement); if (status < 0) goto error; } else if (is_format) { /* The replacement is a format string. */ size_t g; /* We need to create the arguments for the 'format' method. We'll * start by creating a MatchObject. */ match = (MatchObject*)pattern_new_match(self, &state, 1); if (!match) goto error; /* We'll build the args and kwargs the first time. They'll be using * capture objects which refer to the match object indirectly; this * means that args and kwargs can be reused with different match * objects. */ if (!built_capture) { /* The args are a tuple of the capture group matches. */ args = PyTuple_New(match->group_count + 1); if (!args) { Py_DECREF(match); goto error; } for (g = 0; g < match->group_count + 1; g++) /* PyTuple_SetItem borrows the reference. */ PyTuple_SetItem(args, (Py_ssize_t)g, make_capture_object(&match, (Py_ssize_t)g)); /* The kwargs are a dict of the named capture group matches. */ kwargs = make_capture_dict(match, &match); if (!kwargs) { Py_DECREF(args); Py_DECREF(match); goto error; } built_capture = TRUE; } /* Call the 'format' method. */ item = PyObject_Call(replacement, args, kwargs); Py_DECREF(match); if (!item) goto error; /* Add the result to the list. */ status = add_to_join_list(&join_info, item); Py_DECREF(item); if (status < 0) goto error; } else if (is_template) { /* The replacement is a list template. */ Py_ssize_t count; Py_ssize_t index; Py_ssize_t step; /* Add each part of the template to the list. */ count = PyList_GET_SIZE(replacement); if (join_info.reversed) { /* We're searching backwards, so we'll be reversing the list * when it's complete. Therefore, we need to add the items of * the template in reverse order for them to be in the correct * order after the reversal. */ index = count - 1; step = -1; } else { /* We're searching forwards. */ index = 0; step = 1; } while (count > 0) { PyObject* item; PyObject* str_item; /* PyList_GET_ITEM borrows a reference. */ item = PyList_GET_ITEM(replacement, index); str_item = get_sub_replacement(item, string, &state, self->public_group_count); if (!str_item) goto error; /* Add the result to the list. */ if (str_item == Py_None) /* None for "". */ Py_DECREF(str_item); else { status = add_to_join_list(&join_info, str_item); Py_DECREF(str_item); if (status < 0) goto error; } --count; index += step; } } else if (is_callable) { /* Pass a MatchObject to the replacement function. */ PyObject* match; PyObject* args; /* We need to create a MatchObject to pass to the replacement * function. */ match = pattern_new_match(self, &state, 1); if (!match) goto error; /* The args for the replacement function. */ args = PyTuple_Pack(1, match); if (!args) { Py_DECREF(match); goto error; } /* Call the replacement function. */ item = PyObject_CallObject(replacement, args); Py_DECREF(args); Py_DECREF(match); if (!item) goto error; /* Add the result to the list. */ status = add_to_join_list(&join_info, item); Py_DECREF(item); if (status < 0) goto error; } ++sub_count; last_pos = state.text_pos; if (state.version_0) { /* Always advance after a zero-width match. */ if (state.match_pos == state.text_pos) { state.text_pos += step; state.must_advance = FALSE; } else state.must_advance = TRUE; } else /* Continue from where we left off, but don't allow a contiguous * zero-width match. */ state.must_advance = state.match_pos == state.text_pos; } /* Get the segment following the last match. We use 'length' instead of * 'text_length' because the latter is truncated to 'slice_end', a * documented idiosyncracy of the 're' module. */ end_pos = state.reverse ? 0 : str_info.length; if (last_pos != end_pos) { int status; /* The segment is part of the original string. */ if (state.reverse) item = get_slice(string, 0, last_pos); else item = get_slice(string, last_pos, str_info.length); if (!item) goto error; status = add_to_join_list(&join_info, item); Py_DECREF(item); if (status < 0) goto error; } Py_XDECREF(replacement); /* Convert the list to a single string (also cleans up join_info). */ item = join_list_info(&join_info); state_fini(&state); if (built_capture) { Py_DECREF(kwargs); Py_DECREF(args); } if (!item) return NULL; if (sub_type & RE_SUBN) return Py_BuildValue("Nn", item, sub_count); return item; error: if (built_capture) { Py_DECREF(kwargs); Py_DECREF(args); } clear_join_list(&join_info); state_fini(&state); Py_XDECREF(replacement); return NULL; } /* PatternObject's 'sub' method. */ static PyObject* pattern_sub(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; PyObject* replacement; PyObject* string; Py_ssize_t count = 0; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; static char* kwlist[] = { "repl", "string", "count", "pos", "endpos", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nOOO:sub", kwlist, &replacement, &string, &count, &pos, &endpos, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; return pattern_subx(self, replacement, string, count, RE_SUB, pos, endpos, conc); } /* PatternObject's 'subf' method. */ static PyObject* pattern_subf(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; PyObject* format; PyObject* string; Py_ssize_t count = 0; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; static char* kwlist[] = { "format", "string", "count", "pos", "endpos", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nOOO:sub", kwlist, &format, &string, &count, &pos, &endpos, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; return pattern_subx(self, format, string, count, RE_SUBF, pos, endpos, conc); } /* PatternObject's 'subn' method. */ static PyObject* pattern_subn(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; PyObject* replacement; PyObject* string; Py_ssize_t count = 0; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; static char* kwlist[] = { "repl", "string", "count", "pos", "endpos", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nOOO:subn", kwlist, &replacement, &string, &count, &pos, &endpos, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; return pattern_subx(self, replacement, string, count, RE_SUBN, pos, endpos, conc); } /* PatternObject's 'subfn' method. */ static PyObject* pattern_subfn(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; PyObject* format; PyObject* string; Py_ssize_t count = 0; PyObject* pos = Py_None; PyObject* endpos = Py_None; PyObject* concurrent = Py_None; static char* kwlist[] = { "format", "string", "count", "pos", "endpos", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nOOO:subn", kwlist, &format, &string, &count, &pos, &endpos, &concurrent)) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; return pattern_subx(self, format, string, count, RE_SUBF | RE_SUBN, pos, endpos, conc); } /* PatternObject's 'split' method. */ static PyObject* pattern_split(PatternObject* self, PyObject* args, PyObject* kwargs) { int conc; RE_State state; RE_SafeState safe_state; PyObject* list; PyObject* item; int status; Py_ssize_t split_count; size_t g; Py_ssize_t start_pos; Py_ssize_t end_pos; Py_ssize_t step; Py_ssize_t last_pos; PyObject* string; Py_ssize_t maxsplit = 0; PyObject* concurrent = Py_None; static char* kwlist[] = { "string", "maxsplit", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nO:split", kwlist, &string, &maxsplit, &concurrent)) return NULL; if (maxsplit == 0) maxsplit = PY_SSIZE_T_MAX; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; /* The MatchObject, and therefore repeated captures, will not be visible. */ if (!state_init(&state, self, string, 0, PY_SSIZE_T_MAX, FALSE, conc, FALSE, FALSE, FALSE, FALSE)) return NULL; /* Initialise the "safe state" structure. */ safe_state.re_state = &state; safe_state.thread_state = NULL; list = PyList_New(0); if (!list) { state_fini(&state); return NULL; } split_count = 0; if (state.reverse) { start_pos = state.text_length; end_pos = 0; step = -1; } else { start_pos = 0; end_pos = state.text_length; step = 1; } last_pos = start_pos; while (split_count < maxsplit) { status = do_match(&safe_state, TRUE); if (status < 0) goto error; if (status == 0) /* No more matches. */ break; if (state.version_0) { /* Version 0 behaviour is to advance one character if the split was * zero-width. Unfortunately, this can give an incorrect result. * GvR wants this behaviour to be retained so as not to break any * existing software which might rely on it. */ if (state.text_pos == state.match_pos) { if (last_pos == end_pos) break; /* Advance one character. */ state.text_pos += step; state.must_advance = FALSE; continue; } } /* Get segment before this match. */ if (state.reverse) item = get_slice(string, state.match_pos, last_pos); else item = get_slice(string, last_pos, state.match_pos); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; /* Add groups (if any). */ for (g = 1; g <= self->public_group_count; g++) { item = state_get_group(&state, (Py_ssize_t)g, string, FALSE); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; } ++split_count; last_pos = state.text_pos; /* Version 0 behaviour is to advance one character if the match was * zero-width. Unfortunately, this can give an incorrect result. GvR * wants this behaviour to be retained so as not to break any existing * software which might rely on it. */ if (state.version_0) { if (state.text_pos == state.match_pos) /* Advance one character. */ state.text_pos += step; state.must_advance = FALSE; } else /* Continue from where we left off, but don't allow a contiguous * zero-width match. */ state.must_advance = TRUE; } /* Get segment following last match (even if empty). */ if (state.reverse) item = get_slice(string, 0, last_pos); else item = get_slice(string, last_pos, state.text_length); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; state_fini(&state); return list; error: Py_DECREF(list); state_fini(&state); return NULL; } /* PatternObject's 'splititer' method. */ static PyObject* pattern_splititer(PatternObject* pattern, PyObject* args, PyObject* kwargs) { return pattern_splitter(pattern, args, kwargs); } /* PatternObject's 'findall' method. */ static PyObject* pattern_findall(PatternObject* self, PyObject* args, PyObject* kwargs) { Py_ssize_t start; Py_ssize_t end; int conc; RE_State state; RE_SafeState safe_state; PyObject* list; Py_ssize_t step; int status; Py_ssize_t b; Py_ssize_t e; size_t g; PyObject* string; PyObject* pos = Py_None; PyObject* endpos = Py_None; Py_ssize_t overlapped = FALSE; PyObject* concurrent = Py_None; static char* kwlist[] = { "string", "pos", "endpos", "overlapped", "concurrent", NULL }; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|OOnO:findall", kwlist, &string, &pos, &endpos, &overlapped, &concurrent)) return NULL; start = as_string_index(pos, 0); if (start == -1 && PyErr_Occurred()) return NULL; end = as_string_index(endpos, PY_SSIZE_T_MAX); if (end == -1 && PyErr_Occurred()) return NULL; conc = decode_concurrent(concurrent); if (conc < 0) return NULL; /* The MatchObject, and therefore repeated captures, will not be visible. */ if (!state_init(&state, self, string, start, end, overlapped != 0, conc, FALSE, FALSE, FALSE, FALSE)) return NULL; /* Initialise the "safe state" structure. */ safe_state.re_state = &state; safe_state.thread_state = NULL; list = PyList_New(0); if (!list) { state_fini(&state); return NULL; } step = state.reverse ? -1 : 1; while (state.slice_start <= state.text_pos && state.text_pos <= state.slice_end) { PyObject* item; status = do_match(&safe_state, TRUE); if (status < 0) goto error; if (status == 0) break; /* Don't bother to build a MatchObject. */ switch (self->public_group_count) { case 0: if (state.reverse) { b = state.text_pos; e = state.match_pos; } else { b = state.match_pos; e = state.text_pos; } item = get_slice(string, b, e); if (!item) goto error; break; case 1: item = state_get_group(&state, 1, string, TRUE); if (!item) goto error; break; default: item = PyTuple_New((Py_ssize_t)self->public_group_count); if (!item) goto error; for (g = 0; g < self->public_group_count; g++) { PyObject* o; o = state_get_group(&state, (Py_ssize_t)g + 1, string, TRUE); if (!o) { Py_DECREF(item); goto error; } /* PyTuple_SET_ITEM borrows the reference. */ PyTuple_SET_ITEM(item, g, o); } break; } status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; if (state.overlapped) { /* Advance one character. */ state.text_pos = state.match_pos + step; state.must_advance = FALSE; } else /* Continue from where we left off, but don't allow 2 contiguous * zero-width matches. */ state.must_advance = state.text_pos == state.match_pos; } state_fini(&state); return list; error: Py_DECREF(list); state_fini(&state); return NULL; } /* PatternObject's 'finditer' method. */ static PyObject* pattern_finditer(PatternObject* pattern, PyObject* args, PyObject* kwargs) { return pattern_scanner(pattern, args, kwargs); } /* Makes a copy of a PatternObject. */ Py_LOCAL_INLINE(PyObject*) make_pattern_copy(PatternObject* self) { Py_INCREF(self); return (PyObject*)self; } /* PatternObject's '__copy__' method. */ static PyObject* pattern_copy(PatternObject* self, PyObject *unused) { return make_pattern_copy(self); } /* PatternObject's '__deepcopy__' method. */ static PyObject* pattern_deepcopy(PatternObject* self, PyObject* memo) { return make_pattern_copy(self); } /* The documentation of a PatternObject. */ PyDoc_STRVAR(pattern_match_doc, "match(string, pos=None, endpos=None, concurrent=None) --> MatchObject or None.\n\ Match zero or more characters at the beginning of the string."); PyDoc_STRVAR(pattern_fullmatch_doc, "fullmatch(string, pos=None, endpos=None, concurrent=None) --> MatchObject or None.\n\ Match zero or more characters against all of the string."); PyDoc_STRVAR(pattern_search_doc, "search(string, pos=None, endpos=None, concurrent=None) --> MatchObject or None.\n\ Search through string looking for a match, and return a corresponding\n\ match object instance. Return None if no match is found."); PyDoc_STRVAR(pattern_sub_doc, "sub(repl, string, count=0, flags=0, pos=None, endpos=None, concurrent=None) --> newstring\n\ Return the string obtained by replacing the leftmost (or rightmost with a\n\ reverse pattern) non-overlapping occurrences of pattern in string by the\n\ replacement repl."); PyDoc_STRVAR(pattern_subf_doc, "subf(format, string, count=0, flags=0, pos=None, endpos=None, concurrent=None) --> newstring\n\ Return the string obtained by replacing the leftmost (or rightmost with a\n\ reverse pattern) non-overlapping occurrences of pattern in string by the\n\ replacement format."); PyDoc_STRVAR(pattern_subn_doc, "subn(repl, string, count=0, flags=0, pos=None, endpos=None, concurrent=None) --> (newstring, number of subs)\n\ Return the tuple (new_string, number_of_subs_made) found by replacing the\n\ leftmost (or rightmost with a reverse pattern) non-overlapping occurrences\n\ of pattern with the replacement repl."); PyDoc_STRVAR(pattern_subfn_doc, "subfn(format, string, count=0, flags=0, pos=None, endpos=None, concurrent=None) --> (newstring, number of subs)\n\ Return the tuple (new_string, number_of_subs_made) found by replacing the\n\ leftmost (or rightmost with a reverse pattern) non-overlapping occurrences\n\ of pattern with the replacement format."); PyDoc_STRVAR(pattern_split_doc, "split(string, string, maxsplit=0, concurrent=None) --> list.\n\ Split string by the occurrences of pattern."); PyDoc_STRVAR(pattern_splititer_doc, "splititer(string, maxsplit=0, concurrent=None) --> iterator.\n\ Return an iterator yielding the parts of a split string."); PyDoc_STRVAR(pattern_findall_doc, "findall(string, pos=None, endpos=None, overlapped=False, concurrent=None) --> list.\n\ Return a list of all matches of pattern in string. The matches may be\n\ overlapped if overlapped is True."); PyDoc_STRVAR(pattern_finditer_doc, "finditer(string, pos=None, endpos=None, overlapped=False, concurrent=None) --> iterator.\n\ Return an iterator over all matches for the RE pattern in string. The\n\ matches may be overlapped if overlapped is True. For each match, the\n\ iterator returns a MatchObject."); PyDoc_STRVAR(pattern_scanner_doc, "scanner(string, pos=None, endpos=None, overlapped=False, concurrent=None) --> scanner.\n\ Return an scanner for the RE pattern in string. The matches may be overlapped\n\ if overlapped is True."); /* The methods of a PatternObject. */ static PyMethodDef pattern_methods[] = { {"match", (PyCFunction)pattern_match, METH_VARARGS|METH_KEYWORDS, pattern_match_doc}, {"fullmatch", (PyCFunction)pattern_fullmatch, METH_VARARGS|METH_KEYWORDS, pattern_fullmatch_doc}, {"search", (PyCFunction)pattern_search, METH_VARARGS|METH_KEYWORDS, pattern_search_doc}, {"sub", (PyCFunction)pattern_sub, METH_VARARGS|METH_KEYWORDS, pattern_sub_doc}, {"subf", (PyCFunction)pattern_subf, METH_VARARGS|METH_KEYWORDS, pattern_subf_doc}, {"subn", (PyCFunction)pattern_subn, METH_VARARGS|METH_KEYWORDS, pattern_subn_doc}, {"subfn", (PyCFunction)pattern_subfn, METH_VARARGS|METH_KEYWORDS, pattern_subfn_doc}, {"split", (PyCFunction)pattern_split, METH_VARARGS|METH_KEYWORDS, pattern_split_doc}, {"splititer", (PyCFunction)pattern_splititer, METH_VARARGS|METH_KEYWORDS, pattern_splititer_doc}, {"findall", (PyCFunction)pattern_findall, METH_VARARGS|METH_KEYWORDS, pattern_findall_doc}, {"finditer", (PyCFunction)pattern_finditer, METH_VARARGS|METH_KEYWORDS, pattern_finditer_doc}, {"scanner", (PyCFunction)pattern_scanner, METH_VARARGS|METH_KEYWORDS, pattern_scanner_doc}, {"__copy__", (PyCFunction)pattern_copy, METH_NOARGS}, {"__deepcopy__", (PyCFunction)pattern_deepcopy, METH_O}, {NULL, NULL} }; PyDoc_STRVAR(pattern_doc, "Compiled regex object"); /* Deallocates a PatternObject. */ static void pattern_dealloc(PyObject* self_) { PatternObject* self; size_t i; int partial_side; self = (PatternObject*)self_; /* Discard the nodes. */ for (i = 0; i < self->node_count; i++) { RE_Node* node; node = self->node_list[i]; re_dealloc(node->values); if (node->status & RE_STATUS_STRING) { re_dealloc(node->string.bad_character_offset); re_dealloc(node->string.good_suffix_offset); } re_dealloc(node); } re_dealloc(self->node_list); /* Discard the group info. */ re_dealloc(self->group_info); /* Discard the call_ref info. */ re_dealloc(self->call_ref_info); /* Discard the repeat info. */ re_dealloc(self->repeat_info); dealloc_groups(self->groups_storage, self->true_group_count); dealloc_repeats(self->repeats_storage, self->repeat_count); if (self->weakreflist) PyObject_ClearWeakRefs((PyObject*)self); Py_XDECREF(self->pattern); Py_XDECREF(self->groupindex); Py_XDECREF(self->indexgroup); for (partial_side = 0; partial_side < 2; partial_side++) { if (self->partial_named_lists[partial_side]) { for (i = 0; i < self->named_lists_count; i++) Py_XDECREF(self->partial_named_lists[partial_side][i]); re_dealloc(self->partial_named_lists[partial_side]); } } Py_DECREF(self->named_lists); Py_DECREF(self->named_list_indexes); re_dealloc(self->locale_info); PyObject_DEL(self); } /* Info about the various flags that can be passed in. */ typedef struct RE_FlagName { char* name; int value; } RE_FlagName; /* We won't bother about the U flag in Python 3. */ static RE_FlagName flag_names[] = { {"A", RE_FLAG_ASCII}, {"B", RE_FLAG_BESTMATCH}, {"D", RE_FLAG_DEBUG}, {"S", RE_FLAG_DOTALL}, {"F", RE_FLAG_FULLCASE}, {"I", RE_FLAG_IGNORECASE}, {"L", RE_FLAG_LOCALE}, {"M", RE_FLAG_MULTILINE}, {"P", RE_FLAG_POSIX}, {"R", RE_FLAG_REVERSE}, {"T", RE_FLAG_TEMPLATE}, {"X", RE_FLAG_VERBOSE}, {"V0", RE_FLAG_VERSION0}, {"V1", RE_FLAG_VERSION1}, {"W", RE_FLAG_WORD}, }; /* Appends a string to a list. */ Py_LOCAL_INLINE(BOOL) append_string(PyObject* list, char* string) { PyObject* item; int status; item = Py_BuildValue("U", string); if (!item) return FALSE; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) return FALSE; return TRUE; } /* Appends a (decimal) integer to a list. */ Py_LOCAL_INLINE(BOOL) append_integer(PyObject* list, Py_ssize_t value) { PyObject* int_obj; PyObject* repr_obj; int status; int_obj = Py_BuildValue("n", value); if (!int_obj) return FALSE; repr_obj = PyObject_Repr(int_obj); Py_DECREF(int_obj); if (!repr_obj) return FALSE; status = PyList_Append(list, repr_obj); Py_DECREF(repr_obj); if (status < 0) return FALSE; return TRUE; } /* MatchObject's '__repr__' method. */ static PyObject* match_repr(PyObject* self_) { MatchObject* self; PyObject* list; PyObject* matched_substring; PyObject* matched_repr; int status; PyObject* separator; PyObject* result; self = (MatchObject*)self_; list = PyList_New(0); if (!list) return NULL; if (!append_string(list, "match_start)) goto error; if (! append_string(list, ", ")) goto error; if (!append_integer(list, self->match_end)) goto error; if (!append_string(list, "), match=")) goto error; matched_substring = get_slice(self->substring, self->match_start - self->substring_offset, self->match_end - self->substring_offset); if (!matched_substring) goto error; matched_repr = PyObject_Repr(matched_substring); Py_DECREF(matched_substring); if (!matched_repr) goto error; status = PyList_Append(list, matched_repr); Py_DECREF(matched_repr); if (status < 0) goto error; if (self->fuzzy_counts[RE_FUZZY_SUB] != 0 || self->fuzzy_counts[RE_FUZZY_INS] != 0 || self->fuzzy_counts[RE_FUZZY_DEL] != 0) { if (! append_string(list, ", fuzzy_counts=(")) goto error; if (!append_integer(list, (Py_ssize_t)self->fuzzy_counts[RE_FUZZY_SUB])) goto error; if (! append_string(list, ", ")) goto error; if (!append_integer(list, (Py_ssize_t)self->fuzzy_counts[RE_FUZZY_INS])) goto error; if (! append_string(list, ", ")) goto error; if (!append_integer(list, (Py_ssize_t)self->fuzzy_counts[RE_FUZZY_DEL])) goto error; if (! append_string(list, ")")) goto error; } if (self->partial) { if (!append_string(list, ", partial=True")) goto error; } if (! append_string(list, ">")) goto error; separator = Py_BuildValue("U", ""); if (!separator) goto error; result = PyUnicode_Join(separator, list); Py_DECREF(separator); Py_DECREF(list); return result; error: Py_DECREF(list); return NULL; } /* PatternObject's '__repr__' method. */ static PyObject* pattern_repr(PyObject* self_) { PatternObject* self; PyObject* list; PyObject* item; int status; int flag_count; unsigned int i; Py_ssize_t pos; PyObject *key; PyObject *value; PyObject* separator; PyObject* result; self = (PatternObject*)self_; list = PyList_New(0); if (!list) return NULL; if (!append_string(list, "regex.Regex(")) goto error; item = PyObject_Repr(self->pattern); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; flag_count = 0; for (i = 0; i < sizeof(flag_names) / sizeof(flag_names[0]); i++) { if (self->flags & flag_names[i].value) { if (flag_count == 0) { if (!append_string(list, ", flags=")) goto error; } else { if (!append_string(list, " | ")) goto error; } if (!append_string(list, "regex.")) goto error; if (!append_string(list, flag_names[i].name)) goto error; ++flag_count; } } pos = 0; /* PyDict_Next borrows references. */ while (PyDict_Next(self->named_lists, &pos, &key, &value)) { if (!append_string(list, ", ")) goto error; status = PyList_Append(list, key); if (status < 0) goto error; if (!append_string(list, "=")) goto error; item = PyObject_Repr(value); if (!item) goto error; status = PyList_Append(list, item); Py_DECREF(item); if (status < 0) goto error; } if (!append_string(list, ")")) goto error; separator = Py_BuildValue("U", ""); if (!separator) goto error; result = PyUnicode_Join(separator, list); Py_DECREF(separator); Py_DECREF(list); return result; error: Py_DECREF(list); return NULL; } /* PatternObject's 'groupindex' method. */ static PyObject* pattern_groupindex(PyObject* self_) { PatternObject* self; self = (PatternObject*)self_; return PyDict_Copy(self->groupindex); } static PyGetSetDef pattern_getset[] = { {"groupindex", (getter)pattern_groupindex, (setter)NULL, "A dictionary mapping group names to group numbers."}, {NULL} /* Sentinel */ }; static PyMemberDef pattern_members[] = { {"pattern", T_OBJECT, offsetof(PatternObject, pattern), READONLY, "The pattern string from which the regex object was compiled."}, {"flags", T_PYSSIZET, offsetof(PatternObject, flags), READONLY, "The regex matching flags."}, {"groups", T_PYSSIZET, offsetof(PatternObject, public_group_count), READONLY, "The number of capturing groups in the pattern."}, {"named_lists", T_OBJECT, offsetof(PatternObject, named_lists), READONLY, "The named lists used by the regex."}, {NULL} /* Sentinel */ }; static PyTypeObject Pattern_Type = { PyVarObject_HEAD_INIT(NULL, 0) "_" RE_MODULE "." "Pattern", sizeof(PatternObject) }; /* Building the nodes is made simpler by allowing branches to have a single * exit. These need to be removed. */ Py_LOCAL_INLINE(void) skip_one_way_branches(PatternObject* pattern) { BOOL modified; /* If a node refers to a 1-way branch then make the former refer to the * latter's destination. Repeat until they're all done. */ do { size_t i; modified = FALSE; for (i = 0; i < pattern->node_count; i++) { RE_Node* node; RE_Node* next; node = pattern->node_list[i]; /* Check the first destination. */ next = node->next_1.node; if (next && next->op == RE_OP_BRANCH && !next->nonstring.next_2.node) { node->next_1.node = next->next_1.node; modified = TRUE; } /* Check the second destination. */ next = node->nonstring.next_2.node; if (next && next->op == RE_OP_BRANCH && !next->nonstring.next_2.node) { node->nonstring.next_2.node = next->next_1.node; modified = TRUE; } } } while (modified); /* The start node might be a 1-way branch. Skip over it because it'll be * removed. It might even be the first in a chain. */ while (pattern->start_node->op == RE_OP_BRANCH && !pattern->start_node->nonstring.next_2.node) pattern->start_node = pattern->start_node->next_1.node; } /* Adds guards to repeats which are followed by a reference to a group. * * Returns whether a guard was added for a node at or after the given node. */ Py_LOCAL_INLINE(RE_STATUS_T) add_repeat_guards(PatternObject* pattern, RE_Node* node) { RE_STATUS_T result; result = RE_STATUS_NEITHER; for (;;) { if (node->status & RE_STATUS_VISITED_AG) return node->status & (RE_STATUS_REPEAT | RE_STATUS_REF); switch (node->op) { case RE_OP_BRANCH: { RE_STATUS_T branch_1_result; RE_STATUS_T branch_2_result; RE_STATUS_T status; branch_1_result = add_repeat_guards(pattern, node->next_1.node); branch_2_result = add_repeat_guards(pattern, node->nonstring.next_2.node); status = max_status_3(result, branch_1_result, branch_2_result); node->status = RE_STATUS_VISITED_AG | status; return status; } case RE_OP_END_GREEDY_REPEAT: case RE_OP_END_LAZY_REPEAT: node->status |= RE_STATUS_VISITED_AG; return result; case RE_OP_GREEDY_REPEAT: case RE_OP_LAZY_REPEAT: { BOOL limited; RE_STATUS_T body_result; RE_STATUS_T tail_result; RE_RepeatInfo* repeat_info; RE_STATUS_T status; limited = ~node->values[2] != 0; if (limited) body_result = RE_STATUS_LIMITED; else body_result = add_repeat_guards(pattern, node->next_1.node); tail_result = add_repeat_guards(pattern, node->nonstring.next_2.node); repeat_info = &pattern->repeat_info[node->values[0]]; if (body_result != RE_STATUS_REF) repeat_info->status |= RE_STATUS_BODY; if (tail_result != RE_STATUS_REF) repeat_info->status |= RE_STATUS_TAIL; if (limited) result = max_status_2(result, RE_STATUS_LIMITED); else result = max_status_2(result, RE_STATUS_REPEAT); status = max_status_3(result, body_result, tail_result); node->status |= RE_STATUS_VISITED_AG | status; return status; } case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: { BOOL limited; RE_STATUS_T tail_result; RE_RepeatInfo* repeat_info; RE_STATUS_T status; limited = ~node->values[2] != 0; tail_result = add_repeat_guards(pattern, node->next_1.node); repeat_info = &pattern->repeat_info[node->values[0]]; repeat_info->status |= RE_STATUS_BODY; if (tail_result != RE_STATUS_REF) repeat_info->status |= RE_STATUS_TAIL; if (limited) result = max_status_2(result, RE_STATUS_LIMITED); else result = max_status_2(result, RE_STATUS_REPEAT); status = max_status_3(result, RE_STATUS_REPEAT, tail_result); node->status = RE_STATUS_VISITED_AG | status; return status; } case RE_OP_GROUP_CALL: case RE_OP_REF_GROUP: case RE_OP_REF_GROUP_FLD: case RE_OP_REF_GROUP_FLD_REV: case RE_OP_REF_GROUP_IGN: case RE_OP_REF_GROUP_IGN_REV: case RE_OP_REF_GROUP_REV: result = RE_STATUS_REF; node = node->next_1.node; break; case RE_OP_GROUP_EXISTS: { RE_STATUS_T branch_1_result; RE_STATUS_T branch_2_result; RE_STATUS_T status; branch_1_result = add_repeat_guards(pattern, node->next_1.node); branch_2_result = add_repeat_guards(pattern, node->nonstring.next_2.node); status = max_status_4(result, branch_1_result, branch_2_result, RE_STATUS_REF); node->status = RE_STATUS_VISITED_AG | status; return status; } case RE_OP_SUCCESS: node->status = RE_STATUS_VISITED_AG | result; return result; default: node = node->next_1.node; break; } } } /* Adds an index to a node's values unless it's already present. * * 'offset' is the offset of the index count within the values. */ Py_LOCAL_INLINE(BOOL) add_index(RE_Node* node, size_t offset, size_t index) { size_t index_count; size_t first_index; size_t i; RE_CODE* new_values; if (!node) return TRUE; index_count = node->values[offset]; first_index = offset + 1; /* Is the index already present? */ for (i = 0; i < index_count; i++) { if (node->values[first_index + i] == index) return TRUE; } /* Allocate more space for the new index. */ new_values = re_realloc(node->values, (node->value_count + 1) * sizeof(RE_CODE)); if (!new_values) return FALSE; ++node->value_count; node->values = new_values; node->values[first_index + node->values[offset]++] = (RE_CODE)index; return TRUE; } /* Records the index of every repeat and fuzzy section within atomic * subpatterns and lookarounds. */ Py_LOCAL_INLINE(BOOL) record_subpattern_repeats_and_fuzzy_sections(RE_Node* parent_node, size_t offset, size_t repeat_count, RE_Node* node) { while (node) { if (node->status & RE_STATUS_VISITED_REP) return TRUE; node->status |= RE_STATUS_VISITED_REP; switch (node->op) { case RE_OP_BRANCH: case RE_OP_GROUP_EXISTS: if (!record_subpattern_repeats_and_fuzzy_sections(parent_node, offset, repeat_count, node->next_1.node)) return FALSE; node = node->nonstring.next_2.node; break; case RE_OP_END_FUZZY: node = node->next_1.node; break; case RE_OP_END_GREEDY_REPEAT: case RE_OP_END_LAZY_REPEAT: return TRUE; case RE_OP_FUZZY: /* Record the fuzzy index. */ if (!add_index(parent_node, offset, repeat_count + node->values[0])) return FALSE; node = node->next_1.node; break; case RE_OP_GREEDY_REPEAT: case RE_OP_LAZY_REPEAT: /* Record the repeat index. */ if (!add_index(parent_node, offset, node->values[0])) return FALSE; if (!record_subpattern_repeats_and_fuzzy_sections(parent_node, offset, repeat_count, node->next_1.node)) return FALSE; node = node->nonstring.next_2.node; break; case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: /* Record the repeat index. */ if (!add_index(parent_node, offset, node->values[0])) return FALSE; node = node->next_1.node; break; default: node = node->next_1.node; break; } } return TRUE; } /* Marks nodes which are being used as used. */ Py_LOCAL_INLINE(void) use_nodes(RE_Node* node) { while (node && !(node->status & RE_STATUS_USED)) { node->status |= RE_STATUS_USED; if (!(node->status & RE_STATUS_STRING)) { if (node->nonstring.next_2.node) use_nodes(node->nonstring.next_2.node); } node = node->next_1.node; } } /* Discards any unused nodes. * * Optimising the nodes might result in some nodes no longer being used. */ Py_LOCAL_INLINE(void) discard_unused_nodes(PatternObject* pattern) { size_t i; size_t new_count; /* Mark the nodes which are being used. */ use_nodes(pattern->start_node); for (i = 0; i < pattern->call_ref_info_capacity; i++) use_nodes(pattern->call_ref_info[i].node); new_count = 0; for (i = 0; i < pattern->node_count; i++) { RE_Node* node; node = pattern->node_list[i]; if (node->status & RE_STATUS_USED) pattern->node_list[new_count++] = node; else { re_dealloc(node->values); if (node->status & RE_STATUS_STRING) { re_dealloc(node->string.bad_character_offset); re_dealloc(node->string.good_suffix_offset); } re_dealloc(node); } } pattern->node_count = new_count; } /* Marks all the group which are named. Returns FALSE if there's an error. */ Py_LOCAL_INLINE(BOOL) mark_named_groups(PatternObject* pattern) { size_t i; for (i = 0; i < pattern->public_group_count; i++) { RE_GroupInfo* group_info; PyObject* index; int status; group_info = &pattern->group_info[i]; index = Py_BuildValue("n", i + 1); if (!index) return FALSE; status = PyDict_Contains(pattern->indexgroup, index); Py_DECREF(index); if (status < 0) return FALSE; group_info->has_name = status == 1; } return TRUE; } /* Gets the test node. * * The test node lets the matcher look ahead in the pattern, allowing it to * avoid the cost of housekeeping, only to find that what follows doesn't match * anyway. */ Py_LOCAL_INLINE(void) set_test_node(RE_NextNode* next) { RE_Node* node = next->node; RE_Node* test; next->test = node; next->match_next = node; next->match_step = 0; if (!node) return; test = node; while (test->op == RE_OP_END_GROUP || test->op == RE_OP_START_GROUP) test = test->next_1.node; next->test = test; if (test != node) return; switch (test->op) { case RE_OP_ANY: case RE_OP_ANY_ALL: case RE_OP_ANY_ALL_REV: case RE_OP_ANY_REV: case RE_OP_ANY_U: case RE_OP_ANY_U_REV: case RE_OP_BOUNDARY: case RE_OP_CHARACTER: case RE_OP_CHARACTER_IGN: case RE_OP_CHARACTER_IGN_REV: case RE_OP_CHARACTER_REV: case RE_OP_DEFAULT_BOUNDARY: case RE_OP_DEFAULT_END_OF_WORD: case RE_OP_DEFAULT_START_OF_WORD: case RE_OP_END_OF_LINE: case RE_OP_END_OF_LINE_U: case RE_OP_END_OF_STRING: case RE_OP_END_OF_STRING_LINE: case RE_OP_END_OF_STRING_LINE_U: case RE_OP_END_OF_WORD: case RE_OP_GRAPHEME_BOUNDARY: case RE_OP_PROPERTY: case RE_OP_PROPERTY_IGN: case RE_OP_PROPERTY_IGN_REV: case RE_OP_PROPERTY_REV: case RE_OP_RANGE: case RE_OP_RANGE_IGN: case RE_OP_RANGE_IGN_REV: case RE_OP_RANGE_REV: case RE_OP_SEARCH_ANCHOR: case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_IGN: case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER: case RE_OP_SET_INTER_IGN: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION: case RE_OP_SET_UNION_IGN: case RE_OP_SET_UNION_IGN_REV: case RE_OP_SET_UNION_REV: case RE_OP_START_OF_LINE: case RE_OP_START_OF_LINE_U: case RE_OP_START_OF_STRING: case RE_OP_START_OF_WORD: case RE_OP_STRING: case RE_OP_STRING_FLD: case RE_OP_STRING_FLD_REV: case RE_OP_STRING_IGN: case RE_OP_STRING_IGN_REV: case RE_OP_STRING_REV: next->match_next = test->next_1.node; next->match_step = test->step; break; case RE_OP_GREEDY_REPEAT_ONE: case RE_OP_LAZY_REPEAT_ONE: if (test->values[1] > 0) next->test = test; break; } } /* Sets the test nodes. */ Py_LOCAL_INLINE(void) set_test_nodes(PatternObject* pattern) { RE_Node** node_list; size_t i; node_list = pattern->node_list; for (i = 0; i < pattern->node_count; i++) { RE_Node* node; node = node_list[i]; set_test_node(&node->next_1); if (!(node->status & RE_STATUS_STRING)) set_test_node(&node->nonstring.next_2); } } /* Optimises the pattern. */ Py_LOCAL_INLINE(BOOL) optimise_pattern(PatternObject* pattern) { size_t i; /* Building the nodes is made simpler by allowing branches to have a single * exit. These need to be removed. */ skip_one_way_branches(pattern); /* Add position guards for repeat bodies containing a reference to a group * or repeat tails followed at some point by a reference to a group. */ add_repeat_guards(pattern, pattern->start_node); /* Record the index of repeats and fuzzy sections within the body of atomic * and lookaround nodes. */ if (!record_subpattern_repeats_and_fuzzy_sections(NULL, 0, pattern->repeat_count, pattern->start_node)) return FALSE; for (i = 0; i < pattern->call_ref_info_count; i++) { RE_Node* node; node = pattern->call_ref_info[i].node; if (!record_subpattern_repeats_and_fuzzy_sections(NULL, 0, pattern->repeat_count, node)) return FALSE; } /* Discard any unused nodes. */ discard_unused_nodes(pattern); /* Set the test nodes. */ set_test_nodes(pattern); /* Mark all the group that are named. */ if (!mark_named_groups(pattern)) return FALSE; return TRUE; } /* Creates a new pattern node. */ Py_LOCAL_INLINE(RE_Node*) create_node(PatternObject* pattern, RE_UINT8 op, RE_CODE flags, Py_ssize_t step, size_t value_count) { RE_Node* node; node = (RE_Node*)re_alloc(sizeof(*node)); if (!node) return NULL; memset(node, 0, sizeof(RE_Node)); node->value_count = value_count; if (node->value_count > 0) { node->values = (RE_CODE*)re_alloc(node->value_count * sizeof(RE_CODE)); if (!node->values) goto error; } else node->values = NULL; node->op = op; node->match = (flags & RE_POSITIVE_OP) != 0; node->status = (RE_STATUS_T)(flags << RE_STATUS_SHIFT); node->step = step; /* Ensure that there's enough storage to record the new node. */ if (pattern->node_count >= pattern->node_capacity) { RE_Node** new_node_list; pattern->node_capacity *= 2; if (pattern->node_capacity == 0) pattern->node_capacity = RE_INIT_NODE_LIST_SIZE; new_node_list = (RE_Node**)re_realloc(pattern->node_list, pattern->node_capacity * sizeof(RE_Node*)); if (!new_node_list) goto error; pattern->node_list = new_node_list; } /* Record the new node. */ pattern->node_list[pattern->node_count++] = node; return node; error: re_dealloc(node->values); re_dealloc(node); return NULL; } /* Adds a node as a next node for another node. */ Py_LOCAL_INLINE(void) add_node(RE_Node* node_1, RE_Node* node_2) { if (!node_1->next_1.node) node_1->next_1.node = node_2; else node_1->nonstring.next_2.node = node_2; } /* Ensures that the entry for a group's details actually exists. */ Py_LOCAL_INLINE(BOOL) ensure_group(PatternObject* pattern, size_t group) { size_t old_capacity; size_t new_capacity; RE_GroupInfo* new_group_info; if (group <= pattern->true_group_count) /* We already have an entry for the group. */ return TRUE; /* Increase the storage capacity to include the new entry if it's * insufficient. */ old_capacity = pattern->group_info_capacity; new_capacity = pattern->group_info_capacity; while (group > new_capacity) new_capacity += RE_LIST_SIZE_INC; if (new_capacity > old_capacity) { new_group_info = (RE_GroupInfo*)re_realloc(pattern->group_info, new_capacity * sizeof(RE_GroupInfo)); if (!new_group_info) return FALSE; memset(new_group_info + old_capacity, 0, (new_capacity - old_capacity) * sizeof(RE_GroupInfo)); pattern->group_info = new_group_info; pattern->group_info_capacity = new_capacity; } pattern->true_group_count = group; return TRUE; } /* Records that there's a reference to a group. */ Py_LOCAL_INLINE(BOOL) record_ref_group(PatternObject* pattern, size_t group) { if (!ensure_group(pattern, group)) return FALSE; pattern->group_info[group - 1].referenced = TRUE; return TRUE; } /* Records that there's a new group. */ Py_LOCAL_INLINE(BOOL) record_group(PatternObject* pattern, size_t group, RE_Node* node) { if (!ensure_group(pattern, group)) return FALSE; if (group >= 1) { RE_GroupInfo* info; info = &pattern->group_info[group - 1]; info->end_index = (Py_ssize_t)pattern->true_group_count; info->node = node; } return TRUE; } /* Records that a group has closed. */ Py_LOCAL_INLINE(void) record_group_end(PatternObject* pattern, size_t group) { if (group >= 1) pattern->group_info[group - 1].end_index = ++pattern->group_end_index; } /* Ensures that the entry for a call_ref's details actually exists. */ Py_LOCAL_INLINE(BOOL) ensure_call_ref(PatternObject* pattern, size_t call_ref) { size_t old_capacity; size_t new_capacity; RE_CallRefInfo* new_call_ref_info; if (call_ref < pattern->call_ref_info_count) /* We already have an entry for the call_ref. */ return TRUE; /* Increase the storage capacity to include the new entry if it's * insufficient. */ old_capacity = pattern->call_ref_info_capacity; new_capacity = pattern->call_ref_info_capacity; while (call_ref >= new_capacity) new_capacity += RE_LIST_SIZE_INC; if (new_capacity > old_capacity) { new_call_ref_info = (RE_CallRefInfo*)re_realloc(pattern->call_ref_info, new_capacity * sizeof(RE_CallRefInfo)); if (!new_call_ref_info) return FALSE; memset(new_call_ref_info + old_capacity, 0, (new_capacity - old_capacity) * sizeof(RE_CallRefInfo)); pattern->call_ref_info = new_call_ref_info; pattern->call_ref_info_capacity = new_capacity; } pattern->call_ref_info_count = 1 + call_ref; return TRUE; } /* Records that a call_ref is defined. */ Py_LOCAL_INLINE(BOOL) record_call_ref_defined(PatternObject* pattern, size_t call_ref, RE_Node* node) { if (!ensure_call_ref(pattern, call_ref)) return FALSE; pattern->call_ref_info[call_ref].defined = TRUE; pattern->call_ref_info[call_ref].node = node; return TRUE; } /* Records that a call_ref is used. */ Py_LOCAL_INLINE(BOOL) record_call_ref_used(PatternObject* pattern, size_t call_ref) { if (!ensure_call_ref(pattern, call_ref)) return FALSE; pattern->call_ref_info[call_ref].used = TRUE; return TRUE; } /* Checks whether a node matches one and only one character. */ Py_LOCAL_INLINE(BOOL) sequence_matches_one(RE_Node* node) { while (node->op == RE_OP_BRANCH && !node->nonstring.next_2.node) node = node->next_1.node; if (node->next_1.node || (node->status & RE_STATUS_FUZZY)) return FALSE; return node_matches_one_character(node); } /* Records a repeat. */ Py_LOCAL_INLINE(BOOL) record_repeat(PatternObject* pattern, size_t index, size_t repeat_depth) { size_t old_capacity; size_t new_capacity; /* Increase the storage capacity to include the new entry if it's * insufficient. */ old_capacity = pattern->repeat_info_capacity; new_capacity = pattern->repeat_info_capacity; while (index >= new_capacity) new_capacity += RE_LIST_SIZE_INC; if (new_capacity > old_capacity) { RE_RepeatInfo* new_repeat_info; new_repeat_info = (RE_RepeatInfo*)re_realloc(pattern->repeat_info, new_capacity * sizeof(RE_RepeatInfo)); if (!new_repeat_info) return FALSE; memset(new_repeat_info + old_capacity, 0, (new_capacity - old_capacity) * sizeof(RE_RepeatInfo)); pattern->repeat_info = new_repeat_info; pattern->repeat_info_capacity = new_capacity; } if (index >= pattern->repeat_count) pattern->repeat_count = index + 1; if (repeat_depth > 0) pattern->repeat_info[index].status |= RE_STATUS_INNER; return TRUE; } Py_LOCAL_INLINE(Py_ssize_t) get_step(RE_CODE op) { switch (op) { case RE_OP_ANY: case RE_OP_ANY_ALL: case RE_OP_ANY_U: case RE_OP_CHARACTER: case RE_OP_CHARACTER_IGN: case RE_OP_PROPERTY: case RE_OP_PROPERTY_IGN: case RE_OP_RANGE: case RE_OP_RANGE_IGN: case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_IGN: case RE_OP_SET_INTER: case RE_OP_SET_INTER_IGN: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_UNION: case RE_OP_SET_UNION_IGN: case RE_OP_STRING: case RE_OP_STRING_FLD: case RE_OP_STRING_IGN: return 1; case RE_OP_ANY_ALL_REV: case RE_OP_ANY_REV: case RE_OP_ANY_U_REV: case RE_OP_CHARACTER_IGN_REV: case RE_OP_CHARACTER_REV: case RE_OP_PROPERTY_IGN_REV: case RE_OP_PROPERTY_REV: case RE_OP_RANGE_IGN_REV: case RE_OP_RANGE_REV: case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION_IGN_REV: case RE_OP_SET_UNION_REV: case RE_OP_STRING_FLD_REV: case RE_OP_STRING_IGN_REV: case RE_OP_STRING_REV: return -1; } return 0; } Py_LOCAL_INLINE(int) build_sequence(RE_CompileArgs* args); /* Builds an ANY node. */ Py_LOCAL_INLINE(int) build_ANY(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; Py_ssize_t step; RE_Node* node; /* codes: opcode, flags. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; flags = args->code[1]; step = get_step(op); /* Create the node. */ node = create_node(args->pattern, op, flags, step, 0); if (!node) return RE_ERROR_MEMORY; args->code += 2; /* Append the node. */ add_node(args->end, node); args->end = node; ++args->min_width; return RE_ERROR_SUCCESS; } /* Builds a FUZZY node. */ Py_LOCAL_INLINE(int) build_FUZZY(RE_CompileArgs* args) { RE_CODE flags; RE_Node* start_node; RE_Node* end_node; RE_CODE index; RE_CompileArgs subargs; int status; /* codes: opcode, flags, constraints, sequence, end. */ if (args->code + 13 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; /* Create nodes for the start and end of the fuzzy sequence. */ start_node = create_node(args->pattern, RE_OP_FUZZY, flags, 0, 9); end_node = create_node(args->pattern, RE_OP_END_FUZZY, flags, 0, 5); if (!start_node || !end_node) return RE_ERROR_MEMORY; index = (RE_CODE)args->pattern->fuzzy_count++; start_node->values[0] = index; end_node->values[0] = index; /* The constraints consist of 4 pairs of limits and the cost equation. */ end_node->values[RE_FUZZY_VAL_MIN_DEL] = args->code[2]; /* Deletion minimum. */ end_node->values[RE_FUZZY_VAL_MIN_INS] = args->code[4]; /* Insertion minimum. */ end_node->values[RE_FUZZY_VAL_MIN_SUB] = args->code[6]; /* Substitution minimum. */ end_node->values[RE_FUZZY_VAL_MIN_ERR] = args->code[8]; /* Error minimum. */ start_node->values[RE_FUZZY_VAL_MAX_DEL] = args->code[3]; /* Deletion maximum. */ start_node->values[RE_FUZZY_VAL_MAX_INS] = args->code[5]; /* Insertion maximum. */ start_node->values[RE_FUZZY_VAL_MAX_SUB] = args->code[7]; /* Substitution maximum. */ start_node->values[RE_FUZZY_VAL_MAX_ERR] = args->code[9]; /* Error maximum. */ start_node->values[RE_FUZZY_VAL_DEL_COST] = args->code[10]; /* Deletion cost. */ start_node->values[RE_FUZZY_VAL_INS_COST] = args->code[11]; /* Insertion cost. */ start_node->values[RE_FUZZY_VAL_SUB_COST] = args->code[12]; /* Substitution cost. */ start_node->values[RE_FUZZY_VAL_MAX_COST] = args->code[13]; /* Total cost. */ args->code += 14; subargs = *args; subargs.within_fuzzy = TRUE; /* Compile the sequence and check that we've reached the end of the * subpattern. */ status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy = TRUE; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; ++args->code; /* Append the fuzzy sequence. */ add_node(args->end, start_node); add_node(start_node, subargs.start); add_node(subargs.end, end_node); args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds an ATOMIC node. */ Py_LOCAL_INLINE(int) build_ATOMIC(RE_CompileArgs* args) { RE_Node* atomic_node; RE_CompileArgs subargs; int status; RE_Node* end_node; /* codes: opcode, sequence, end. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; atomic_node = create_node(args->pattern, RE_OP_ATOMIC, 0, 0, 0); if (!atomic_node) return RE_ERROR_MEMORY; ++args->code; /* Compile the sequence and check that we've reached the end of it. */ subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; ++args->code; /* Check the subpattern. */ args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; if (subargs.has_groups) atomic_node->status |= RE_STATUS_HAS_GROUPS; if (subargs.has_repeats) atomic_node->status |= RE_STATUS_HAS_REPEATS; /* Create the node to terminate the subpattern. */ end_node = create_node(subargs.pattern, RE_OP_END_ATOMIC, 0, 0, 0); if (!end_node) return RE_ERROR_MEMORY; /* Append the new sequence. */ add_node(args->end, atomic_node); add_node(atomic_node, subargs.start); add_node(subargs.end, end_node); args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a BOUNDARY node. */ Py_LOCAL_INLINE(int) build_BOUNDARY(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; RE_Node* node; /* codes: opcode, flags. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; flags = args->code[1]; args->code += 2; /* Create the node. */ node = create_node(args->pattern, op, flags, 0, 0); if (!node) return RE_ERROR_MEMORY; /* Append the node. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a BRANCH node. */ Py_LOCAL_INLINE(int) build_BRANCH(RE_CompileArgs* args) { RE_Node* branch_node; RE_Node* join_node; Py_ssize_t min_width; RE_CompileArgs subargs; int status; /* codes: opcode, branch, next, branch, end. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; /* Create nodes for the start and end of the branch sequence. */ branch_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); join_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!branch_node || !join_node) return RE_ERROR_MEMORY; /* Append the node. */ add_node(args->end, branch_node); args->end = join_node; min_width = PY_SSIZE_T_MAX; subargs = *args; /* A branch in the regular expression is compiled into a series of 2-way * branches. */ do { RE_Node* next_branch_node; /* Skip over the 'BRANCH' or 'NEXT' opcode. */ ++subargs.code; /* Compile the sequence until the next 'BRANCH' or 'NEXT' opcode. */ status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; min_width = min_ssize_t(min_width, subargs.min_width); args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; /* Append the sequence. */ add_node(branch_node, subargs.start); add_node(subargs.end, join_node); /* Create a start node for the next sequence and append it. */ next_branch_node = create_node(subargs.pattern, RE_OP_BRANCH, 0, 0, 0); if (!next_branch_node) return RE_ERROR_MEMORY; add_node(branch_node, next_branch_node); branch_node = next_branch_node; } while (subargs.code < subargs.end_code && subargs.code[0] == RE_OP_NEXT); /* We should have reached the end of the branch. */ if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; ++args->code; args->min_width += min_width; return RE_ERROR_SUCCESS; } /* Builds a CALL_REF node. */ Py_LOCAL_INLINE(int) build_CALL_REF(RE_CompileArgs* args) { RE_CODE call_ref; RE_Node* start_node; RE_Node* end_node; RE_CompileArgs subargs; int status; /* codes: opcode, call_ref. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; call_ref = args->code[1]; args->code += 2; /* Create nodes for the start and end of the subpattern. */ start_node = create_node(args->pattern, RE_OP_CALL_REF, 0, 0, 1); end_node = create_node(args->pattern, RE_OP_GROUP_RETURN, 0, 0, 0); if (!start_node || !end_node) return RE_ERROR_MEMORY; start_node->values[0] = call_ref; /* Compile the sequence and check that we've reached the end of the * subpattern. */ subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; ++args->code; /* Record that we defined a call_ref. */ if (!record_call_ref_defined(args->pattern, call_ref, start_node)) return RE_ERROR_MEMORY; /* Append the node. */ add_node(args->end, start_node); add_node(start_node, subargs.start); add_node(subargs.end, end_node); args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a CHARACTER or PROPERTY node. */ Py_LOCAL_INLINE(int) build_CHARACTER_or_PROPERTY(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; Py_ssize_t step; RE_Node* node; /* codes: opcode, flags, value. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; flags = args->code[1]; step = get_step(op); if (flags & RE_ZEROWIDTH_OP) step = 0; /* Create the node. */ node = create_node(args->pattern, op, flags, step, 1); if (!node) return RE_ERROR_MEMORY; node->values[0] = args->code[2]; args->code += 3; /* Append the node. */ add_node(args->end, node); args->end = node; if (step != 0) ++args->min_width; return RE_ERROR_SUCCESS; } /* Builds a CONDITIONAL node. */ Py_LOCAL_INLINE(int) build_CONDITIONAL(RE_CompileArgs* args) { RE_CODE flags; BOOL forward; RE_Node* test_node; RE_CompileArgs subargs; int status; RE_Node* end_test_node; RE_Node* end_node; Py_ssize_t min_width; /* codes: opcode, flags, forward, sequence, next, sequence, next, sequence, * end. */ if (args->code + 4 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; forward = (BOOL)args->code[2]; /* Create a node for the lookaround. */ test_node = create_node(args->pattern, RE_OP_CONDITIONAL, flags, 0, 0); if (!test_node) return RE_ERROR_MEMORY; args->code += 3; add_node(args->end, test_node); /* Compile the lookaround test and check that we've reached the end of the * subpattern. */ subargs = *args; subargs.forward = forward; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_NEXT) return RE_ERROR_ILLEGAL; args->code = subargs.code; ++args->code; /* Check the lookaround subpattern. */ args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; if (subargs.has_groups) test_node->status |= RE_STATUS_HAS_GROUPS; if (subargs.has_repeats) test_node->status |= RE_STATUS_HAS_REPEATS; /* Create the node to terminate the test. */ end_test_node = create_node(args->pattern, RE_OP_END_CONDITIONAL, 0, 0, 0); if (!end_test_node) return RE_ERROR_MEMORY; /* test node -> test -> end test node */ add_node(test_node, subargs.start); add_node(subargs.end, end_test_node); /* Compile the true branch. */ subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; /* Check the true branch. */ args->code = subargs.code; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; min_width = subargs.min_width; /* Create the terminating node. */ end_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!end_node) return RE_ERROR_MEMORY; /* end test node -> true branch -> end node */ add_node(end_test_node, subargs.start); add_node(subargs.end, end_node); if (args->code[0] == RE_OP_NEXT) { /* There's a false branch. */ ++args->code; /* Compile the false branch. */ subargs.code = args->code; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; /* Check the false branch. */ args->code = subargs.code; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; min_width = min_ssize_t(min_width, subargs.min_width); /* test node -> false branch -> end node */ add_node(test_node, subargs.start); add_node(subargs.end, end_node); } else /* end test node -> end node */ add_node(end_test_node, end_node); if (args->code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->min_width += min_width; ++args->code; args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a GROUP node. */ Py_LOCAL_INLINE(int) build_GROUP(RE_CompileArgs* args) { RE_CODE private_group; RE_CODE public_group; RE_Node* start_node; RE_Node* end_node; RE_CompileArgs subargs; int status; /* codes: opcode, private_group, public_group. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; private_group = args->code[1]; public_group = args->code[2]; args->code += 3; /* Create nodes for the start and end of the capture group. */ start_node = create_node(args->pattern, args->forward ? RE_OP_START_GROUP : RE_OP_END_GROUP, 0, 0, 3); end_node = create_node(args->pattern, args->forward ? RE_OP_END_GROUP : RE_OP_START_GROUP, 0, 0, 3); if (!start_node || !end_node) return RE_ERROR_MEMORY; start_node->values[0] = private_group; end_node->values[0] = private_group; start_node->values[1] = public_group; end_node->values[1] = public_group; /* Signal that the capture should be saved when it's complete. */ start_node->values[2] = 0; end_node->values[2] = 1; /* Record that we have a new capture group. */ if (!record_group(args->pattern, private_group, start_node)) return RE_ERROR_MEMORY; /* Compile the sequence and check that we've reached the end of the capture * group. */ subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures | subargs.visible_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= TRUE; args->has_repeats |= subargs.has_repeats; ++args->code; /* Record that the capture group has closed. */ record_group_end(args->pattern, private_group); /* Append the capture group. */ add_node(args->end, start_node); add_node(start_node, subargs.start); add_node(subargs.end, end_node); args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a GROUP_CALL node. */ Py_LOCAL_INLINE(int) build_GROUP_CALL(RE_CompileArgs* args) { RE_CODE call_ref; RE_Node* node; /* codes: opcode, call_ref. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; call_ref = args->code[1]; /* Create the node. */ node = create_node(args->pattern, RE_OP_GROUP_CALL, 0, 0, 1); if (!node) return RE_ERROR_MEMORY; node->values[0] = call_ref; node->status |= RE_STATUS_HAS_GROUPS; node->status |= RE_STATUS_HAS_REPEATS; args->code += 2; /* Record that we used a call_ref. */ if (!record_call_ref_used(args->pattern, call_ref)) return RE_ERROR_MEMORY; /* Append the node. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a GROUP_EXISTS node. */ Py_LOCAL_INLINE(int) build_GROUP_EXISTS(RE_CompileArgs* args) { RE_CODE group; RE_Node* start_node; RE_Node* end_node; RE_CompileArgs subargs; int status; Py_ssize_t min_width; /* codes: opcode, sequence, next, sequence, end. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; group = args->code[1]; args->code += 2; /* Record that we have a reference to a group. If group is 0, then we have * a DEFINE and not a true group. */ if (group > 0 && !record_ref_group(args->pattern, group)) return RE_ERROR_MEMORY; /* Create nodes for the start and end of the structure. */ start_node = create_node(args->pattern, RE_OP_GROUP_EXISTS, 0, 0, 1); end_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!start_node || !end_node) return RE_ERROR_MEMORY; start_node->values[0] = group; subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; args->code = subargs.code; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; min_width = subargs.min_width; /* Append the start node. */ add_node(args->end, start_node); add_node(start_node, subargs.start); if (args->code[0] == RE_OP_NEXT) { RE_Node* true_branch_end; ++args->code; true_branch_end = subargs.end; subargs.code = args->code; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; args->code = subargs.code; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; if (group == 0) { /* Join the 2 branches end-to-end and bypass it. The sequence * itself will never be matched as a whole, so it doesn't matter. */ min_width = 0; add_node(start_node, end_node); add_node(true_branch_end, subargs.start); } else { args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; min_width = min_ssize_t(min_width, subargs.min_width); add_node(start_node, subargs.start); add_node(true_branch_end, end_node); } add_node(subargs.end, end_node); } else { add_node(start_node, end_node); add_node(subargs.end, end_node); min_width = 0; } args->min_width += min_width; if (args->code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; ++args->code; args->end = end_node; return RE_ERROR_SUCCESS; } /* Builds a LOOKAROUND node. */ Py_LOCAL_INLINE(int) build_LOOKAROUND(RE_CompileArgs* args) { RE_CODE flags; BOOL forward; RE_Node* lookaround_node; RE_CompileArgs subargs; int status; RE_Node* end_node; RE_Node* next_node; /* codes: opcode, flags, forward, sequence, end. */ if (args->code + 3 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; forward = (BOOL)args->code[2]; /* Create a node for the lookaround. */ lookaround_node = create_node(args->pattern, RE_OP_LOOKAROUND, flags, 0, 0); if (!lookaround_node) return RE_ERROR_MEMORY; args->code += 3; /* Compile the sequence and check that we've reached the end of the * subpattern. */ subargs = *args; subargs.forward = forward; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; ++args->code; /* Check the subpattern. */ args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; if (subargs.has_groups) lookaround_node->status |= RE_STATUS_HAS_GROUPS; if (subargs.has_repeats) lookaround_node->status |= RE_STATUS_HAS_REPEATS; /* Create the node to terminate the subpattern. */ end_node = create_node(args->pattern, RE_OP_END_LOOKAROUND, 0, 0, 0); if (!end_node) return RE_ERROR_MEMORY; /* Make a continuation node. */ next_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!next_node) return RE_ERROR_MEMORY; /* Append the new sequence. */ add_node(args->end, lookaround_node); add_node(lookaround_node, subargs.start); add_node(lookaround_node, next_node); add_node(subargs.end, end_node); add_node(end_node, next_node); args->end = next_node; return RE_ERROR_SUCCESS; } /* Builds a RANGE node. */ Py_LOCAL_INLINE(int) build_RANGE(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; Py_ssize_t step; RE_Node* node; /* codes: opcode, flags, lower, upper. */ if (args->code + 3 > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; flags = args->code[1]; step = get_step(op); if (flags & RE_ZEROWIDTH_OP) step = 0; /* Create the node. */ node = create_node(args->pattern, op, flags, step, 2); if (!node) return RE_ERROR_MEMORY; node->values[0] = args->code[2]; node->values[1] = args->code[3]; args->code += 4; /* Append the node. */ add_node(args->end, node); args->end = node; if (step != 0) ++args->min_width; return RE_ERROR_SUCCESS; } /* Builds a REF_GROUP node. */ Py_LOCAL_INLINE(int) build_REF_GROUP(RE_CompileArgs* args) { RE_CODE flags; RE_CODE group; RE_Node* node; /* codes: opcode, flags, group. */ if (args->code + 2 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; group = args->code[2]; node = create_node(args->pattern, (RE_UINT8)args->code[0], flags, 0, 1); if (!node) return RE_ERROR_MEMORY; node->values[0] = group; args->code += 3; /* Record that we have a reference to a group. */ if (!record_ref_group(args->pattern, group)) return RE_ERROR_MEMORY; /* Append the reference. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a REPEAT node. */ Py_LOCAL_INLINE(int) build_REPEAT(RE_CompileArgs* args) { BOOL greedy; RE_CODE min_count; RE_CODE max_count; int status; /* codes: opcode, min_count, max_count, sequence, end. */ if (args->code + 3 > args->end_code) return RE_ERROR_ILLEGAL; greedy = args->code[0] == RE_OP_GREEDY_REPEAT; min_count = args->code[1]; max_count = args->code[2]; if (args->code[1] > args->code[2]) return RE_ERROR_ILLEGAL; args->code += 3; if (min_count == 1 && max_count == 1) { /* Singly-repeated sequence. */ RE_CompileArgs subargs; subargs = *args; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats |= subargs.has_repeats; ++args->code; /* Append the sequence. */ add_node(args->end, subargs.start); args->end = subargs.end; } else { size_t index; RE_Node* repeat_node; RE_CompileArgs subargs; index = args->pattern->repeat_count; /* Create the nodes for the repeat. */ repeat_node = create_node(args->pattern, greedy ? RE_OP_GREEDY_REPEAT : RE_OP_LAZY_REPEAT, 0, args->forward ? 1 : -1, 4); if (!repeat_node || !record_repeat(args->pattern, index, args->repeat_depth)) return RE_ERROR_MEMORY; repeat_node->values[0] = (RE_CODE)index; repeat_node->values[1] = min_count; repeat_node->values[2] = max_count; repeat_node->values[3] = args->forward; if (args->within_fuzzy) args->pattern->repeat_info[index].status |= RE_STATUS_BODY; /* Compile the 'body' and check that we've reached the end of it. */ subargs = *args; subargs.visible_captures = TRUE; ++subargs.repeat_depth; status = build_sequence(&subargs); if (status != RE_ERROR_SUCCESS) return status; if (subargs.code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; args->code = subargs.code; args->min_width += (Py_ssize_t)min_count * subargs.min_width; args->has_captures |= subargs.has_captures; args->is_fuzzy |= subargs.is_fuzzy; args->has_groups |= subargs.has_groups; args->has_repeats = TRUE; ++args->code; /* Is it a repeat of something which will match a single character? * * If it's in a fuzzy section then it won't be optimised as a * single-character repeat. */ if (sequence_matches_one(subargs.start)) { repeat_node->op = greedy ? RE_OP_GREEDY_REPEAT_ONE : RE_OP_LAZY_REPEAT_ONE; /* Append the new sequence. */ add_node(args->end, repeat_node); repeat_node->nonstring.next_2.node = subargs.start; args->end = repeat_node; } else { RE_Node* end_repeat_node; RE_Node* end_node; end_repeat_node = create_node(args->pattern, greedy ? RE_OP_END_GREEDY_REPEAT : RE_OP_END_LAZY_REPEAT, 0, args->forward ? 1 : -1, 4); if (!end_repeat_node) return RE_ERROR_MEMORY; end_repeat_node->values[0] = repeat_node->values[0]; end_repeat_node->values[1] = repeat_node->values[1]; end_repeat_node->values[2] = repeat_node->values[2]; end_repeat_node->values[3] = args->forward; end_node = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); if (!end_node) return RE_ERROR_MEMORY; /* Append the new sequence. */ add_node(args->end, repeat_node); add_node(repeat_node, subargs.start); add_node(repeat_node, end_node); add_node(subargs.end, end_repeat_node); add_node(end_repeat_node, subargs.start); add_node(end_repeat_node, end_node); args->end = end_node; } } return RE_ERROR_SUCCESS; } /* Builds a STRING node. */ Py_LOCAL_INLINE(int) build_STRING(RE_CompileArgs* args, BOOL is_charset) { RE_CODE flags; RE_CODE length; RE_UINT8 op; Py_ssize_t step; RE_Node* node; size_t i; /* codes: opcode, flags, length, characters. */ flags = args->code[1]; length = args->code[2]; if (args->code + 3 + length > args->end_code) return RE_ERROR_ILLEGAL; op = (RE_UINT8)args->code[0]; step = get_step(op); /* Create the node. */ node = create_node(args->pattern, op, flags, step * (Py_ssize_t)length, length); if (!node) return RE_ERROR_MEMORY; if (!is_charset) node->status |= RE_STATUS_STRING; for (i = 0; i < length; i++) node->values[i] = args->code[3 + i]; args->code += 3 + length; /* Append the node. */ add_node(args->end, node); args->end = node; /* Because of full case-folding, one character in the text could match * multiple characters in the pattern. */ if (op == RE_OP_STRING_FLD || op == RE_OP_STRING_FLD_REV) args->min_width += possible_unfolded_length((Py_ssize_t)length); else args->min_width += (Py_ssize_t)length; return RE_ERROR_SUCCESS; } /* Builds a SET node. */ Py_LOCAL_INLINE(int) build_SET(RE_CompileArgs* args) { RE_UINT8 op; RE_CODE flags; Py_ssize_t step; RE_Node* node; Py_ssize_t min_width; int status; /* codes: opcode, flags, members. */ op = (RE_UINT8)args->code[0]; flags = args->code[1]; step = get_step(op); if (flags & RE_ZEROWIDTH_OP) step = 0; node = create_node(args->pattern, op, flags, step, 0); if (!node) return RE_ERROR_MEMORY; args->code += 2; /* Append the node. */ add_node(args->end, node); args->end = node; min_width = args->min_width; /* Compile the character set. */ do { switch (args->code[0]) { case RE_OP_CHARACTER: case RE_OP_PROPERTY: status = build_CHARACTER_or_PROPERTY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_RANGE: status = build_RANGE(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_SET_DIFF: case RE_OP_SET_INTER: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_UNION: status = build_SET(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_STRING: /* A set of characters. */ if (!build_STRING(args, TRUE)) return FALSE; break; default: /* Illegal opcode for a character set. */ return RE_ERROR_ILLEGAL; } } while (args->code < args->end_code && args->code[0] != RE_OP_END); /* Check that we've reached the end correctly. (The last opcode should be * 'END'.) */ if (args->code >= args->end_code || args->code[0] != RE_OP_END) return RE_ERROR_ILLEGAL; ++args->code; /* At this point the set's members are in the main sequence. They need to * be moved out-of-line. */ node->nonstring.next_2.node = node->next_1.node; node->next_1.node = NULL; args->end = node; args->min_width = min_width; if (step != 0) ++args->min_width; return RE_ERROR_SUCCESS; } /* Builds a STRING_SET node. */ Py_LOCAL_INLINE(int) build_STRING_SET(RE_CompileArgs* args) { RE_CODE index; RE_CODE min_len; RE_CODE max_len; RE_Node* node; /* codes: opcode, index, min_len, max_len. */ if (args->code + 3 > args->end_code) return RE_ERROR_ILLEGAL; index = args->code[1]; min_len = args->code[2]; max_len = args->code[3]; node = create_node(args->pattern, (RE_UINT8)args->code[0], 0, 0, 3); if (!node) return RE_ERROR_MEMORY; node->values[0] = index; node->values[1] = min_len; node->values[2] = max_len; args->code += 4; /* Append the reference. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a SUCCESS node . */ Py_LOCAL_INLINE(int) build_SUCCESS(RE_CompileArgs* args) { RE_Node* node; /* code: opcode. */ /* Create the node. */ node = create_node(args->pattern, (RE_UINT8)args->code[0], 0, 0, 0); if (!node) return RE_ERROR_MEMORY; ++args->code; /* Append the node. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a zero-width node. */ Py_LOCAL_INLINE(int) build_zerowidth(RE_CompileArgs* args) { RE_CODE flags; RE_Node* node; /* codes: opcode, flags. */ if (args->code + 1 > args->end_code) return RE_ERROR_ILLEGAL; flags = args->code[1]; /* Create the node. */ node = create_node(args->pattern, (RE_UINT8)args->code[0], flags, 0, 0); if (!node) return RE_ERROR_MEMORY; args->code += 2; /* Append the node. */ add_node(args->end, node); args->end = node; return RE_ERROR_SUCCESS; } /* Builds a sequence of nodes from regular expression code. */ Py_LOCAL_INLINE(int) build_sequence(RE_CompileArgs* args) { int status; /* Guarantee that there's something to attach to. */ args->start = create_node(args->pattern, RE_OP_BRANCH, 0, 0, 0); args->end = args->start; args->min_width = 0; args->has_captures = FALSE; args->is_fuzzy = FALSE; args->has_groups = FALSE; args->has_repeats = FALSE; /* The sequence should end with an opcode we don't understand. If it * doesn't then the code is illegal. */ while (args->code < args->end_code) { /* The following code groups opcodes by format, not function. */ switch (args->code[0]) { case RE_OP_ANY: case RE_OP_ANY_ALL: case RE_OP_ANY_ALL_REV: case RE_OP_ANY_REV: case RE_OP_ANY_U: case RE_OP_ANY_U_REV: /* A simple opcode with no trailing codewords and width of 1. */ status = build_ANY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_ATOMIC: /* An atomic sequence. */ status = build_ATOMIC(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_BOUNDARY: case RE_OP_DEFAULT_BOUNDARY: case RE_OP_DEFAULT_END_OF_WORD: case RE_OP_DEFAULT_START_OF_WORD: case RE_OP_END_OF_WORD: case RE_OP_GRAPHEME_BOUNDARY: case RE_OP_KEEP: case RE_OP_SKIP: case RE_OP_START_OF_WORD: /* A word or grapheme boundary. */ status = build_BOUNDARY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_BRANCH: /* A 2-way branch. */ status = build_BRANCH(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_CALL_REF: /* A group call ref. */ status = build_CALL_REF(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_CHARACTER: case RE_OP_CHARACTER_IGN: case RE_OP_CHARACTER_IGN_REV: case RE_OP_CHARACTER_REV: case RE_OP_PROPERTY: case RE_OP_PROPERTY_IGN: case RE_OP_PROPERTY_IGN_REV: case RE_OP_PROPERTY_REV: /* A character literal or a property. */ status = build_CHARACTER_or_PROPERTY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_CONDITIONAL: /* A lookaround conditional. */ status = build_CONDITIONAL(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_END_OF_LINE: case RE_OP_END_OF_LINE_U: case RE_OP_END_OF_STRING: case RE_OP_END_OF_STRING_LINE: case RE_OP_END_OF_STRING_LINE_U: case RE_OP_SEARCH_ANCHOR: case RE_OP_START_OF_LINE: case RE_OP_START_OF_LINE_U: case RE_OP_START_OF_STRING: /* A simple opcode with no trailing codewords and width of 0. */ status = build_zerowidth(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_FAILURE: case RE_OP_PRUNE: case RE_OP_SUCCESS: status = build_SUCCESS(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_FUZZY: /* A fuzzy sequence. */ status = build_FUZZY(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_GREEDY_REPEAT: case RE_OP_LAZY_REPEAT: /* A repeated sequence. */ status = build_REPEAT(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_GROUP: /* A capture group. */ status = build_GROUP(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_GROUP_CALL: /* A group call. */ status = build_GROUP_CALL(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_GROUP_EXISTS: /* A conditional sequence. */ status = build_GROUP_EXISTS(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_LOOKAROUND: /* A lookaround. */ status = build_LOOKAROUND(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_RANGE: case RE_OP_RANGE_IGN: case RE_OP_RANGE_IGN_REV: case RE_OP_RANGE_REV: /* A range. */ status = build_RANGE(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_REF_GROUP: case RE_OP_REF_GROUP_FLD: case RE_OP_REF_GROUP_FLD_REV: case RE_OP_REF_GROUP_IGN: case RE_OP_REF_GROUP_IGN_REV: case RE_OP_REF_GROUP_REV: /* A reference to a group. */ status = build_REF_GROUP(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_SET_DIFF: case RE_OP_SET_DIFF_IGN: case RE_OP_SET_DIFF_IGN_REV: case RE_OP_SET_DIFF_REV: case RE_OP_SET_INTER: case RE_OP_SET_INTER_IGN: case RE_OP_SET_INTER_IGN_REV: case RE_OP_SET_INTER_REV: case RE_OP_SET_SYM_DIFF: case RE_OP_SET_SYM_DIFF_IGN: case RE_OP_SET_SYM_DIFF_IGN_REV: case RE_OP_SET_SYM_DIFF_REV: case RE_OP_SET_UNION: case RE_OP_SET_UNION_IGN: case RE_OP_SET_UNION_IGN_REV: case RE_OP_SET_UNION_REV: /* A set. */ status = build_SET(args); if (status != RE_ERROR_SUCCESS) return status; break; case RE_OP_STRING: case RE_OP_STRING_FLD: case RE_OP_STRING_FLD_REV: case RE_OP_STRING_IGN: case RE_OP_STRING_IGN_REV: case RE_OP_STRING_REV: /* A string literal. */ if (!build_STRING(args, FALSE)) return FALSE; break; case RE_OP_STRING_SET: case RE_OP_STRING_SET_FLD: case RE_OP_STRING_SET_FLD_REV: case RE_OP_STRING_SET_IGN: case RE_OP_STRING_SET_IGN_REV: case RE_OP_STRING_SET_REV: /* A reference to a list. */ status = build_STRING_SET(args); if (status != RE_ERROR_SUCCESS) return status; break; default: /* We've found an opcode which we don't recognise. We'll leave it * for the caller. */ return RE_ERROR_SUCCESS; } } /* If we're here then we should be at the end of the code, otherwise we * have an error. */ return args->code == args->end_code; } /* Compiles the regular expression code to 'nodes'. * * Various details about the regular expression are discovered during * compilation and stored in the PatternObject. */ Py_LOCAL_INLINE(BOOL) compile_to_nodes(RE_CODE* code, RE_CODE* end_code, PatternObject* pattern) { RE_CompileArgs args; int status; /* Compile a regex sequence and then check that we've reached the end * correctly. (The last opcode should be 'SUCCESS'.) * * If successful, 'start' and 'end' will point to the start and end nodes * of the compiled sequence. */ args.code = code; args.end_code = end_code; args.pattern = pattern; args.forward = (pattern->flags & RE_FLAG_REVERSE) == 0; args.visible_captures = FALSE; args.has_captures = FALSE; args.repeat_depth = 0; args.is_fuzzy = FALSE; args.within_fuzzy = FALSE; status = build_sequence(&args); if (status == RE_ERROR_ILLEGAL) set_error(RE_ERROR_ILLEGAL, NULL); if (status != RE_ERROR_SUCCESS) return FALSE; pattern->min_width = args.min_width; pattern->is_fuzzy = args.is_fuzzy; pattern->do_search_start = TRUE; pattern->start_node = args.start; /* Optimise the pattern. */ if (!optimise_pattern(pattern)) return FALSE; pattern->start_test = locate_test_start(pattern->start_node); /* Get the call_ref for the entire pattern, if any. */ if (pattern->start_node->op == RE_OP_CALL_REF) pattern->pattern_call_ref = (Py_ssize_t)pattern->start_node->values[0]; else pattern->pattern_call_ref = -1; return TRUE; } /* Gets the required characters for a regex. * * In the event of an error, it just pretends that there are no required * characters. */ Py_LOCAL_INLINE(void) get_required_chars(PyObject* required_chars, RE_CODE** req_chars, size_t* req_length) { Py_ssize_t len; RE_CODE* chars; Py_ssize_t i; *req_chars = NULL; *req_length = 0; len = PyTuple_GET_SIZE(required_chars); if (len < 1 || PyErr_Occurred()) { PyErr_Clear(); return; } chars = (RE_CODE*)re_alloc((size_t)len * sizeof(RE_CODE)); if (!chars) goto error; for (i = 0; i < len; i++) { PyObject* o; size_t value; /* PyTuple_SET_ITEM borrows the reference. */ o = PyTuple_GET_ITEM(required_chars, i); value = PyLong_AsUnsignedLong(o); if ((Py_ssize_t)value == -1 && PyErr_Occurred()) goto error; chars[i] = (RE_CODE)value; if (chars[i] != value) goto error; } *req_chars = chars; *req_length = (size_t)len; return; error: PyErr_Clear(); re_dealloc(chars); } /* Makes a STRING node. */ Py_LOCAL_INLINE(RE_Node*) make_STRING_node(PatternObject* pattern, RE_UINT8 op, size_t length, RE_CODE* chars) { Py_ssize_t step; RE_Node* node; size_t i; step = get_step(op); /* Create the node. */ node = create_node(pattern, op, 0, step * (Py_ssize_t)length, length); if (!node) return NULL; node->status |= RE_STATUS_STRING; for (i = 0; i < length; i++) node->values[i] = chars[i]; return node; } /* Scans all of the characters in the current locale for their properties. */ Py_LOCAL_INLINE(void) scan_locale_chars(RE_LocaleInfo* locale_info) { int c; for (c = 0; c < 0x100; c++) { unsigned short props = 0; if (isalnum(c)) props |= RE_LOCALE_ALNUM; if (isalpha(c)) props |= RE_LOCALE_ALPHA; if (iscntrl(c)) props |= RE_LOCALE_CNTRL; if (isdigit(c)) props |= RE_LOCALE_DIGIT; if (isgraph(c)) props |= RE_LOCALE_GRAPH; if (islower(c)) props |= RE_LOCALE_LOWER; if (isprint(c)) props |= RE_LOCALE_PRINT; if (ispunct(c)) props |= RE_LOCALE_PUNCT; if (isspace(c)) props |= RE_LOCALE_SPACE; if (isupper(c)) props |= RE_LOCALE_UPPER; locale_info->properties[c] = props; locale_info->uppercase[c] = (unsigned char)toupper(c); locale_info->lowercase[c] = (unsigned char)tolower(c); } } /* Compiles regular expression code to a PatternObject. * * The regular expression code is provided as a list and is then compiled to * 'nodes'. Various details about the regular expression are discovered during * compilation and stored in the PatternObject. */ static PyObject* re_compile(PyObject* self_, PyObject* args) { PyObject* pattern; Py_ssize_t flags = 0; PyObject* code_list; PyObject* groupindex; PyObject* indexgroup; PyObject* named_lists; PyObject* named_list_indexes; Py_ssize_t req_offset; PyObject* required_chars; Py_ssize_t req_flags; size_t public_group_count; Py_ssize_t code_len; RE_CODE* code; Py_ssize_t i; RE_CODE* req_chars; size_t req_length; PatternObject* self; BOOL unicode; BOOL locale; BOOL ascii; BOOL ok; if (!PyArg_ParseTuple(args, "OnOOOOOnOnn:re_compile", &pattern, &flags, &code_list, &groupindex, &indexgroup, &named_lists, &named_list_indexes, &req_offset, &required_chars, &req_flags, &public_group_count)) return NULL; /* Read the regex code. */ code_len = PyList_GET_SIZE(code_list); code = (RE_CODE*)re_alloc((size_t)code_len * sizeof(RE_CODE)); if (!code) return NULL; for (i = 0; i < code_len; i++) { PyObject* o; size_t value; /* PyList_GET_ITEM borrows a reference. */ o = PyList_GET_ITEM(code_list, i); value = PyLong_AsUnsignedLong(o); if ((Py_ssize_t)value == -1 && PyErr_Occurred()) goto error; code[i] = (RE_CODE)value; if (code[i] != value) goto error; } /* Get the required characters. */ get_required_chars(required_chars, &req_chars, &req_length); /* Create the PatternObject. */ self = PyObject_NEW(PatternObject, &Pattern_Type); if (!self) { set_error(RE_ERROR_MEMORY, NULL); re_dealloc(req_chars); re_dealloc(code); return NULL; } /* Initialise the PatternObject. */ self->pattern = pattern; self->flags = flags; self->weakreflist = NULL; self->start_node = NULL; self->repeat_count = 0; self->true_group_count = 0; self->public_group_count = public_group_count; self->group_end_index = 0; self->groupindex = groupindex; self->indexgroup = indexgroup; self->named_lists = named_lists; self->named_lists_count = (size_t)PyDict_Size(named_lists); self->partial_named_lists[0] = NULL; self->partial_named_lists[1] = NULL; self->named_list_indexes = named_list_indexes; self->node_capacity = 0; self->node_count = 0; self->node_list = NULL; self->group_info_capacity = 0; self->group_info = NULL; self->call_ref_info_capacity = 0; self->call_ref_info_count = 0; self->call_ref_info = NULL; self->repeat_info_capacity = 0; self->repeat_info = NULL; self->groups_storage = NULL; self->repeats_storage = NULL; self->fuzzy_count = 0; self->recursive = FALSE; self->req_offset = req_offset; self->req_string = NULL; self->locale_info = NULL; Py_INCREF(self->pattern); Py_INCREF(self->groupindex); Py_INCREF(self->indexgroup); Py_INCREF(self->named_lists); Py_INCREF(self->named_list_indexes); /* Initialise the character encoding. */ unicode = (flags & RE_FLAG_UNICODE) != 0; locale = (flags & RE_FLAG_LOCALE) != 0; ascii = (flags & RE_FLAG_ASCII) != 0; if (!unicode && !locale && !ascii) { if (PyBytes_Check(self->pattern)) ascii = RE_FLAG_ASCII; else unicode = RE_FLAG_UNICODE; } if (unicode) self->encoding = &unicode_encoding; else if (locale) self->encoding = &locale_encoding; else if (ascii) self->encoding = &ascii_encoding; /* Compile the regular expression code to nodes. */ ok = compile_to_nodes(code, code + code_len, self); /* We no longer need the regular expression code. */ re_dealloc(code); if (!ok) { Py_DECREF(self); re_dealloc(req_chars); return NULL; } /* Make a node for the required string, if there's one. */ if (req_chars) { /* Remove the FULLCASE flag if it's not a Unicode pattern or not * ignoring case. */ if (!(self->flags & RE_FLAG_UNICODE) || !(self->flags & RE_FLAG_IGNORECASE)) req_flags &= ~RE_FLAG_FULLCASE; if (self->flags & RE_FLAG_REVERSE) { switch (req_flags) { case 0: self->req_string = make_STRING_node(self, RE_OP_STRING_REV, req_length, req_chars); break; case RE_FLAG_IGNORECASE | RE_FLAG_FULLCASE: self->req_string = make_STRING_node(self, RE_OP_STRING_FLD_REV, req_length, req_chars); break; case RE_FLAG_IGNORECASE: self->req_string = make_STRING_node(self, RE_OP_STRING_IGN_REV, req_length, req_chars); break; } } else { switch (req_flags) { case 0: self->req_string = make_STRING_node(self, RE_OP_STRING, req_length, req_chars); break; case RE_FLAG_IGNORECASE | RE_FLAG_FULLCASE: self->req_string = make_STRING_node(self, RE_OP_STRING_FLD, req_length, req_chars); break; case RE_FLAG_IGNORECASE: self->req_string = make_STRING_node(self, RE_OP_STRING_IGN, req_length, req_chars); break; } } re_dealloc(req_chars); } if (locale) { /* Store info about the characters in the locale for locale-sensitive * matching. */ self->locale_info = re_alloc(sizeof(RE_LocaleInfo)); if (!self->locale_info) { Py_DECREF(self); return NULL; } scan_locale_chars(self->locale_info); } return (PyObject*)self; error: re_dealloc(code); set_error(RE_ERROR_ILLEGAL, NULL); return NULL; } /* Gets the size of the codewords. */ static PyObject* get_code_size(PyObject* self, PyObject* unused) { return Py_BuildValue("n", sizeof(RE_CODE)); } /* Gets the property dict. */ static PyObject* get_properties(PyObject* self_, PyObject* args) { Py_INCREF(property_dict); return property_dict; } /* Folds the case of a string. */ static PyObject* fold_case(PyObject* self_, PyObject* args) { RE_StringInfo str_info; Py_UCS4 (*char_at)(void* text, Py_ssize_t pos); RE_EncodingTable* encoding; RE_LocaleInfo locale_info; Py_ssize_t folded_charsize; void (*set_char_at)(void* text, Py_ssize_t pos, Py_UCS4 ch); Py_ssize_t buf_size; void* folded; Py_ssize_t folded_len; PyObject* result; Py_ssize_t flags; PyObject* string; if (!PyArg_ParseTuple(args, "nO:fold_case", &flags, &string)) return NULL; if (!(flags & RE_FLAG_IGNORECASE)) { Py_INCREF(string); return string; } /* Get the string. */ if (!get_string(string, &str_info)) return NULL; /* Get the function for reading from the original string. */ switch (str_info.charsize) { case 1: char_at = bytes1_char_at; break; case 2: char_at = bytes2_char_at; break; case 4: char_at = bytes4_char_at; break; default: release_buffer(&str_info); return NULL; } /* What's the encoding? */ if (flags & RE_FLAG_UNICODE) encoding = &unicode_encoding; else if (flags & RE_FLAG_LOCALE) { encoding = &locale_encoding; scan_locale_chars(&locale_info); } else if (flags & RE_FLAG_ASCII) encoding = &ascii_encoding; else encoding = &unicode_encoding; #if PY_VERSION_HEX >= 0x03030000 /* Initially assume that the folded string will have the same width as the * original string (usually true). */ folded_charsize = str_info.charsize; /* When folding a Unicode string, some codepoints in the range U+00..U+FF * are mapped to codepoints in the range U+0100..U+FFFF. */ if (encoding == &unicode_encoding && str_info.charsize == 1) folded_charsize = 2; #else /* The folded string will have the same width as the original string. */ folded_charsize = str_info.charsize; #endif /* Get the function for writing to the folded string. */ switch (folded_charsize) { case 1: set_char_at = bytes1_set_char_at; break; case 2: set_char_at = bytes2_set_char_at; break; case 4: set_char_at = bytes4_set_char_at; break; default: release_buffer(&str_info); return NULL; } /* Allocate a buffer for the folded string. */ if (flags & RE_FLAG_FULLCASE) /* When using full case-folding with Unicode, some single codepoints * are mapped to multiple codepoints. */ buf_size = str_info.length * RE_MAX_FOLDED; else buf_size = str_info.length; folded = re_alloc((size_t)(buf_size * folded_charsize)); if (!folded) { release_buffer(&str_info); return NULL; } /* Fold the case of the string. */ folded_len = 0; if (flags & RE_FLAG_FULLCASE) { /* Full case-folding. */ int (*full_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch, Py_UCS4* folded); Py_ssize_t i; Py_UCS4 codepoints[RE_MAX_FOLDED]; full_case_fold = encoding->full_case_fold; for (i = 0; i < str_info.length; i++) { int count; int j; count = full_case_fold(&locale_info, char_at(str_info.characters, i), codepoints); for (j = 0; j < count; j++) set_char_at(folded, folded_len + j, codepoints[j]); folded_len += count; } } else { /* Simple case-folding. */ Py_UCS4 (*simple_case_fold)(RE_LocaleInfo* locale_info, Py_UCS4 ch); Py_ssize_t i; simple_case_fold = encoding->simple_case_fold; for (i = 0; i < str_info.length; i++) { Py_UCS4 ch; ch = simple_case_fold(&locale_info, char_at(str_info.characters, i)); set_char_at(folded, i, ch); } folded_len = str_info.length; } /* Build the result string. */ if (str_info.is_unicode) result = build_unicode_value(folded, folded_len, folded_charsize); else result = build_bytes_value(folded, folded_len, folded_charsize); re_dealloc(folded); /* Release the original string's buffer. */ release_buffer(&str_info); return result; } /* Returns a tuple of the Unicode characters that expand on full case-folding. */ static PyObject* get_expand_on_folding(PyObject* self, PyObject* unused) { int count; PyObject* result; int i; /* How many characters are there? */ count = sizeof(re_expand_on_folding) / sizeof(re_expand_on_folding[0]); /* Put all the characters in a tuple. */ result = PyTuple_New(count); if (!result) return NULL; for (i = 0; i < count; i++) { #if PY_VERSION_HEX >= 0x03030000 Py_UCS4 codepoint; #else Py_UNICODE codepoint; #endif PyObject* item; codepoint = re_expand_on_folding[i]; item = build_unicode_value(&codepoint, 1, sizeof(codepoint)); if (!item) goto error; /* PyTuple_SetItem borrows the reference. */ PyTuple_SetItem(result, i, item); } return result; error: Py_DECREF(result); return NULL; } /* Returns whether a character has a given value for a Unicode property. */ static PyObject* has_property_value(PyObject* self_, PyObject* args) { BOOL v; Py_ssize_t property_value; Py_ssize_t character; if (!PyArg_ParseTuple(args, "nn:has_property_value", &property_value, &character)) return NULL; v = unicode_has_property((RE_CODE)property_value, (Py_UCS4)character) ? 1 : 0; return Py_BuildValue("n", v); } /* Returns a list of all the simple cases of a character. * * If full case-folding is turned on and the character also expands on full * case-folding, a None is appended to the list. */ static PyObject* get_all_cases(PyObject* self_, PyObject* args) { RE_EncodingTable* encoding; RE_LocaleInfo locale_info; int count; Py_UCS4 cases[RE_MAX_CASES]; PyObject* result; int i; Py_UCS4 folded[RE_MAX_FOLDED]; Py_ssize_t flags; Py_ssize_t character; if (!PyArg_ParseTuple(args, "nn:get_all_cases", &flags, &character)) return NULL; /* What's the encoding? */ if (flags & RE_FLAG_UNICODE) encoding = &unicode_encoding; else if (flags & RE_FLAG_LOCALE) { encoding = &locale_encoding; scan_locale_chars(&locale_info); } else if (flags & RE_FLAG_ASCII) encoding = &ascii_encoding; else encoding = &unicode_encoding; /* Get all the simple cases. */ count = encoding->all_cases(&locale_info, (Py_UCS4)character, cases); result = PyList_New(count); if (!result) return NULL; for (i = 0; i < count; i++) { PyObject* item; item = Py_BuildValue("n", cases[i]); if (!item) goto error; /* PyList_SetItem borrows the reference. */ PyList_SetItem(result, i, item); } /* If the character also expands on full case-folding, append a None. */ if ((flags & RE_FULL_CASE_FOLDING) == RE_FULL_CASE_FOLDING) { count = encoding->full_case_fold(&locale_info, (Py_UCS4)character, folded); if (count > 1) PyList_Append(result, Py_None); } return result; error: Py_DECREF(result); return NULL; } /* The table of the module's functions. */ static PyMethodDef _functions[] = { {"compile", (PyCFunction)re_compile, METH_VARARGS}, {"get_code_size", (PyCFunction)get_code_size, METH_NOARGS}, {"get_properties", (PyCFunction)get_properties, METH_VARARGS}, {"fold_case", (PyCFunction)fold_case, METH_VARARGS}, {"get_expand_on_folding", (PyCFunction)get_expand_on_folding, METH_NOARGS}, {"has_property_value", (PyCFunction)has_property_value, METH_VARARGS}, {"get_all_cases", (PyCFunction)get_all_cases, METH_VARARGS}, {NULL, NULL} }; /* Initialises the property dictionary. */ Py_LOCAL_INLINE(BOOL) init_property_dict(void) { size_t value_set_count; size_t i; PyObject** value_dicts; property_dict = NULL; /* How many value sets are there? */ value_set_count = 0; for (i = 0; i < sizeof(re_property_values) / sizeof(re_property_values[0]); i++) { RE_PropertyValue* value; value = &re_property_values[i]; if (value->value_set >= value_set_count) value_set_count = (size_t)value->value_set + 1; } /* Quick references for the value sets. */ value_dicts = (PyObject**)re_alloc(value_set_count * sizeof(value_dicts[0])); if (!value_dicts) return FALSE; memset(value_dicts, 0, value_set_count * sizeof(value_dicts[0])); /* Build the property values dictionaries. */ for (i = 0; i < sizeof(re_property_values) / sizeof(re_property_values[0]); i++) { RE_PropertyValue* value; PyObject* v; int status; value = &re_property_values[i]; if (!value_dicts[value->value_set]) { value_dicts[value->value_set] = PyDict_New(); if (!value_dicts[value->value_set]) goto error; } v = Py_BuildValue("i", value->id); if (!v) goto error; status = PyDict_SetItemString(value_dicts[value->value_set], re_strings[value->name], v); Py_DECREF(v); if (status < 0) goto error; } /* Build the property dictionary. */ property_dict = PyDict_New(); if (!property_dict) goto error; for (i = 0; i < sizeof(re_properties) / sizeof(re_properties[0]); i++) { RE_Property* property; PyObject* v; int status; property = &re_properties[i]; v = Py_BuildValue("iO", property->id, value_dicts[property->value_set]); if (!v) goto error; status = PyDict_SetItemString(property_dict, re_strings[property->name], v); Py_DECREF(v); if (status < 0) goto error; } /* DECREF the value sets. Any unused ones will be deallocated. */ for (i = 0; i < value_set_count; i++) Py_XDECREF(value_dicts[i]); re_dealloc(value_dicts); return TRUE; error: Py_XDECREF(property_dict); /* DECREF the value sets. */ for (i = 0; i < value_set_count; i++) Py_XDECREF(value_dicts[i]); re_dealloc(value_dicts); return FALSE; } /* The module definition. */ static struct PyModuleDef remodule = { PyModuleDef_HEAD_INIT, "_" RE_MODULE, NULL, -1, _functions, NULL, NULL, NULL, NULL }; /* Initialises the module. */ PyMODINIT_FUNC PyInit__regex(void) { PyObject* m; PyObject* d; PyObject* x; #if defined(VERBOSE) /* Unbuffered in case it crashes! */ setvbuf(stdout, NULL, _IONBF, 0); #endif /* Initialise Pattern_Type. */ Pattern_Type.tp_dealloc = pattern_dealloc; Pattern_Type.tp_repr = pattern_repr; Pattern_Type.tp_flags = Py_TPFLAGS_DEFAULT; Pattern_Type.tp_doc = pattern_doc; Pattern_Type.tp_weaklistoffset = offsetof(PatternObject, weakreflist); Pattern_Type.tp_methods = pattern_methods; Pattern_Type.tp_members = pattern_members; Pattern_Type.tp_getset = pattern_getset; /* Initialise Match_Type. */ Match_Type.tp_dealloc = match_dealloc; Match_Type.tp_repr = match_repr; Match_Type.tp_as_mapping = &match_as_mapping; Match_Type.tp_flags = Py_TPFLAGS_DEFAULT; Match_Type.tp_doc = match_doc; Match_Type.tp_methods = match_methods; Match_Type.tp_members = match_members; Match_Type.tp_getset = match_getset; /* Initialise Scanner_Type. */ Scanner_Type.tp_dealloc = scanner_dealloc; Scanner_Type.tp_flags = Py_TPFLAGS_DEFAULT; Scanner_Type.tp_doc = scanner_doc; Scanner_Type.tp_iter = scanner_iter; Scanner_Type.tp_iternext = scanner_iternext; Scanner_Type.tp_methods = scanner_methods; Scanner_Type.tp_members = scanner_members; /* Initialise Splitter_Type. */ Splitter_Type.tp_dealloc = splitter_dealloc; Splitter_Type.tp_flags = Py_TPFLAGS_DEFAULT; Splitter_Type.tp_doc = splitter_doc; Splitter_Type.tp_iter = splitter_iter; Splitter_Type.tp_iternext = splitter_iternext; Splitter_Type.tp_methods = splitter_methods; Splitter_Type.tp_members = splitter_members; /* Initialise Capture_Type. */ Capture_Type.tp_dealloc = capture_dealloc; Capture_Type.tp_str = capture_str; Capture_Type.tp_as_mapping = &capture_as_mapping; Capture_Type.tp_flags = Py_TPFLAGS_DEFAULT; Capture_Type.tp_methods = capture_methods; /* Initialize object types */ if (PyType_Ready(&Pattern_Type) < 0) return NULL; if (PyType_Ready(&Match_Type) < 0) return NULL; if (PyType_Ready(&Scanner_Type) < 0) return NULL; if (PyType_Ready(&Splitter_Type) < 0) return NULL; if (PyType_Ready(&Capture_Type) < 0) return NULL; error_exception = NULL; m = PyModule_Create(&remodule); if (!m) return NULL; d = PyModule_GetDict(m); x = PyLong_FromLong(RE_MAGIC); if (x) { PyDict_SetItemString(d, "MAGIC", x); Py_DECREF(x); } x = PyLong_FromLong(sizeof(RE_CODE)); if (x) { PyDict_SetItemString(d, "CODE_SIZE", x); Py_DECREF(x); } x = PyUnicode_FromString(copyright); if (x) { PyDict_SetItemString(d, "copyright", x); Py_DECREF(x); } /* Initialise the property dictionary. */ if (!init_property_dict()) return NULL; return m; } /* vim:ts=4:sw=4:et */ regex-2016.01.10/Python3/_regex.h0000666000000000000000000001461312644551563014354 0ustar 00000000000000/* * Secret Labs' Regular Expression Engine * * regular expression matching engine * * Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved. * * NOTE: This file is generated by regex.py. If you need * to change anything in here, edit regex.py and run it. * * 2010-01-16 mrab Re-written */ /* Supports Unicode version 8.0.0. */ #define RE_MAGIC 20100116 #include "_regex_unicode.h" /* Operators. */ #define RE_OP_FAILURE 0 #define RE_OP_SUCCESS 1 #define RE_OP_ANY 2 #define RE_OP_ANY_ALL 3 #define RE_OP_ANY_ALL_REV 4 #define RE_OP_ANY_REV 5 #define RE_OP_ANY_U 6 #define RE_OP_ANY_U_REV 7 #define RE_OP_ATOMIC 8 #define RE_OP_BOUNDARY 9 #define RE_OP_BRANCH 10 #define RE_OP_CALL_REF 11 #define RE_OP_CHARACTER 12 #define RE_OP_CHARACTER_IGN 13 #define RE_OP_CHARACTER_IGN_REV 14 #define RE_OP_CHARACTER_REV 15 #define RE_OP_CONDITIONAL 16 #define RE_OP_DEFAULT_BOUNDARY 17 #define RE_OP_DEFAULT_END_OF_WORD 18 #define RE_OP_DEFAULT_START_OF_WORD 19 #define RE_OP_END 20 #define RE_OP_END_OF_LINE 21 #define RE_OP_END_OF_LINE_U 22 #define RE_OP_END_OF_STRING 23 #define RE_OP_END_OF_STRING_LINE 24 #define RE_OP_END_OF_STRING_LINE_U 25 #define RE_OP_END_OF_WORD 26 #define RE_OP_FUZZY 27 #define RE_OP_GRAPHEME_BOUNDARY 28 #define RE_OP_GREEDY_REPEAT 29 #define RE_OP_GROUP 30 #define RE_OP_GROUP_CALL 31 #define RE_OP_GROUP_EXISTS 32 #define RE_OP_KEEP 33 #define RE_OP_LAZY_REPEAT 34 #define RE_OP_LOOKAROUND 35 #define RE_OP_NEXT 36 #define RE_OP_PROPERTY 37 #define RE_OP_PROPERTY_IGN 38 #define RE_OP_PROPERTY_IGN_REV 39 #define RE_OP_PROPERTY_REV 40 #define RE_OP_PRUNE 41 #define RE_OP_RANGE 42 #define RE_OP_RANGE_IGN 43 #define RE_OP_RANGE_IGN_REV 44 #define RE_OP_RANGE_REV 45 #define RE_OP_REF_GROUP 46 #define RE_OP_REF_GROUP_FLD 47 #define RE_OP_REF_GROUP_FLD_REV 48 #define RE_OP_REF_GROUP_IGN 49 #define RE_OP_REF_GROUP_IGN_REV 50 #define RE_OP_REF_GROUP_REV 51 #define RE_OP_SEARCH_ANCHOR 52 #define RE_OP_SET_DIFF 53 #define RE_OP_SET_DIFF_IGN 54 #define RE_OP_SET_DIFF_IGN_REV 55 #define RE_OP_SET_DIFF_REV 56 #define RE_OP_SET_INTER 57 #define RE_OP_SET_INTER_IGN 58 #define RE_OP_SET_INTER_IGN_REV 59 #define RE_OP_SET_INTER_REV 60 #define RE_OP_SET_SYM_DIFF 61 #define RE_OP_SET_SYM_DIFF_IGN 62 #define RE_OP_SET_SYM_DIFF_IGN_REV 63 #define RE_OP_SET_SYM_DIFF_REV 64 #define RE_OP_SET_UNION 65 #define RE_OP_SET_UNION_IGN 66 #define RE_OP_SET_UNION_IGN_REV 67 #define RE_OP_SET_UNION_REV 68 #define RE_OP_SKIP 69 #define RE_OP_START_OF_LINE 70 #define RE_OP_START_OF_LINE_U 71 #define RE_OP_START_OF_STRING 72 #define RE_OP_START_OF_WORD 73 #define RE_OP_STRING 74 #define RE_OP_STRING_FLD 75 #define RE_OP_STRING_FLD_REV 76 #define RE_OP_STRING_IGN 77 #define RE_OP_STRING_IGN_REV 78 #define RE_OP_STRING_REV 79 #define RE_OP_STRING_SET 80 #define RE_OP_STRING_SET_FLD 81 #define RE_OP_STRING_SET_FLD_REV 82 #define RE_OP_STRING_SET_IGN 83 #define RE_OP_STRING_SET_IGN_REV 84 #define RE_OP_STRING_SET_REV 85 #define RE_OP_BODY_END 86 #define RE_OP_BODY_START 87 #define RE_OP_END_ATOMIC 88 #define RE_OP_END_CONDITIONAL 89 #define RE_OP_END_FUZZY 90 #define RE_OP_END_GREEDY_REPEAT 91 #define RE_OP_END_GROUP 92 #define RE_OP_END_LAZY_REPEAT 93 #define RE_OP_END_LOOKAROUND 94 #define RE_OP_GREEDY_REPEAT_ONE 95 #define RE_OP_GROUP_RETURN 96 #define RE_OP_LAZY_REPEAT_ONE 97 #define RE_OP_MATCH_BODY 98 #define RE_OP_MATCH_TAIL 99 #define RE_OP_START_GROUP 100 char* re_op_text[] = { "RE_OP_FAILURE", "RE_OP_SUCCESS", "RE_OP_ANY", "RE_OP_ANY_ALL", "RE_OP_ANY_ALL_REV", "RE_OP_ANY_REV", "RE_OP_ANY_U", "RE_OP_ANY_U_REV", "RE_OP_ATOMIC", "RE_OP_BOUNDARY", "RE_OP_BRANCH", "RE_OP_CALL_REF", "RE_OP_CHARACTER", "RE_OP_CHARACTER_IGN", "RE_OP_CHARACTER_IGN_REV", "RE_OP_CHARACTER_REV", "RE_OP_CONDITIONAL", "RE_OP_DEFAULT_BOUNDARY", "RE_OP_DEFAULT_END_OF_WORD", "RE_OP_DEFAULT_START_OF_WORD", "RE_OP_END", "RE_OP_END_OF_LINE", "RE_OP_END_OF_LINE_U", "RE_OP_END_OF_STRING", "RE_OP_END_OF_STRING_LINE", "RE_OP_END_OF_STRING_LINE_U", "RE_OP_END_OF_WORD", "RE_OP_FUZZY", "RE_OP_GRAPHEME_BOUNDARY", "RE_OP_GREEDY_REPEAT", "RE_OP_GROUP", "RE_OP_GROUP_CALL", "RE_OP_GROUP_EXISTS", "RE_OP_KEEP", "RE_OP_LAZY_REPEAT", "RE_OP_LOOKAROUND", "RE_OP_NEXT", "RE_OP_PROPERTY", "RE_OP_PROPERTY_IGN", "RE_OP_PROPERTY_IGN_REV", "RE_OP_PROPERTY_REV", "RE_OP_PRUNE", "RE_OP_RANGE", "RE_OP_RANGE_IGN", "RE_OP_RANGE_IGN_REV", "RE_OP_RANGE_REV", "RE_OP_REF_GROUP", "RE_OP_REF_GROUP_FLD", "RE_OP_REF_GROUP_FLD_REV", "RE_OP_REF_GROUP_IGN", "RE_OP_REF_GROUP_IGN_REV", "RE_OP_REF_GROUP_REV", "RE_OP_SEARCH_ANCHOR", "RE_OP_SET_DIFF", "RE_OP_SET_DIFF_IGN", "RE_OP_SET_DIFF_IGN_REV", "RE_OP_SET_DIFF_REV", "RE_OP_SET_INTER", "RE_OP_SET_INTER_IGN", "RE_OP_SET_INTER_IGN_REV", "RE_OP_SET_INTER_REV", "RE_OP_SET_SYM_DIFF", "RE_OP_SET_SYM_DIFF_IGN", "RE_OP_SET_SYM_DIFF_IGN_REV", "RE_OP_SET_SYM_DIFF_REV", "RE_OP_SET_UNION", "RE_OP_SET_UNION_IGN", "RE_OP_SET_UNION_IGN_REV", "RE_OP_SET_UNION_REV", "RE_OP_SKIP", "RE_OP_START_OF_LINE", "RE_OP_START_OF_LINE_U", "RE_OP_START_OF_STRING", "RE_OP_START_OF_WORD", "RE_OP_STRING", "RE_OP_STRING_FLD", "RE_OP_STRING_FLD_REV", "RE_OP_STRING_IGN", "RE_OP_STRING_IGN_REV", "RE_OP_STRING_REV", "RE_OP_STRING_SET", "RE_OP_STRING_SET_FLD", "RE_OP_STRING_SET_FLD_REV", "RE_OP_STRING_SET_IGN", "RE_OP_STRING_SET_IGN_REV", "RE_OP_STRING_SET_REV", "RE_OP_BODY_END", "RE_OP_BODY_START", "RE_OP_END_ATOMIC", "RE_OP_END_CONDITIONAL", "RE_OP_END_FUZZY", "RE_OP_END_GREEDY_REPEAT", "RE_OP_END_GROUP", "RE_OP_END_LAZY_REPEAT", "RE_OP_END_LOOKAROUND", "RE_OP_GREEDY_REPEAT_ONE", "RE_OP_GROUP_RETURN", "RE_OP_LAZY_REPEAT_ONE", "RE_OP_MATCH_BODY", "RE_OP_MATCH_TAIL", "RE_OP_START_GROUP", }; #define RE_FLAG_ASCII 0x80 #define RE_FLAG_BESTMATCH 0x1000 #define RE_FLAG_DEBUG 0x200 #define RE_FLAG_DOTALL 0x10 #define RE_FLAG_ENHANCEMATCH 0x8000 #define RE_FLAG_FULLCASE 0x4000 #define RE_FLAG_IGNORECASE 0x2 #define RE_FLAG_LOCALE 0x4 #define RE_FLAG_MULTILINE 0x8 #define RE_FLAG_POSIX 0x10000 #define RE_FLAG_REVERSE 0x400 #define RE_FLAG_TEMPLATE 0x1 #define RE_FLAG_UNICODE 0x20 #define RE_FLAG_VERBOSE 0x40 #define RE_FLAG_VERSION0 0x2000 #define RE_FLAG_VERSION1 0x100 #define RE_FLAG_WORD 0x800 regex-2016.01.10/Python3/_regex_core.py0000666000000000000000000041221412644551563015564 0ustar 00000000000000# # Secret Labs' Regular Expression Engine core module # # Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved. # # This version of the SRE library can be redistributed under CNRI's # Python 1.6 license. For any other use, please contact Secret Labs # AB (info@pythonware.com). # # Portions of this engine have been developed in cooperation with # CNRI. Hewlett-Packard provided funding for 1.6 integration and # other compatibility work. # # 2010-01-16 mrab Python front-end re-written and extended import string import sys import unicodedata from collections import defaultdict import _regex __all__ = ["A", "ASCII", "B", "BESTMATCH", "D", "DEBUG", "E", "ENHANCEMATCH", "F", "FULLCASE", "I", "IGNORECASE", "L", "LOCALE", "M", "MULTILINE", "P", "POSIX", "R", "REVERSE", "S", "DOTALL", "T", "TEMPLATE", "U", "UNICODE", "V0", "VERSION0", "V1", "VERSION1", "W", "WORD", "X", "VERBOSE", "error", "Scanner"] # The regex exception. class error(Exception): def __init__(self, message, pattern=None, pos=None): newline = '\n' if isinstance(pattern, str) else b'\n' self.msg = message self.pattern = pattern self.pos = pos if pattern is not None and pos is not None: self.lineno = pattern.count(newline, 0, pos) + 1 self.colno = pos - pattern.rfind(newline, 0, pos) message = "{} at position {}".format(message, pos) if newline in pattern: message += " (line {}, column {})".format(self.lineno, self.colno) Exception.__init__(self, message) # The exception for when a positional flag has been turned on in the old # behaviour. class _UnscopedFlagSet(Exception): pass # The exception for when parsing fails and we want to try something else. class ParseError(Exception): pass # The exception for when there isn't a valid first set. class _FirstSetError(Exception): pass # Flags. A = ASCII = 0x80 # Assume ASCII locale. B = BESTMATCH = 0x1000 # Best fuzzy match. D = DEBUG = 0x200 # Print parsed pattern. E = ENHANCEMATCH = 0x8000 # Attempt to improve the fit after finding the first # fuzzy match. F = FULLCASE = 0x4000 # Unicode full case-folding. I = IGNORECASE = 0x2 # Ignore case. L = LOCALE = 0x4 # Assume current 8-bit locale. M = MULTILINE = 0x8 # Make anchors look for newline. P = POSIX = 0x10000 # POSIX-style matching (leftmost longest). R = REVERSE = 0x400 # Search backwards. S = DOTALL = 0x10 # Make dot match newline. U = UNICODE = 0x20 # Assume Unicode locale. V0 = VERSION0 = 0x2000 # Old legacy behaviour. V1 = VERSION1 = 0x100 # New enhanced behaviour. W = WORD = 0x800 # Default Unicode word breaks. X = VERBOSE = 0x40 # Ignore whitespace and comments. T = TEMPLATE = 0x1 # Template (present because re module has it). DEFAULT_VERSION = VERSION1 _ALL_VERSIONS = VERSION0 | VERSION1 _ALL_ENCODINGS = ASCII | LOCALE | UNICODE # The default flags for the various versions. DEFAULT_FLAGS = {VERSION0: 0, VERSION1: FULLCASE} # The mask for the flags. GLOBAL_FLAGS = (_ALL_ENCODINGS | _ALL_VERSIONS | BESTMATCH | DEBUG | ENHANCEMATCH | POSIX | REVERSE) SCOPED_FLAGS = FULLCASE | IGNORECASE | MULTILINE | DOTALL | WORD | VERBOSE ALPHA = frozenset(string.ascii_letters) DIGITS = frozenset(string.digits) ALNUM = ALPHA | DIGITS OCT_DIGITS = frozenset(string.octdigits) HEX_DIGITS = frozenset(string.hexdigits) SPECIAL_CHARS = frozenset("()|?*+{^$.[\\#") | frozenset([""]) NAMED_CHAR_PART = ALNUM | frozenset(" -") PROPERTY_NAME_PART = ALNUM | frozenset(" &_-.") SET_OPS = ("||", "~~", "&&", "--") # The width of the code words inside the regex engine. BYTES_PER_CODE = _regex.get_code_size() BITS_PER_CODE = BYTES_PER_CODE * 8 # The repeat count which represents infinity. UNLIMITED = (1 << BITS_PER_CODE) - 1 # The regular expression flags. REGEX_FLAGS = {"a": ASCII, "b": BESTMATCH, "e": ENHANCEMATCH, "f": FULLCASE, "i": IGNORECASE, "L": LOCALE, "m": MULTILINE, "p": POSIX, "r": REVERSE, "s": DOTALL, "u": UNICODE, "V0": VERSION0, "V1": VERSION1, "w": WORD, "x": VERBOSE} # The case flags. CASE_FLAGS = FULLCASE | IGNORECASE NOCASE = 0 FULLIGNORECASE = FULLCASE | IGNORECASE FULL_CASE_FOLDING = UNICODE | FULLIGNORECASE CASE_FLAGS_COMBINATIONS = {0: 0, FULLCASE: 0, IGNORECASE: IGNORECASE, FULLIGNORECASE: FULLIGNORECASE} # The number of digits in hexadecimal escapes. HEX_ESCAPES = {"x": 2, "u": 4, "U": 8} # A singleton which indicates a comment within a pattern. COMMENT = object() FLAGS = object() # The names of the opcodes. OPCODES = """ FAILURE SUCCESS ANY ANY_ALL ANY_ALL_REV ANY_REV ANY_U ANY_U_REV ATOMIC BOUNDARY BRANCH CALL_REF CHARACTER CHARACTER_IGN CHARACTER_IGN_REV CHARACTER_REV CONDITIONAL DEFAULT_BOUNDARY DEFAULT_END_OF_WORD DEFAULT_START_OF_WORD END END_OF_LINE END_OF_LINE_U END_OF_STRING END_OF_STRING_LINE END_OF_STRING_LINE_U END_OF_WORD FUZZY GRAPHEME_BOUNDARY GREEDY_REPEAT GROUP GROUP_CALL GROUP_EXISTS KEEP LAZY_REPEAT LOOKAROUND NEXT PROPERTY PROPERTY_IGN PROPERTY_IGN_REV PROPERTY_REV PRUNE RANGE RANGE_IGN RANGE_IGN_REV RANGE_REV REF_GROUP REF_GROUP_FLD REF_GROUP_FLD_REV REF_GROUP_IGN REF_GROUP_IGN_REV REF_GROUP_REV SEARCH_ANCHOR SET_DIFF SET_DIFF_IGN SET_DIFF_IGN_REV SET_DIFF_REV SET_INTER SET_INTER_IGN SET_INTER_IGN_REV SET_INTER_REV SET_SYM_DIFF SET_SYM_DIFF_IGN SET_SYM_DIFF_IGN_REV SET_SYM_DIFF_REV SET_UNION SET_UNION_IGN SET_UNION_IGN_REV SET_UNION_REV SKIP START_OF_LINE START_OF_LINE_U START_OF_STRING START_OF_WORD STRING STRING_FLD STRING_FLD_REV STRING_IGN STRING_IGN_REV STRING_REV STRING_SET STRING_SET_FLD STRING_SET_FLD_REV STRING_SET_IGN STRING_SET_IGN_REV STRING_SET_REV """ # Define the opcodes in a namespace. class Namespace: pass OP = Namespace() for i, op in enumerate(OPCODES.split()): setattr(OP, op, i) def _shrink_cache(cache_dict, args_dict, locale_sensitive, max_length, divisor=5): """Make room in the given cache. Args: cache_dict: The cache dictionary to modify. args_dict: The dictionary of named list args used by patterns. max_length: Maximum # of entries in cache_dict before it is shrunk. divisor: Cache will shrink to max_length - 1/divisor*max_length items. """ # Toss out a fraction of the entries at random to make room for new ones. # A random algorithm was chosen as opposed to simply cache_dict.popitem() # as popitem could penalize the same regular expression repeatedly based # on its internal hash value. Being random should spread the cache miss # love around. cache_keys = tuple(cache_dict.keys()) overage = len(cache_keys) - max_length if overage < 0: # Cache is already within limits. Normally this should not happen # but it could due to multithreading. return number_to_toss = max_length // divisor + overage # The import is done here to avoid a circular dependency. import random if not hasattr(random, 'sample'): # Do nothing while resolving the circular dependency: # re->random->warnings->tokenize->string->re return for doomed_key in random.sample(cache_keys, number_to_toss): try: del cache_dict[doomed_key] except KeyError: # Ignore problems if the cache changed from another thread. pass # Rebuild the arguments and locale-sensitivity dictionaries. args_dict.clear() sensitivity_dict = {} for pattern, pattern_type, flags, args, default_version, locale in tuple(cache_dict): args_dict[pattern, pattern_type, flags, default_version, locale] = args try: sensitivity_dict[pattern_type, pattern] = locale_sensitive[pattern_type, pattern] except KeyError: pass locale_sensitive.clear() locale_sensitive.update(sensitivity_dict) def _fold_case(info, string): "Folds the case of a string." flags = info.flags if (flags & _ALL_ENCODINGS) == 0: flags |= info.guess_encoding return _regex.fold_case(flags, string) def is_cased(info, char): "Checks whether a character is cased." return len(_regex.get_all_cases(info.flags, char)) > 1 def _compile_firstset(info, fs): "Compiles the firstset for the pattern." if not fs or None in fs: return [] # If we ignore the case, for simplicity we won't build a firstset. members = set() for i in fs: if isinstance(i, Character) and not i.positive: return [] if i.case_flags: if isinstance(i, Character): if is_cased(info, i.value): return [] elif isinstance(i, SetBase): return [] members.add(i.with_flags(case_flags=NOCASE)) # Build the firstset. fs = SetUnion(info, list(members), zerowidth=True) fs = fs.optimise(info, in_set=True) # Compile the firstset. return fs.compile(bool(info.flags & REVERSE)) def _flatten_code(code): "Flattens the code from a list of tuples." flat_code = [] for c in code: flat_code.extend(c) return flat_code def make_character(info, value, in_set=False): "Makes a character literal." if in_set: # A character set is built case-sensitively. return Character(value) return Character(value, case_flags=info.flags & CASE_FLAGS) def make_ref_group(info, name, position): "Makes a group reference." return RefGroup(info, name, position, case_flags=info.flags & CASE_FLAGS) def make_string_set(info, name): "Makes a string set." return StringSet(info, name, case_flags=info.flags & CASE_FLAGS) def make_property(info, prop, in_set): "Makes a property." if in_set: return prop return prop.with_flags(case_flags=info.flags & CASE_FLAGS) def _parse_pattern(source, info): "Parses a pattern, eg. 'a|b|c'." branches = [parse_sequence(source, info)] while source.match("|"): branches.append(parse_sequence(source, info)) if len(branches) == 1: return branches[0] return Branch(branches) def parse_sequence(source, info): "Parses a sequence, eg. 'abc'." sequence = [] applied = False while True: # Get literal characters followed by an element. characters, case_flags, element = parse_literal_and_element(source, info) if not element: # No element, just a literal. We've also reached the end of the # sequence. append_literal(characters, case_flags, sequence) break if element is COMMENT or element is FLAGS: append_literal(characters, case_flags, sequence) elif type(element) is tuple: # It looks like we've found a quantifier. ch, saved_pos = element counts = parse_quantifier(source, info, ch) if counts: # It _is_ a quantifier. apply_quantifier(source, info, counts, characters, case_flags, ch, saved_pos, applied, sequence) applied = True else: # It's not a quantifier. Maybe it's a fuzzy constraint. constraints = parse_fuzzy(source, ch) if constraints: # It _is_ a fuzzy constraint. apply_constraint(source, info, constraints, characters, case_flags, saved_pos, applied, sequence) applied = True else: # The element was just a literal. characters.append(ord(ch)) append_literal(characters, case_flags, sequence) applied = False else: # We have a literal followed by something else. append_literal(characters, case_flags, sequence) sequence.append(element) applied = False return make_sequence(sequence) def apply_quantifier(source, info, counts, characters, case_flags, ch, saved_pos, applied, sequence): if characters: # The quantifier applies to the last character. append_literal(characters[ : -1], case_flags, sequence) element = Character(characters[-1], case_flags=case_flags) else: # The quantifier applies to the last item in the sequence. if applied: raise error("multiple repeat", source.string, saved_pos) if not sequence: raise error("nothing to repeat", source.string, saved_pos) element = sequence.pop() min_count, max_count = counts saved_pos = source.pos ch = source.get() if ch == "?": # The "?" suffix that means it's a lazy repeat. repeated = LazyRepeat elif ch == "+": # The "+" suffix that means it's a possessive repeat. repeated = PossessiveRepeat else: # No suffix means that it's a greedy repeat. source.pos = saved_pos repeated = GreedyRepeat # Ignore the quantifier if it applies to a zero-width item or the number of # repeats is fixed at 1. if not element.is_empty() and (min_count != 1 or max_count != 1): element = repeated(element, min_count, max_count) sequence.append(element) def apply_constraint(source, info, constraints, characters, case_flags, saved_pos, applied, sequence): if characters: # The constraint applies to the last character. append_literal(characters[ : -1], case_flags, sequence) element = Character(characters[-1], case_flags=case_flags) sequence.append(Fuzzy(element, constraints)) else: # The constraint applies to the last item in the sequence. if applied or not sequence: raise error("nothing for fuzzy constraint", source.string, saved_pos) element = sequence.pop() # If a group is marked as fuzzy then put all of the fuzzy part in the # group. if isinstance(element, Group): element.subpattern = Fuzzy(element.subpattern, constraints) sequence.append(element) else: sequence.append(Fuzzy(element, constraints)) def append_literal(characters, case_flags, sequence): if characters: sequence.append(Literal(characters, case_flags=case_flags)) def PossessiveRepeat(element, min_count, max_count): "Builds a possessive repeat." return Atomic(GreedyRepeat(element, min_count, max_count)) _QUANTIFIERS = {"?": (0, 1), "*": (0, None), "+": (1, None)} def parse_quantifier(source, info, ch): "Parses a quantifier." q = _QUANTIFIERS.get(ch) if q: # It's a quantifier. return q if ch == "{": # Looks like a limited repeated element, eg. 'a{2,3}'. counts = parse_limited_quantifier(source) if counts: return counts return None def is_above_limit(count): "Checks whether a count is above the maximum." return count is not None and count >= UNLIMITED def parse_limited_quantifier(source): "Parses a limited quantifier." saved_pos = source.pos min_count = parse_count(source) if source.match(","): max_count = parse_count(source) # No minimum means 0 and no maximum means unlimited. min_count = int(min_count or 0) max_count = int(max_count) if max_count else None if max_count is not None and min_count > max_count: raise error("min repeat greater than max repeat", source.string, saved_pos) else: if not min_count: source.pos = saved_pos return None min_count = max_count = int(min_count) if is_above_limit(min_count) or is_above_limit(max_count): raise error("repeat count too big", source.string, saved_pos) if not source.match ("}"): source.pos = saved_pos return None return min_count, max_count def parse_fuzzy(source, ch): "Parses a fuzzy setting, if present." if ch != "{": return None saved_pos = source.pos constraints = {} try: parse_fuzzy_item(source, constraints) while source.match(","): parse_fuzzy_item(source, constraints) except ParseError: source.pos = saved_pos return None if not source.match("}"): raise error("expected }", source.string, source.pos) return constraints def parse_fuzzy_item(source, constraints): "Parses a fuzzy setting item." saved_pos = source.pos try: parse_cost_constraint(source, constraints) except ParseError: source.pos = saved_pos parse_cost_equation(source, constraints) def parse_cost_constraint(source, constraints): "Parses a cost constraint." saved_pos = source.pos ch = source.get() if ch in ALPHA: # Syntax: constraint [("<=" | "<") cost] constraint = parse_constraint(source, constraints, ch) max_inc = parse_fuzzy_compare(source) if max_inc is None: # No maximum cost. constraints[constraint] = 0, None else: # There's a maximum cost. cost_pos = source.pos max_cost = int(parse_count(source)) # Inclusive or exclusive limit? if not max_inc: max_cost -= 1 if max_cost < 0: raise error("bad fuzzy cost limit", source.string, cost_pos) constraints[constraint] = 0, max_cost elif ch in DIGITS: # Syntax: cost ("<=" | "<") constraint ("<=" | "<") cost source.pos = saved_pos try: # Minimum cost. min_cost = int(parse_count(source)) min_inc = parse_fuzzy_compare(source) if min_inc is None: raise ParseError() constraint = parse_constraint(source, constraints, source.get()) max_inc = parse_fuzzy_compare(source) if max_inc is None: raise ParseError() # Maximum cost. cost_pos = source.pos max_cost = int(parse_count(source)) # Inclusive or exclusive limits? if not min_inc: min_cost += 1 if not max_inc: max_cost -= 1 if not 0 <= min_cost <= max_cost: raise error("bad fuzzy cost limit", source.string, cost_pos) constraints[constraint] = min_cost, max_cost except ValueError: raise ParseError() else: raise ParseError() def parse_constraint(source, constraints, ch): "Parses a constraint." if ch not in "deis": raise error("bad fuzzy constraint", source.string, source.pos) if ch in constraints: raise error("repeated fuzzy constraint", source.string, source.pos) return ch def parse_fuzzy_compare(source): "Parses a cost comparator." if source.match("<="): return True elif source.match("<"): return False else: return None def parse_cost_equation(source, constraints): "Parses a cost equation." if "cost" in constraints: raise error("more than one cost equation", source.string, source.pos) cost = {} parse_cost_term(source, cost) while source.match("+"): parse_cost_term(source, cost) max_inc = parse_fuzzy_compare(source) if max_inc is None: raise error("missing fuzzy cost limit", source.string, source.pos) max_cost = int(parse_count(source)) if not max_inc: max_cost -= 1 if max_cost < 0: raise error("bad fuzzy cost limit", source.string, source.pos) cost["max"] = max_cost constraints["cost"] = cost def parse_cost_term(source, cost): "Parses a cost equation term." coeff = parse_count(source) ch = source.get() if ch not in "dis": raise ParseError() if ch in cost: raise error("repeated fuzzy cost", source.string, source.pos) cost[ch] = int(coeff or 1) def parse_count(source): "Parses a quantifier's count, which can be empty." return source.get_while(DIGITS) def parse_literal_and_element(source, info): """Parses a literal followed by an element. The element is FLAGS if it's an inline flag or None if it has reached the end of a sequence. """ characters = [] case_flags = info.flags & CASE_FLAGS while True: saved_pos = source.pos ch = source.get() if ch in SPECIAL_CHARS: if ch in ")|": # The end of a sequence. At the end of the pattern ch is "". source.pos = saved_pos return characters, case_flags, None elif ch == "\\": # An escape sequence outside a set. element = parse_escape(source, info, False) return characters, case_flags, element elif ch == "(": # A parenthesised subpattern or a flag. element = parse_paren(source, info) if element and element is not COMMENT: return characters, case_flags, element elif ch == ".": # Any character. if info.flags & DOTALL: element = AnyAll() elif info.flags & WORD: element = AnyU() else: element = Any() return characters, case_flags, element elif ch == "[": # A character set. element = parse_set(source, info) return characters, case_flags, element elif ch == "^": # The start of a line or the string. if info.flags & MULTILINE: if info.flags & WORD: element = StartOfLineU() else: element = StartOfLine() else: element = StartOfString() return characters, case_flags, element elif ch == "$": # The end of a line or the string. if info.flags & MULTILINE: if info.flags & WORD: element = EndOfLineU() else: element = EndOfLine() else: if info.flags & WORD: element = EndOfStringLineU() else: element = EndOfStringLine() return characters, case_flags, element elif ch in "?*+{": # Looks like a quantifier. return characters, case_flags, (ch, saved_pos) else: # A literal. characters.append(ord(ch)) else: # A literal. characters.append(ord(ch)) def parse_paren(source, info): """Parses a parenthesised subpattern or a flag. Returns FLAGS if it's an inline flag. """ saved_pos = source.pos ch = source.get() if ch == "?": # (?... saved_pos_2 = source.pos ch = source.get() if ch == "<": # (?<... saved_pos_3 = source.pos ch = source.get() if ch in ("=", "!"): # (?<=... or (?") saved_flags = info.flags try: subpattern = _parse_pattern(source, info) source.expect(")") finally: info.flags = saved_flags source.ignore_space = bool(info.flags & VERBOSE) info.close_group() return Group(info, group, subpattern) if ch in ("=", "!"): # (?=... or (?!...: lookahead. return parse_lookaround(source, info, False, ch == "=") if ch == "P": # (?P...: a Python extension. return parse_extension(source, info) if ch == "#": # (?#...: a comment. return parse_comment(source) if ch == "(": # (?(...: a conditional subpattern. return parse_conditional(source, info) if ch == ">": # (?>...: an atomic subpattern. return parse_atomic(source, info) if ch == "|": # (?|...: a common/reset groups branch. return parse_common(source, info) if ch == "R" or "0" <= ch <= "9": # (?R...: probably a call to a group. return parse_call_group(source, info, ch, saved_pos_2) if ch == "&": # (?&...: a call to a named group. return parse_call_named_group(source, info, saved_pos_2) # (?...: probably a flags subpattern. source.pos = saved_pos_2 return parse_flags_subpattern(source, info) if ch == "*": # (*... saved_pos_2 = source.pos word = source.get_while(set(")>"), include=False) if word[ : 1].isalpha(): verb = VERBS.get(word) if not verb: raise error("unknown verb", source.string, saved_pos_2) source.expect(")") return verb # (...: an unnamed capture group. source.pos = saved_pos group = info.open_group() saved_flags = info.flags try: subpattern = _parse_pattern(source, info) source.expect(")") finally: info.flags = saved_flags source.ignore_space = bool(info.flags & VERBOSE) info.close_group() return Group(info, group, subpattern) def parse_extension(source, info): "Parses a Python extension." saved_pos = source.pos ch = source.get() if ch == "<": # (?P<...: a named capture group. name = parse_name(source) group = info.open_group(name) source.expect(">") saved_flags = info.flags try: subpattern = _parse_pattern(source, info) source.expect(")") finally: info.flags = saved_flags source.ignore_space = bool(info.flags & VERBOSE) info.close_group() return Group(info, group, subpattern) if ch == "=": # (?P=...: a named group reference. name = parse_name(source, allow_numeric=True) source.expect(")") if info.is_open_group(name): raise error("cannot refer to an open group", source.string, saved_pos) return make_ref_group(info, name, saved_pos) if ch == ">" or ch == "&": # (?P>...: a call to a group. return parse_call_named_group(source, info, saved_pos) source.pos = saved_pos raise error("unknown extension", source.string, saved_pos) def parse_comment(source): "Parses a comment." source.skip_while(set(")"), include=False) source.expect(")") return COMMENT def parse_lookaround(source, info, behind, positive): "Parses a lookaround." saved_flags = info.flags try: subpattern = _parse_pattern(source, info) source.expect(")") finally: info.flags = saved_flags source.ignore_space = bool(info.flags & VERBOSE) return LookAround(behind, positive, subpattern) def parse_conditional(source, info): "Parses a conditional subpattern." saved_flags = info.flags saved_pos = source.pos ch = source.get() if ch == "?": # (?(?... ch = source.get() if ch in ("=", "!"): # (?(?=... or (?(?!...: lookahead conditional. return parse_lookaround_conditional(source, info, False, ch == "=") if ch == "<": # (?(?<... ch = source.get() if ch in ("=", "!"): # (?(?<=... or (?(?"), include=False) if not name: raise error("missing group name", source.string, source.pos) if name.isdigit(): min_group = 0 if allow_group_0 else 1 if not allow_numeric or int(name) < min_group: raise error("bad character in group name", source.string, source.pos) else: if not name.isidentifier(): raise error("bad character in group name", source.string, source.pos) return name def is_octal(string): "Checks whether a string is octal." return all(ch in OCT_DIGITS for ch in string) def is_decimal(string): "Checks whether a string is decimal." return all(ch in DIGITS for ch in string) def is_hexadecimal(string): "Checks whether a string is hexadecimal." return all(ch in HEX_DIGITS for ch in string) def parse_escape(source, info, in_set): "Parses an escape sequence." saved_ignore = source.ignore_space source.ignore_space = False ch = source.get() source.ignore_space = saved_ignore if not ch: # A backslash at the end of the pattern. raise error("bad escape (end of pattern)", source.string, source.pos) if ch in HEX_ESCAPES: # A hexadecimal escape sequence. return parse_hex_escape(source, info, HEX_ESCAPES[ch], in_set, ch) elif ch == "g" and not in_set: # A group reference. saved_pos = source.pos try: return parse_group_ref(source, info) except error: # Invalid as a group reference, so assume it's a literal. source.pos = saved_pos return make_character(info, ord(ch), in_set) elif ch == "G" and not in_set: # A search anchor. return SearchAnchor() elif ch == "L" and not in_set: # A string set. return parse_string_set(source, info) elif ch == "N": # A named codepoint. return parse_named_char(source, info, in_set) elif ch in "pP": # A Unicode property, positive or negative. return parse_property(source, info, ch == "p", in_set) elif ch == "X" and not in_set: # A grapheme cluster. return Grapheme() elif ch in ALPHA: # An alphabetic escape sequence. # Positional escapes aren't allowed inside a character set. if not in_set: if info.flags & WORD: value = WORD_POSITION_ESCAPES.get(ch) else: value = POSITION_ESCAPES.get(ch) if value: return value value = CHARSET_ESCAPES.get(ch) if value: return value value = CHARACTER_ESCAPES.get(ch) if value: return Character(ord(value)) return make_character(info, ord(ch), in_set) elif ch in DIGITS: # A numeric escape sequence. return parse_numeric_escape(source, info, ch, in_set) else: # A literal. return make_character(info, ord(ch), in_set) def parse_numeric_escape(source, info, ch, in_set): "Parses a numeric escape sequence." if in_set or ch == "0": # Octal escape sequence, max 3 digits. return parse_octal_escape(source, info, [ch], in_set) # At least 1 digit, so either octal escape or group. digits = ch saved_pos = source.pos ch = source.get() if ch in DIGITS: # At least 2 digits, so either octal escape or group. digits += ch saved_pos = source.pos ch = source.get() if is_octal(digits) and ch in OCT_DIGITS: # 3 octal digits, so octal escape sequence. encoding = info.flags & _ALL_ENCODINGS if encoding == ASCII or encoding == LOCALE: octal_mask = 0xFF else: octal_mask = 0x1FF value = int(digits + ch, 8) & octal_mask return make_character(info, value) # Group reference. source.pos = saved_pos if info.is_open_group(digits): raise error("cannot refer to an open group", source.string, source.pos) return make_ref_group(info, digits, source.pos) def parse_octal_escape(source, info, digits, in_set): "Parses an octal escape sequence." saved_pos = source.pos ch = source.get() while len(digits) < 3 and ch in OCT_DIGITS: digits.append(ch) saved_pos = source.pos ch = source.get() source.pos = saved_pos try: value = int("".join(digits), 8) return make_character(info, value, in_set) except ValueError: if digits[0] in OCT_DIGITS: raise error("incomplete escape \\%s" % ''.join(digits), source.string, source.pos) else: raise error("bad escape \\%s" % digits[0], source.string, source.pos) def parse_hex_escape(source, info, expected_len, in_set, type): "Parses a hex escape sequence." digits = [] for i in range(expected_len): ch = source.get() if ch not in HEX_DIGITS: raise error("incomplete escape \\%s%s" % (type, ''.join(digits)), source.string, source.pos) digits.append(ch) value = int("".join(digits), 16) return make_character(info, value, in_set) def parse_group_ref(source, info): "Parses a group reference." source.expect("<") saved_pos = source.pos name = parse_name(source, True) source.expect(">") if info.is_open_group(name): raise error("cannot refer to an open group", source.string, source.pos) return make_ref_group(info, name, saved_pos) def parse_string_set(source, info): "Parses a string set reference." source.expect("<") name = parse_name(source, True) source.expect(">") if name is None or name not in info.kwargs: raise error("undefined named list", source.string, source.pos) return make_string_set(info, name) def parse_named_char(source, info, in_set): "Parses a named character." saved_pos = source.pos if source.match("{"): name = source.get_while(NAMED_CHAR_PART) if source.match("}"): try: value = unicodedata.lookup(name) return make_character(info, ord(value), in_set) except KeyError: raise error("undefined character name", source.string, source.pos) source.pos = saved_pos return make_character(info, ord("N"), in_set) def parse_property(source, info, positive, in_set): "Parses a Unicode property." saved_pos = source.pos ch = source.get() if ch == "{": negate = source.match("^") prop_name, name = parse_property_name(source) if source.match("}"): # It's correctly delimited. prop = lookup_property(prop_name, name, positive != negate, source) return make_property(info, prop, in_set) elif ch and ch in "CLMNPSZ": # An abbreviated property, eg \pL. prop = lookup_property(None, ch, positive, source) return make_property(info, prop, in_set) # Not a property, so treat as a literal "p" or "P". source.pos = saved_pos ch = "p" if positive else "P" return make_character(info, ord(ch), in_set) def parse_property_name(source): "Parses a property name, which may be qualified." name = source.get_while(PROPERTY_NAME_PART) saved_pos = source.pos ch = source.get() if ch and ch in ":=": prop_name = name name = source.get_while(ALNUM | set(" &_-./")).strip() if name: # Name after the ":" or "=", so it's a qualified name. saved_pos = source.pos else: # No name after the ":" or "=", so assume it's an unqualified name. prop_name, name = None, prop_name else: prop_name = None source.pos = saved_pos return prop_name, name def parse_set(source, info): "Parses a character set." version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION saved_ignore = source.ignore_space source.ignore_space = False # Negative set? negate = source.match("^") try: if version == VERSION0: item = parse_set_imp_union(source, info) else: item = parse_set_union(source, info) if not source.match("]"): raise error("missing ]", source.string, source.pos) finally: source.ignore_space = saved_ignore if negate: item = item.with_flags(positive=not item.positive) item = item.with_flags(case_flags=info.flags & CASE_FLAGS) return item def parse_set_union(source, info): "Parses a set union ([x||y])." items = [parse_set_symm_diff(source, info)] while source.match("||"): items.append(parse_set_symm_diff(source, info)) if len(items) == 1: return items[0] return SetUnion(info, items) def parse_set_symm_diff(source, info): "Parses a set symmetric difference ([x~~y])." items = [parse_set_inter(source, info)] while source.match("~~"): items.append(parse_set_inter(source, info)) if len(items) == 1: return items[0] return SetSymDiff(info, items) def parse_set_inter(source, info): "Parses a set intersection ([x&&y])." items = [parse_set_diff(source, info)] while source.match("&&"): items.append(parse_set_diff(source, info)) if len(items) == 1: return items[0] return SetInter(info, items) def parse_set_diff(source, info): "Parses a set difference ([x--y])." items = [parse_set_imp_union(source, info)] while source.match("--"): items.append(parse_set_imp_union(source, info)) if len(items) == 1: return items[0] return SetDiff(info, items) def parse_set_imp_union(source, info): "Parses a set implicit union ([xy])." version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION items = [parse_set_member(source, info)] while True: saved_pos = source.pos if source.match("]"): # End of the set. source.pos = saved_pos break if version == VERSION1 and any(source.match(op) for op in SET_OPS): # The new behaviour has set operators. source.pos = saved_pos break items.append(parse_set_member(source, info)) if len(items) == 1: return items[0] return SetUnion(info, items) def parse_set_member(source, info): "Parses a member in a character set." # Parse a set item. start = parse_set_item(source, info) saved_pos1 = source.pos if (not isinstance(start, Character) or not start.positive or not source.match("-")): # It's not the start of a range. return start version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION # It looks like the start of a range of characters. saved_pos2 = source.pos if version == VERSION1 and source.match("-"): # It's actually the set difference operator '--', so return the # character. source.pos = saved_pos1 return start if source.match("]"): # We've reached the end of the set, so return both the character and # hyphen. source.pos = saved_pos2 return SetUnion(info, [start, Character(ord("-"))]) # Parse a set item. end = parse_set_item(source, info) if not isinstance(end, Character) or not end.positive: # It's not a range, so return the character, hyphen and property. return SetUnion(info, [start, Character(ord("-")), end]) # It _is_ a range. if start.value > end.value: raise error("bad character range", source.string, source.pos) if start.value == end.value: return start return Range(start.value, end.value) def parse_set_item(source, info): "Parses an item in a character set." version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION if source.match("\\"): # An escape sequence in a set. return parse_escape(source, info, True) saved_pos = source.pos if source.match("[:"): # Looks like a POSIX character class. try: return parse_posix_class(source, info) except ParseError: # Not a POSIX character class. source.pos = saved_pos if version == VERSION1 and source.match("["): # It's the start of a nested set. # Negative set? negate = source.match("^") item = parse_set_union(source, info) if not source.match("]"): raise error("missing ]", source.string, source.pos) if negate: item = item.with_flags(positive=not item.positive) return item ch = source.get() if not ch: raise error("unterminated character set", source.string, source.pos) return Character(ord(ch)) def parse_posix_class(source, info): "Parses a POSIX character class." negate = source.match("^") prop_name, name = parse_property_name(source) if not source.match(":]"): raise ParseError() return lookup_property(prop_name, name, not negate, source, posix=True) def float_to_rational(flt): "Converts a float to a rational pair." int_part = int(flt) error = flt - int_part if abs(error) < 0.0001: return int_part, 1 den, num = float_to_rational(1.0 / error) return int_part * den + num, den def numeric_to_rational(numeric): "Converts a numeric string to a rational string, if possible." if numeric[ : 1] == "-": sign, numeric = numeric[0], numeric[1 : ] else: sign = "" parts = numeric.split("/") if len(parts) == 2: num, den = float_to_rational(float(parts[0]) / float(parts[1])) elif len(parts) == 1: num, den = float_to_rational(float(parts[0])) else: raise ValueError() result = "{}{}/{}".format(sign, num, den) if result.endswith("/1"): return result[ : -2] return result def standardise_name(name): "Standardises a property or value name." try: return numeric_to_rational("".join(name)) except (ValueError, ZeroDivisionError): return "".join(ch for ch in name if ch not in "_- ").upper() _posix_classes = set('ALNUM DIGIT PUNCT XDIGIT'.split()) def lookup_property(property, value, positive, source=None, posix=False): "Looks up a property." # Normalise the names (which may still be lists). property = standardise_name(property) if property else None value = standardise_name(value) if (property, value) == ("GENERALCATEGORY", "ASSIGNED"): property, value, positive = "GENERALCATEGORY", "UNASSIGNED", not positive if posix and not property and value.upper() in _posix_classes: value = 'POSIX' + value if property: # Both the property and the value are provided. prop = PROPERTIES.get(property) if not prop: if not source: raise error("unknown property") raise error("unknown property", source.string, source.pos) prop_id, value_dict = prop val_id = value_dict.get(value) if val_id is None: if not source: raise error("unknown property value") raise error("unknown property value", source.string, source.pos) if "YES" in value_dict and val_id == 0: positive, val_id = not positive, 1 return Property((prop_id << 16) | val_id, positive) # Only the value is provided. # It might be the name of a GC, script or block value. for property in ("GC", "SCRIPT", "BLOCK"): prop_id, value_dict = PROPERTIES.get(property) val_id = value_dict.get(value) if val_id is not None: return Property((prop_id << 16) | val_id, positive) # It might be the name of a binary property. prop = PROPERTIES.get(value) if prop: prop_id, value_dict = prop if "YES" in value_dict: return Property((prop_id << 16) | 1, positive) # It might be the name of a binary property starting with a prefix. if value.startswith("IS"): prop = PROPERTIES.get(value[2 : ]) if prop: prop_id, value_dict = prop if "YES" in value_dict: return Property((prop_id << 16) | 1, positive) # It might be the name of a script or block starting with a prefix. for prefix, property in (("IS", "SCRIPT"), ("IN", "BLOCK")): if value.startswith(prefix): prop_id, value_dict = PROPERTIES.get(property) val_id = value_dict.get(value[2 : ]) if val_id is not None: return Property((prop_id << 16) | val_id, positive) # Unknown property. if not source: raise error("unknown property") raise error("unknown property", source.string, source.pos) def _compile_replacement(source, pattern, is_unicode): "Compiles a replacement template escape sequence." ch = source.get() if ch in ALPHA: # An alphabetic escape sequence. value = CHARACTER_ESCAPES.get(ch) if value: return False, [ord(value)] if ch in HEX_ESCAPES and (ch == "x" or is_unicode): # A hexadecimal escape sequence. return False, [parse_repl_hex_escape(source, HEX_ESCAPES[ch], ch)] if ch == "g": # A group preference. return True, [compile_repl_group(source, pattern)] if ch == "N" and is_unicode: # A named character. value = parse_repl_named_char(source) if value is not None: return False, [value] return False, [ord("\\"), ord(ch)] if isinstance(source.sep, bytes): octal_mask = 0xFF else: octal_mask = 0x1FF if ch == "0": # An octal escape sequence. digits = ch while len(digits) < 3: saved_pos = source.pos ch = source.get() if ch not in OCT_DIGITS: source.pos = saved_pos break digits += ch return False, [int(digits, 8) & octal_mask] if ch in DIGITS: # Either an octal escape sequence (3 digits) or a group reference (max # 2 digits). digits = ch saved_pos = source.pos ch = source.get() if ch in DIGITS: digits += ch saved_pos = source.pos ch = source.get() if ch and is_octal(digits + ch): # An octal escape sequence. return False, [int(digits + ch, 8) & octal_mask] # A group reference. source.pos = saved_pos return True, [int(digits)] if ch == "\\": # An escaped backslash is a backslash. return False, [ord("\\")] if not ch: # A trailing backslash. raise error("bad escape (end of pattern)", source.string, source.pos) # An escaped non-backslash is a backslash followed by the literal. return False, [ord("\\"), ord(ch)] def parse_repl_hex_escape(source, expected_len, type): "Parses a hex escape sequence in a replacement string." digits = [] for i in range(expected_len): ch = source.get() if ch not in HEX_DIGITS: raise error("incomplete escape \\%s%s" % (type, ''.join(digits)), source.string, source.pos) digits.append(ch) return int("".join(digits), 16) def parse_repl_named_char(source): "Parses a named character in a replacement string." saved_pos = source.pos if source.match("{"): name = source.get_while(ALPHA | set(" ")) if source.match("}"): try: value = unicodedata.lookup(name) return ord(value) except KeyError: raise error("undefined character name", source.string, source.pos) source.pos = saved_pos return None def compile_repl_group(source, pattern): "Compiles a replacement template group reference." source.expect("<") name = parse_name(source, True, True) source.expect(">") if name.isdigit(): index = int(name) if not 0 <= index <= pattern.groups: raise error("invalid group reference", source.string, source.pos) return index try: return pattern.groupindex[name] except KeyError: raise IndexError("unknown group") # The regular expression is parsed into a syntax tree. The different types of # node are defined below. INDENT = " " POSITIVE_OP = 0x1 ZEROWIDTH_OP = 0x2 FUZZY_OP = 0x4 REVERSE_OP = 0x8 REQUIRED_OP = 0x10 POS_TEXT = {False: "NON-MATCH", True: "MATCH"} CASE_TEXT = {NOCASE: "", IGNORECASE: " SIMPLE_IGNORE_CASE", FULLCASE: "", FULLIGNORECASE: " FULL_IGNORE_CASE"} def make_sequence(items): if len(items) == 1: return items[0] return Sequence(items) # Common base class for all nodes. class RegexBase: def __init__(self): self._key = self.__class__ def with_flags(self, positive=None, case_flags=None, zerowidth=None): if positive is None: positive = self.positive else: positive = bool(positive) if case_flags is None: case_flags = self.case_flags else: case_flags = CASE_FLAGS_COMBINATIONS[case_flags & CASE_FLAGS] if zerowidth is None: zerowidth = self.zerowidth else: zerowidth = bool(zerowidth) if (positive == self.positive and case_flags == self.case_flags and zerowidth == self.zerowidth): return self return self.rebuild(positive, case_flags, zerowidth) def fix_groups(self, pattern, reverse, fuzzy): pass def optimise(self, info): return self def pack_characters(self, info): return self def remove_captures(self): return self def is_atomic(self): return True def can_be_affix(self): return True def contains_group(self): return False def get_firstset(self, reverse): raise _FirstSetError() def has_simple_start(self): return False def compile(self, reverse=False, fuzzy=False): return self._compile(reverse, fuzzy) def dump(self, indent, reverse): self._dump(indent, reverse) def is_empty(self): return False def __hash__(self): return hash(self._key) def __eq__(self, other): return type(self) is type(other) and self._key == other._key def __ne__(self, other): return not self.__eq__(other) def get_required_string(self, reverse): return self.max_width(), None # Base class for zero-width nodes. class ZeroWidthBase(RegexBase): def __init__(self, positive=True): RegexBase.__init__(self) self.positive = bool(positive) self._key = self.__class__, self.positive def get_firstset(self, reverse): return set([None]) def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if fuzzy: flags |= FUZZY_OP if reverse: flags |= REVERSE_OP return [(self._opcode, flags)] def _dump(self, indent, reverse): print("{}{} {}".format(INDENT * indent, self._op_name, POS_TEXT[self.positive])) def max_width(self): return 0 class Any(RegexBase): _opcode = {False: OP.ANY, True: OP.ANY_REV} _op_name = "ANY" def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if fuzzy: flags |= FUZZY_OP return [(self._opcode[reverse], flags)] def _dump(self, indent, reverse): print("{}{}".format(INDENT * indent, self._op_name)) def max_width(self): return 1 class AnyAll(Any): _opcode = {False: OP.ANY_ALL, True: OP.ANY_ALL_REV} _op_name = "ANY_ALL" class AnyU(Any): _opcode = {False: OP.ANY_U, True: OP.ANY_U_REV} _op_name = "ANY_U" class Atomic(RegexBase): def __init__(self, subpattern): RegexBase.__init__(self) self.subpattern = subpattern def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): self.subpattern = self.subpattern.optimise(info) if self.subpattern.is_empty(): return self.subpattern return self def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): self.subpattern = self.subpattern.remove_captures() return self def can_be_affix(self): return self.subpattern.can_be_affix() def contains_group(self): return self.subpattern.contains_group() def get_firstset(self, reverse): return self.subpattern.get_firstset(reverse) def has_simple_start(self): return self.subpattern.has_simple_start() def _compile(self, reverse, fuzzy): return ([(OP.ATOMIC, )] + self.subpattern.compile(reverse, fuzzy) + [(OP.END, )]) def _dump(self, indent, reverse): print("{}ATOMIC".format(INDENT * indent)) self.subpattern.dump(indent + 1, reverse) def is_empty(self): return self.subpattern.is_empty() def __eq__(self, other): return (type(self) is type(other) and self.subpattern == other.subpattern) def max_width(self): return self.subpattern.max_width() def get_required_string(self, reverse): return self.subpattern.get_required_string(reverse) class Boundary(ZeroWidthBase): _opcode = OP.BOUNDARY _op_name = "BOUNDARY" class Branch(RegexBase): def __init__(self, branches): RegexBase.__init__(self) self.branches = branches def fix_groups(self, pattern, reverse, fuzzy): for b in self.branches: b.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): # Flatten branches within branches. branches = Branch._flatten_branches(info, self.branches) # Move any common prefix or suffix out of the branches. prefix, branches = Branch._split_common_prefix(info, branches) # Try to reduce adjacent single-character branches to sets. branches = Branch._reduce_to_set(info, branches) if len(branches) > 1: sequence = [Branch(branches)] else: sequence = branches return make_sequence(prefix + sequence) def pack_characters(self, info): self.branches = [b.pack_characters(info) for b in self.branches] return self def remove_captures(self): self.branches = [b.remove_captures() for b in self.branches] return self def is_atomic(self): return all(b.is_atomic() for b in self.branches) def can_be_affix(self): return all(b.can_be_affix() for b in self.branches) def contains_group(self): return any(b.contains_group() for b in self.branches) def get_firstset(self, reverse): fs = set() for b in self.branches: fs |= b.get_firstset(reverse) return fs or set([None]) def _compile(self, reverse, fuzzy): code = [(OP.BRANCH, )] for b in self.branches: code.extend(b.compile(reverse, fuzzy)) code.append((OP.NEXT, )) code[-1] = (OP.END, ) return code def _dump(self, indent, reverse): print("{}BRANCH".format(INDENT * indent)) self.branches[0].dump(indent + 1, reverse) for b in self.branches[1 : ]: print("{}OR".format(INDENT * indent)) b.dump(indent + 1, reverse) @staticmethod def _flatten_branches(info, branches): # Flatten the branches so that there aren't branches of branches. new_branches = [] for b in branches: b = b.optimise(info) if isinstance(b, Branch): new_branches.extend(b.branches) else: new_branches.append(b) return new_branches @staticmethod def _split_common_prefix(info, branches): # Common leading items can be moved out of the branches. # Get the items in the branches. alternatives = [] for b in branches: if isinstance(b, Sequence): alternatives.append(b.items) else: alternatives.append([b]) # What is the maximum possible length of the prefix? max_count = min(len(a) for a in alternatives) # What is the longest common prefix? prefix = alternatives[0] pos = 0 end_pos = max_count while pos < end_pos and prefix[pos].can_be_affix() and all(a[pos] == prefix[pos] for a in alternatives): pos += 1 count = pos if info.flags & UNICODE: # We need to check that we're not splitting a sequence of # characters which could form part of full case-folding. count = pos while count > 0 and not all(Branch._can_split(a, count) for a in alternatives): count -= 1 # No common prefix is possible. if count == 0: return [], branches # Rebuild the branches. new_branches = [] for a in alternatives: new_branches.append(make_sequence(a[count : ])) return prefix[ : count], new_branches @staticmethod def _split_common_suffix(info, branches): # Common trailing items can be moved out of the branches. # Get the items in the branches. alternatives = [] for b in branches: if isinstance(b, Sequence): alternatives.append(b.items) else: alternatives.append([b]) # What is the maximum possible length of the suffix? max_count = min(len(a) for a in alternatives) # What is the longest common suffix? suffix = alternatives[0] pos = -1 end_pos = -1 - max_count while pos > end_pos and suffix[pos].can_be_affix() and all(a[pos] == suffix[pos] for a in alternatives): pos -= 1 count = -1 - pos if info.flags & UNICODE: # We need to check that we're not splitting a sequence of # characters which could form part of full case-folding. while count > 0 and not all(Branch._can_split_rev(a, count) for a in alternatives): count -= 1 # No common suffix is possible. if count == 0: return [], branches # Rebuild the branches. new_branches = [] for a in alternatives: new_branches.append(make_sequence(a[ : -count])) return suffix[-count : ], new_branches @staticmethod def _can_split(items, count): # Check the characters either side of the proposed split. if not Branch._is_full_case(items, count - 1): return True if not Branch._is_full_case(items, count): return True # Check whether a 1-1 split would be OK. if Branch._is_folded(items[count - 1 : count + 1]): return False # Check whether a 1-2 split would be OK. if (Branch._is_full_case(items, count + 2) and Branch._is_folded(items[count - 1 : count + 2])): return False # Check whether a 2-1 split would be OK. if (Branch._is_full_case(items, count - 2) and Branch._is_folded(items[count - 2 : count + 1])): return False return True @staticmethod def _can_split_rev(items, count): end = len(items) # Check the characters either side of the proposed split. if not Branch._is_full_case(items, end - count): return True if not Branch._is_full_case(items, end - count - 1): return True # Check whether a 1-1 split would be OK. if Branch._is_folded(items[end - count - 1 : end - count + 1]): return False # Check whether a 1-2 split would be OK. if (Branch._is_full_case(items, end - count + 2) and Branch._is_folded(items[end - count - 1 : end - count + 2])): return False # Check whether a 2-1 split would be OK. if (Branch._is_full_case(items, end - count - 2) and Branch._is_folded(items[end - count - 2 : end - count + 1])): return False return True @staticmethod def _merge_common_prefixes(info, branches): # Branches with the same case-sensitive character prefix can be grouped # together if they are separated only by other branches with a # character prefix. prefixed = defaultdict(list) order = {} new_branches = [] for b in branches: if Branch._is_simple_character(b): # Branch starts with a simple character. prefixed[b.value].append([b]) order.setdefault(b.value, len(order)) elif (isinstance(b, Sequence) and b.items and Branch._is_simple_character(b.items[0])): # Branch starts with a simple character. prefixed[b.items[0].value].append(b.items) order.setdefault(b.items[0].value, len(order)) else: Branch._flush_char_prefix(info, prefixed, order, new_branches) new_branches.append(b) Branch._flush_char_prefix(info, prefixed, order, new_branches) return new_branches @staticmethod def _is_simple_character(c): return isinstance(c, Character) and c.positive and not c.case_flags @staticmethod def _reduce_to_set(info, branches): # Can the branches be reduced to a set? new_branches = [] items = set() case_flags = NOCASE for b in branches: if isinstance(b, (Character, Property, SetBase)): # Branch starts with a single character. if b.case_flags != case_flags: # Different case sensitivity, so flush. Branch._flush_set_members(info, items, case_flags, new_branches) case_flags = b.case_flags items.add(b.with_flags(case_flags=NOCASE)) else: Branch._flush_set_members(info, items, case_flags, new_branches) new_branches.append(b) Branch._flush_set_members(info, items, case_flags, new_branches) return new_branches @staticmethod def _flush_char_prefix(info, prefixed, order, new_branches): # Flush the prefixed branches. if not prefixed: return for value, branches in sorted(prefixed.items(), key=lambda pair: order[pair[0]]): if len(branches) == 1: new_branches.append(make_sequence(branches[0])) else: subbranches = [] optional = False for b in branches: if len(b) > 1: subbranches.append(make_sequence(b[1 : ])) elif not optional: subbranches.append(Sequence()) optional = True sequence = Sequence([Character(value), Branch(subbranches)]) new_branches.append(sequence.optimise(info)) prefixed.clear() order.clear() @staticmethod def _flush_set_members(info, items, case_flags, new_branches): # Flush the set members. if not items: return if len(items) == 1: item = list(items)[0] else: item = SetUnion(info, list(items)).optimise(info) new_branches.append(item.with_flags(case_flags=case_flags)) items.clear() @staticmethod def _is_full_case(items, i): if not 0 <= i < len(items): return False item = items[i] return (isinstance(item, Character) and item.positive and (item.case_flags & FULLIGNORECASE) == FULLIGNORECASE) @staticmethod def _is_folded(items): if len(items) < 2: return False for i in items: if (not isinstance(i, Character) or not i.positive or not i.case_flags): return False folded = "".join(chr(i.value) for i in items) folded = _regex.fold_case(FULL_CASE_FOLDING, folded) # Get the characters which expand to multiple codepoints on folding. expanding_chars = _regex.get_expand_on_folding() for c in expanding_chars: if folded == _regex.fold_case(FULL_CASE_FOLDING, c): return True return False def is_empty(self): return all(b.is_empty() for b in self.branches) def __eq__(self, other): return type(self) is type(other) and self.branches == other.branches def max_width(self): return max(b.max_width() for b in self.branches) class CallGroup(RegexBase): def __init__(self, info, group, position): RegexBase.__init__(self) self.info = info self.group = group self.position = position self._key = self.__class__, self.group def fix_groups(self, pattern, reverse, fuzzy): try: self.group = int(self.group) except ValueError: try: self.group = self.info.group_index[self.group] except KeyError: raise error("invalid group reference", pattern, self.position) if not 0 <= self.group <= self.info.group_count: raise error("unknown group", pattern, self.position) if self.group > 0 and self.info.open_group_count[self.group] > 1: raise error("ambiguous group reference", pattern, self.position) self.info.group_calls.append((self, reverse, fuzzy)) self._key = self.__class__, self.group def remove_captures(self): raise error("group reference not allowed", pattern, self.position) def _compile(self, reverse, fuzzy): return [(OP.GROUP_CALL, self.call_ref)] def _dump(self, indent, reverse): print("{}GROUP_CALL {}".format(INDENT * indent, self.group)) def __eq__(self, other): return type(self) is type(other) and self.group == other.group def max_width(self): return UNLIMITED class Character(RegexBase): _opcode = {(NOCASE, False): OP.CHARACTER, (IGNORECASE, False): OP.CHARACTER_IGN, (FULLCASE, False): OP.CHARACTER, (FULLIGNORECASE, False): OP.CHARACTER_IGN, (NOCASE, True): OP.CHARACTER_REV, (IGNORECASE, True): OP.CHARACTER_IGN_REV, (FULLCASE, True): OP.CHARACTER_REV, (FULLIGNORECASE, True): OP.CHARACTER_IGN_REV} def __init__(self, value, positive=True, case_flags=NOCASE, zerowidth=False): RegexBase.__init__(self) self.value = value self.positive = bool(positive) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self.zerowidth = bool(zerowidth) if (self.positive and (self.case_flags & FULLIGNORECASE) == FULLIGNORECASE): self.folded = _regex.fold_case(FULL_CASE_FOLDING, chr(self.value)) else: self.folded = chr(self.value) self._key = (self.__class__, self.value, self.positive, self.case_flags, self.zerowidth) def rebuild(self, positive, case_flags, zerowidth): return Character(self.value, positive, case_flags, zerowidth) def optimise(self, info, in_set=False): return self def get_firstset(self, reverse): return set([self]) def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP code = PrecompiledCode([self._opcode[self.case_flags, reverse], flags, self.value]) if len(self.folded) > 1: # The character expands on full case-folding. code = Branch([code, String([ord(c) for c in self.folded], case_flags=self.case_flags)]) return code.compile(reverse, fuzzy) def _dump(self, indent, reverse): display = ascii(chr(self.value)).lstrip("bu") print("{}CHARACTER {} {}{}".format(INDENT * indent, POS_TEXT[self.positive], display, CASE_TEXT[self.case_flags])) def matches(self, ch): return (ch == self.value) == self.positive def max_width(self): return len(self.folded) def get_required_string(self, reverse): if not self.positive: return 1, None self.folded_characters = tuple(ord(c) for c in self.folded) return 0, self class Conditional(RegexBase): def __init__(self, info, group, yes_item, no_item, position): RegexBase.__init__(self) self.info = info self.group = group self.yes_item = yes_item self.no_item = no_item self.position = position def fix_groups(self, pattern, reverse, fuzzy): try: self.group = int(self.group) except ValueError: try: self.group = self.info.group_index[self.group] except KeyError: if self.group == 'DEFINE': # 'DEFINE' is a special name unless there's a group with # that name. self.group = 0 else: raise error("unknown group", pattern, self.position) if not 0 <= self.group <= self.info.group_count: raise error("invalid group reference", pattern, self.position) self.yes_item.fix_groups(pattern, reverse, fuzzy) self.no_item.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): yes_item = self.yes_item.optimise(info) no_item = self.no_item.optimise(info) return Conditional(info, self.group, yes_item, no_item, self.position) def pack_characters(self, info): self.yes_item = self.yes_item.pack_characters(info) self.no_item = self.no_item.pack_characters(info) return self def remove_captures(self): self.yes_item = self.yes_item.remove_captures() self.no_item = self.no_item.remove_captures() def is_atomic(self): return self.yes_item.is_atomic() and self.no_item.is_atomic() def can_be_affix(self): return self.yes_item.can_be_affix() and self.no_item.can_be_affix() def contains_group(self): return self.yes_item.contains_group() or self.no_item.contains_group() def get_firstset(self, reverse): return (self.yes_item.get_firstset(reverse) | self.no_item.get_firstset(reverse)) def _compile(self, reverse, fuzzy): code = [(OP.GROUP_EXISTS, self.group)] code.extend(self.yes_item.compile(reverse, fuzzy)) add_code = self.no_item.compile(reverse, fuzzy) if add_code: code.append((OP.NEXT, )) code.extend(add_code) code.append((OP.END, )) return code def _dump(self, indent, reverse): print("{}GROUP_EXISTS {}".format(INDENT * indent, self.group)) self.yes_item.dump(indent + 1, reverse) if not self.no_item.is_empty(): print("{}OR".format(INDENT * indent)) self.no_item.dump(indent + 1, reverse) def is_empty(self): return self.yes_item.is_empty() and self.no_item.is_empty() def __eq__(self, other): return type(self) is type(other) and (self.group, self.yes_item, self.no_item) == (other.group, other.yes_item, other.no_item) def max_width(self): return max(self.yes_item.max_width(), self.no_item.max_width()) class DefaultBoundary(ZeroWidthBase): _opcode = OP.DEFAULT_BOUNDARY _op_name = "DEFAULT_BOUNDARY" class DefaultEndOfWord(ZeroWidthBase): _opcode = OP.DEFAULT_END_OF_WORD _op_name = "DEFAULT_END_OF_WORD" class DefaultStartOfWord(ZeroWidthBase): _opcode = OP.DEFAULT_START_OF_WORD _op_name = "DEFAULT_START_OF_WORD" class EndOfLine(ZeroWidthBase): _opcode = OP.END_OF_LINE _op_name = "END_OF_LINE" class EndOfLineU(EndOfLine): _opcode = OP.END_OF_LINE_U _op_name = "END_OF_LINE_U" class EndOfString(ZeroWidthBase): _opcode = OP.END_OF_STRING _op_name = "END_OF_STRING" class EndOfStringLine(ZeroWidthBase): _opcode = OP.END_OF_STRING_LINE _op_name = "END_OF_STRING_LINE" class EndOfStringLineU(EndOfStringLine): _opcode = OP.END_OF_STRING_LINE_U _op_name = "END_OF_STRING_LINE_U" class EndOfWord(ZeroWidthBase): _opcode = OP.END_OF_WORD _op_name = "END_OF_WORD" class Failure(ZeroWidthBase): _op_name = "FAILURE" def _compile(self, reverse, fuzzy): return [(OP.FAILURE, )] class Fuzzy(RegexBase): def __init__(self, subpattern, constraints=None): RegexBase.__init__(self) if constraints is None: constraints = {} self.subpattern = subpattern self.constraints = constraints # If an error type is mentioned in the cost equation, then its maximum # defaults to unlimited. if "cost" in constraints: for e in "dis": if e in constraints["cost"]: constraints.setdefault(e, (0, None)) # If any error type is mentioned, then all the error maxima default to # 0, otherwise they default to unlimited. if set(constraints) & set("dis"): for e in "dis": constraints.setdefault(e, (0, 0)) else: for e in "dis": constraints.setdefault(e, (0, None)) # The maximum of the generic error type defaults to unlimited. constraints.setdefault("e", (0, None)) # The cost equation defaults to equal costs. Also, the cost of any # error type not mentioned in the cost equation defaults to 0. if "cost" in constraints: for e in "dis": constraints["cost"].setdefault(e, 0) else: constraints["cost"] = {"d": 1, "i": 1, "s": 1, "max": constraints["e"][1]} def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, reverse, True) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): self.subpattern = self.subpattern.remove_captures() return self def is_atomic(self): return self.subpattern.is_atomic() def contains_group(self): return self.subpattern.contains_group() def _compile(self, reverse, fuzzy): # The individual limits. arguments = [] for e in "dise": v = self.constraints[e] arguments.append(v[0]) arguments.append(UNLIMITED if v[1] is None else v[1]) # The coeffs of the cost equation. for e in "dis": arguments.append(self.constraints["cost"][e]) # The maximum of the cost equation. v = self.constraints["cost"]["max"] arguments.append(UNLIMITED if v is None else v) flags = 0 if reverse: flags |= REVERSE_OP return ([(OP.FUZZY, flags) + tuple(arguments)] + self.subpattern.compile(reverse, True) + [(OP.END,)]) def _dump(self, indent, reverse): constraints = self._constraints_to_string() if constraints: constraints = " " + constraints print("{}FUZZY{}".format(INDENT * indent, constraints)) self.subpattern.dump(indent + 1, reverse) def is_empty(self): return self.subpattern.is_empty() def __eq__(self, other): return (type(self) is type(other) and self.subpattern == other.subpattern) def max_width(self): return UNLIMITED def _constraints_to_string(self): constraints = [] for name in "ids": min, max = self.constraints[name] if max == 0: continue con = "" if min > 0: con = "{}<=".format(min) con += name if max is not None: con += "<={}".format(max) constraints.append(con) cost = [] for name in "ids": coeff = self.constraints["cost"][name] if coeff > 0: cost.append("{}{}".format(coeff, name)) limit = self.constraints["cost"]["max"] if limit is not None and limit > 0: cost = "{}<={}".format("+".join(cost), limit) constraints.append(cost) return ",".join(constraints) class Grapheme(RegexBase): def _compile(self, reverse, fuzzy): # Match at least 1 character until a grapheme boundary is reached. Note # that this is the same whether matching forwards or backwards. grapheme_matcher = Atomic(Sequence([LazyRepeat(AnyAll(), 1, None), GraphemeBoundary()])) return grapheme_matcher.compile(reverse, fuzzy) def _dump(self, indent, reverse): print("{}GRAPHEME".format(INDENT * indent)) def max_width(self): return UNLIMITED class GraphemeBoundary: def compile(self, reverse, fuzzy): return [(OP.GRAPHEME_BOUNDARY, 1)] class GreedyRepeat(RegexBase): _opcode = OP.GREEDY_REPEAT _op_name = "GREEDY_REPEAT" def __init__(self, subpattern, min_count, max_count): RegexBase.__init__(self) self.subpattern = subpattern self.min_count = min_count self.max_count = max_count def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): subpattern = self.subpattern.optimise(info) return type(self)(subpattern, self.min_count, self.max_count) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): self.subpattern = self.subpattern.remove_captures() return self def is_atomic(self): return self.min_count == self.max_count and self.subpattern.is_atomic() def contains_group(self): return self.subpattern.contains_group() def get_firstset(self, reverse): fs = self.subpattern.get_firstset(reverse) if self.min_count == 0: fs.add(None) return fs def _compile(self, reverse, fuzzy): repeat = [self._opcode, self.min_count] if self.max_count is None: repeat.append(UNLIMITED) else: repeat.append(self.max_count) subpattern = self.subpattern.compile(reverse, fuzzy) if not subpattern: return [] return ([tuple(repeat)] + subpattern + [(OP.END, )]) def _dump(self, indent, reverse): if self.max_count is None: limit = "INF" else: limit = self.max_count print("{}{} {} {}".format(INDENT * indent, self._op_name, self.min_count, limit)) self.subpattern.dump(indent + 1, reverse) def is_empty(self): return self.subpattern.is_empty() def __eq__(self, other): return type(self) is type(other) and (self.subpattern, self.min_count, self.max_count) == (other.subpattern, other.min_count, other.max_count) def max_width(self): if self.max_count is None: return UNLIMITED return self.subpattern.max_width() * self.max_count def get_required_string(self, reverse): max_count = UNLIMITED if self.max_count is None else self.max_count if self.min_count == 0: w = self.subpattern.max_width() * max_count return min(w, UNLIMITED), None ofs, req = self.subpattern.get_required_string(reverse) if req: return ofs, req w = self.subpattern.max_width() * max_count return min(w, UNLIMITED), None class Group(RegexBase): def __init__(self, info, group, subpattern): RegexBase.__init__(self) self.info = info self.group = group self.subpattern = subpattern self.call_ref = None def fix_groups(self, pattern, reverse, fuzzy): self.info.defined_groups[self.group] = (self, reverse, fuzzy) self.subpattern.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): subpattern = self.subpattern.optimise(info) return Group(self.info, self.group, subpattern) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): return self.subpattern.remove_captures() def is_atomic(self): return self.subpattern.is_atomic() def can_be_affix(self): return False def contains_group(self): return True def get_firstset(self, reverse): return self.subpattern.get_firstset(reverse) def has_simple_start(self): return self.subpattern.has_simple_start() def _compile(self, reverse, fuzzy): code = [] key = self.group, reverse, fuzzy ref = self.info.call_refs.get(key) if ref is not None: code += [(OP.CALL_REF, ref)] public_group = private_group = self.group if private_group < 0: public_group = self.info.private_groups[private_group] private_group = self.info.group_count - private_group code += ([(OP.GROUP, private_group, public_group)] + self.subpattern.compile(reverse, fuzzy) + [(OP.END, )]) if ref is not None: code += [(OP.END, )] return code def _dump(self, indent, reverse): group = self.group if group < 0: group = private_groups[group] print("{}GROUP {}".format(INDENT * indent, group)) self.subpattern.dump(indent + 1, reverse) def __eq__(self, other): return (type(self) is type(other) and (self.group, self.subpattern) == (other.group, other.subpattern)) def max_width(self): return self.subpattern.max_width() def get_required_string(self, reverse): return self.subpattern.get_required_string(reverse) class Keep(ZeroWidthBase): _opcode = OP.KEEP _op_name = "KEEP" class LazyRepeat(GreedyRepeat): _opcode = OP.LAZY_REPEAT _op_name = "LAZY_REPEAT" class LookAround(RegexBase): _dir_text = {False: "AHEAD", True: "BEHIND"} def __init__(self, behind, positive, subpattern): RegexBase.__init__(self) self.behind = bool(behind) self.positive = bool(positive) self.subpattern = subpattern def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, self.behind, fuzzy) def optimise(self, info): subpattern = self.subpattern.optimise(info) if self.positive and subpattern.is_empty(): return subpattern return LookAround(self.behind, self.positive, subpattern) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) return self def remove_captures(self): return self.subpattern.remove_captures() def is_atomic(self): return self.subpattern.is_atomic() def can_be_affix(self): return self.subpattern.can_be_affix() def contains_group(self): return self.subpattern.contains_group() def _compile(self, reverse, fuzzy): return ([(OP.LOOKAROUND, int(self.positive), int(not self.behind))] + self.subpattern.compile(self.behind) + [(OP.END, )]) def _dump(self, indent, reverse): print("{}LOOK{} {}".format(INDENT * indent, self._dir_text[self.behind], POS_TEXT[self.positive])) self.subpattern.dump(indent + 1, self.behind) def is_empty(self): return self.positive and self.subpattern.is_empty() def __eq__(self, other): return type(self) is type(other) and (self.behind, self.positive, self.subpattern) == (other.behind, other.positive, other.subpattern) def max_width(self): return 0 class LookAroundConditional(RegexBase): _dir_text = {False: "AHEAD", True: "BEHIND"} def __init__(self, behind, positive, subpattern, yes_item, no_item): RegexBase.__init__(self) self.behind = bool(behind) self.positive = bool(positive) self.subpattern = subpattern self.yes_item = yes_item self.no_item = no_item def fix_groups(self, pattern, reverse, fuzzy): self.subpattern.fix_groups(pattern, reverse, fuzzy) self.yes_item.fix_groups(pattern, reverse, fuzzy) self.no_item.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): subpattern = self.subpattern.optimise(info) yes_item = self.yes_item.optimise(info) no_item = self.no_item.optimise(info) return LookAroundConditional(self.behind, self.positive, subpattern, yes_item, no_item) def pack_characters(self, info): self.subpattern = self.subpattern.pack_characters(info) self.yes_item = self.yes_item.pack_characters(info) self.no_item = self.no_item.pack_characters(info) return self def remove_captures(self): self.subpattern = self.subpattern.remove_captures() self.yes_item = self.yes_item.remove_captures() self.no_item = self.no_item.remove_captures() def is_atomic(self): return (self.subpattern.is_atomic() and self.yes_item.is_atomic() and self.no_item.is_atomic()) def can_be_affix(self): return (self.subpattern.can_be_affix() and self.yes_item.can_be_affix() and self.no_item.can_be_affix()) def contains_group(self): return (self.subpattern.contains_group() or self.yes_item.contains_group() or self.no_item.contains_group()) def get_firstset(self, reverse): return (self.subpattern.get_firstset(reverse) | self.no_item.get_firstset(reverse)) def _compile(self, reverse, fuzzy): code = [(OP.CONDITIONAL, int(self.positive), int(not self.behind))] code.extend(self.subpattern.compile(self.behind, fuzzy)) code.append((OP.NEXT, )) code.extend(self.yes_item.compile(reverse, fuzzy)) add_code = self.no_item.compile(reverse, fuzzy) if add_code: code.append((OP.NEXT, )) code.extend(add_code) code.append((OP.END, )) return code def _dump(self, indent, reverse): print("{}CONDITIONAL {} {}".format(INDENT * indent, self._dir_text[self.behind], POS_TEXT[self.positive])) self.subpattern.dump(indent + 1, self.behind) print("{}EITHER".format(INDENT * indent)) self.yes_item.dump(indent + 1, reverse) if not self.no_item.is_empty(): print("{}OR".format(INDENT * indent)) self.no_item.dump(indent + 1, reverse) def is_empty(self): return (self.subpattern.is_empty() and self.yes_item.is_empty() or self.no_item.is_empty()) def __eq__(self, other): return type(self) is type(other) and (self.subpattern, self.yes_item, self.no_item) == (other.subpattern, other.yes_item, other.no_item) def max_width(self): return max(self.yes_item.max_width(), self.no_item.max_width()) def get_required_string(self, reverse): return self.max_width(), None class PrecompiledCode(RegexBase): def __init__(self, code): self.code = code def _compile(self, reverse, fuzzy): return [tuple(self.code)] class Property(RegexBase): _opcode = {(NOCASE, False): OP.PROPERTY, (IGNORECASE, False): OP.PROPERTY_IGN, (FULLCASE, False): OP.PROPERTY, (FULLIGNORECASE, False): OP.PROPERTY_IGN, (NOCASE, True): OP.PROPERTY_REV, (IGNORECASE, True): OP.PROPERTY_IGN_REV, (FULLCASE, True): OP.PROPERTY_REV, (FULLIGNORECASE, True): OP.PROPERTY_IGN_REV} def __init__(self, value, positive=True, case_flags=NOCASE, zerowidth=False): RegexBase.__init__(self) self.value = value self.positive = bool(positive) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self.zerowidth = bool(zerowidth) self._key = (self.__class__, self.value, self.positive, self.case_flags, self.zerowidth) def rebuild(self, positive, case_flags, zerowidth): return Property(self.value, positive, case_flags, zerowidth) def optimise(self, info, in_set=False): return self def get_firstset(self, reverse): return set([self]) def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP return [(self._opcode[self.case_flags, reverse], flags, self.value)] def _dump(self, indent, reverse): prop = PROPERTY_NAMES[self.value >> 16] name, value = prop[0], prop[1][self.value & 0xFFFF] print("{}PROPERTY {} {}:{}{}".format(INDENT * indent, POS_TEXT[self.positive], name, value, CASE_TEXT[self.case_flags])) def matches(self, ch): return _regex.has_property_value(self.value, ch) == self.positive def max_width(self): return 1 class Prune(ZeroWidthBase): _op_name = "PRUNE" def _compile(self, reverse, fuzzy): return [(OP.PRUNE, )] class Range(RegexBase): _opcode = {(NOCASE, False): OP.RANGE, (IGNORECASE, False): OP.RANGE_IGN, (FULLCASE, False): OP.RANGE, (FULLIGNORECASE, False): OP.RANGE_IGN, (NOCASE, True): OP.RANGE_REV, (IGNORECASE, True): OP.RANGE_IGN_REV, (FULLCASE, True): OP.RANGE_REV, (FULLIGNORECASE, True): OP.RANGE_IGN_REV} _op_name = "RANGE" def __init__(self, lower, upper, positive=True, case_flags=NOCASE, zerowidth=False): RegexBase.__init__(self) self.lower = lower self.upper = upper self.positive = bool(positive) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self.zerowidth = bool(zerowidth) self._key = (self.__class__, self.lower, self.upper, self.positive, self.case_flags, self.zerowidth) def rebuild(self, positive, case_flags, zerowidth): return Range(self.lower, self.upper, positive, case_flags, zerowidth) def optimise(self, info, in_set=False): # Is the range case-sensitive? if not self.positive or not (self.case_flags & IGNORECASE) or in_set: return self # Is full case-folding possible? if (not (info.flags & UNICODE) or (self.case_flags & FULLIGNORECASE) != FULLIGNORECASE): return self # Get the characters which expand to multiple codepoints on folding. expanding_chars = _regex.get_expand_on_folding() # Get the folded characters in the range. items = [] for ch in expanding_chars: if self.lower <= ord(ch) <= self.upper: folded = _regex.fold_case(FULL_CASE_FOLDING, ch) items.append(String([ord(c) for c in folded], case_flags=self.case_flags)) if not items: # We can fall back to simple case-folding. return self if len(items) < self.upper - self.lower + 1: # Not all the characters are covered by the full case-folding. items.insert(0, self) return Branch(items) def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP return [(self._opcode[self.case_flags, reverse], flags, self.lower, self.upper)] def _dump(self, indent, reverse): display_lower = ascii(chr(self.lower)).lstrip("bu") display_upper = ascii(chr(self.upper)).lstrip("bu") print("{}RANGE {} {} {}{}".format(INDENT * indent, POS_TEXT[self.positive], display_lower, display_upper, CASE_TEXT[self.case_flags])) def matches(self, ch): return (self.lower <= ch <= self.upper) == self.positive def max_width(self): return 1 class RefGroup(RegexBase): _opcode = {(NOCASE, False): OP.REF_GROUP, (IGNORECASE, False): OP.REF_GROUP_IGN, (FULLCASE, False): OP.REF_GROUP, (FULLIGNORECASE, False): OP.REF_GROUP_FLD, (NOCASE, True): OP.REF_GROUP_REV, (IGNORECASE, True): OP.REF_GROUP_IGN_REV, (FULLCASE, True): OP.REF_GROUP_REV, (FULLIGNORECASE, True): OP.REF_GROUP_FLD_REV} def __init__(self, info, group, position, case_flags=NOCASE): RegexBase.__init__(self) self.info = info self.group = group self.position = position self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self._key = self.__class__, self.group, self.case_flags def fix_groups(self, pattern, reverse, fuzzy): try: self.group = int(self.group) except ValueError: try: self.group = self.info.group_index[self.group] except KeyError: raise error("unknown group", pattern, self.position) if not 1 <= self.group <= self.info.group_count: raise error("invalid group reference", pattern, self.position) self._key = self.__class__, self.group, self.case_flags def remove_captures(self): raise error("group reference not allowed", pattern, self.position) def _compile(self, reverse, fuzzy): flags = 0 if fuzzy: flags |= FUZZY_OP return [(self._opcode[self.case_flags, reverse], flags, self.group)] def _dump(self, indent, reverse): print("{}REF_GROUP {}{}".format(INDENT * indent, self.group, CASE_TEXT[self.case_flags])) def max_width(self): return UNLIMITED class SearchAnchor(ZeroWidthBase): _opcode = OP.SEARCH_ANCHOR _op_name = "SEARCH_ANCHOR" class Sequence(RegexBase): def __init__(self, items=None): RegexBase.__init__(self) if items is None: items = [] self.items = items def fix_groups(self, pattern, reverse, fuzzy): for s in self.items: s.fix_groups(pattern, reverse, fuzzy) def optimise(self, info): # Flatten the sequences. items = [] for s in self.items: s = s.optimise(info) if isinstance(s, Sequence): items.extend(s.items) else: items.append(s) return make_sequence(items) def pack_characters(self, info): "Packs sequences of characters into strings." items = [] characters = [] case_flags = NOCASE for s in self.items: if type(s) is Character and s.positive: if s.case_flags != case_flags: # Different case sensitivity, so flush, unless neither the # previous nor the new character are cased. if s.case_flags or is_cased(info, s.value): Sequence._flush_characters(info, characters, case_flags, items) case_flags = s.case_flags characters.append(s.value) elif type(s) is String or type(s) is Literal: if s.case_flags != case_flags: # Different case sensitivity, so flush, unless the neither # the previous nor the new string are cased. if s.case_flags or any(is_cased(info, c) for c in characters): Sequence._flush_characters(info, characters, case_flags, items) case_flags = s.case_flags characters.extend(s.characters) else: Sequence._flush_characters(info, characters, case_flags, items) items.append(s.pack_characters(info)) Sequence._flush_characters(info, characters, case_flags, items) return make_sequence(items) def remove_captures(self): self.items = [s.remove_captures() for s in self.items] return self def is_atomic(self): return all(s.is_atomic() for s in self.items) def can_be_affix(self): return False def contains_group(self): return any(s.contains_group() for s in self.items) def get_firstset(self, reverse): fs = set() items = self.items if reverse: items.reverse() for s in items: fs |= s.get_firstset(reverse) if None not in fs: return fs fs.discard(None) return fs | set([None]) def has_simple_start(self): return bool(self.items) and self.items[0].has_simple_start() def _compile(self, reverse, fuzzy): seq = self.items if reverse: seq = seq[::-1] code = [] for s in seq: code.extend(s.compile(reverse, fuzzy)) return code def _dump(self, indent, reverse): for s in self.items: s.dump(indent, reverse) @staticmethod def _flush_characters(info, characters, case_flags, items): if not characters: return # Disregard case_flags if all of the characters are case-less. if case_flags & IGNORECASE: if not any(is_cased(info, c) for c in characters): case_flags = NOCASE if len(characters) == 1: items.append(Character(characters[0], case_flags=case_flags)) else: items.append(String(characters, case_flags=case_flags)) characters[:] = [] def is_empty(self): return all(i.is_empty() for i in self.items) def __eq__(self, other): return type(self) is type(other) and self.items == other.items def max_width(self): return sum(s.max_width() for s in self.items) def get_required_string(self, reverse): seq = self.items if reverse: seq = seq[::-1] offset = 0 for s in seq: ofs, req = s.get_required_string(reverse) offset += ofs if req: return offset, req return offset, None class SetBase(RegexBase): def __init__(self, info, items, positive=True, case_flags=NOCASE, zerowidth=False): RegexBase.__init__(self) self.info = info self.items = tuple(items) self.positive = bool(positive) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self.zerowidth = bool(zerowidth) self.char_width = 1 self._key = (self.__class__, self.items, self.positive, self.case_flags, self.zerowidth) def rebuild(self, positive, case_flags, zerowidth): return type(self)(self.info, self.items, positive, case_flags, zerowidth).optimise(self.info) def get_firstset(self, reverse): return set([self]) def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP code = [(self._opcode[self.case_flags, reverse], flags)] for m in self.items: code.extend(m.compile()) code.append((OP.END, )) return code def _dump(self, indent, reverse): print("{}{} {}{}".format(INDENT * indent, self._op_name, POS_TEXT[self.positive], CASE_TEXT[self.case_flags])) for i in self.items: i.dump(indent + 1, reverse) def _handle_case_folding(self, info, in_set): # Is the set case-sensitive? if not self.positive or not (self.case_flags & IGNORECASE) or in_set: return self # Is full case-folding possible? if (not (self.info.flags & UNICODE) or (self.case_flags & FULLIGNORECASE) != FULLIGNORECASE): return self # Get the characters which expand to multiple codepoints on folding. expanding_chars = _regex.get_expand_on_folding() # Get the folded characters in the set. items = [] seen = set() for ch in expanding_chars: if self.matches(ord(ch)): folded = _regex.fold_case(FULL_CASE_FOLDING, ch) if folded not in seen: items.append(String([ord(c) for c in folded], case_flags=self.case_flags)) seen.add(folded) if not items: # We can fall back to simple case-folding. return self return Branch([self] + items) def max_width(self): # Is the set case-sensitive? if not self.positive or not (self.case_flags & IGNORECASE): return 1 # Is full case-folding possible? if (not (self.info.flags & UNICODE) or (self.case_flags & FULLIGNORECASE) != FULLIGNORECASE): return 1 # Get the characters which expand to multiple codepoints on folding. expanding_chars = _regex.get_expand_on_folding() # Get the folded characters in the set. seen = set() for ch in expanding_chars: if self.matches(ord(ch)): folded = _regex.fold_case(FULL_CASE_FOLDING, ch) seen.add(folded) if not seen: return 1 return max(len(folded) for folded in seen) class SetDiff(SetBase): _opcode = {(NOCASE, False): OP.SET_DIFF, (IGNORECASE, False): OP.SET_DIFF_IGN, (FULLCASE, False): OP.SET_DIFF, (FULLIGNORECASE, False): OP.SET_DIFF_IGN, (NOCASE, True): OP.SET_DIFF_REV, (IGNORECASE, True): OP.SET_DIFF_IGN_REV, (FULLCASE, True): OP.SET_DIFF_REV, (FULLIGNORECASE, True): OP.SET_DIFF_IGN_REV} _op_name = "SET_DIFF" def optimise(self, info, in_set=False): items = self.items if len(items) > 2: items = [items[0], SetUnion(info, items[1 : ])] if len(items) == 1: return items[0].with_flags(case_flags=self.case_flags, zerowidth=self.zerowidth).optimise(info, in_set) self.items = tuple(m.optimise(info, in_set=True) for m in items) return self._handle_case_folding(info, in_set) def matches(self, ch): m = self.items[0].matches(ch) and not self.items[1].matches(ch) return m == self.positive class SetInter(SetBase): _opcode = {(NOCASE, False): OP.SET_INTER, (IGNORECASE, False): OP.SET_INTER_IGN, (FULLCASE, False): OP.SET_INTER, (FULLIGNORECASE, False): OP.SET_INTER_IGN, (NOCASE, True): OP.SET_INTER_REV, (IGNORECASE, True): OP.SET_INTER_IGN_REV, (FULLCASE, True): OP.SET_INTER_REV, (FULLIGNORECASE, True): OP.SET_INTER_IGN_REV} _op_name = "SET_INTER" def optimise(self, info, in_set=False): items = [] for m in self.items: m = m.optimise(info, in_set=True) if isinstance(m, SetInter) and m.positive: # Intersection in intersection. items.extend(m.items) else: items.append(m) if len(items) == 1: return items[0].with_flags(case_flags=self.case_flags, zerowidth=self.zerowidth).optimise(info, in_set) self.items = tuple(items) return self._handle_case_folding(info, in_set) def matches(self, ch): m = all(i.matches(ch) for i in self.items) return m == self.positive class SetSymDiff(SetBase): _opcode = {(NOCASE, False): OP.SET_SYM_DIFF, (IGNORECASE, False): OP.SET_SYM_DIFF_IGN, (FULLCASE, False): OP.SET_SYM_DIFF, (FULLIGNORECASE, False): OP.SET_SYM_DIFF_IGN, (NOCASE, True): OP.SET_SYM_DIFF_REV, (IGNORECASE, True): OP.SET_SYM_DIFF_IGN_REV, (FULLCASE, True): OP.SET_SYM_DIFF_REV, (FULLIGNORECASE, True): OP.SET_SYM_DIFF_IGN_REV} _op_name = "SET_SYM_DIFF" def optimise(self, info, in_set=False): items = [] for m in self.items: m = m.optimise(info, in_set=True) if isinstance(m, SetSymDiff) and m.positive: # Symmetric difference in symmetric difference. items.extend(m.items) else: items.append(m) if len(items) == 1: return items[0].with_flags(case_flags=self.case_flags, zerowidth=self.zerowidth).optimise(info, in_set) self.items = tuple(items) return self._handle_case_folding(info, in_set) def matches(self, ch): m = False for i in self.items: m = m != i.matches(ch) return m == self.positive class SetUnion(SetBase): _opcode = {(NOCASE, False): OP.SET_UNION, (IGNORECASE, False): OP.SET_UNION_IGN, (FULLCASE, False): OP.SET_UNION, (FULLIGNORECASE, False): OP.SET_UNION_IGN, (NOCASE, True): OP.SET_UNION_REV, (IGNORECASE, True): OP.SET_UNION_IGN_REV, (FULLCASE, True): OP.SET_UNION_REV, (FULLIGNORECASE, True): OP.SET_UNION_IGN_REV} _op_name = "SET_UNION" def optimise(self, info, in_set=False): items = [] for m in self.items: m = m.optimise(info, in_set=True) if isinstance(m, SetUnion) and m.positive: # Union in union. items.extend(m.items) else: items.append(m) if len(items) == 1: i = items[0] return i.with_flags(positive=i.positive == self.positive, case_flags=self.case_flags, zerowidth=self.zerowidth).optimise(info, in_set) self.items = tuple(items) return self._handle_case_folding(info, in_set) def _compile(self, reverse, fuzzy): flags = 0 if self.positive: flags |= POSITIVE_OP if self.zerowidth: flags |= ZEROWIDTH_OP if fuzzy: flags |= FUZZY_OP characters, others = defaultdict(list), [] for m in self.items: if isinstance(m, Character): characters[m.positive].append(m.value) else: others.append(m) code = [(self._opcode[self.case_flags, reverse], flags)] for positive, values in characters.items(): flags = 0 if positive: flags |= POSITIVE_OP if len(values) == 1: code.append((OP.CHARACTER, flags, values[0])) else: code.append((OP.STRING, flags, len(values)) + tuple(values)) for m in others: code.extend(m.compile()) code.append((OP.END, )) return code def matches(self, ch): m = any(i.matches(ch) for i in self.items) return m == self.positive class Skip(ZeroWidthBase): _op_name = "SKIP" _opcode = OP.SKIP class StartOfLine(ZeroWidthBase): _opcode = OP.START_OF_LINE _op_name = "START_OF_LINE" class StartOfLineU(StartOfLine): _opcode = OP.START_OF_LINE_U _op_name = "START_OF_LINE_U" class StartOfString(ZeroWidthBase): _opcode = OP.START_OF_STRING _op_name = "START_OF_STRING" class StartOfWord(ZeroWidthBase): _opcode = OP.START_OF_WORD _op_name = "START_OF_WORD" class String(RegexBase): _opcode = {(NOCASE, False): OP.STRING, (IGNORECASE, False): OP.STRING_IGN, (FULLCASE, False): OP.STRING, (FULLIGNORECASE, False): OP.STRING_FLD, (NOCASE, True): OP.STRING_REV, (IGNORECASE, True): OP.STRING_IGN_REV, (FULLCASE, True): OP.STRING_REV, (FULLIGNORECASE, True): OP.STRING_FLD_REV} def __init__(self, characters, case_flags=NOCASE): self.characters = tuple(characters) self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] if (self.case_flags & FULLIGNORECASE) == FULLIGNORECASE: folded_characters = [] for char in self.characters: folded = _regex.fold_case(FULL_CASE_FOLDING, chr(char)) folded_characters.extend(ord(c) for c in folded) else: folded_characters = self.characters self.folded_characters = tuple(folded_characters) self.required = False self._key = self.__class__, self.characters, self.case_flags def get_firstset(self, reverse): if reverse: pos = -1 else: pos = 0 return set([Character(self.characters[pos], case_flags=self.case_flags)]) def has_simple_start(self): return True def _compile(self, reverse, fuzzy): flags = 0 if fuzzy: flags |= FUZZY_OP if self.required: flags |= REQUIRED_OP return [(self._opcode[self.case_flags, reverse], flags, len(self.folded_characters)) + self.folded_characters] def _dump(self, indent, reverse): display = ascii("".join(chr(c) for c in self.characters)).lstrip("bu") print("{}STRING {}{}".format(INDENT * indent, display, CASE_TEXT[self.case_flags])) def max_width(self): return len(self.folded_characters) def get_required_string(self, reverse): return 0, self class Literal(String): def _dump(self, indent, reverse): for c in self.characters: display = ascii(chr(c)).lstrip("bu") print("{}CHARACTER MATCH {}{}".format(INDENT * indent, display, CASE_TEXT[self.case_flags])) class StringSet(RegexBase): _opcode = {(NOCASE, False): OP.STRING_SET, (IGNORECASE, False): OP.STRING_SET_IGN, (FULLCASE, False): OP.STRING_SET, (FULLIGNORECASE, False): OP.STRING_SET_FLD, (NOCASE, True): OP.STRING_SET_REV, (IGNORECASE, True): OP.STRING_SET_IGN_REV, (FULLCASE, True): OP.STRING_SET_REV, (FULLIGNORECASE, True): OP.STRING_SET_FLD_REV} def __init__(self, info, name, case_flags=NOCASE): self.info = info self.name = name self.case_flags = CASE_FLAGS_COMBINATIONS[case_flags] self._key = self.__class__, self.name, self.case_flags self.set_key = (name, self.case_flags) if self.set_key not in info.named_lists_used: info.named_lists_used[self.set_key] = len(info.named_lists_used) def _compile(self, reverse, fuzzy): index = self.info.named_lists_used[self.set_key] items = self.info.kwargs[self.name] case_flags = self.case_flags if not items: return [] encoding = self.info.flags & _ALL_ENCODINGS fold_flags = encoding | case_flags if fuzzy: choices = [self._folded(fold_flags, i) for i in items] # Sort from longest to shortest. choices.sort(key=lambda s: (-len(s), s)) branches = [] for string in choices: branches.append(Sequence([Character(c, case_flags=case_flags) for c in string])) if len(branches) > 1: branch = Branch(branches) else: branch = branches[0] branch = branch.optimise(self.info).pack_characters(self.info) return branch.compile(reverse, fuzzy) else: min_len = min(len(i) for i in items) max_len = max(len(self._folded(fold_flags, i)) for i in items) return [(self._opcode[case_flags, reverse], index, min_len, max_len)] def _dump(self, indent, reverse): print("{}STRING_SET {}{}".format(INDENT * indent, self.name, CASE_TEXT[self.case_flags])) def _folded(self, fold_flags, item): if isinstance(item, str): return [ord(c) for c in _regex.fold_case(fold_flags, item)] else: return list(item) def _flatten(self, s): # Flattens the branches. if isinstance(s, Branch): for b in s.branches: self._flatten(b) elif isinstance(s, Sequence) and s.items: seq = s.items while isinstance(seq[-1], Sequence): seq[-1 : ] = seq[-1].items n = 0 while n < len(seq) and isinstance(seq[n], Character): n += 1 if n > 1: seq[ : n] = [String([c.value for c in seq[ : n]], case_flags=self.case_flags)] self._flatten(seq[-1]) def max_width(self): if not self.info.kwargs[self.name]: return 0 if self.case_flags & IGNORECASE: fold_flags = (self.info.flags & _ALL_ENCODINGS) | self.case_flags return max(len(_regex.fold_case(fold_flags, i)) for i in self.info.kwargs[self.name]) else: return max(len(i) for i in self.info.kwargs[self.name]) class Source: "Scanner for the regular expression source string." def __init__(self, string): if isinstance(string, str): self.string = string self.char_type = chr else: self.string = string.decode("latin-1") self.char_type = lambda c: bytes([c]) self.pos = 0 self.ignore_space = False self.sep = string[ : 0] def get(self): string = self.string pos = self.pos try: if self.ignore_space: while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) else: break ch = string[pos] self.pos = pos + 1 return ch except IndexError: # We've reached the end of the string. self.pos = pos return string[ : 0] except ValueError: # The comment extended to the end of the string. self.pos = len(string) return string[ : 0] def get_many(self, count=1): string = self.string pos = self.pos try: if self.ignore_space: substring = [] while len(substring) < count: while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) else: break substring.append(string[pos]) pos += 1 substring = "".join(substring) else: substring = string[pos : pos + count] pos += len(substring) self.pos = pos return substring except IndexError: # We've reached the end of the string. self.pos = len(string) return "".join(substring) except ValueError: # The comment extended to the end of the string. self.pos = len(string) return "".join(substring) def get_while(self, test_set, include=True): string = self.string pos = self.pos if self.ignore_space: try: substring = [] while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) elif (string[pos] in test_set) == include: substring.append(string[pos]) pos += 1 else: break self.pos = pos except IndexError: # We've reached the end of the string. self.pos = len(string) except ValueError: # The comment extended to the end of the string. self.pos = len(string) return "".join(substring) else: try: while (string[pos] in test_set) == include: pos += 1 substring = string[self.pos : pos] self.pos = pos return substring except IndexError: # We've reached the end of the string. substring = string[self.pos : pos] self.pos = pos return substring def skip_while(self, test_set, include=True): string = self.string pos = self.pos try: if self.ignore_space: while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) elif (string[pos] in test_set) == include: pos += 1 else: break else: while (string[pos] in test_set) == include: pos += 1 self.pos = pos except IndexError: # We've reached the end of the string. self.pos = len(string) except ValueError: # The comment extended to the end of the string. self.pos = len(string) def match(self, substring): string = self.string pos = self.pos if self.ignore_space: try: for c in substring: while True: if string[pos].isspace(): # Skip over the whitespace. pos += 1 elif string[pos] == "#": # Skip over the comment to the end of the line. pos = string.index("\n", pos) else: break if string[pos] != c: return False pos += 1 self.pos = pos return True except IndexError: # We've reached the end of the string. return False except ValueError: # The comment extended to the end of the string. return False else: if not string.startswith(substring, pos): return False self.pos = pos + len(substring) return True def expect(self, substring): if not self.match(substring): raise error("missing {}".format(substring), self.string, self.pos) def at_end(self): string = self.string pos = self.pos try: if self.ignore_space: while True: if string[pos].isspace(): pos += 1 elif string[pos] == "#": pos = string.index("\n", pos) else: break return pos >= len(string) except IndexError: # We've reached the end of the string. return True except ValueError: # The comment extended to the end of the string. return True class Info: "Info about the regular expression." def __init__(self, flags=0, char_type=None, kwargs={}): flags |= DEFAULT_FLAGS[(flags & _ALL_VERSIONS) or DEFAULT_VERSION] self.flags = flags self.global_flags = flags self.inline_locale = False self.kwargs = kwargs self.group_count = 0 self.group_index = {} self.group_name = {} self.char_type = char_type self.named_lists_used = {} self.open_groups = [] self.open_group_count = {} self.defined_groups = {} self.group_calls = [] self.private_groups = {} def open_group(self, name=None): group = self.group_index.get(name) if group is None: while True: self.group_count += 1 if name is None or self.group_count not in self.group_name: break group = self.group_count if name: self.group_index[name] = group self.group_name[group] = name if group in self.open_groups: # We have a nested named group. We'll assign it a private group # number, initially negative until we can assign a proper # (positive) number. group_alias = -(len(self.private_groups) + 1) self.private_groups[group_alias] = group group = group_alias self.open_groups.append(group) self.open_group_count[group] = self.open_group_count.get(group, 0) + 1 return group def close_group(self): self.open_groups.pop() def is_open_group(self, name): # In version 1, a group reference can refer to an open group. We'll # just pretend the group isn't open. version = (self.flags & _ALL_VERSIONS) or DEFAULT_VERSION if version == VERSION1: return False if name.isdigit(): group = int(name) else: group = self.group_index.get(name) return group in self.open_groups def _check_group_features(info, parsed): """Checks whether the reverse and fuzzy features of the group calls match the groups which they call. """ call_refs = {} additional_groups = [] for call, reverse, fuzzy in info.group_calls: # Look up the reference of this group call. key = (call.group, reverse, fuzzy) ref = call_refs.get(key) if ref is None: # This group doesn't have a reference yet, so look up its features. if call.group == 0: # Calling the pattern as a whole. rev = bool(info.flags & REVERSE) fuz = isinstance(parsed, Fuzzy) if (rev, fuz) != (reverse, fuzzy): # The pattern as a whole doesn't have the features we want, # so we'll need to make a copy of it with the desired # features. additional_groups.append((parsed, reverse, fuzzy)) else: # Calling a capture group. def_info = info.defined_groups[call.group] group = def_info[0] if def_info[1 : ] != (reverse, fuzzy): # The group doesn't have the features we want, so we'll # need to make a copy of it with the desired features. additional_groups.append((group, reverse, fuzzy)) ref = len(call_refs) call_refs[key] = ref call.call_ref = ref info.call_refs = call_refs info.additional_groups = additional_groups def _get_required_string(parsed, flags): "Gets the required string and related info of a parsed pattern." req_offset, required = parsed.get_required_string(bool(flags & REVERSE)) if required: required.required = True if req_offset >= UNLIMITED: req_offset = -1 req_flags = required.case_flags if not (flags & UNICODE): req_flags &= ~UNICODE req_chars = required.folded_characters else: req_offset = 0 req_chars = () req_flags = 0 return req_offset, req_chars, req_flags class Scanner: def __init__(self, lexicon, flags=0): self.lexicon = lexicon # Combine phrases into a compound pattern. patterns = [] for phrase, action in lexicon: # Parse the regular expression. source = Source(phrase) info = Info(flags, source.char_type) source.ignore_space = bool(info.flags & VERBOSE) parsed = _parse_pattern(source, info) if not source.at_end(): raise error("unbalanced parenthesis", source.string, source.pos) # We want to forbid capture groups within each phrase. patterns.append(parsed.remove_captures()) # Combine all the subpatterns into one pattern. info = Info(flags) patterns = [Group(info, g + 1, p) for g, p in enumerate(patterns)] parsed = Branch(patterns) # Optimise the compound pattern. parsed = parsed.optimise(info) parsed = parsed.pack_characters(info) # Get the required string. req_offset, req_chars, req_flags = _get_required_string(parsed, info.flags) # Check the features of the groups. _check_group_features(info, parsed) # Complain if there are any group calls. They are not supported by the # Scanner class. if info.call_refs: raise error("recursive regex not supported by Scanner", source.string, source.pos) reverse = bool(info.flags & REVERSE) # Compile the compound pattern. The result is a list of tuples. code = parsed.compile(reverse) + [(OP.SUCCESS, )] # Flatten the code into a list of ints. code = _flatten_code(code) if not parsed.has_simple_start(): # Get the first set, if possible. try: fs_code = _compile_firstset(info, parsed.get_firstset(reverse)) fs_code = _flatten_code(fs_code) code = fs_code + code except _FirstSetError: pass # Check the global flags for conflicts. version = (info.flags & _ALL_VERSIONS) or DEFAULT_VERSION if version not in (0, VERSION0, VERSION1): raise ValueError("VERSION0 and VERSION1 flags are mutually incompatible") # Create the PatternObject. # # Local flags like IGNORECASE affect the code generation, but aren't # needed by the PatternObject itself. Conversely, global flags like # LOCALE _don't_ affect the code generation but _are_ needed by the # PatternObject. self.scanner = _regex.compile(None, (flags & GLOBAL_FLAGS) | version, code, {}, {}, {}, [], req_offset, req_chars, req_flags, len(patterns)) def scan(self, string): result = [] append = result.append match = self.scanner.scanner(string).match i = 0 while True: m = match() if not m: break j = m.end() if i == j: break action = self.lexicon[m.lastindex - 1][1] if hasattr(action, '__call__'): self.match = m action = action(self, m.group()) if action is not None: append(action) i = j return result, string[i : ] # Get the known properties dict. PROPERTIES = _regex.get_properties() # Build the inverse of the properties dict. PROPERTY_NAMES = {} for prop_name, (prop_id, values) in PROPERTIES.items(): name, prop_values = PROPERTY_NAMES.get(prop_id, ("", {})) name = max(name, prop_name, key=len) PROPERTY_NAMES[prop_id] = name, prop_values for val_name, val_id in values.items(): prop_values[val_id] = max(prop_values.get(val_id, ""), val_name, key=len) # Character escape sequences. CHARACTER_ESCAPES = { "a": "\a", "b": "\b", "f": "\f", "n": "\n", "r": "\r", "t": "\t", "v": "\v", } # Predefined character set escape sequences. CHARSET_ESCAPES = { "d": lookup_property(None, "Digit", True), "D": lookup_property(None, "Digit", False), "s": lookup_property(None, "Space", True), "S": lookup_property(None, "Space", False), "w": lookup_property(None, "Word", True), "W": lookup_property(None, "Word", False), } # Positional escape sequences. POSITION_ESCAPES = { "A": StartOfString(), "b": Boundary(), "B": Boundary(False), "K": Keep(), "m": StartOfWord(), "M": EndOfWord(), "Z": EndOfString(), } # Positional escape sequences when WORD flag set. WORD_POSITION_ESCAPES = dict(POSITION_ESCAPES) WORD_POSITION_ESCAPES.update({ "b": DefaultBoundary(), "B": DefaultBoundary(False), "m": DefaultStartOfWord(), "M": DefaultEndOfWord(), }) # Regex control verbs. VERBS = { "FAIL": Failure(), "F": Failure(), "PRUNE": Prune(), "SKIP": Skip(), } regex-2016.01.10/Python3/_regex_unicode.c0000666000000000000000000253720412540663552016062 0ustar 00000000000000/* For Unicode version 8.0.0 */ #include "_regex_unicode.h" #define RE_BLANK_MASK ((1 << RE_PROP_ZL) | (1 << RE_PROP_ZP)) #define RE_GRAPH_MASK ((1 << RE_PROP_CC) | (1 << RE_PROP_CS) | (1 << RE_PROP_CN)) #define RE_WORD_MASK (RE_PROP_M_MASK | (1 << RE_PROP_ND) | (1 << RE_PROP_PC)) typedef struct RE_AllCases { RE_INT32 diffs[RE_MAX_CASES - 1]; } RE_AllCases; typedef struct RE_FullCaseFolding { RE_INT32 diff; RE_UINT16 codepoints[RE_MAX_FOLDED - 1]; } RE_FullCaseFolding; /* strings. */ char* re_strings[] = { "-1/2", "0", "1", "1/10", "1/12", "1/16", "1/2", "1/3", "1/4", "1/5", "1/6", "1/7", "1/8", "1/9", "10", "100", "1000", "10000", "100000", "1000000", "100000000", "10000000000", "1000000000000", "103", "107", "11", "11/12", "11/2", "118", "12", "122", "129", "13", "13/2", "130", "132", "133", "14", "15", "15/2", "16", "17", "17/2", "18", "19", "2", "2/3", "2/5", "20", "200", "2000", "20000", "200000", "202", "21", "214", "216", "216000", "218", "22", "220", "222", "224", "226", "228", "23", "230", "232", "233", "234", "24", "240", "25", "26", "27", "28", "29", "3", "3/16", "3/2", "3/4", "3/5", "3/8", "30", "300", "3000", "30000", "300000", "31", "32", "33", "34", "35", "36", "37", "38", "39", "4", "4/5", "40", "400", "4000", "40000", "400000", "41", "42", "43", "432000", "44", "45", "46", "47", "48", "49", "5", "5/12", "5/2", "5/6", "5/8", "50", "500", "5000", "50000", "500000", "6", "60", "600", "6000", "60000", "600000", "7", "7/12", "7/2", "7/8", "70", "700", "7000", "70000", "700000", "8", "80", "800", "8000", "80000", "800000", "84", "9", "9/2", "90", "900", "9000", "90000", "900000", "91", "A", "ABOVE", "ABOVELEFT", "ABOVERIGHT", "AEGEANNUMBERS", "AGHB", "AHEX", "AHOM", "AI", "AIN", "AL", "ALAPH", "ALCHEMICAL", "ALCHEMICALSYMBOLS", "ALEF", "ALETTER", "ALNUM", "ALPHA", "ALPHABETIC", "ALPHABETICPF", "ALPHABETICPRESENTATIONFORMS", "ALPHANUMERIC", "AMBIGUOUS", "AN", "ANATOLIANHIEROGLYPHS", "ANCIENTGREEKMUSIC", "ANCIENTGREEKMUSICALNOTATION", "ANCIENTGREEKNUMBERS", "ANCIENTSYMBOLS", "ANY", "AR", "ARAB", "ARABIC", "ARABICEXTA", "ARABICEXTENDEDA", "ARABICLETTER", "ARABICMATH", "ARABICMATHEMATICALALPHABETICSYMBOLS", "ARABICNUMBER", "ARABICPFA", "ARABICPFB", "ARABICPRESENTATIONFORMSA", "ARABICPRESENTATIONFORMSB", "ARABICSUP", "ARABICSUPPLEMENT", "ARMENIAN", "ARMI", "ARMN", "ARROWS", "ASCII", "ASCIIHEXDIGIT", "ASSIGNED", "AT", "ATA", "ATAR", "ATB", "ATBL", "ATERM", "ATTACHEDABOVE", "ATTACHEDABOVERIGHT", "ATTACHEDBELOW", "ATTACHEDBELOWLEFT", "AVAGRAHA", "AVESTAN", "AVST", "B", "B2", "BA", "BALI", "BALINESE", "BAMU", "BAMUM", "BAMUMSUP", "BAMUMSUPPLEMENT", "BASICLATIN", "BASS", "BASSAVAH", "BATAK", "BATK", "BB", "BC", "BEH", "BELOW", "BELOWLEFT", "BELOWRIGHT", "BENG", "BENGALI", "BETH", "BIDIC", "BIDICLASS", "BIDICONTROL", "BIDIM", "BIDIMIRRORED", "BINDU", "BK", "BL", "BLANK", "BLK", "BLOCK", "BLOCKELEMENTS", "BN", "BOPO", "BOPOMOFO", "BOPOMOFOEXT", "BOPOMOFOEXTENDED", "BOTTOM", "BOTTOMANDRIGHT", "BOUNDARYNEUTRAL", "BOXDRAWING", "BR", "BRAH", "BRAHMI", "BRAHMIJOININGNUMBER", "BRAI", "BRAILLE", "BRAILLEPATTERNS", "BREAKAFTER", "BREAKBEFORE", "BREAKBOTH", "BREAKSYMBOLS", "BUGI", "BUGINESE", "BUHD", "BUHID", "BURUSHASKIYEHBARREE", "BYZANTINEMUSIC", "BYZANTINEMUSICALSYMBOLS", "C", "C&", "CAKM", "CAN", "CANADIANABORIGINAL", "CANADIANSYLLABICS", "CANONICAL", "CANONICALCOMBININGCLASS", "CANS", "CANTILLATIONMARK", "CARI", "CARIAN", "CARRIAGERETURN", "CASED", "CASEDLETTER", "CASEIGNORABLE", "CAUCASIANALBANIAN", "CB", "CC", "CCC", "CCC10", "CCC103", "CCC107", "CCC11", "CCC118", "CCC12", "CCC122", "CCC129", "CCC13", "CCC130", "CCC132", "CCC133", "CCC14", "CCC15", "CCC16", "CCC17", "CCC18", "CCC19", "CCC20", "CCC21", "CCC22", "CCC23", "CCC24", "CCC25", "CCC26", "CCC27", "CCC28", "CCC29", "CCC30", "CCC31", "CCC32", "CCC33", "CCC34", "CCC35", "CCC36", "CCC84", "CCC91", "CF", "CHAKMA", "CHAM", "CHANGESWHENCASEFOLDED", "CHANGESWHENCASEMAPPED", "CHANGESWHENLOWERCASED", "CHANGESWHENTITLECASED", "CHANGESWHENUPPERCASED", "CHER", "CHEROKEE", "CHEROKEESUP", "CHEROKEESUPPLEMENT", "CI", "CIRCLE", "CJ", "CJK", "CJKCOMPAT", "CJKCOMPATFORMS", "CJKCOMPATIBILITY", "CJKCOMPATIBILITYFORMS", "CJKCOMPATIBILITYIDEOGRAPHS", "CJKCOMPATIBILITYIDEOGRAPHSSUPPLEMENT", "CJKCOMPATIDEOGRAPHS", "CJKCOMPATIDEOGRAPHSSUP", "CJKEXTA", "CJKEXTB", "CJKEXTC", "CJKEXTD", "CJKEXTE", "CJKRADICALSSUP", "CJKRADICALSSUPPLEMENT", "CJKSTROKES", "CJKSYMBOLS", "CJKSYMBOLSANDPUNCTUATION", "CJKUNIFIEDIDEOGRAPHS", "CJKUNIFIEDIDEOGRAPHSEXTENSIONA", "CJKUNIFIEDIDEOGRAPHSEXTENSIONB", "CJKUNIFIEDIDEOGRAPHSEXTENSIONC", "CJKUNIFIEDIDEOGRAPHSEXTENSIOND", "CJKUNIFIEDIDEOGRAPHSEXTENSIONE", "CL", "CLOSE", "CLOSEPARENTHESIS", "CLOSEPUNCTUATION", "CM", "CN", "CNTRL", "CO", "COM", "COMBININGDIACRITICALMARKS", "COMBININGDIACRITICALMARKSEXTENDED", "COMBININGDIACRITICALMARKSFORSYMBOLS", "COMBININGDIACRITICALMARKSSUPPLEMENT", "COMBININGHALFMARKS", "COMBININGMARK", "COMBININGMARKSFORSYMBOLS", "COMMON", "COMMONINDICNUMBERFORMS", "COMMONSEPARATOR", "COMPAT", "COMPATJAMO", "COMPLEXCONTEXT", "CONDITIONALJAPANESESTARTER", "CONNECTORPUNCTUATION", "CONSONANT", "CONSONANTDEAD", "CONSONANTFINAL", "CONSONANTHEADLETTER", "CONSONANTKILLER", "CONSONANTMEDIAL", "CONSONANTPLACEHOLDER", "CONSONANTPRECEDINGREPHA", "CONSONANTPREFIXED", "CONSONANTSUBJOINED", "CONSONANTSUCCEEDINGREPHA", "CONSONANTWITHSTACKER", "CONTINGENTBREAK", "CONTROL", "CONTROLPICTURES", "COPT", "COPTIC", "COPTICEPACTNUMBERS", "COUNTINGROD", "COUNTINGRODNUMERALS", "CP", "CPRT", "CR", "CS", "CUNEIFORM", "CUNEIFORMNUMBERS", "CUNEIFORMNUMBERSANDPUNCTUATION", "CURRENCYSYMBOL", "CURRENCYSYMBOLS", "CWCF", "CWCM", "CWL", "CWT", "CWU", "CYPRIOT", "CYPRIOTSYLLABARY", "CYRILLIC", "CYRILLICEXTA", "CYRILLICEXTB", "CYRILLICEXTENDEDA", "CYRILLICEXTENDEDB", "CYRILLICSUP", "CYRILLICSUPPLEMENT", "CYRILLICSUPPLEMENTARY", "CYRL", "D", "DA", "DAL", "DALATHRISH", "DASH", "DASHPUNCTUATION", "DB", "DE", "DECIMAL", "DECIMALNUMBER", "DECOMPOSITIONTYPE", "DEFAULTIGNORABLECODEPOINT", "DEP", "DEPRECATED", "DESERET", "DEVA", "DEVANAGARI", "DEVANAGARIEXT", "DEVANAGARIEXTENDED", "DI", "DIA", "DIACRITIC", "DIACRITICALS", "DIACRITICALSEXT", "DIACRITICALSFORSYMBOLS", "DIACRITICALSSUP", "DIGIT", "DINGBATS", "DOMINO", "DOMINOTILES", "DOUBLEABOVE", "DOUBLEBELOW", "DOUBLEQUOTE", "DQ", "DSRT", "DT", "DUALJOINING", "DUPL", "DUPLOYAN", "E", "EA", "EARLYDYNASTICCUNEIFORM", "EASTASIANWIDTH", "EGYP", "EGYPTIANHIEROGLYPHS", "ELBA", "ELBASAN", "EMOTICONS", "EN", "ENC", "ENCLOSEDALPHANUM", "ENCLOSEDALPHANUMERICS", "ENCLOSEDALPHANUMERICSUPPLEMENT", "ENCLOSEDALPHANUMSUP", "ENCLOSEDCJK", "ENCLOSEDCJKLETTERSANDMONTHS", "ENCLOSEDIDEOGRAPHICSUP", "ENCLOSEDIDEOGRAPHICSUPPLEMENT", "ENCLOSINGMARK", "ES", "ET", "ETHI", "ETHIOPIC", "ETHIOPICEXT", "ETHIOPICEXTA", "ETHIOPICEXTENDED", "ETHIOPICEXTENDEDA", "ETHIOPICSUP", "ETHIOPICSUPPLEMENT", "EUROPEANNUMBER", "EUROPEANSEPARATOR", "EUROPEANTERMINATOR", "EX", "EXCLAMATION", "EXT", "EXTEND", "EXTENDER", "EXTENDNUMLET", "F", "FALSE", "FARSIYEH", "FE", "FEH", "FIN", "FINAL", "FINALPUNCTUATION", "FINALSEMKATH", "FIRSTSTRONGISOLATE", "FO", "FONT", "FORMAT", "FRA", "FRACTION", "FSI", "FULLWIDTH", "GAF", "GAMAL", "GC", "GCB", "GEMINATIONMARK", "GENERALCATEGORY", "GENERALPUNCTUATION", "GEOMETRICSHAPES", "GEOMETRICSHAPESEXT", "GEOMETRICSHAPESEXTENDED", "GEOR", "GEORGIAN", "GEORGIANSUP", "GEORGIANSUPPLEMENT", "GL", "GLAG", "GLAGOLITIC", "GLUE", "GOTH", "GOTHIC", "GRAN", "GRANTHA", "GRAPH", "GRAPHEMEBASE", "GRAPHEMECLUSTERBREAK", "GRAPHEMEEXTEND", "GRAPHEMELINK", "GRBASE", "GREEK", "GREEKANDCOPTIC", "GREEKEXT", "GREEKEXTENDED", "GREK", "GREXT", "GRLINK", "GUJARATI", "GUJR", "GURMUKHI", "GURU", "H", "H2", "H3", "HAH", "HALFANDFULLFORMS", "HALFMARKS", "HALFWIDTH", "HALFWIDTHANDFULLWIDTHFORMS", "HAMZAONHEHGOAL", "HAN", "HANG", "HANGUL", "HANGULCOMPATIBILITYJAMO", "HANGULJAMO", "HANGULJAMOEXTENDEDA", "HANGULJAMOEXTENDEDB", "HANGULSYLLABLES", "HANGULSYLLABLETYPE", "HANI", "HANO", "HANUNOO", "HATR", "HATRAN", "HE", "HEBR", "HEBREW", "HEBREWLETTER", "HEH", "HEHGOAL", "HETH", "HEX", "HEXDIGIT", "HIGHPRIVATEUSESURROGATES", "HIGHPUSURROGATES", "HIGHSURROGATES", "HIRA", "HIRAGANA", "HL", "HLUW", "HMNG", "HRKT", "HST", "HUNG", "HY", "HYPHEN", "ID", "IDC", "IDCONTINUE", "IDEO", "IDEOGRAPHIC", "IDEOGRAPHICDESCRIPTIONCHARACTERS", "IDS", "IDSB", "IDSBINARYOPERATOR", "IDST", "IDSTART", "IDSTRINARYOPERATOR", "IMPERIALARAMAIC", "IN", "INDICNUMBERFORMS", "INDICPOSITIONALCATEGORY", "INDICSYLLABICCATEGORY", "INFIXNUMERIC", "INHERITED", "INIT", "INITIAL", "INITIALPUNCTUATION", "INPC", "INSC", "INSCRIPTIONALPAHLAVI", "INSCRIPTIONALPARTHIAN", "INSEPARABLE", "INSEPERABLE", "INVISIBLESTACKER", "IOTASUBSCRIPT", "IPAEXT", "IPAEXTENSIONS", "IS", "ISO", "ISOLATED", "ITAL", "JAMO", "JAMOEXTA", "JAMOEXTB", "JAVA", "JAVANESE", "JG", "JL", "JOINC", "JOINCAUSING", "JOINCONTROL", "JOINER", "JOININGGROUP", "JOININGTYPE", "JT", "JV", "KA", "KAF", "KAITHI", "KALI", "KANA", "KANASUP", "KANASUPPLEMENT", "KANAVOICING", "KANBUN", "KANGXI", "KANGXIRADICALS", "KANNADA", "KAPH", "KATAKANA", "KATAKANAEXT", "KATAKANAORHIRAGANA", "KATAKANAPHONETICEXTENSIONS", "KAYAHLI", "KHAPH", "KHAR", "KHAROSHTHI", "KHMER", "KHMERSYMBOLS", "KHMR", "KHOJ", "KHOJKI", "KHUDAWADI", "KNDA", "KNOTTEDHEH", "KTHI", "KV", "L", "L&", "LAM", "LAMADH", "LANA", "LAO", "LAOO", "LATIN", "LATIN1", "LATIN1SUP", "LATIN1SUPPLEMENT", "LATINEXTA", "LATINEXTADDITIONAL", "LATINEXTB", "LATINEXTC", "LATINEXTD", "LATINEXTE", "LATINEXTENDEDA", "LATINEXTENDEDADDITIONAL", "LATINEXTENDEDB", "LATINEXTENDEDC", "LATINEXTENDEDD", "LATINEXTENDEDE", "LATN", "LB", "LC", "LE", "LEADINGJAMO", "LEFT", "LEFTANDRIGHT", "LEFTJOINING", "LEFTTORIGHT", "LEFTTORIGHTEMBEDDING", "LEFTTORIGHTISOLATE", "LEFTTORIGHTOVERRIDE", "LEPC", "LEPCHA", "LETTER", "LETTERLIKESYMBOLS", "LETTERNUMBER", "LF", "LIMB", "LIMBU", "LINA", "LINB", "LINEARA", "LINEARB", "LINEARBIDEOGRAMS", "LINEARBSYLLABARY", "LINEBREAK", "LINEFEED", "LINESEPARATOR", "LISU", "LL", "LM", "LO", "LOE", "LOGICALORDEREXCEPTION", "LOWER", "LOWERCASE", "LOWERCASELETTER", "LOWSURROGATES", "LRE", "LRI", "LRO", "LT", "LU", "LV", "LVSYLLABLE", "LVT", "LVTSYLLABLE", "LYCI", "LYCIAN", "LYDI", "LYDIAN", "M", "M&", "MAHAJANI", "MAHJ", "MAHJONG", "MAHJONGTILES", "MALAYALAM", "MAND", "MANDAIC", "MANDATORYBREAK", "MANI", "MANICHAEAN", "MANICHAEANALEPH", "MANICHAEANAYIN", "MANICHAEANBETH", "MANICHAEANDALETH", "MANICHAEANDHAMEDH", "MANICHAEANFIVE", "MANICHAEANGIMEL", "MANICHAEANHETH", "MANICHAEANHUNDRED", "MANICHAEANKAPH", "MANICHAEANLAMEDH", "MANICHAEANMEM", "MANICHAEANNUN", "MANICHAEANONE", "MANICHAEANPE", "MANICHAEANQOPH", "MANICHAEANRESH", "MANICHAEANSADHE", "MANICHAEANSAMEKH", "MANICHAEANTAW", "MANICHAEANTEN", "MANICHAEANTETH", "MANICHAEANTHAMEDH", "MANICHAEANTWENTY", "MANICHAEANWAW", "MANICHAEANYODH", "MANICHAEANZAYIN", "MARK", "MATH", "MATHALPHANUM", "MATHEMATICALALPHANUMERICSYMBOLS", "MATHEMATICALOPERATORS", "MATHOPERATORS", "MATHSYMBOL", "MB", "MC", "ME", "MED", "MEDIAL", "MEEM", "MEETEIMAYEK", "MEETEIMAYEKEXT", "MEETEIMAYEKEXTENSIONS", "MEND", "MENDEKIKAKUI", "MERC", "MERO", "MEROITICCURSIVE", "MEROITICHIEROGLYPHS", "MIAO", "MIDLETTER", "MIDNUM", "MIDNUMLET", "MIM", "MISCARROWS", "MISCELLANEOUSMATHEMATICALSYMBOLSA", "MISCELLANEOUSMATHEMATICALSYMBOLSB", "MISCELLANEOUSSYMBOLS", "MISCELLANEOUSSYMBOLSANDARROWS", "MISCELLANEOUSSYMBOLSANDPICTOGRAPHS", "MISCELLANEOUSTECHNICAL", "MISCMATHSYMBOLSA", "MISCMATHSYMBOLSB", "MISCPICTOGRAPHS", "MISCSYMBOLS", "MISCTECHNICAL", "ML", "MLYM", "MN", "MODI", "MODIFIERLETTER", "MODIFIERLETTERS", "MODIFIERSYMBOL", "MODIFIERTONELETTERS", "MODIFYINGLETTER", "MONG", "MONGOLIAN", "MRO", "MROO", "MTEI", "MULT", "MULTANI", "MUSIC", "MUSICALSYMBOLS", "MYANMAR", "MYANMAREXTA", "MYANMAREXTB", "MYANMAREXTENDEDA", "MYANMAREXTENDEDB", "MYMR", "N", "N&", "NA", "NABATAEAN", "NAN", "NAR", "NARB", "NARROW", "NB", "NBAT", "NCHAR", "ND", "NEUTRAL", "NEWLINE", "NEWTAILUE", "NEXTLINE", "NK", "NKO", "NKOO", "NL", "NO", "NOBLOCK", "NOBREAK", "NOJOININGGROUP", "NONCHARACTERCODEPOINT", "NONE", "NONJOINER", "NONJOINING", "NONSPACINGMARK", "NONSTARTER", "NOON", "NOTAPPLICABLE", "NOTREORDERED", "NR", "NS", "NSM", "NT", "NU", "NUKTA", "NUMBER", "NUMBERFORMS", "NUMBERJOINER", "NUMERIC", "NUMERICTYPE", "NUMERICVALUE", "NUN", "NV", "NYA", "OALPHA", "OCR", "ODI", "OGAM", "OGHAM", "OGREXT", "OIDC", "OIDS", "OLCHIKI", "OLCK", "OLDHUNGARIAN", "OLDITALIC", "OLDNORTHARABIAN", "OLDPERMIC", "OLDPERSIAN", "OLDSOUTHARABIAN", "OLDTURKIC", "OLETTER", "OLOWER", "OMATH", "ON", "OP", "OPENPUNCTUATION", "OPTICALCHARACTERRECOGNITION", "ORIYA", "ORKH", "ORNAMENTALDINGBATS", "ORYA", "OSMA", "OSMANYA", "OTHER", "OTHERALPHABETIC", "OTHERDEFAULTIGNORABLECODEPOINT", "OTHERGRAPHEMEEXTEND", "OTHERIDCONTINUE", "OTHERIDSTART", "OTHERLETTER", "OTHERLOWERCASE", "OTHERMATH", "OTHERNEUTRAL", "OTHERNUMBER", "OTHERPUNCTUATION", "OTHERSYMBOL", "OTHERUPPERCASE", "OUPPER", "OV", "OVERLAY", "OVERSTRUCK", "P", "P&", "PAHAWHHMONG", "PALM", "PALMYRENE", "PARAGRAPHSEPARATOR", "PATSYN", "PATTERNSYNTAX", "PATTERNWHITESPACE", "PATWS", "PAUC", "PAUCINHAU", "PC", "PD", "PDF", "PDI", "PE", "PERM", "PF", "PHAG", "PHAGSPA", "PHAISTOS", "PHAISTOSDISC", "PHLI", "PHLP", "PHNX", "PHOENICIAN", "PHONETICEXT", "PHONETICEXTENSIONS", "PHONETICEXTENSIONSSUPPLEMENT", "PHONETICEXTSUP", "PI", "PLAYINGCARDS", "PLRD", "PO", "POPDIRECTIONALFORMAT", "POPDIRECTIONALISOLATE", "POSIXALNUM", "POSIXDIGIT", "POSIXPUNCT", "POSIXXDIGIT", "POSTFIXNUMERIC", "PP", "PR", "PREFIXNUMERIC", "PREPEND", "PRINT", "PRIVATEUSE", "PRIVATEUSEAREA", "PRTI", "PS", "PSALTERPAHLAVI", "PUA", "PUNCT", "PUNCTUATION", "PUREKILLER", "QAAC", "QAAI", "QAF", "QAPH", "QMARK", "QU", "QUOTATION", "QUOTATIONMARK", "R", "RADICAL", "REGIONALINDICATOR", "REGISTERSHIFTER", "REH", "REJANG", "REVERSEDPE", "RI", "RIGHT", "RIGHTJOINING", "RIGHTTOLEFT", "RIGHTTOLEFTEMBEDDING", "RIGHTTOLEFTISOLATE", "RIGHTTOLEFTOVERRIDE", "RJNG", "RLE", "RLI", "RLO", "ROHINGYAYEH", "RUMI", "RUMINUMERALSYMBOLS", "RUNIC", "RUNR", "S", "S&", "SA", "SAD", "SADHE", "SAMARITAN", "SAMR", "SARB", "SAUR", "SAURASHTRA", "SB", "SC", "SCONTINUE", "SCRIPT", "SD", "SE", "SEEN", "SEGMENTSEPARATOR", "SEMKATH", "SENTENCEBREAK", "SEP", "SEPARATOR", "SG", "SGNW", "SHARADA", "SHAVIAN", "SHAW", "SHIN", "SHORTHANDFORMATCONTROLS", "SHRD", "SIDD", "SIDDHAM", "SIGNWRITING", "SIND", "SINGLEQUOTE", "SINH", "SINHALA", "SINHALAARCHAICNUMBERS", "SK", "SM", "SMALL", "SMALLFORMS", "SMALLFORMVARIANTS", "SML", "SO", "SOFTDOTTED", "SORA", "SORASOMPENG", "SP", "SPACE", "SPACESEPARATOR", "SPACINGMARK", "SPACINGMODIFIERLETTERS", "SPECIALS", "SQ", "SQR", "SQUARE", "ST", "STERM", "STRAIGHTWAW", "SUB", "SUND", "SUNDANESE", "SUNDANESESUP", "SUNDANESESUPPLEMENT", "SUP", "SUPARROWSA", "SUPARROWSB", "SUPARROWSC", "SUPER", "SUPERANDSUB", "SUPERSCRIPTSANDSUBSCRIPTS", "SUPMATHOPERATORS", "SUPPLEMENTALARROWSA", "SUPPLEMENTALARROWSB", "SUPPLEMENTALARROWSC", "SUPPLEMENTALMATHEMATICALOPERATORS", "SUPPLEMENTALPUNCTUATION", "SUPPLEMENTALSYMBOLSANDPICTOGRAPHS", "SUPPLEMENTARYPRIVATEUSEAREAA", "SUPPLEMENTARYPRIVATEUSEAREAB", "SUPPUAA", "SUPPUAB", "SUPPUNCTUATION", "SUPSYMBOLSANDPICTOGRAPHS", "SURROGATE", "SUTTONSIGNWRITING", "SWASHKAF", "SY", "SYLLABLEMODIFIER", "SYLO", "SYLOTINAGRI", "SYMBOL", "SYRC", "SYRIAC", "SYRIACWAW", "T", "TAGALOG", "TAGB", "TAGBANWA", "TAGS", "TAH", "TAILE", "TAITHAM", "TAIVIET", "TAIXUANJING", "TAIXUANJINGSYMBOLS", "TAKR", "TAKRI", "TALE", "TALU", "TAMIL", "TAML", "TAVT", "TAW", "TEHMARBUTA", "TEHMARBUTAGOAL", "TELU", "TELUGU", "TERM", "TERMINALPUNCTUATION", "TETH", "TFNG", "TGLG", "THAA", "THAANA", "THAI", "TIBETAN", "TIBT", "TIFINAGH", "TIRH", "TIRHUTA", "TITLECASELETTER", "TONELETTER", "TONEMARK", "TOP", "TOPANDBOTTOM", "TOPANDBOTTOMANDRIGHT", "TOPANDLEFT", "TOPANDLEFTANDRIGHT", "TOPANDRIGHT", "TRAILINGJAMO", "TRANSPARENT", "TRANSPORTANDMAP", "TRANSPORTANDMAPSYMBOLS", "TRUE", "U", "UCAS", "UCASEXT", "UGAR", "UGARITIC", "UIDEO", "UNASSIGNED", "UNIFIEDCANADIANABORIGINALSYLLABICS", "UNIFIEDCANADIANABORIGINALSYLLABICSEXTENDED", "UNIFIEDIDEOGRAPH", "UNKNOWN", "UP", "UPPER", "UPPERCASE", "UPPERCASELETTER", "V", "VAI", "VAII", "VARIATIONSELECTOR", "VARIATIONSELECTORS", "VARIATIONSELECTORSSUPPLEMENT", "VEDICEXT", "VEDICEXTENSIONS", "VERT", "VERTICAL", "VERTICALFORMS", "VIRAMA", "VISARGA", "VISUALORDERLEFT", "VOWEL", "VOWELDEPENDENT", "VOWELINDEPENDENT", "VOWELJAMO", "VR", "VS", "VSSUP", "W", "WARA", "WARANGCITI", "WAW", "WB", "WHITESPACE", "WIDE", "WJ", "WORD", "WORDBREAK", "WORDJOINER", "WS", "WSPACE", "XDIGIT", "XIDC", "XIDCONTINUE", "XIDS", "XIDSTART", "XPEO", "XSUX", "XX", "Y", "YEH", "YEHBARREE", "YEHWITHTAIL", "YES", "YI", "YIII", "YIJING", "YIJINGHEXAGRAMSYMBOLS", "YIRADICALS", "YISYLLABLES", "YUDH", "YUDHHE", "Z", "Z&", "ZAIN", "ZHAIN", "ZINH", "ZL", "ZP", "ZS", "ZW", "ZWSPACE", "ZYYY", "ZZZZ", }; /* strings: 12240 bytes. */ /* properties. */ RE_Property re_properties[] = { { 547, 0, 0}, { 544, 0, 0}, { 252, 1, 1}, { 251, 1, 1}, {1081, 2, 2}, {1079, 2, 2}, {1259, 3, 3}, {1254, 3, 3}, { 566, 4, 4}, { 545, 4, 4}, {1087, 5, 5}, {1078, 5, 5}, { 823, 6, 6}, { 172, 7, 6}, { 171, 7, 6}, { 767, 8, 6}, { 766, 8, 6}, {1227, 9, 6}, {1226, 9, 6}, { 294, 10, 6}, { 296, 11, 6}, { 350, 11, 6}, { 343, 12, 6}, { 433, 12, 6}, { 345, 13, 6}, { 435, 13, 6}, { 344, 14, 6}, { 434, 14, 6}, { 341, 15, 6}, { 431, 15, 6}, { 342, 16, 6}, { 432, 16, 6}, { 636, 17, 6}, { 632, 17, 6}, { 628, 18, 6}, { 627, 18, 6}, {1267, 19, 6}, {1266, 19, 6}, {1265, 20, 6}, {1264, 20, 6}, { 458, 21, 6}, { 466, 21, 6}, { 567, 22, 6}, { 575, 22, 6}, { 565, 23, 6}, { 569, 23, 6}, { 568, 24, 6}, { 576, 24, 6}, {1255, 25, 6}, {1262, 25, 6}, {1117, 25, 6}, { 244, 26, 6}, { 242, 26, 6}, { 671, 27, 6}, { 669, 27, 6}, { 451, 28, 6}, { 625, 29, 6}, {1044, 30, 6}, {1041, 30, 6}, {1188, 31, 6}, {1187, 31, 6}, { 971, 32, 6}, { 952, 32, 6}, { 612, 33, 6}, { 611, 33, 6}, { 204, 34, 6}, { 160, 34, 6}, { 964, 35, 6}, { 933, 35, 6}, { 630, 36, 6}, { 629, 36, 6}, { 468, 37, 6}, { 467, 37, 6}, { 523, 38, 6}, { 521, 38, 6}, { 970, 39, 6}, { 951, 39, 6}, { 976, 40, 6}, { 977, 40, 6}, { 909, 41, 6}, { 895, 41, 6}, { 966, 42, 6}, { 938, 42, 6}, { 634, 43, 6}, { 633, 43, 6}, { 637, 44, 6}, { 635, 44, 6}, {1046, 45, 6}, {1223, 46, 6}, {1219, 46, 6}, { 965, 47, 6}, { 935, 47, 6}, { 460, 48, 6}, { 459, 48, 6}, {1113, 49, 6}, {1082, 49, 6}, { 765, 50, 6}, { 764, 50, 6}, { 968, 51, 6}, { 940, 51, 6}, { 967, 52, 6}, { 939, 52, 6}, {1126, 53, 6}, {1232, 54, 6}, {1248, 54, 6}, { 989, 55, 6}, { 990, 55, 6}, { 988, 56, 6}, { 987, 56, 6}, { 598, 57, 7}, { 622, 57, 7}, { 243, 58, 8}, { 234, 58, 8}, { 288, 59, 9}, { 300, 59, 9}, { 457, 60, 10}, { 482, 60, 10}, { 489, 61, 11}, { 487, 61, 11}, { 673, 62, 12}, { 667, 62, 12}, { 674, 63, 13}, { 675, 63, 13}, { 757, 64, 14}, { 732, 64, 14}, { 928, 65, 15}, { 921, 65, 15}, { 929, 66, 16}, { 931, 66, 16}, { 246, 67, 6}, { 245, 67, 6}, { 641, 68, 17}, { 648, 68, 17}, { 642, 69, 18}, { 649, 69, 18}, { 175, 70, 6}, { 170, 70, 6}, { 183, 71, 6}, { 250, 72, 6}, { 564, 73, 6}, {1027, 74, 6}, {1258, 75, 6}, {1263, 76, 6}, {1019, 77, 6}, {1018, 78, 6}, {1020, 79, 6}, {1021, 80, 6}, }; /* properties: 588 bytes. */ /* property values. */ RE_PropertyValue re_property_values[] = { {1220, 0, 0}, { 383, 0, 0}, {1228, 0, 1}, { 774, 0, 1}, { 768, 0, 2}, { 761, 0, 2}, {1200, 0, 3}, { 773, 0, 3}, { 865, 0, 4}, { 762, 0, 4}, { 969, 0, 5}, { 763, 0, 5}, { 913, 0, 6}, { 863, 0, 6}, { 505, 0, 7}, { 831, 0, 7}, {1119, 0, 8}, { 830, 0, 8}, { 456, 0, 9}, { 896, 0, 9}, { 473, 0, 9}, { 747, 0, 10}, { 904, 0, 10}, { 973, 0, 11}, { 905, 0, 11}, {1118, 0, 12}, {1291, 0, 12}, { 759, 0, 13}, {1289, 0, 13}, { 986, 0, 14}, {1290, 0, 14}, { 415, 0, 15}, { 299, 0, 15}, { 384, 0, 15}, { 537, 0, 16}, { 338, 0, 16}, {1028, 0, 17}, { 385, 0, 17}, {1153, 0, 18}, { 425, 0, 18}, { 452, 0, 19}, { 994, 0, 19}, { 955, 0, 20}, {1031, 0, 20}, { 381, 0, 21}, { 997, 0, 21}, { 401, 0, 22}, { 993, 0, 22}, { 974, 0, 23}, {1015, 0, 23}, { 828, 0, 24}, {1107, 0, 24}, { 429, 0, 25}, {1079, 0, 25}, { 867, 0, 26}, {1106, 0, 26}, { 975, 0, 27}, {1112, 0, 27}, { 647, 0, 28}, {1012, 0, 28}, { 532, 0, 29}, { 999, 0, 29}, { 963, 0, 30}, { 281, 0, 30}, { 282, 0, 30}, { 745, 0, 31}, { 708, 0, 31}, { 709, 0, 31}, { 822, 0, 32}, { 783, 0, 32}, { 392, 0, 32}, { 784, 0, 32}, { 924, 0, 33}, { 885, 0, 33}, { 886, 0, 33}, {1035, 0, 34}, { 981, 0, 34}, {1034, 0, 34}, { 982, 0, 34}, {1160, 0, 35}, {1068, 0, 35}, {1069, 0, 35}, {1089, 0, 36}, {1284, 0, 36}, {1285, 0, 36}, { 295, 0, 37}, { 733, 0, 37}, { 205, 0, 38}, { 906, 1, 0}, { 893, 1, 0}, { 228, 1, 1}, { 203, 1, 1}, { 718, 1, 2}, { 717, 1, 2}, { 716, 1, 2}, { 725, 1, 3}, { 719, 1, 3}, { 727, 1, 4}, { 721, 1, 4}, { 657, 1, 5}, { 656, 1, 5}, {1120, 1, 6}, { 866, 1, 6}, { 387, 1, 7}, { 469, 1, 7}, { 571, 1, 8}, { 570, 1, 8}, { 438, 1, 9}, { 444, 1, 10}, { 443, 1, 10}, { 445, 1, 10}, { 199, 1, 11}, { 606, 1, 12}, { 186, 1, 13}, {1162, 1, 14}, { 198, 1, 15}, { 197, 1, 15}, {1193, 1, 16}, { 902, 1, 17}, {1073, 1, 18}, { 791, 1, 19}, { 188, 1, 20}, { 187, 1, 20}, { 463, 1, 21}, { 240, 1, 22}, { 579, 1, 23}, { 577, 1, 24}, { 957, 1, 25}, {1179, 1, 26}, {1186, 1, 27}, { 688, 1, 28}, { 789, 1, 29}, {1104, 1, 30}, {1194, 1, 31}, { 713, 1, 32}, {1195, 1, 33}, { 879, 1, 34}, { 553, 1, 35}, { 594, 1, 36}, { 662, 1, 36}, { 509, 1, 37}, { 515, 1, 38}, { 514, 1, 38}, { 347, 1, 39}, {1221, 1, 40}, {1215, 1, 40}, { 286, 1, 40}, { 937, 1, 41}, {1066, 1, 42}, {1165, 1, 43}, { 601, 1, 44}, { 277, 1, 45}, {1167, 1, 46}, { 698, 1, 47}, { 871, 1, 48}, {1222, 1, 49}, {1216, 1, 49}, { 750, 1, 50}, {1170, 1, 51}, { 899, 1, 52}, { 699, 1, 53}, { 275, 1, 54}, {1171, 1, 55}, { 388, 1, 56}, { 470, 1, 56}, { 223, 1, 57}, {1130, 1, 58}, { 231, 1, 59}, { 744, 1, 60}, { 941, 1, 61}, {1132, 1, 62}, {1131, 1, 62}, {1236, 1, 63}, {1235, 1, 63}, {1009, 1, 64}, {1008, 1, 64}, {1010, 1, 65}, {1011, 1, 65}, { 390, 1, 66}, { 472, 1, 66}, { 726, 1, 67}, { 720, 1, 67}, { 573, 1, 68}, { 572, 1, 68}, { 548, 1, 69}, {1035, 1, 69}, {1139, 1, 70}, {1138, 1, 70}, { 430, 1, 71}, { 389, 1, 72}, { 471, 1, 72}, { 393, 1, 72}, { 746, 1, 73}, { 925, 1, 74}, { 202, 1, 75}, { 826, 1, 76}, { 827, 1, 76}, { 855, 1, 77}, { 860, 1, 77}, { 416, 1, 78}, { 956, 1, 79}, { 934, 1, 79}, { 498, 1, 80}, { 497, 1, 80}, { 262, 1, 81}, { 253, 1, 82}, { 549, 1, 83}, { 852, 1, 84}, { 859, 1, 84}, { 474, 1, 85}, { 850, 1, 86}, { 856, 1, 86}, {1141, 1, 87}, {1134, 1, 87}, { 269, 1, 88}, { 268, 1, 88}, {1142, 1, 89}, {1135, 1, 89}, { 851, 1, 90}, { 857, 1, 90}, {1144, 1, 91}, {1140, 1, 91}, { 853, 1, 92}, { 849, 1, 92}, { 558, 1, 93}, { 728, 1, 94}, { 722, 1, 94}, { 418, 1, 95}, { 555, 1, 96}, { 554, 1, 96}, {1197, 1, 97}, { 512, 1, 98}, { 510, 1, 98}, { 441, 1, 99}, { 439, 1, 99}, {1145, 1, 100}, {1151, 1, 100}, { 368, 1, 101}, { 367, 1, 101}, { 687, 1, 102}, { 686, 1, 102}, { 631, 1, 103}, { 627, 1, 103}, { 371, 1, 104}, { 370, 1, 104}, { 617, 1, 105}, { 690, 1, 106}, { 256, 1, 107}, { 593, 1, 108}, { 398, 1, 108}, { 685, 1, 109}, { 258, 1, 110}, { 257, 1, 110}, { 369, 1, 111}, { 693, 1, 112}, { 691, 1, 112}, { 502, 1, 113}, { 501, 1, 113}, { 356, 1, 114}, { 354, 1, 114}, { 373, 1, 115}, { 362, 1, 115}, {1279, 1, 116}, {1278, 1, 116}, { 372, 1, 117}, { 353, 1, 117}, {1281, 1, 118}, {1280, 1, 119}, { 760, 1, 120}, {1230, 1, 121}, { 442, 1, 122}, { 440, 1, 122}, { 225, 1, 123}, { 868, 1, 124}, { 729, 1, 125}, { 723, 1, 125}, {1159, 1, 126}, { 395, 1, 127}, { 640, 1, 127}, {1001, 1, 128}, {1077, 1, 129}, { 465, 1, 130}, { 464, 1, 130}, { 694, 1, 131}, {1050, 1, 132}, { 595, 1, 133}, { 663, 1, 133}, { 666, 1, 134}, { 883, 1, 135}, { 881, 1, 135}, { 340, 1, 136}, { 882, 1, 137}, { 880, 1, 137}, {1172, 1, 138}, { 837, 1, 139}, { 836, 1, 139}, { 513, 1, 140}, { 511, 1, 140}, { 730, 1, 141}, { 724, 1, 141}, { 349, 1, 142}, { 348, 1, 142}, { 835, 1, 143}, { 597, 1, 144}, { 592, 1, 144}, { 596, 1, 145}, { 664, 1, 145}, { 615, 1, 146}, { 613, 1, 147}, { 614, 1, 147}, { 769, 1, 148}, {1029, 1, 149}, {1033, 1, 149}, {1028, 1, 149}, { 358, 1, 150}, { 360, 1, 150}, { 174, 1, 151}, { 173, 1, 151}, { 195, 1, 152}, { 193, 1, 152}, {1233, 1, 153}, {1248, 1, 153}, {1239, 1, 154}, { 391, 1, 155}, { 586, 1, 155}, { 357, 1, 156}, { 355, 1, 156}, {1110, 1, 157}, {1109, 1, 157}, { 196, 1, 158}, { 194, 1, 158}, { 588, 1, 159}, { 585, 1, 159}, {1121, 1, 160}, { 756, 1, 161}, { 755, 1, 162}, { 158, 1, 163}, { 181, 1, 164}, { 182, 1, 165}, {1003, 1, 166}, {1002, 1, 166}, { 780, 1, 167}, { 292, 1, 168}, { 419, 1, 169}, { 944, 1, 170}, { 561, 1, 171}, { 946, 1, 172}, {1218, 1, 173}, { 947, 1, 174}, { 461, 1, 175}, {1093, 1, 176}, { 962, 1, 177}, { 493, 1, 178}, { 297, 1, 179}, { 753, 1, 180}, { 437, 1, 181}, { 638, 1, 182}, { 985, 1, 183}, { 888, 1, 184}, { 603, 1, 185}, {1007, 1, 186}, { 782, 1, 187}, { 843, 1, 188}, { 842, 1, 189}, { 697, 1, 190}, { 948, 1, 191}, { 945, 1, 192}, { 794, 1, 193}, { 217, 1, 194}, { 651, 1, 195}, { 650, 1, 196}, {1032, 1, 197}, { 949, 1, 198}, { 943, 1, 199}, {1065, 1, 200}, {1064, 1, 200}, { 265, 1, 201}, { 679, 1, 202}, {1115, 1, 203}, { 339, 1, 204}, { 785, 1, 205}, {1092, 1, 206}, {1105, 1, 207}, { 702, 1, 208}, { 876, 1, 209}, { 703, 1, 210}, { 563, 1, 211}, {1199, 1, 212}, {1099, 1, 213}, { 864, 1, 214}, {1176, 1, 215}, { 161, 1, 216}, {1252, 1, 217}, { 992, 1, 218}, { 426, 1, 219}, { 428, 1, 220}, { 427, 1, 220}, { 488, 1, 221}, { 491, 1, 222}, { 178, 1, 223}, { 227, 1, 224}, { 226, 1, 224}, { 872, 1, 225}, { 230, 1, 226}, { 983, 1, 227}, { 844, 1, 228}, { 683, 1, 229}, { 682, 1, 229}, { 485, 1, 230}, {1096, 1, 231}, { 280, 1, 232}, { 279, 1, 232}, { 878, 1, 233}, { 877, 1, 233}, { 180, 1, 234}, { 179, 1, 234}, {1174, 1, 235}, {1173, 1, 235}, { 421, 1, 236}, { 420, 1, 236}, { 825, 1, 237}, { 824, 1, 237}, {1154, 1, 238}, { 839, 1, 239}, { 191, 1, 240}, { 190, 1, 240}, { 788, 1, 241}, { 787, 1, 241}, { 476, 1, 242}, { 475, 1, 242}, {1013, 1, 243}, { 499, 1, 244}, { 500, 1, 244}, { 504, 1, 245}, { 503, 1, 245}, { 854, 1, 246}, { 858, 1, 246}, { 494, 1, 247}, { 959, 1, 248}, {1212, 1, 249}, {1211, 1, 249}, { 167, 1, 250}, { 166, 1, 250}, { 551, 1, 251}, { 550, 1, 251}, {1143, 1, 252}, {1136, 1, 252}, {1146, 1, 253}, {1152, 1, 253}, { 374, 1, 254}, { 363, 1, 254}, { 375, 1, 255}, { 364, 1, 255}, { 376, 1, 256}, { 365, 1, 256}, { 377, 1, 257}, { 366, 1, 257}, { 359, 1, 258}, { 361, 1, 258}, {1168, 1, 259}, {1234, 1, 260}, {1249, 1, 260}, {1147, 1, 261}, {1149, 1, 261}, {1148, 1, 262}, {1150, 1, 262}, {1224, 2, 0}, {1295, 2, 0}, { 394, 2, 1}, {1294, 2, 1}, { 715, 2, 2}, { 731, 2, 2}, { 570, 2, 3}, { 574, 2, 3}, { 438, 2, 4}, { 446, 2, 4}, { 199, 2, 5}, { 201, 2, 5}, { 606, 2, 6}, { 605, 2, 6}, { 186, 2, 7}, { 185, 2, 7}, {1162, 2, 8}, {1161, 2, 8}, {1193, 2, 9}, {1192, 2, 9}, { 463, 2, 10}, { 462, 2, 10}, { 240, 2, 11}, { 239, 2, 11}, { 579, 2, 12}, { 580, 2, 12}, { 577, 2, 13}, { 578, 2, 13}, { 957, 2, 14}, { 960, 2, 14}, {1179, 2, 15}, {1180, 2, 15}, {1186, 2, 16}, {1185, 2, 16}, { 688, 2, 17}, { 704, 2, 17}, { 789, 2, 18}, { 862, 2, 18}, {1104, 2, 19}, {1103, 2, 19}, {1194, 2, 20}, { 713, 2, 21}, { 714, 2, 21}, {1195, 2, 22}, {1196, 2, 22}, { 879, 2, 23}, { 884, 2, 23}, { 553, 2, 24}, { 552, 2, 24}, { 592, 2, 25}, { 591, 2, 25}, { 509, 2, 26}, { 508, 2, 26}, { 347, 2, 27}, { 346, 2, 27}, { 285, 2, 28}, { 289, 2, 28}, { 937, 2, 29}, { 936, 2, 29}, {1066, 2, 30}, {1067, 2, 30}, { 698, 2, 31}, { 700, 2, 31}, { 871, 2, 32}, { 870, 2, 32}, { 617, 2, 33}, { 616, 2, 33}, { 690, 2, 34}, { 681, 2, 34}, { 256, 2, 35}, { 255, 2, 35}, { 590, 2, 36}, { 599, 2, 36}, {1276, 2, 37}, {1277, 2, 37}, { 944, 2, 38}, { 661, 2, 38}, { 561, 2, 39}, { 560, 2, 39}, { 461, 2, 40}, { 481, 2, 40}, { 644, 2, 41}, {1288, 2, 41}, {1038, 2, 41}, {1165, 2, 42}, {1191, 2, 42}, { 601, 2, 43}, { 600, 2, 43}, { 277, 2, 44}, { 276, 2, 44}, {1167, 2, 45}, {1166, 2, 45}, { 750, 2, 46}, { 749, 2, 46}, {1170, 2, 47}, {1177, 2, 47}, { 754, 2, 48}, { 752, 2, 48}, {1218, 2, 49}, {1217, 2, 49}, {1093, 2, 50}, {1094, 2, 50}, { 962, 2, 51}, { 961, 2, 51}, { 436, 2, 52}, { 423, 2, 52}, { 268, 2, 53}, { 267, 2, 53}, { 275, 2, 54}, { 274, 2, 54}, { 418, 2, 55}, { 417, 2, 55}, {1037, 2, 55}, { 899, 2, 56}, {1178, 2, 56}, { 558, 2, 57}, { 557, 2, 57}, {1197, 2, 58}, {1190, 2, 58}, {1159, 2, 59}, {1158, 2, 59}, { 947, 2, 60}, {1268, 2, 60}, { 697, 2, 61}, { 696, 2, 61}, { 223, 2, 62}, { 222, 2, 62}, { 426, 2, 63}, {1269, 2, 63}, {1007, 2, 64}, {1006, 2, 64}, {1001, 2, 65}, {1000, 2, 65}, { 902, 2, 66}, { 903, 2, 66}, {1130, 2, 67}, {1129, 2, 67}, { 744, 2, 68}, { 743, 2, 68}, { 941, 2, 69}, { 942, 2, 69}, {1230, 2, 70}, {1231, 2, 70}, {1077, 2, 71}, {1076, 2, 71}, { 694, 2, 72}, { 680, 2, 72}, {1050, 2, 73}, {1059, 2, 73}, { 780, 2, 74}, { 779, 2, 74}, { 292, 2, 75}, { 291, 2, 75}, { 782, 2, 76}, { 781, 2, 76}, { 340, 2, 77}, {1171, 2, 78}, { 712, 2, 78}, {1172, 2, 79}, {1181, 2, 79}, { 217, 2, 80}, { 218, 2, 80}, { 491, 2, 81}, { 490, 2, 81}, {1073, 2, 82}, {1074, 2, 82}, { 760, 2, 83}, { 225, 2, 84}, { 224, 2, 84}, { 666, 2, 85}, { 665, 2, 85}, { 835, 2, 86}, { 874, 2, 86}, { 638, 2, 87}, { 200, 2, 87}, { 948, 2, 88}, {1075, 2, 88}, { 651, 2, 89}, {1030, 2, 89}, { 650, 2, 90}, {1004, 2, 90}, { 949, 2, 91}, { 958, 2, 91}, { 679, 2, 92}, { 706, 2, 92}, { 231, 2, 93}, { 232, 2, 93}, { 265, 2, 94}, { 264, 2, 94}, { 791, 2, 95}, { 790, 2, 95}, { 339, 2, 96}, { 283, 2, 96}, { 842, 2, 97}, { 840, 2, 97}, { 843, 2, 98}, { 841, 2, 98}, { 844, 2, 99}, {1014, 2, 99}, {1092, 2, 100}, {1097, 2, 100}, {1115, 2, 101}, {1114, 2, 101}, {1176, 2, 102}, {1175, 2, 102}, { 297, 2, 103}, { 159, 2, 103}, { 230, 2, 104}, { 229, 2, 104}, { 485, 2, 105}, { 484, 2, 105}, { 493, 2, 106}, { 492, 2, 106}, { 563, 2, 107}, { 562, 2, 107}, { 983, 2, 108}, { 620, 2, 108}, { 702, 2, 109}, { 701, 2, 109}, { 753, 2, 110}, { 751, 2, 110}, { 785, 2, 111}, { 786, 2, 111}, { 794, 2, 112}, { 793, 2, 112}, { 839, 2, 113}, { 838, 2, 113}, { 864, 2, 114}, { 872, 2, 115}, { 873, 2, 115}, { 945, 2, 116}, { 891, 2, 116}, { 888, 2, 117}, { 894, 2, 117}, { 985, 2, 118}, { 984, 2, 118}, { 992, 2, 119}, { 991, 2, 119}, { 946, 2, 120}, { 998, 2, 120}, {1032, 2, 121}, {1005, 2, 121}, {1099, 2, 122}, {1098, 2, 122}, { 703, 2, 123}, {1101, 2, 123}, {1199, 2, 124}, {1198, 2, 124}, {1252, 2, 125}, {1251, 2, 125}, { 161, 2, 126}, { 178, 2, 127}, { 619, 2, 127}, { 603, 2, 128}, { 602, 2, 128}, { 876, 2, 129}, { 875, 2, 129}, { 943, 2, 130}, { 623, 2, 130}, {1100, 2, 131}, {1091, 2, 131}, { 692, 2, 132}, { 621, 2, 132}, { 963, 3, 0}, {1270, 3, 0}, { 479, 3, 1}, { 480, 3, 1}, {1102, 3, 2}, {1122, 3, 2}, { 607, 3, 3}, { 618, 3, 3}, { 424, 3, 4}, { 748, 3, 5}, { 898, 3, 6}, { 904, 3, 6}, { 522, 3, 7}, {1047, 3, 8}, {1052, 3, 8}, { 537, 3, 9}, { 535, 3, 9}, { 690, 3, 10}, { 677, 3, 10}, { 169, 3, 11}, { 734, 3, 11}, { 845, 3, 12}, { 861, 3, 12}, { 846, 3, 13}, { 863, 3, 13}, { 847, 3, 14}, { 829, 3, 14}, { 927, 3, 15}, { 922, 3, 15}, { 524, 3, 16}, { 519, 3, 16}, { 963, 4, 0}, {1270, 4, 0}, { 424, 4, 1}, { 748, 4, 2}, { 415, 4, 3}, { 383, 4, 3}, { 522, 4, 4}, { 519, 4, 4}, {1047, 4, 5}, {1052, 4, 5}, {1119, 4, 6}, {1107, 4, 6}, { 708, 4, 7}, {1229, 4, 8}, {1164, 4, 9}, { 775, 4, 10}, { 777, 4, 11}, {1026, 4, 12}, {1023, 4, 12}, { 963, 5, 0}, {1270, 5, 0}, { 424, 5, 1}, { 748, 5, 2}, { 522, 5, 3}, { 519, 5, 3}, {1088, 5, 4}, {1083, 5, 4}, { 537, 5, 5}, { 535, 5, 5}, {1116, 5, 6}, { 766, 5, 7}, { 763, 5, 7}, {1226, 5, 8}, {1225, 5, 8}, { 950, 5, 9}, { 734, 5, 9}, { 927, 5, 10}, { 922, 5, 10}, { 211, 5, 11}, { 206, 5, 11}, {1126, 5, 12}, {1125, 5, 12}, { 379, 5, 13}, { 378, 5, 13}, {1080, 5, 14}, {1079, 5, 14}, { 905, 6, 0}, { 885, 6, 0}, { 525, 6, 0}, { 526, 6, 0}, {1275, 6, 1}, {1271, 6, 1}, {1164, 6, 1}, {1213, 6, 1}, { 916, 7, 0}, { 887, 7, 0}, { 735, 7, 1}, { 708, 7, 1}, {1246, 7, 2}, {1229, 7, 2}, {1209, 7, 3}, {1164, 7, 3}, { 776, 7, 4}, { 775, 7, 4}, { 778, 7, 5}, { 777, 7, 5}, { 739, 8, 0}, { 708, 8, 0}, {1055, 8, 1}, {1045, 8, 1}, { 516, 8, 2}, { 495, 8, 2}, { 517, 8, 3}, { 506, 8, 3}, { 518, 8, 4}, { 507, 8, 4}, { 192, 8, 5}, { 177, 8, 5}, { 396, 8, 6}, { 425, 8, 6}, { 986, 8, 7}, { 219, 8, 7}, {1085, 8, 8}, {1068, 8, 8}, {1255, 8, 9}, {1261, 8, 9}, { 972, 8, 10}, { 953, 8, 10}, { 261, 8, 11}, { 254, 8, 11}, { 913, 8, 12}, { 920, 8, 12}, { 189, 8, 13}, { 164, 8, 13}, { 742, 8, 14}, { 772, 8, 14}, {1058, 8, 15}, {1062, 8, 15}, { 740, 8, 16}, { 770, 8, 16}, {1056, 8, 17}, {1060, 8, 17}, {1016, 8, 18}, { 995, 8, 18}, { 741, 8, 19}, { 771, 8, 19}, {1057, 8, 20}, {1061, 8, 20}, { 534, 8, 21}, { 540, 8, 21}, {1017, 8, 22}, { 996, 8, 22}, { 917, 9, 0}, { 1, 9, 0}, { 918, 9, 0}, { 979, 9, 1}, { 2, 9, 1}, { 978, 9, 1}, { 923, 9, 2}, { 130, 9, 2}, { 901, 9, 2}, { 684, 9, 3}, { 139, 9, 3}, { 707, 9, 3}, {1240, 9, 4}, { 146, 9, 4}, {1247, 9, 4}, { 301, 9, 5}, { 14, 9, 5}, { 304, 9, 6}, { 25, 9, 6}, { 306, 9, 7}, { 29, 9, 7}, { 309, 9, 8}, { 32, 9, 8}, { 313, 9, 9}, { 37, 9, 9}, { 314, 9, 10}, { 38, 9, 10}, { 315, 9, 11}, { 40, 9, 11}, { 316, 9, 12}, { 41, 9, 12}, { 317, 9, 13}, { 43, 9, 13}, { 318, 9, 14}, { 44, 9, 14}, { 319, 9, 15}, { 48, 9, 15}, { 320, 9, 16}, { 54, 9, 16}, { 321, 9, 17}, { 59, 9, 17}, { 322, 9, 18}, { 65, 9, 18}, { 323, 9, 19}, { 70, 9, 19}, { 324, 9, 20}, { 72, 9, 20}, { 325, 9, 21}, { 73, 9, 21}, { 326, 9, 22}, { 74, 9, 22}, { 327, 9, 23}, { 75, 9, 23}, { 328, 9, 24}, { 76, 9, 24}, { 329, 9, 25}, { 83, 9, 25}, { 330, 9, 26}, { 88, 9, 26}, { 331, 9, 27}, { 89, 9, 27}, { 332, 9, 28}, { 90, 9, 28}, { 333, 9, 29}, { 91, 9, 29}, { 334, 9, 30}, { 92, 9, 30}, { 335, 9, 31}, { 93, 9, 31}, { 336, 9, 32}, { 145, 9, 32}, { 337, 9, 33}, { 153, 9, 33}, { 302, 9, 34}, { 23, 9, 34}, { 303, 9, 35}, { 24, 9, 35}, { 305, 9, 36}, { 28, 9, 36}, { 307, 9, 37}, { 30, 9, 37}, { 308, 9, 38}, { 31, 9, 38}, { 310, 9, 39}, { 34, 9, 39}, { 311, 9, 40}, { 35, 9, 40}, { 214, 9, 41}, { 53, 9, 41}, { 209, 9, 41}, { 212, 9, 42}, { 55, 9, 42}, { 207, 9, 42}, { 213, 9, 43}, { 56, 9, 43}, { 208, 9, 43}, { 237, 9, 44}, { 58, 9, 44}, { 249, 9, 44}, { 236, 9, 45}, { 60, 9, 45}, { 219, 9, 45}, { 238, 9, 46}, { 61, 9, 46}, { 263, 9, 46}, { 736, 9, 47}, { 62, 9, 47}, { 708, 9, 47}, {1053, 9, 48}, { 63, 9, 48}, {1045, 9, 48}, { 156, 9, 49}, { 64, 9, 49}, { 164, 9, 49}, { 155, 9, 50}, { 66, 9, 50}, { 154, 9, 50}, { 157, 9, 51}, { 67, 9, 51}, { 184, 9, 51}, { 478, 9, 52}, { 68, 9, 52}, { 453, 9, 52}, { 477, 9, 53}, { 69, 9, 53}, { 448, 9, 53}, { 655, 9, 54}, { 71, 9, 54}, { 658, 9, 54}, { 312, 9, 55}, { 36, 9, 55}, { 215, 9, 56}, { 49, 9, 56}, { 210, 9, 56}, { 910, 10, 0}, { 287, 10, 1}, { 284, 10, 1}, { 397, 10, 2}, { 386, 10, 2}, { 536, 10, 3}, { 907, 10, 4}, { 893, 10, 4}, { 646, 10, 5}, { 645, 10, 5}, { 833, 10, 6}, { 832, 10, 6}, { 531, 10, 7}, { 530, 10, 7}, { 660, 10, 8}, { 659, 10, 8}, { 351, 10, 9}, { 496, 10, 9}, {1137, 10, 10}, {1133, 10, 10}, {1128, 10, 11}, {1238, 10, 12}, {1237, 10, 12}, {1256, 10, 13}, { 892, 10, 14}, { 890, 10, 14}, {1108, 10, 15}, {1111, 10, 15}, {1124, 10, 16}, {1123, 10, 16}, { 539, 10, 17}, { 538, 10, 17}, { 897, 11, 0}, { 885, 11, 0}, { 176, 11, 1}, { 154, 11, 1}, { 587, 11, 2}, { 581, 11, 2}, {1256, 11, 3}, {1250, 11, 3}, { 541, 11, 4}, { 525, 11, 4}, { 892, 11, 5}, { 887, 11, 5}, { 908, 12, 0}, { 163, 12, 1}, { 165, 12, 2}, { 168, 12, 3}, { 235, 12, 4}, { 241, 12, 5}, { 449, 12, 6}, { 450, 12, 7}, { 486, 12, 8}, { 529, 12, 9}, { 533, 12, 10}, { 542, 12, 11}, { 543, 12, 12}, { 584, 12, 13}, { 589, 12, 14}, {1184, 12, 14}, { 604, 12, 15}, { 608, 12, 16}, { 609, 12, 17}, { 610, 12, 18}, { 678, 12, 19}, { 689, 12, 20}, { 705, 12, 21}, { 710, 12, 22}, { 711, 12, 23}, { 834, 12, 24}, { 848, 12, 25}, { 915, 12, 26}, { 930, 12, 27}, { 997, 12, 28}, {1039, 12, 29}, {1040, 12, 30}, {1049, 12, 31}, {1051, 12, 32}, {1071, 12, 33}, {1072, 12, 34}, {1084, 12, 35}, {1086, 12, 36}, {1095, 12, 37}, {1155, 12, 38}, {1169, 12, 39}, {1182, 12, 40}, {1183, 12, 41}, {1189, 12, 42}, {1253, 12, 43}, {1163, 12, 44}, {1272, 12, 45}, {1273, 12, 46}, {1274, 12, 47}, {1282, 12, 48}, {1283, 12, 49}, {1286, 12, 50}, {1287, 12, 51}, { 695, 12, 52}, { 528, 12, 53}, { 278, 12, 54}, { 527, 12, 55}, { 932, 12, 56}, {1063, 12, 57}, {1127, 12, 58}, { 795, 12, 59}, { 796, 12, 60}, { 797, 12, 61}, { 798, 12, 62}, { 799, 12, 63}, { 800, 12, 64}, { 801, 12, 65}, { 802, 12, 66}, { 803, 12, 67}, { 804, 12, 68}, { 805, 12, 69}, { 806, 12, 70}, { 807, 12, 71}, { 808, 12, 72}, { 809, 12, 73}, { 810, 12, 74}, { 811, 12, 75}, { 812, 12, 76}, { 813, 12, 77}, { 814, 12, 78}, { 815, 12, 79}, { 816, 12, 80}, { 817, 12, 81}, { 818, 12, 82}, { 819, 12, 83}, { 820, 12, 84}, { 821, 12, 85}, { 912, 13, 0}, {1214, 13, 0}, { 670, 13, 1}, { 281, 13, 1}, { 483, 13, 2}, { 447, 13, 2}, {1054, 13, 3}, {1045, 13, 3}, { 738, 13, 4}, { 708, 13, 4}, {1210, 13, 5}, {1164, 13, 5}, {1224, 14, 0}, {1270, 14, 0}, { 955, 14, 1}, { 954, 14, 1}, { 381, 14, 2}, { 378, 14, 2}, {1043, 14, 3}, {1042, 14, 3}, { 559, 14, 4}, { 556, 14, 4}, { 914, 14, 5}, { 919, 14, 5}, { 520, 14, 6}, { 519, 14, 6}, { 273, 14, 7}, {1156, 14, 7}, { 643, 14, 8}, { 658, 14, 8}, {1025, 14, 9}, {1024, 14, 9}, {1022, 14, 10}, {1015, 14, 10}, { 927, 14, 11}, { 922, 14, 11}, { 172, 14, 12}, { 164, 14, 12}, { 630, 14, 13}, { 626, 14, 13}, { 652, 14, 14}, { 639, 14, 14}, { 653, 14, 14}, { 625, 14, 15}, { 624, 14, 15}, { 392, 14, 16}, { 382, 14, 16}, { 271, 14, 17}, { 233, 14, 17}, { 270, 14, 18}, { 221, 14, 18}, {1117, 14, 19}, {1116, 14, 19}, { 792, 14, 20}, { 248, 14, 20}, { 293, 14, 21}, { 424, 14, 21}, { 758, 14, 22}, { 748, 14, 22}, { 414, 14, 23}, { 298, 14, 23}, { 399, 14, 24}, {1070, 14, 24}, { 176, 14, 25}, { 162, 14, 25}, { 272, 14, 26}, { 220, 14, 26}, {1153, 14, 27}, {1090, 14, 27}, {1293, 14, 28}, {1292, 14, 28}, { 900, 14, 29}, { 904, 14, 29}, {1260, 14, 30}, {1257, 14, 30}, { 668, 14, 31}, { 676, 14, 32}, { 675, 14, 33}, { 582, 14, 34}, { 583, 14, 35}, { 380, 14, 36}, { 422, 14, 36}, { 607, 14, 37}, { 618, 14, 37}, { 400, 14, 38}, { 352, 14, 38}, {1047, 14, 39}, {1052, 14, 39}, { 910, 15, 0}, { 927, 15, 1}, { 922, 15, 1}, { 473, 15, 2}, { 466, 15, 2}, { 455, 15, 3}, { 454, 15, 3}, { 889, 16, 0}, { 0, 16, 1}, { 1, 16, 2}, { 5, 16, 3}, { 4, 16, 4}, { 3, 16, 5}, { 13, 16, 6}, { 12, 16, 7}, { 11, 16, 8}, { 10, 16, 9}, { 78, 16, 10}, { 9, 16, 11}, { 8, 16, 12}, { 7, 16, 13}, { 82, 16, 14}, { 47, 16, 15}, { 115, 16, 16}, { 6, 16, 17}, { 131, 16, 18}, { 81, 16, 19}, { 118, 16, 20}, { 46, 16, 21}, { 80, 16, 22}, { 98, 16, 23}, { 117, 16, 24}, { 133, 16, 25}, { 26, 16, 26}, { 2, 16, 27}, { 79, 16, 28}, { 45, 16, 29}, { 116, 16, 30}, { 77, 16, 31}, { 132, 16, 32}, { 97, 16, 33}, { 147, 16, 34}, { 114, 16, 35}, { 27, 16, 36}, { 124, 16, 37}, { 33, 16, 38}, { 130, 16, 39}, { 39, 16, 40}, { 139, 16, 41}, { 42, 16, 42}, { 146, 16, 43}, { 14, 16, 44}, { 25, 16, 45}, { 29, 16, 46}, { 32, 16, 47}, { 37, 16, 48}, { 38, 16, 49}, { 40, 16, 50}, { 41, 16, 51}, { 43, 16, 52}, { 44, 16, 53}, { 48, 16, 54}, { 54, 16, 55}, { 59, 16, 56}, { 65, 16, 57}, { 70, 16, 58}, { 72, 16, 59}, { 73, 16, 60}, { 74, 16, 61}, { 75, 16, 62}, { 76, 16, 63}, { 83, 16, 64}, { 88, 16, 65}, { 89, 16, 66}, { 90, 16, 67}, { 91, 16, 68}, { 92, 16, 69}, { 93, 16, 70}, { 94, 16, 71}, { 95, 16, 72}, { 96, 16, 73}, { 99, 16, 74}, { 104, 16, 75}, { 105, 16, 76}, { 106, 16, 77}, { 108, 16, 78}, { 109, 16, 79}, { 110, 16, 80}, { 111, 16, 81}, { 112, 16, 82}, { 113, 16, 83}, { 119, 16, 84}, { 125, 16, 85}, { 134, 16, 86}, { 140, 16, 87}, { 148, 16, 88}, { 15, 16, 89}, { 49, 16, 90}, { 84, 16, 91}, { 100, 16, 92}, { 120, 16, 93}, { 126, 16, 94}, { 135, 16, 95}, { 141, 16, 96}, { 149, 16, 97}, { 16, 16, 98}, { 50, 16, 99}, { 85, 16, 100}, { 101, 16, 101}, { 121, 16, 102}, { 127, 16, 103}, { 136, 16, 104}, { 142, 16, 105}, { 150, 16, 106}, { 17, 16, 107}, { 51, 16, 108}, { 86, 16, 109}, { 102, 16, 110}, { 122, 16, 111}, { 128, 16, 112}, { 137, 16, 113}, { 143, 16, 114}, { 151, 16, 115}, { 18, 16, 116}, { 52, 16, 117}, { 57, 16, 118}, { 87, 16, 119}, { 103, 16, 120}, { 107, 16, 121}, { 123, 16, 122}, { 129, 16, 123}, { 138, 16, 124}, { 144, 16, 125}, { 152, 16, 126}, { 19, 16, 127}, { 20, 16, 128}, { 21, 16, 129}, { 22, 16, 130}, { 887, 17, 0}, {1053, 17, 1}, { 736, 17, 2}, {1242, 17, 3}, { 737, 17, 4}, {1203, 17, 5}, { 259, 17, 6}, {1204, 17, 7}, {1208, 17, 8}, {1206, 17, 9}, {1207, 17, 10}, { 260, 17, 11}, {1205, 17, 12}, { 980, 17, 13}, { 963, 18, 0}, { 247, 18, 1}, {1241, 18, 2}, { 216, 18, 3}, { 923, 18, 4}, {1240, 18, 5}, {1036, 18, 6}, { 654, 18, 7}, {1245, 18, 8}, {1244, 18, 9}, {1243, 18, 10}, { 408, 18, 11}, { 402, 18, 12}, { 403, 18, 13}, { 413, 18, 14}, { 410, 18, 15}, { 409, 18, 16}, { 412, 18, 17}, { 411, 18, 18}, { 407, 18, 19}, { 404, 18, 20}, { 405, 18, 21}, { 869, 18, 22}, {1201, 18, 23}, {1202, 18, 24}, { 546, 18, 25}, { 290, 18, 26}, {1048, 18, 27}, {1157, 18, 28}, { 406, 18, 29}, { 911, 18, 30}, { 672, 18, 31}, { 926, 18, 32}, { 924, 18, 33}, { 266, 18, 34}, }; /* property values: 5648 bytes. */ /* Codepoints which expand on full case-folding. */ RE_UINT16 re_expand_on_folding[] = { 223, 304, 329, 496, 912, 944, 1415, 7830, 7831, 7832, 7833, 7834, 7838, 8016, 8018, 8020, 8022, 8064, 8065, 8066, 8067, 8068, 8069, 8070, 8071, 8072, 8073, 8074, 8075, 8076, 8077, 8078, 8079, 8080, 8081, 8082, 8083, 8084, 8085, 8086, 8087, 8088, 8089, 8090, 8091, 8092, 8093, 8094, 8095, 8096, 8097, 8098, 8099, 8100, 8101, 8102, 8103, 8104, 8105, 8106, 8107, 8108, 8109, 8110, 8111, 8114, 8115, 8116, 8118, 8119, 8124, 8130, 8131, 8132, 8134, 8135, 8140, 8146, 8147, 8150, 8151, 8162, 8163, 8164, 8166, 8167, 8178, 8179, 8180, 8182, 8183, 8188, 64256, 64257, 64258, 64259, 64260, 64261, 64262, 64275, 64276, 64277, 64278, 64279, }; /* expand_on_folding: 208 bytes. */ /* General_Category. */ static RE_UINT8 re_general_category_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 14, 14, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 23, 21, 21, 21, 21, 24, 21, 21, 21, 21, 21, 21, 21, 21, 25, 26, 21, 21, 27, 28, 21, 29, 30, 31, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 32, 7, 33, 34, 7, 35, 21, 21, 21, 21, 21, 36, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 37, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, }; static RE_UINT8 re_general_category_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38, 39, 34, 34, 34, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 65, 66, 67, 68, 69, 70, 71, 69, 72, 73, 69, 69, 64, 74, 64, 64, 75, 76, 77, 78, 79, 80, 81, 82, 69, 83, 84, 85, 86, 87, 88, 89, 69, 69, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 90, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 91, 92, 34, 34, 34, 34, 34, 34, 34, 34, 93, 34, 34, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 106, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 34, 34, 109, 110, 111, 112, 34, 34, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 123, 34, 34, 130, 123, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 123, 123, 141, 123, 123, 123, 142, 143, 144, 145, 146, 147, 148, 123, 123, 149, 123, 150, 151, 152, 153, 123, 123, 154, 123, 123, 123, 155, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 34, 34, 34, 34, 34, 34, 156, 157, 34, 158, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 34, 34, 34, 34, 34, 34, 34, 159, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 34, 34, 34, 160, 123, 123, 123, 34, 34, 34, 34, 161, 162, 163, 164, 123, 123, 123, 123, 123, 123, 165, 166, 167, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 168, 169, 123, 123, 123, 123, 123, 123, 69, 170, 171, 172, 173, 123, 174, 123, 175, 176, 177, 178, 179, 180, 181, 182, 69, 69, 69, 69, 183, 184, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 185, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 186, 187, 123, 123, 188, 189, 190, 191, 192, 123, 69, 193, 69, 69, 194, 195, 69, 196, 197, 198, 199, 200, 201, 202, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 203, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 204, 34, 205, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 206, 123, 123, 34, 34, 34, 34, 207, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 208, 123, 209, 210, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 211, }; static RE_UINT16 re_general_category_stage_3[] = { 0, 0, 1, 2, 3, 4, 5, 6, 0, 0, 7, 8, 9, 10, 11, 12, 13, 13, 13, 14, 15, 13, 13, 16, 17, 18, 19, 20, 21, 22, 13, 23, 13, 13, 13, 24, 25, 11, 11, 11, 11, 26, 11, 27, 28, 29, 30, 31, 32, 32, 32, 32, 32, 32, 32, 33, 34, 35, 36, 11, 37, 38, 13, 39, 9, 9, 9, 11, 11, 11, 13, 13, 40, 13, 13, 13, 41, 13, 13, 13, 13, 13, 13, 42, 9, 43, 44, 11, 45, 46, 32, 47, 48, 49, 50, 51, 52, 53, 49, 49, 54, 32, 55, 56, 49, 49, 49, 49, 49, 57, 58, 59, 60, 61, 49, 32, 62, 49, 49, 49, 49, 49, 63, 64, 65, 49, 66, 67, 49, 68, 69, 70, 49, 71, 72, 72, 72, 72, 49, 73, 72, 72, 74, 32, 75, 49, 49, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 82, 83, 90, 91, 92, 93, 94, 95, 96, 83, 97, 98, 99, 87, 100, 101, 82, 83, 102, 103, 104, 87, 105, 106, 107, 108, 109, 110, 111, 93, 112, 113, 114, 83, 115, 116, 117, 87, 118, 119, 114, 83, 120, 121, 122, 87, 123, 119, 114, 49, 124, 125, 126, 87, 127, 128, 129, 49, 130, 131, 132, 93, 133, 134, 49, 49, 135, 136, 137, 72, 72, 138, 139, 140, 141, 142, 143, 72, 72, 144, 145, 146, 147, 148, 49, 149, 150, 151, 152, 32, 153, 154, 155, 72, 72, 49, 49, 156, 157, 158, 159, 160, 161, 162, 163, 9, 9, 164, 49, 49, 165, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 166, 167, 49, 49, 166, 49, 49, 168, 169, 170, 49, 49, 49, 169, 49, 49, 49, 171, 172, 173, 49, 174, 9, 9, 9, 9, 9, 175, 176, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 177, 49, 178, 179, 49, 49, 49, 49, 180, 181, 182, 183, 49, 184, 49, 185, 182, 186, 49, 49, 49, 187, 188, 189, 190, 191, 192, 190, 49, 49, 193, 49, 49, 194, 49, 49, 195, 49, 49, 49, 49, 196, 49, 197, 198, 199, 200, 49, 201, 73, 49, 49, 202, 49, 203, 204, 205, 205, 49, 206, 49, 49, 49, 207, 208, 209, 190, 190, 210, 211, 72, 72, 72, 72, 212, 49, 49, 213, 214, 158, 215, 216, 217, 49, 218, 65, 49, 49, 219, 220, 49, 49, 221, 222, 223, 65, 49, 224, 72, 72, 72, 72, 225, 226, 227, 228, 11, 11, 229, 27, 27, 27, 230, 231, 11, 232, 27, 27, 32, 32, 32, 233, 13, 13, 13, 13, 13, 13, 13, 13, 13, 234, 13, 13, 13, 13, 13, 13, 235, 236, 235, 235, 236, 237, 235, 238, 239, 239, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 72, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 266, 267, 268, 269, 205, 270, 271, 205, 272, 273, 273, 273, 273, 273, 273, 273, 273, 274, 205, 275, 205, 205, 205, 205, 276, 205, 277, 273, 278, 205, 279, 280, 281, 205, 205, 282, 72, 281, 72, 265, 265, 265, 283, 205, 205, 205, 205, 284, 265, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 285, 286, 205, 205, 287, 205, 205, 205, 205, 205, 205, 288, 205, 205, 205, 205, 205, 205, 205, 289, 290, 265, 291, 205, 205, 292, 273, 293, 273, 294, 295, 273, 273, 273, 296, 273, 297, 205, 205, 205, 273, 298, 205, 205, 299, 205, 300, 205, 301, 302, 303, 304, 72, 9, 9, 305, 11, 11, 306, 307, 308, 13, 13, 13, 13, 13, 13, 309, 310, 11, 11, 311, 49, 49, 49, 312, 313, 49, 314, 315, 315, 315, 315, 32, 32, 316, 317, 318, 319, 320, 72, 72, 72, 205, 321, 205, 205, 205, 205, 205, 322, 205, 205, 205, 205, 205, 323, 72, 324, 325, 326, 327, 328, 134, 49, 49, 49, 49, 329, 176, 49, 49, 49, 49, 330, 331, 49, 201, 134, 49, 49, 49, 49, 197, 332, 49, 50, 205, 205, 322, 49, 205, 333, 334, 205, 335, 336, 205, 205, 334, 205, 205, 336, 205, 205, 205, 333, 49, 49, 49, 196, 205, 205, 205, 205, 49, 49, 49, 49, 49, 196, 72, 72, 49, 337, 49, 49, 49, 49, 49, 49, 149, 205, 205, 205, 282, 49, 49, 224, 338, 49, 339, 72, 13, 13, 340, 341, 13, 342, 49, 49, 49, 49, 343, 344, 31, 345, 346, 347, 13, 13, 13, 348, 349, 350, 351, 352, 72, 72, 72, 353, 354, 49, 355, 356, 49, 49, 49, 357, 358, 49, 49, 359, 360, 190, 32, 361, 65, 49, 362, 49, 363, 364, 49, 149, 75, 49, 49, 365, 366, 367, 368, 369, 49, 49, 370, 371, 372, 373, 49, 374, 49, 49, 49, 375, 376, 377, 378, 379, 380, 381, 315, 11, 11, 382, 383, 11, 11, 11, 11, 11, 49, 49, 384, 190, 49, 49, 385, 49, 386, 49, 49, 202, 387, 387, 387, 387, 387, 387, 387, 387, 388, 388, 388, 388, 388, 388, 388, 388, 49, 49, 49, 49, 49, 49, 201, 49, 49, 49, 49, 49, 49, 203, 72, 72, 389, 390, 391, 392, 393, 49, 49, 49, 49, 49, 49, 394, 395, 396, 49, 49, 49, 49, 49, 397, 72, 49, 49, 49, 49, 398, 49, 49, 194, 72, 72, 399, 32, 400, 32, 401, 402, 403, 404, 405, 49, 49, 49, 49, 49, 49, 49, 406, 407, 2, 3, 4, 5, 408, 409, 410, 49, 411, 49, 197, 412, 413, 414, 415, 416, 49, 170, 417, 201, 201, 72, 72, 49, 49, 49, 49, 49, 49, 49, 50, 418, 265, 265, 419, 266, 266, 266, 420, 421, 324, 422, 72, 72, 205, 205, 423, 72, 72, 72, 72, 72, 72, 72, 72, 49, 149, 49, 49, 49, 99, 424, 425, 49, 49, 426, 49, 427, 49, 49, 428, 49, 429, 49, 49, 430, 431, 72, 72, 9, 9, 432, 11, 11, 49, 49, 49, 49, 201, 190, 72, 72, 72, 72, 72, 49, 49, 194, 49, 49, 49, 433, 72, 49, 49, 49, 314, 49, 196, 194, 72, 434, 49, 49, 435, 49, 436, 49, 437, 49, 197, 438, 72, 72, 72, 49, 439, 49, 440, 49, 441, 72, 72, 72, 72, 49, 49, 49, 442, 265, 443, 265, 265, 444, 445, 49, 446, 447, 448, 49, 449, 49, 450, 72, 72, 451, 49, 452, 453, 49, 49, 49, 454, 49, 455, 49, 456, 49, 457, 458, 72, 72, 72, 72, 72, 49, 49, 49, 49, 459, 72, 72, 72, 9, 9, 9, 460, 11, 11, 11, 461, 72, 72, 72, 72, 72, 72, 265, 462, 463, 49, 49, 464, 465, 443, 466, 467, 217, 49, 49, 468, 469, 49, 459, 190, 470, 49, 471, 472, 473, 49, 49, 474, 217, 49, 49, 475, 476, 477, 478, 479, 49, 96, 480, 481, 72, 72, 72, 72, 482, 483, 484, 49, 49, 485, 486, 190, 487, 82, 83, 97, 488, 489, 490, 491, 49, 49, 49, 492, 493, 190, 72, 72, 49, 49, 494, 495, 496, 497, 72, 72, 49, 49, 49, 498, 499, 190, 72, 72, 49, 49, 500, 501, 190, 72, 72, 72, 49, 502, 503, 504, 72, 72, 72, 72, 72, 72, 9, 9, 11, 11, 146, 505, 72, 72, 72, 72, 49, 49, 49, 459, 49, 203, 72, 72, 72, 72, 72, 72, 266, 266, 266, 266, 266, 266, 506, 507, 49, 49, 49, 49, 385, 72, 72, 72, 49, 49, 197, 72, 72, 72, 72, 72, 49, 49, 49, 49, 314, 72, 72, 72, 49, 49, 49, 459, 49, 197, 367, 72, 72, 72, 72, 72, 72, 49, 201, 508, 49, 49, 49, 509, 510, 511, 512, 513, 49, 72, 72, 72, 72, 72, 72, 72, 49, 49, 49, 49, 73, 514, 515, 516, 467, 517, 72, 72, 72, 72, 72, 72, 518, 72, 72, 72, 72, 72, 72, 72, 49, 49, 49, 49, 49, 49, 50, 149, 459, 519, 520, 72, 72, 72, 72, 72, 205, 205, 205, 205, 205, 205, 205, 323, 205, 205, 521, 205, 205, 205, 522, 523, 524, 205, 525, 205, 205, 205, 526, 72, 205, 205, 205, 205, 527, 72, 72, 72, 205, 205, 205, 205, 205, 282, 265, 528, 9, 529, 11, 530, 531, 532, 235, 9, 533, 534, 535, 536, 537, 9, 529, 11, 538, 539, 11, 540, 541, 542, 543, 9, 544, 11, 9, 529, 11, 530, 531, 11, 235, 9, 533, 543, 9, 544, 11, 9, 529, 11, 545, 9, 546, 547, 548, 549, 11, 550, 9, 551, 552, 553, 554, 11, 555, 9, 556, 11, 557, 558, 558, 558, 32, 32, 32, 559, 32, 32, 560, 561, 562, 563, 46, 72, 72, 72, 72, 72, 49, 49, 49, 49, 564, 565, 72, 72, 566, 49, 567, 568, 569, 570, 571, 572, 573, 202, 574, 202, 72, 72, 72, 575, 205, 205, 324, 205, 205, 205, 205, 205, 205, 322, 333, 576, 576, 576, 205, 323, 173, 205, 333, 205, 205, 205, 324, 205, 205, 281, 72, 72, 72, 72, 577, 205, 578, 205, 205, 281, 526, 303, 72, 72, 205, 205, 205, 205, 205, 205, 205, 579, 205, 205, 205, 205, 205, 205, 205, 321, 205, 205, 580, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 422, 581, 322, 205, 205, 205, 205, 205, 205, 205, 322, 205, 205, 205, 205, 205, 582, 72, 72, 324, 205, 205, 205, 583, 174, 205, 205, 583, 205, 584, 72, 72, 72, 72, 72, 72, 526, 72, 72, 72, 72, 72, 72, 582, 72, 72, 72, 422, 72, 72, 72, 49, 49, 49, 49, 49, 314, 72, 72, 49, 49, 49, 73, 49, 49, 49, 49, 49, 201, 49, 49, 49, 49, 49, 49, 49, 49, 518, 72, 72, 72, 72, 72, 49, 201, 72, 72, 72, 72, 72, 72, 585, 72, 586, 586, 586, 586, 586, 586, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 72, 388, 388, 388, 388, 388, 388, 388, 587, }; static RE_UINT8 re_general_category_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 2, 4, 5, 6, 2, 7, 7, 7, 7, 7, 2, 8, 9, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 18, 19, 1, 20, 20, 21, 22, 23, 24, 25, 26, 27, 15, 2, 28, 29, 27, 30, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 31, 11, 11, 11, 32, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 33, 16, 16, 16, 16, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 34, 34, 34, 34, 34, 34, 16, 32, 32, 32, 32, 32, 32, 32, 11, 34, 34, 16, 34, 32, 32, 11, 34, 11, 16, 11, 11, 34, 32, 11, 32, 16, 11, 34, 32, 32, 32, 11, 34, 16, 32, 11, 34, 11, 34, 34, 32, 35, 32, 16, 36, 36, 37, 34, 38, 37, 34, 34, 34, 34, 34, 34, 34, 34, 16, 32, 34, 38, 32, 11, 32, 32, 32, 32, 32, 32, 16, 16, 16, 11, 34, 32, 34, 34, 11, 32, 32, 32, 32, 32, 16, 16, 39, 16, 16, 16, 16, 16, 40, 40, 40, 40, 40, 40, 40, 40, 40, 41, 41, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 41, 40, 40, 42, 41, 41, 41, 42, 42, 41, 41, 41, 41, 41, 41, 41, 41, 43, 43, 43, 43, 43, 43, 43, 43, 32, 32, 42, 32, 44, 45, 16, 10, 44, 44, 41, 46, 11, 47, 47, 11, 34, 11, 11, 11, 11, 11, 11, 11, 11, 48, 11, 11, 11, 11, 16, 16, 16, 16, 16, 16, 16, 16, 16, 34, 16, 11, 32, 16, 32, 32, 32, 32, 16, 16, 32, 49, 34, 32, 34, 11, 32, 50, 43, 43, 51, 32, 32, 32, 11, 34, 34, 34, 34, 34, 34, 16, 48, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 47, 52, 2, 2, 2, 53, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 54, 55, 56, 57, 58, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 59, 60, 61, 43, 60, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 44, 44, 36, 63, 64, 44, 44, 44, 44, 44, 65, 65, 65, 8, 9, 66, 2, 67, 43, 43, 43, 43, 43, 61, 68, 2, 69, 36, 36, 36, 36, 70, 43, 43, 7, 7, 7, 7, 7, 2, 2, 36, 71, 36, 36, 36, 36, 36, 36, 36, 36, 36, 72, 43, 43, 43, 73, 50, 43, 43, 74, 75, 76, 43, 43, 36, 7, 7, 7, 7, 7, 36, 77, 78, 2, 2, 2, 2, 2, 2, 2, 79, 70, 36, 36, 36, 36, 36, 36, 36, 43, 43, 43, 43, 43, 80, 81, 36, 36, 36, 36, 43, 43, 43, 43, 43, 71, 44, 44, 44, 44, 44, 44, 44, 7, 7, 7, 7, 7, 36, 36, 36, 36, 36, 36, 36, 36, 70, 43, 43, 43, 43, 40, 21, 2, 82, 44, 44, 36, 36, 36, 43, 43, 75, 43, 43, 43, 43, 75, 43, 75, 43, 43, 44, 2, 2, 2, 2, 2, 2, 2, 64, 36, 36, 36, 36, 70, 43, 44, 64, 44, 44, 44, 44, 44, 44, 44, 44, 36, 36, 62, 44, 44, 44, 44, 44, 44, 58, 43, 43, 43, 43, 43, 43, 43, 83, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 83, 71, 84, 85, 43, 43, 43, 83, 84, 85, 84, 70, 43, 43, 43, 36, 36, 36, 36, 36, 43, 2, 7, 7, 7, 7, 7, 86, 36, 36, 36, 36, 36, 36, 36, 70, 84, 81, 36, 36, 36, 62, 81, 62, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 36, 36, 36, 62, 62, 44, 36, 36, 44, 71, 84, 85, 43, 80, 87, 88, 87, 85, 62, 44, 44, 44, 87, 44, 44, 36, 81, 36, 43, 44, 7, 7, 7, 7, 7, 36, 20, 27, 27, 27, 57, 44, 44, 58, 83, 81, 36, 36, 62, 44, 81, 62, 36, 81, 62, 36, 44, 80, 84, 85, 80, 44, 58, 80, 58, 43, 44, 58, 44, 44, 44, 81, 36, 62, 62, 44, 44, 44, 7, 7, 7, 7, 7, 43, 36, 70, 44, 44, 44, 44, 44, 58, 83, 81, 36, 36, 36, 36, 81, 36, 81, 36, 36, 36, 36, 36, 36, 62, 36, 81, 36, 36, 44, 71, 84, 85, 43, 43, 58, 83, 87, 85, 44, 62, 44, 44, 44, 44, 44, 44, 44, 66, 44, 44, 44, 81, 44, 44, 44, 58, 84, 81, 36, 36, 36, 62, 81, 62, 36, 81, 36, 36, 44, 71, 85, 85, 43, 80, 87, 88, 87, 85, 44, 44, 44, 44, 83, 44, 44, 36, 81, 78, 27, 27, 27, 44, 44, 44, 44, 44, 71, 81, 36, 36, 62, 44, 36, 62, 36, 36, 44, 81, 62, 62, 36, 44, 81, 62, 44, 36, 62, 44, 36, 36, 36, 36, 36, 36, 44, 44, 84, 83, 88, 44, 84, 88, 84, 85, 44, 62, 44, 44, 87, 44, 44, 44, 44, 27, 89, 67, 67, 57, 90, 44, 44, 83, 84, 81, 36, 36, 36, 62, 36, 62, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 44, 81, 43, 83, 84, 88, 43, 80, 43, 43, 44, 44, 44, 58, 80, 36, 62, 44, 44, 44, 44, 44, 44, 27, 27, 27, 89, 58, 84, 81, 36, 36, 36, 62, 36, 36, 36, 81, 36, 36, 44, 71, 85, 84, 84, 88, 83, 88, 84, 43, 44, 44, 44, 87, 88, 44, 44, 44, 62, 81, 62, 44, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 62, 81, 84, 85, 43, 80, 84, 88, 84, 85, 62, 44, 44, 44, 87, 44, 44, 44, 81, 27, 27, 27, 44, 56, 36, 36, 36, 44, 84, 81, 36, 36, 36, 36, 36, 36, 36, 36, 62, 44, 36, 36, 36, 36, 81, 36, 36, 36, 36, 81, 44, 36, 36, 36, 62, 44, 80, 44, 87, 84, 43, 80, 80, 84, 84, 84, 84, 44, 84, 64, 44, 44, 44, 44, 44, 81, 36, 36, 36, 36, 36, 36, 36, 70, 36, 43, 43, 43, 80, 44, 91, 36, 36, 36, 75, 43, 43, 43, 61, 7, 7, 7, 7, 7, 2, 44, 44, 81, 62, 62, 81, 62, 62, 81, 44, 44, 44, 36, 36, 81, 36, 36, 36, 81, 36, 81, 81, 44, 36, 81, 36, 70, 36, 43, 43, 43, 58, 71, 44, 36, 36, 62, 82, 43, 43, 43, 44, 7, 7, 7, 7, 7, 44, 36, 36, 77, 67, 2, 2, 2, 2, 2, 2, 2, 92, 92, 67, 43, 67, 67, 67, 7, 7, 7, 7, 7, 27, 27, 27, 27, 27, 50, 50, 50, 4, 4, 84, 36, 36, 36, 36, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 44, 58, 43, 43, 43, 43, 43, 43, 83, 43, 43, 61, 43, 36, 36, 70, 43, 43, 43, 43, 43, 58, 43, 43, 43, 43, 43, 43, 43, 43, 43, 80, 67, 67, 67, 67, 76, 67, 67, 90, 67, 2, 2, 92, 67, 21, 64, 44, 44, 36, 36, 36, 36, 36, 93, 85, 43, 83, 43, 43, 43, 85, 83, 85, 71, 7, 7, 7, 7, 7, 2, 2, 2, 36, 36, 36, 84, 43, 36, 36, 43, 71, 84, 94, 93, 84, 84, 84, 36, 70, 43, 71, 36, 36, 36, 36, 36, 36, 83, 85, 83, 84, 84, 85, 93, 7, 7, 7, 7, 7, 84, 85, 67, 11, 11, 11, 48, 44, 44, 48, 44, 36, 36, 36, 36, 36, 63, 69, 36, 36, 36, 36, 36, 62, 36, 36, 44, 36, 36, 36, 62, 62, 36, 36, 44, 62, 36, 36, 44, 36, 36, 36, 62, 62, 36, 36, 44, 36, 36, 36, 36, 36, 36, 36, 62, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 58, 43, 2, 2, 2, 2, 95, 27, 27, 27, 27, 27, 27, 27, 27, 27, 96, 44, 67, 67, 67, 67, 67, 44, 44, 44, 11, 11, 11, 44, 16, 16, 16, 44, 97, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 63, 72, 98, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 99, 100, 44, 36, 36, 36, 36, 36, 63, 2, 101, 102, 36, 36, 36, 62, 44, 44, 44, 36, 36, 36, 36, 36, 36, 62, 36, 36, 43, 80, 44, 44, 44, 44, 44, 36, 43, 61, 64, 44, 44, 44, 44, 36, 43, 44, 44, 44, 44, 44, 44, 62, 43, 44, 44, 44, 44, 44, 44, 36, 36, 43, 85, 43, 43, 43, 84, 84, 84, 84, 83, 85, 43, 43, 43, 43, 43, 2, 86, 2, 66, 70, 44, 7, 7, 7, 7, 7, 44, 44, 44, 27, 27, 27, 27, 27, 44, 44, 44, 2, 2, 2, 103, 2, 60, 43, 68, 36, 104, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 44, 44, 44, 44, 36, 36, 36, 36, 70, 62, 44, 44, 36, 36, 36, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 62, 43, 83, 84, 85, 83, 84, 44, 44, 84, 83, 84, 84, 85, 43, 44, 44, 90, 44, 2, 7, 7, 7, 7, 7, 36, 36, 36, 36, 36, 36, 36, 44, 36, 36, 36, 36, 36, 36, 44, 44, 36, 36, 36, 36, 36, 44, 44, 44, 7, 7, 7, 7, 7, 96, 44, 67, 67, 67, 67, 67, 67, 67, 67, 67, 36, 36, 36, 70, 83, 85, 44, 2, 36, 36, 93, 83, 43, 43, 43, 80, 83, 83, 85, 43, 43, 43, 83, 84, 84, 85, 43, 43, 43, 43, 80, 58, 2, 2, 2, 86, 2, 2, 2, 44, 43, 43, 43, 43, 43, 43, 43, 105, 43, 43, 94, 36, 36, 36, 36, 36, 36, 36, 83, 43, 43, 83, 83, 84, 84, 83, 94, 36, 36, 36, 44, 44, 92, 67, 67, 67, 67, 50, 43, 43, 43, 43, 67, 67, 67, 67, 90, 44, 43, 94, 36, 36, 36, 36, 36, 36, 93, 43, 43, 84, 43, 85, 43, 36, 36, 36, 36, 83, 43, 84, 85, 85, 43, 84, 44, 44, 44, 44, 2, 2, 36, 36, 84, 84, 84, 84, 43, 43, 43, 43, 84, 43, 44, 54, 2, 2, 7, 7, 7, 7, 7, 44, 81, 36, 36, 36, 36, 36, 40, 40, 40, 2, 2, 2, 2, 2, 44, 44, 44, 44, 43, 61, 43, 43, 43, 43, 43, 43, 83, 43, 43, 43, 71, 36, 70, 36, 36, 84, 71, 62, 43, 44, 44, 44, 16, 16, 16, 16, 16, 16, 40, 40, 40, 40, 40, 40, 40, 45, 16, 16, 16, 16, 16, 16, 45, 16, 16, 16, 16, 16, 16, 16, 16, 106, 40, 40, 43, 43, 43, 44, 44, 44, 43, 43, 32, 32, 32, 16, 16, 16, 16, 32, 16, 16, 16, 16, 11, 11, 11, 11, 16, 16, 16, 44, 11, 11, 11, 44, 16, 16, 16, 16, 48, 48, 48, 48, 16, 16, 16, 16, 16, 16, 16, 44, 16, 16, 16, 16, 107, 107, 107, 107, 16, 16, 108, 16, 11, 11, 109, 110, 41, 16, 108, 16, 11, 11, 109, 41, 16, 16, 44, 16, 11, 11, 111, 41, 16, 16, 16, 16, 11, 11, 112, 41, 44, 16, 108, 16, 11, 11, 109, 113, 114, 114, 114, 114, 114, 115, 65, 65, 116, 116, 116, 2, 117, 118, 117, 118, 2, 2, 2, 2, 119, 65, 65, 120, 2, 2, 2, 2, 121, 122, 2, 123, 124, 2, 125, 126, 2, 2, 2, 2, 2, 9, 124, 2, 2, 2, 2, 127, 65, 65, 68, 65, 65, 65, 65, 65, 128, 44, 27, 27, 27, 8, 125, 129, 27, 27, 27, 27, 27, 8, 125, 100, 40, 40, 40, 40, 40, 40, 82, 44, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 130, 43, 43, 43, 43, 43, 43, 131, 51, 132, 51, 132, 43, 43, 43, 43, 43, 80, 44, 44, 44, 44, 44, 44, 44, 67, 133, 67, 134, 67, 34, 11, 16, 11, 32, 134, 67, 49, 11, 11, 67, 67, 67, 133, 133, 133, 11, 11, 135, 11, 11, 35, 36, 39, 67, 16, 11, 8, 8, 49, 16, 16, 26, 67, 136, 27, 27, 27, 27, 27, 27, 27, 27, 101, 101, 101, 101, 101, 101, 101, 101, 101, 137, 138, 101, 139, 67, 44, 44, 8, 8, 140, 67, 67, 8, 67, 67, 140, 26, 67, 140, 67, 67, 67, 140, 67, 67, 67, 67, 67, 67, 67, 8, 67, 140, 140, 67, 67, 67, 67, 67, 67, 67, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 67, 67, 67, 67, 4, 4, 67, 67, 8, 67, 67, 67, 141, 142, 67, 67, 67, 67, 67, 67, 67, 67, 140, 67, 67, 67, 67, 67, 67, 26, 8, 8, 8, 8, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 8, 8, 8, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 90, 44, 44, 67, 67, 67, 90, 44, 44, 44, 44, 27, 27, 27, 27, 27, 27, 67, 67, 67, 67, 67, 67, 67, 27, 27, 27, 67, 67, 67, 26, 67, 67, 67, 67, 26, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 8, 8, 8, 8, 67, 67, 67, 67, 67, 67, 67, 26, 67, 67, 67, 67, 4, 4, 4, 4, 4, 4, 4, 27, 27, 27, 27, 27, 27, 27, 67, 67, 67, 67, 67, 67, 8, 8, 125, 143, 8, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 4, 8, 125, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 143, 8, 8, 8, 8, 8, 8, 8, 4, 4, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 140, 26, 8, 8, 140, 67, 67, 67, 44, 67, 67, 67, 67, 67, 67, 67, 67, 44, 67, 67, 67, 67, 67, 67, 67, 67, 67, 44, 56, 67, 67, 67, 67, 67, 90, 67, 67, 67, 67, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 67, 67, 11, 11, 11, 11, 11, 11, 11, 47, 16, 16, 16, 16, 16, 16, 16, 108, 32, 11, 32, 34, 34, 34, 34, 11, 32, 32, 34, 16, 16, 16, 40, 11, 32, 32, 136, 67, 67, 134, 34, 145, 43, 32, 44, 44, 54, 2, 95, 2, 16, 16, 16, 53, 44, 44, 53, 44, 36, 36, 36, 36, 44, 44, 44, 52, 64, 44, 44, 44, 44, 44, 44, 58, 36, 36, 36, 62, 44, 44, 44, 44, 36, 36, 36, 62, 36, 36, 36, 62, 2, 117, 117, 2, 121, 122, 117, 2, 2, 2, 2, 6, 2, 103, 117, 2, 117, 4, 4, 4, 4, 2, 2, 86, 2, 2, 2, 2, 2, 116, 2, 2, 103, 146, 44, 44, 44, 44, 44, 44, 67, 67, 67, 67, 67, 56, 67, 67, 67, 67, 44, 44, 44, 44, 44, 44, 67, 67, 67, 44, 44, 44, 44, 44, 67, 67, 67, 67, 67, 67, 44, 44, 1, 2, 147, 148, 4, 4, 4, 4, 4, 67, 4, 4, 4, 4, 149, 150, 151, 101, 101, 101, 101, 43, 43, 84, 152, 40, 40, 67, 101, 153, 63, 67, 36, 36, 36, 62, 58, 154, 155, 69, 36, 36, 36, 36, 36, 63, 40, 69, 44, 44, 81, 36, 36, 36, 36, 36, 67, 27, 27, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 90, 27, 27, 27, 27, 27, 67, 67, 67, 67, 67, 67, 67, 27, 27, 27, 27, 156, 27, 27, 27, 27, 27, 27, 27, 36, 36, 104, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 157, 2, 7, 7, 7, 7, 7, 36, 44, 44, 32, 32, 32, 32, 32, 32, 32, 70, 51, 158, 43, 43, 43, 43, 43, 86, 32, 32, 32, 32, 32, 32, 40, 43, 36, 36, 36, 101, 101, 101, 101, 101, 43, 2, 2, 2, 44, 44, 44, 44, 41, 41, 41, 155, 40, 40, 40, 40, 41, 32, 32, 32, 32, 32, 32, 32, 16, 32, 32, 32, 32, 32, 32, 32, 45, 16, 16, 16, 34, 34, 34, 32, 32, 32, 32, 32, 42, 159, 34, 35, 32, 32, 16, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 11, 11, 44, 11, 11, 32, 32, 44, 44, 44, 44, 44, 44, 44, 81, 40, 35, 36, 36, 36, 71, 36, 71, 36, 70, 36, 36, 36, 93, 85, 83, 67, 67, 44, 44, 27, 27, 27, 67, 160, 44, 44, 44, 36, 36, 2, 2, 44, 44, 44, 44, 84, 36, 36, 36, 36, 36, 36, 36, 36, 36, 84, 84, 84, 84, 84, 84, 84, 84, 80, 44, 44, 44, 44, 2, 43, 36, 36, 36, 2, 72, 72, 44, 36, 36, 36, 43, 43, 43, 43, 2, 36, 36, 36, 70, 43, 43, 43, 43, 43, 84, 44, 44, 44, 44, 44, 54, 36, 70, 84, 43, 43, 84, 83, 84, 161, 2, 2, 2, 2, 2, 2, 52, 7, 7, 7, 7, 7, 44, 44, 2, 36, 36, 70, 69, 36, 36, 36, 36, 7, 7, 7, 7, 7, 36, 36, 62, 36, 36, 36, 36, 70, 43, 43, 83, 85, 83, 85, 80, 44, 44, 44, 44, 36, 70, 36, 36, 36, 36, 83, 44, 7, 7, 7, 7, 7, 44, 2, 2, 69, 36, 36, 77, 67, 93, 83, 36, 71, 43, 71, 70, 71, 36, 36, 43, 70, 62, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 81, 104, 2, 36, 36, 36, 36, 36, 93, 43, 84, 2, 104, 162, 80, 44, 44, 44, 44, 81, 36, 36, 62, 81, 36, 36, 62, 81, 36, 36, 62, 44, 44, 44, 44, 16, 16, 16, 16, 16, 110, 40, 40, 16, 16, 16, 44, 44, 44, 44, 44, 36, 93, 85, 84, 83, 161, 85, 44, 36, 36, 44, 44, 44, 44, 44, 44, 36, 36, 36, 62, 44, 81, 36, 36, 163, 163, 163, 163, 163, 163, 163, 163, 164, 164, 164, 164, 164, 164, 164, 164, 16, 16, 16, 108, 44, 44, 44, 44, 44, 53, 16, 16, 44, 44, 81, 71, 36, 36, 36, 36, 165, 36, 36, 36, 36, 36, 36, 62, 36, 36, 62, 62, 36, 81, 62, 36, 36, 36, 36, 36, 36, 41, 41, 41, 41, 41, 41, 41, 41, 44, 44, 44, 44, 44, 44, 44, 44, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 144, 44, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 160, 44, 2, 2, 2, 166, 126, 44, 44, 44, 6, 167, 168, 144, 144, 144, 144, 144, 144, 144, 126, 166, 126, 2, 123, 169, 2, 64, 2, 2, 149, 144, 144, 126, 2, 170, 8, 171, 66, 2, 44, 44, 36, 36, 62, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 62, 79, 54, 2, 3, 2, 4, 5, 6, 2, 16, 16, 16, 16, 16, 17, 18, 125, 126, 4, 2, 36, 36, 36, 36, 36, 69, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 40, 44, 36, 36, 36, 44, 36, 36, 36, 44, 36, 36, 36, 44, 36, 62, 44, 20, 172, 57, 130, 26, 8, 140, 90, 44, 44, 44, 44, 79, 65, 67, 44, 36, 36, 36, 36, 36, 36, 81, 36, 36, 36, 36, 36, 36, 62, 36, 81, 2, 64, 44, 173, 27, 27, 27, 27, 27, 27, 44, 56, 67, 67, 67, 67, 101, 101, 139, 27, 89, 67, 67, 67, 67, 67, 67, 67, 67, 27, 90, 44, 90, 44, 44, 44, 44, 44, 44, 44, 67, 67, 67, 67, 67, 67, 50, 44, 174, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 44, 44, 27, 27, 44, 44, 44, 44, 44, 44, 148, 36, 36, 36, 36, 175, 44, 44, 36, 36, 36, 43, 43, 80, 44, 44, 36, 36, 36, 36, 36, 36, 36, 54, 36, 36, 44, 44, 36, 36, 36, 36, 176, 101, 101, 44, 44, 44, 44, 44, 11, 11, 11, 11, 16, 16, 16, 16, 36, 36, 44, 44, 44, 44, 44, 54, 36, 36, 36, 44, 62, 36, 36, 36, 36, 36, 36, 81, 62, 44, 62, 81, 36, 36, 36, 54, 27, 27, 27, 27, 36, 36, 36, 77, 156, 27, 27, 27, 44, 44, 44, 173, 27, 27, 27, 27, 36, 62, 36, 44, 44, 173, 27, 27, 36, 36, 36, 27, 27, 27, 44, 54, 36, 36, 36, 36, 36, 44, 44, 54, 36, 36, 36, 36, 44, 44, 27, 36, 44, 27, 27, 27, 27, 27, 27, 27, 70, 43, 58, 80, 44, 44, 43, 43, 36, 36, 81, 36, 81, 36, 36, 36, 36, 36, 44, 44, 43, 80, 44, 58, 27, 27, 27, 27, 44, 44, 44, 44, 2, 2, 2, 2, 64, 44, 44, 44, 36, 36, 36, 36, 36, 36, 177, 30, 36, 36, 36, 36, 36, 36, 177, 27, 36, 36, 36, 36, 78, 36, 36, 36, 36, 36, 70, 80, 44, 173, 27, 27, 2, 2, 2, 64, 44, 44, 44, 44, 36, 36, 36, 44, 54, 2, 2, 2, 36, 36, 36, 44, 27, 27, 27, 27, 36, 62, 44, 44, 27, 27, 27, 27, 36, 44, 44, 44, 54, 2, 64, 44, 44, 44, 44, 44, 173, 27, 27, 27, 36, 36, 36, 36, 62, 44, 44, 44, 11, 47, 44, 44, 44, 44, 44, 44, 16, 108, 44, 44, 44, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 96, 85, 94, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 43, 43, 43, 43, 43, 43, 43, 61, 2, 2, 2, 44, 27, 27, 27, 7, 7, 7, 7, 7, 44, 44, 44, 44, 44, 44, 44, 58, 84, 85, 43, 83, 85, 61, 178, 2, 2, 44, 44, 44, 44, 44, 44, 44, 43, 71, 36, 36, 36, 36, 36, 36, 36, 36, 36, 70, 43, 43, 85, 43, 43, 43, 80, 7, 7, 7, 7, 7, 2, 2, 44, 44, 44, 44, 44, 44, 36, 70, 2, 62, 44, 44, 44, 44, 36, 93, 84, 43, 43, 43, 43, 83, 94, 36, 63, 2, 2, 43, 61, 44, 7, 7, 7, 7, 7, 63, 63, 2, 173, 27, 27, 27, 27, 27, 27, 27, 27, 27, 96, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 84, 85, 43, 84, 83, 43, 2, 2, 2, 44, 36, 36, 36, 62, 62, 36, 36, 81, 36, 36, 36, 36, 36, 36, 36, 81, 36, 36, 36, 36, 63, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 70, 84, 85, 43, 43, 43, 80, 44, 44, 43, 84, 81, 36, 36, 36, 62, 81, 83, 84, 88, 87, 88, 87, 84, 44, 62, 44, 44, 87, 44, 44, 81, 36, 36, 84, 44, 43, 43, 43, 80, 44, 43, 43, 80, 44, 44, 44, 44, 44, 84, 85, 43, 43, 83, 83, 84, 85, 83, 43, 36, 72, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 93, 84, 43, 43, 44, 84, 84, 43, 85, 61, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 36, 36, 43, 44, 84, 85, 43, 43, 43, 83, 85, 85, 61, 2, 62, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 70, 85, 84, 43, 43, 43, 85, 44, 44, 44, 44, 36, 36, 36, 36, 36, 44, 58, 43, 84, 43, 43, 85, 43, 43, 44, 44, 7, 7, 7, 7, 7, 27, 2, 92, 27, 96, 44, 44, 44, 44, 44, 81, 101, 101, 101, 101, 101, 101, 101, 175, 2, 2, 64, 44, 44, 44, 44, 44, 43, 43, 61, 44, 44, 44, 44, 44, 43, 43, 43, 61, 2, 2, 67, 67, 40, 40, 92, 44, 44, 44, 44, 44, 7, 7, 7, 7, 7, 173, 27, 27, 27, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 44, 44, 81, 36, 93, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 88, 43, 74, 40, 40, 40, 40, 40, 40, 36, 44, 44, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 44, 50, 61, 65, 65, 44, 44, 44, 44, 44, 44, 67, 67, 67, 90, 56, 67, 67, 67, 67, 67, 179, 85, 43, 67, 179, 84, 84, 180, 65, 65, 65, 181, 43, 43, 43, 76, 50, 43, 43, 43, 67, 67, 67, 67, 67, 67, 67, 43, 43, 67, 67, 67, 67, 67, 90, 44, 44, 44, 67, 43, 76, 44, 44, 44, 44, 44, 27, 44, 44, 44, 44, 44, 44, 44, 11, 11, 11, 11, 11, 16, 16, 16, 16, 16, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 16, 16, 16, 108, 16, 16, 16, 16, 16, 11, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 47, 11, 44, 47, 48, 47, 48, 11, 47, 11, 11, 11, 11, 16, 16, 53, 53, 16, 16, 16, 53, 16, 16, 16, 16, 16, 16, 16, 11, 48, 11, 47, 48, 11, 11, 11, 47, 11, 11, 11, 47, 16, 16, 16, 16, 16, 11, 48, 11, 47, 11, 11, 47, 47, 44, 11, 11, 11, 47, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 11, 11, 11, 11, 11, 16, 16, 16, 16, 16, 16, 16, 16, 44, 11, 11, 11, 11, 31, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 33, 16, 16, 16, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 31, 16, 16, 16, 16, 33, 16, 16, 16, 11, 11, 11, 11, 31, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 33, 16, 16, 16, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 31, 16, 16, 16, 16, 33, 16, 16, 16, 11, 11, 11, 11, 31, 16, 16, 16, 16, 33, 16, 16, 16, 32, 44, 7, 7, 7, 7, 7, 7, 7, 7, 7, 43, 43, 43, 76, 67, 50, 43, 43, 43, 43, 43, 43, 43, 43, 76, 67, 67, 67, 50, 67, 67, 67, 67, 67, 67, 67, 76, 21, 2, 2, 44, 44, 44, 44, 44, 44, 44, 58, 43, 43, 36, 36, 62, 173, 27, 27, 27, 27, 43, 43, 43, 80, 44, 44, 44, 44, 36, 36, 81, 36, 36, 36, 36, 36, 81, 62, 62, 81, 81, 36, 36, 36, 36, 62, 36, 36, 81, 81, 44, 44, 44, 62, 44, 81, 81, 81, 81, 36, 81, 62, 62, 81, 81, 81, 81, 81, 81, 62, 62, 81, 36, 62, 36, 36, 36, 62, 36, 36, 81, 36, 62, 62, 36, 36, 36, 36, 36, 81, 36, 36, 81, 36, 81, 36, 36, 81, 36, 36, 8, 44, 44, 44, 44, 44, 44, 44, 56, 67, 67, 67, 67, 67, 67, 67, 44, 44, 44, 67, 67, 67, 67, 67, 67, 90, 44, 44, 44, 44, 44, 44, 67, 67, 67, 67, 67, 25, 41, 41, 67, 67, 56, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 90, 44, 67, 67, 90, 44, 44, 44, 44, 44, 67, 67, 67, 67, 44, 44, 44, 44, 67, 67, 67, 67, 67, 67, 67, 44, 79, 44, 44, 44, 44, 44, 44, 44, 65, 65, 65, 65, 65, 65, 65, 65, 164, 164, 164, 164, 164, 164, 164, 44, }; static RE_UINT8 re_general_category_stage_5[] = { 15, 15, 12, 23, 23, 23, 25, 23, 20, 21, 23, 24, 23, 19, 9, 9, 24, 24, 24, 23, 23, 1, 1, 1, 1, 20, 23, 21, 26, 22, 26, 2, 2, 2, 2, 20, 24, 21, 24, 15, 25, 25, 27, 23, 26, 27, 5, 28, 24, 16, 27, 26, 27, 24, 11, 11, 26, 11, 5, 29, 11, 23, 1, 24, 1, 2, 2, 24, 2, 1, 2, 5, 5, 5, 1, 3, 3, 2, 5, 2, 4, 4, 26, 26, 4, 26, 6, 6, 0, 0, 4, 2, 1, 23, 1, 0, 0, 1, 24, 1, 27, 6, 7, 7, 0, 4, 0, 2, 0, 23, 19, 0, 0, 27, 27, 25, 0, 6, 19, 6, 23, 6, 6, 23, 5, 0, 5, 23, 23, 0, 16, 16, 23, 25, 27, 27, 16, 0, 4, 5, 5, 6, 6, 5, 23, 5, 6, 16, 6, 4, 4, 6, 6, 27, 5, 27, 27, 5, 0, 16, 6, 0, 0, 5, 4, 0, 6, 8, 8, 8, 8, 6, 23, 4, 0, 8, 8, 0, 11, 27, 27, 0, 0, 25, 23, 27, 5, 8, 8, 5, 23, 11, 11, 0, 19, 5, 12, 5, 5, 20, 21, 0, 10, 10, 10, 5, 19, 23, 5, 4, 7, 0, 2, 4, 3, 3, 2, 0, 3, 26, 2, 26, 0, 26, 1, 26, 26, 0, 12, 12, 12, 16, 19, 19, 28, 29, 20, 28, 13, 14, 16, 12, 23, 28, 29, 23, 23, 22, 22, 23, 24, 20, 21, 23, 23, 12, 11, 4, 21, 4, 25, 0, 6, 7, 7, 6, 1, 27, 27, 1, 27, 2, 2, 27, 10, 1, 2, 10, 10, 11, 24, 27, 27, 20, 21, 27, 21, 24, 21, 20, 2, 6, 20, 0, 27, 4, 5, 10, 19, 20, 21, 21, 27, 10, 19, 4, 10, 4, 6, 26, 26, 4, 27, 11, 4, 23, 7, 23, 26, 1, 25, 27, 8, 23, 4, 8, 18, 18, 17, 17, 5, 24, 23, 20, 19, 22, 22, 20, 22, 22, 24, 19, 24, 0, 24, 26, 0, 11, 6, 11, 10, 0, 23, 10, 5, 11, 23, 16, 27, 8, 8, 16, 16, 6, }; /* General_Category: 9628 bytes. */ RE_UINT32 re_get_general_category(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 11; code = ch ^ (f << 11); pos = (RE_UINT32)re_general_category_stage_1[f] << 4; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_general_category_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_general_category_stage_3[pos + f] << 3; f = code >> 1; code ^= f << 1; pos = (RE_UINT32)re_general_category_stage_4[pos + f] << 1; value = re_general_category_stage_5[pos + code]; return value; } /* Block. */ static RE_UINT8 re_block_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 16, 16, 16, 16, 18, 16, 19, 20, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 24, 25, 16, 16, 26, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 27, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, }; static RE_UINT8 re_block_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 9, 10, 11, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 28, 29, 30, 31, 31, 32, 32, 32, 33, 34, 34, 34, 34, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 50, 51, 51, 52, 53, 54, 55, 56, 56, 57, 57, 58, 59, 60, 61, 62, 62, 63, 64, 65, 65, 66, 67, 68, 68, 69, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 82, 83, 83, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 85, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 87, 87, 87, 87, 87, 87, 87, 87, 87, 88, 89, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 103, 104, 104, 104, 104, 104, 104, 104, 105, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 108, 108, 108, 108, 109, 110, 110, 110, 110, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 119, 126, 126, 126, 119, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 119, 119, 137, 119, 119, 119, 138, 139, 140, 141, 142, 143, 144, 119, 119, 145, 119, 146, 147, 148, 149, 119, 119, 150, 119, 119, 119, 151, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 152, 152, 152, 152, 152, 152, 152, 152, 153, 154, 155, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 156, 156, 156, 156, 156, 156, 156, 156, 157, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 158, 158, 158, 158, 158, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 159, 159, 159, 159, 160, 161, 162, 163, 119, 119, 119, 119, 119, 119, 164, 165, 166, 166, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 167, 168, 119, 119, 119, 119, 119, 119, 169, 169, 170, 170, 171, 119, 172, 119, 173, 173, 173, 173, 173, 173, 173, 173, 174, 174, 174, 174, 174, 175, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 176, 177, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 178, 178, 119, 119, 179, 180, 181, 181, 182, 182, 183, 183, 183, 183, 183, 183, 184, 185, 186, 187, 188, 188, 189, 189, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 191, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 193, 194, 195, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 196, 197, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 198, 198, 198, 198, 199, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 200, 119, 201, 202, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 203, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, }; static RE_UINT16 re_block_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 40, 41, 41, 42, 42, 42, 42, 42, 42, 43, 43, 44, 44, 45, 45, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 51, 51, 51, 52, 52, 52, 52, 52, 52, 53, 53, 54, 54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 60, 61, 61, 61, 19, 19, 19, 19, 62, 63, 63, 63, 64, 64, 64, 64, 64, 64, 64, 64, 65, 65, 65, 65, 66, 66, 66, 66, 67, 67, 67, 67, 67, 67, 67, 67, 68, 68, 68, 68, 68, 68, 68, 68, 69, 69, 69, 69, 69, 69, 69, 70, 70, 70, 71, 71, 71, 72, 72, 72, 73, 73, 73, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 75, 76, 76, 76, 76, 76, 76, 76, 76, 77, 77, 77, 77, 77, 77, 77, 77, 78, 78, 78, 78, 79, 79, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 81, 81, 81, 81, 81, 81, 81, 81, 82, 82, 83, 83, 83, 83, 83, 83, 84, 84, 84, 84, 84, 84, 84, 84, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 86, 86, 86, 87, 88, 88, 88, 88, 88, 88, 88, 88, 89, 89, 89, 89, 89, 89, 89, 89, 90, 90, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 91, 91, 91, 91, 92, 92, 92, 92, 92, 92, 92, 92, 93, 93, 93, 93, 93, 93, 94, 94, 95, 95, 95, 95, 95, 95, 95, 95, 96, 96, 96, 97, 97, 97, 97, 97, 98, 98, 98, 98, 98, 98, 99, 99, 100, 100, 100, 100, 100, 100, 100, 100, 101, 101, 101, 101, 101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 19, 103, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105, 106, 106, 106, 106, 106, 106, 107, 107, 107, 108, 108, 108, 108, 108, 108, 109, 110, 110, 111, 111, 111, 112, 113, 113, 113, 113, 113, 113, 113, 113, 114, 114, 114, 114, 114, 114, 114, 114, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 116, 116, 116, 116, 117, 117, 117, 117, 117, 117, 117, 117, 118, 118, 118, 118, 118, 118, 118, 118, 118, 119, 119, 119, 119, 120, 120, 120, 121, 121, 121, 121, 121, 121, 121, 121, 121, 121, 121, 121, 122, 122, 122, 122, 122, 122, 123, 123, 123, 123, 123, 123, 124, 124, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 125, 126, 126, 126, 127, 128, 128, 128, 128, 129, 129, 129, 129, 129, 129, 130, 130, 131, 131, 131, 132, 132, 132, 133, 133, 134, 134, 134, 134, 134, 134, 135, 135, 136, 136, 136, 136, 136, 136, 137, 137, 138, 138, 138, 138, 138, 138, 139, 139, 140, 140, 140, 141, 141, 141, 141, 142, 142, 142, 142, 142, 143, 143, 143, 143, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 145, 145, 145, 145, 145, 146, 146, 146, 146, 146, 146, 146, 146, 147, 147, 147, 147, 147, 147, 147, 147, 148, 148, 148, 148, 148, 148, 148, 148, 149, 149, 149, 149, 149, 149, 149, 149, 150, 150, 150, 150, 150, 150, 150, 150, 151, 151, 151, 151, 151, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 153, 154, 155, 156, 156, 157, 157, 158, 158, 158, 158, 158, 158, 158, 158, 158, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 159, 160, 161, 161, 161, 161, 161, 161, 161, 161, 162, 162, 162, 162, 162, 162, 162, 162, 163, 163, 163, 163, 164, 164, 164, 164, 164, 165, 165, 165, 165, 166, 166, 166, 19, 19, 19, 19, 19, 19, 19, 19, 167, 167, 168, 168, 168, 168, 169, 169, 170, 170, 170, 171, 171, 172, 172, 172, 173, 173, 174, 174, 174, 174, 19, 19, 175, 175, 175, 175, 175, 176, 176, 176, 177, 177, 177, 19, 19, 19, 19, 19, 178, 178, 178, 179, 179, 179, 179, 19, 180, 180, 180, 180, 180, 180, 180, 180, 181, 181, 181, 181, 182, 182, 183, 183, 184, 184, 184, 19, 19, 19, 185, 185, 186, 186, 187, 187, 19, 19, 19, 19, 188, 188, 189, 189, 189, 189, 189, 189, 190, 190, 190, 190, 190, 190, 191, 191, 192, 192, 19, 19, 193, 193, 193, 193, 194, 194, 194, 194, 195, 195, 196, 196, 197, 197, 197, 19, 19, 19, 19, 19, 198, 198, 198, 198, 198, 19, 19, 19, 199, 199, 199, 199, 199, 199, 199, 199, 19, 19, 19, 19, 19, 19, 200, 200, 201, 201, 201, 201, 201, 201, 201, 201, 202, 202, 202, 202, 202, 203, 203, 203, 204, 204, 204, 204, 204, 205, 205, 205, 206, 206, 206, 206, 206, 206, 207, 207, 208, 208, 208, 208, 208, 19, 19, 19, 209, 209, 209, 210, 210, 210, 210, 210, 211, 211, 211, 211, 211, 211, 211, 211, 212, 212, 212, 212, 212, 212, 19, 19, 213, 213, 213, 213, 213, 213, 213, 213, 214, 214, 214, 214, 214, 214, 19, 19, 215, 215, 215, 215, 215, 19, 19, 19, 216, 216, 216, 216, 19, 19, 19, 19, 19, 19, 217, 217, 217, 217, 217, 217, 19, 19, 19, 19, 218, 218, 218, 218, 219, 219, 219, 219, 219, 219, 219, 219, 220, 220, 220, 220, 220, 220, 220, 220, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 19, 19, 19, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222, 19, 19, 19, 19, 19, 223, 223, 223, 223, 223, 223, 223, 223, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 225, 225, 225, 19, 19, 19, 19, 19, 19, 226, 226, 226, 227, 227, 227, 227, 227, 227, 227, 227, 227, 19, 19, 19, 19, 19, 19, 19, 228, 228, 228, 228, 228, 228, 228, 228, 228, 228, 19, 19, 19, 19, 19, 19, 229, 229, 229, 229, 229, 229, 229, 229, 230, 230, 230, 230, 230, 230, 230, 230, 230, 230, 231, 19, 19, 19, 19, 19, 232, 232, 232, 232, 232, 232, 232, 232, 233, 233, 233, 233, 233, 233, 233, 233, 234, 234, 234, 234, 234, 19, 19, 19, 235, 235, 235, 235, 235, 235, 236, 236, 237, 237, 237, 237, 237, 237, 237, 237, 238, 238, 238, 238, 238, 238, 238, 238, 238, 238, 238, 19, 19, 19, 19, 19, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 19, 19, 240, 240, 240, 240, 240, 240, 240, 240, 241, 241, 241, 242, 242, 242, 242, 242, 242, 242, 243, 243, 243, 243, 243, 243, 244, 244, 244, 244, 244, 244, 244, 244, 245, 245, 245, 245, 245, 245, 245, 245, 246, 246, 246, 246, 246, 246, 246, 246, 247, 247, 247, 247, 247, 248, 248, 248, 249, 249, 249, 249, 249, 249, 249, 249, 250, 250, 250, 250, 250, 250, 250, 250, 251, 251, 251, 251, 251, 251, 251, 251, 252, 252, 252, 252, 252, 252, 252, 252, 253, 253, 253, 253, 253, 253, 253, 253, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 19, 19, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 257, 19, 19, 19, 19, 19, 258, 258, 258, 258, 258, 258, 258, 258, 258, 258, 19, 19, 19, 19, 19, 19, 259, 259, 259, 259, 259, 259, 259, 259, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 19, 261, 261, 261, 261, 261, 261, 261, 261, 262, 262, 262, 262, 262, 262, 262, 262, }; static RE_UINT16 re_block_stage_4[] = { 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 39, 39, 39, 40, 40, 40, 40, 41, 41, 41, 41, 42, 42, 42, 42, 43, 43, 43, 43, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 48, 48, 48, 48, 49, 49, 49, 49, 50, 50, 50, 50, 51, 51, 51, 51, 52, 52, 52, 52, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 55, 55, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 61, 61, 61, 61, 62, 62, 62, 62, 63, 63, 63, 63, 64, 64, 64, 64, 65, 65, 65, 65, 66, 66, 66, 66, 67, 67, 67, 67, 68, 68, 68, 68, 69, 69, 69, 69, 70, 70, 70, 70, 71, 71, 71, 71, 72, 72, 72, 72, 73, 73, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 76, 76, 76, 76, 77, 77, 77, 77, 78, 78, 78, 78, 79, 79, 79, 79, 80, 80, 80, 80, 81, 81, 81, 81, 82, 82, 82, 82, 83, 83, 83, 83, 84, 84, 84, 84, 85, 85, 85, 85, 86, 86, 86, 86, 87, 87, 87, 87, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 90, 91, 91, 91, 91, 92, 92, 92, 92, 93, 93, 93, 93, 94, 94, 94, 94, 95, 95, 95, 95, 96, 96, 96, 96, 97, 97, 97, 97, 98, 98, 98, 98, 99, 99, 99, 99, 100, 100, 100, 100, 101, 101, 101, 101, 102, 102, 102, 102, 103, 103, 103, 103, 104, 104, 104, 104, 105, 105, 105, 105, 106, 106, 106, 106, 107, 107, 107, 107, 108, 108, 108, 108, 109, 109, 109, 109, 110, 110, 110, 110, 111, 111, 111, 111, 112, 112, 112, 112, 113, 113, 113, 113, 114, 114, 114, 114, 115, 115, 115, 115, 116, 116, 116, 116, 117, 117, 117, 117, 118, 118, 118, 118, 119, 119, 119, 119, 120, 120, 120, 120, 121, 121, 121, 121, 122, 122, 122, 122, 123, 123, 123, 123, 124, 124, 124, 124, 125, 125, 125, 125, 126, 126, 126, 126, 127, 127, 127, 127, 128, 128, 128, 128, 129, 129, 129, 129, 130, 130, 130, 130, 131, 131, 131, 131, 132, 132, 132, 132, 133, 133, 133, 133, 134, 134, 134, 134, 135, 135, 135, 135, 136, 136, 136, 136, 137, 137, 137, 137, 138, 138, 138, 138, 139, 139, 139, 139, 140, 140, 140, 140, 141, 141, 141, 141, 142, 142, 142, 142, 143, 143, 143, 143, 144, 144, 144, 144, 145, 145, 145, 145, 146, 146, 146, 146, 147, 147, 147, 147, 148, 148, 148, 148, 149, 149, 149, 149, 150, 150, 150, 150, 151, 151, 151, 151, 152, 152, 152, 152, 153, 153, 153, 153, 154, 154, 154, 154, 155, 155, 155, 155, 156, 156, 156, 156, 157, 157, 157, 157, 158, 158, 158, 158, 159, 159, 159, 159, 160, 160, 160, 160, 161, 161, 161, 161, 162, 162, 162, 162, 163, 163, 163, 163, 164, 164, 164, 164, 165, 165, 165, 165, 166, 166, 166, 166, 167, 167, 167, 167, 168, 168, 168, 168, 169, 169, 169, 169, 170, 170, 170, 170, 171, 171, 171, 171, 172, 172, 172, 172, 173, 173, 173, 173, 174, 174, 174, 174, 175, 175, 175, 175, 176, 176, 176, 176, 177, 177, 177, 177, 178, 178, 178, 178, 179, 179, 179, 179, 180, 180, 180, 180, 181, 181, 181, 181, 182, 182, 182, 182, 183, 183, 183, 183, 184, 184, 184, 184, 185, 185, 185, 185, 186, 186, 186, 186, 187, 187, 187, 187, 188, 188, 188, 188, 189, 189, 189, 189, 190, 190, 190, 190, 191, 191, 191, 191, 192, 192, 192, 192, 193, 193, 193, 193, 194, 194, 194, 194, 195, 195, 195, 195, 196, 196, 196, 196, 197, 197, 197, 197, 198, 198, 198, 198, 199, 199, 199, 199, 200, 200, 200, 200, 201, 201, 201, 201, 202, 202, 202, 202, 203, 203, 203, 203, 204, 204, 204, 204, 205, 205, 205, 205, 206, 206, 206, 206, 207, 207, 207, 207, 208, 208, 208, 208, 209, 209, 209, 209, 210, 210, 210, 210, 211, 211, 211, 211, 212, 212, 212, 212, 213, 213, 213, 213, 214, 214, 214, 214, 215, 215, 215, 215, 216, 216, 216, 216, 217, 217, 217, 217, 218, 218, 218, 218, 219, 219, 219, 219, 220, 220, 220, 220, 221, 221, 221, 221, 222, 222, 222, 222, 223, 223, 223, 223, 224, 224, 224, 224, 225, 225, 225, 225, 226, 226, 226, 226, 227, 227, 227, 227, 228, 228, 228, 228, 229, 229, 229, 229, 230, 230, 230, 230, 231, 231, 231, 231, 232, 232, 232, 232, 233, 233, 233, 233, 234, 234, 234, 234, 235, 235, 235, 235, 236, 236, 236, 236, 237, 237, 237, 237, 238, 238, 238, 238, 239, 239, 239, 239, 240, 240, 240, 240, 241, 241, 241, 241, 242, 242, 242, 242, 243, 243, 243, 243, 244, 244, 244, 244, 245, 245, 245, 245, 246, 246, 246, 246, 247, 247, 247, 247, 248, 248, 248, 248, 249, 249, 249, 249, 250, 250, 250, 250, 251, 251, 251, 251, 252, 252, 252, 252, 253, 253, 253, 253, 254, 254, 254, 254, 255, 255, 255, 255, 256, 256, 256, 256, 257, 257, 257, 257, 258, 258, 258, 258, 259, 259, 259, 259, 260, 260, 260, 260, 261, 261, 261, 261, 262, 262, 262, 262, }; static RE_UINT16 re_block_stage_5[] = { 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 0, 0, 0, 0, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 39, 39, 39, 40, 40, 40, 40, 41, 41, 41, 41, 42, 42, 42, 42, 43, 43, 43, 43, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 48, 48, 48, 48, 49, 49, 49, 49, 50, 50, 50, 50, 51, 51, 51, 51, 52, 52, 52, 52, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 55, 55, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 61, 61, 61, 61, 62, 62, 62, 62, 63, 63, 63, 63, 64, 64, 64, 64, 65, 65, 65, 65, 66, 66, 66, 66, 67, 67, 67, 67, 68, 68, 68, 68, 69, 69, 69, 69, 70, 70, 70, 70, 71, 71, 71, 71, 72, 72, 72, 72, 73, 73, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 76, 76, 76, 76, 77, 77, 77, 77, 78, 78, 78, 78, 79, 79, 79, 79, 80, 80, 80, 80, 81, 81, 81, 81, 82, 82, 82, 82, 83, 83, 83, 83, 84, 84, 84, 84, 85, 85, 85, 85, 86, 86, 86, 86, 87, 87, 87, 87, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 90, 91, 91, 91, 91, 92, 92, 92, 92, 93, 93, 93, 93, 94, 94, 94, 94, 95, 95, 95, 95, 96, 96, 96, 96, 97, 97, 97, 97, 98, 98, 98, 98, 99, 99, 99, 99, 100, 100, 100, 100, 101, 101, 101, 101, 102, 102, 102, 102, 103, 103, 103, 103, 104, 104, 104, 104, 105, 105, 105, 105, 106, 106, 106, 106, 107, 107, 107, 107, 108, 108, 108, 108, 109, 109, 109, 109, 110, 110, 110, 110, 111, 111, 111, 111, 112, 112, 112, 112, 113, 113, 113, 113, 114, 114, 114, 114, 115, 115, 115, 115, 116, 116, 116, 116, 117, 117, 117, 117, 118, 118, 118, 118, 119, 119, 119, 119, 120, 120, 120, 120, 121, 121, 121, 121, 122, 122, 122, 122, 123, 123, 123, 123, 124, 124, 124, 124, 125, 125, 125, 125, 126, 126, 126, 126, 127, 127, 127, 127, 128, 128, 128, 128, 129, 129, 129, 129, 130, 130, 130, 130, 131, 131, 131, 131, 132, 132, 132, 132, 133, 133, 133, 133, 134, 134, 134, 134, 135, 135, 135, 135, 136, 136, 136, 136, 137, 137, 137, 137, 138, 138, 138, 138, 139, 139, 139, 139, 140, 140, 140, 140, 141, 141, 141, 141, 142, 142, 142, 142, 143, 143, 143, 143, 144, 144, 144, 144, 145, 145, 145, 145, 146, 146, 146, 146, 147, 147, 147, 147, 148, 148, 148, 148, 149, 149, 149, 149, 150, 150, 150, 150, 151, 151, 151, 151, 152, 152, 152, 152, 153, 153, 153, 153, 154, 154, 154, 154, 155, 155, 155, 155, 156, 156, 156, 156, 157, 157, 157, 157, 158, 158, 158, 158, 159, 159, 159, 159, 160, 160, 160, 160, 161, 161, 161, 161, 162, 162, 162, 162, 163, 163, 163, 163, 164, 164, 164, 164, 165, 165, 165, 165, 166, 166, 166, 166, 167, 167, 167, 167, 168, 168, 168, 168, 169, 169, 169, 169, 170, 170, 170, 170, 171, 171, 171, 171, 172, 172, 172, 172, 173, 173, 173, 173, 174, 174, 174, 174, 175, 175, 175, 175, 176, 176, 176, 176, 177, 177, 177, 177, 178, 178, 178, 178, 179, 179, 179, 179, 180, 180, 180, 180, 181, 181, 181, 181, 182, 182, 182, 182, 183, 183, 183, 183, 184, 184, 184, 184, 185, 185, 185, 185, 186, 186, 186, 186, 187, 187, 187, 187, 188, 188, 188, 188, 189, 189, 189, 189, 190, 190, 190, 190, 191, 191, 191, 191, 192, 192, 192, 192, 193, 193, 193, 193, 194, 194, 194, 194, 195, 195, 195, 195, 196, 196, 196, 196, 197, 197, 197, 197, 198, 198, 198, 198, 199, 199, 199, 199, 200, 200, 200, 200, 201, 201, 201, 201, 202, 202, 202, 202, 203, 203, 203, 203, 204, 204, 204, 204, 205, 205, 205, 205, 206, 206, 206, 206, 207, 207, 207, 207, 208, 208, 208, 208, 209, 209, 209, 209, 210, 210, 210, 210, 211, 211, 211, 211, 212, 212, 212, 212, 213, 213, 213, 213, 214, 214, 214, 214, 215, 215, 215, 215, 216, 216, 216, 216, 217, 217, 217, 217, 218, 218, 218, 218, 219, 219, 219, 219, 220, 220, 220, 220, 221, 221, 221, 221, 222, 222, 222, 222, 223, 223, 223, 223, 224, 224, 224, 224, 225, 225, 225, 225, 226, 226, 226, 226, 227, 227, 227, 227, 228, 228, 228, 228, 229, 229, 229, 229, 230, 230, 230, 230, 231, 231, 231, 231, 232, 232, 232, 232, 233, 233, 233, 233, 234, 234, 234, 234, 235, 235, 235, 235, 236, 236, 236, 236, 237, 237, 237, 237, 238, 238, 238, 238, 239, 239, 239, 239, 240, 240, 240, 240, 241, 241, 241, 241, 242, 242, 242, 242, 243, 243, 243, 243, 244, 244, 244, 244, 245, 245, 245, 245, 246, 246, 246, 246, 247, 247, 247, 247, 248, 248, 248, 248, 249, 249, 249, 249, 250, 250, 250, 250, 251, 251, 251, 251, 252, 252, 252, 252, 253, 253, 253, 253, 254, 254, 254, 254, 255, 255, 255, 255, 256, 256, 256, 256, 257, 257, 257, 257, 258, 258, 258, 258, 259, 259, 259, 259, 260, 260, 260, 260, 261, 261, 261, 261, 262, 262, 262, 262, }; /* Block: 8720 bytes. */ RE_UINT32 re_get_block(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_block_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_block_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_block_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_block_stage_4[pos + f] << 2; value = re_block_stage_5[pos + code]; return value; } /* Script. */ static RE_UINT8 re_script_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 12, 12, 12, 12, 13, 14, 14, 14, 14, 15, 16, 17, 18, 19, 20, 14, 21, 14, 22, 14, 14, 14, 14, 23, 14, 14, 14, 14, 14, 14, 14, 14, 24, 25, 14, 14, 26, 27, 14, 28, 29, 30, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 31, 7, 32, 33, 7, 34, 14, 14, 14, 14, 14, 35, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, }; static RE_UINT8 re_script_stage_2[] = { 0, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 32, 33, 34, 35, 36, 37, 37, 37, 37, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 2, 2, 53, 54, 55, 56, 57, 58, 59, 59, 59, 60, 61, 59, 59, 59, 59, 59, 59, 59, 62, 62, 59, 59, 59, 59, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 59, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 80, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 81, 82, 82, 82, 82, 82, 82, 82, 82, 82, 83, 84, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 97, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 71, 71, 99, 100, 101, 102, 103, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 98, 114, 115, 116, 117, 118, 119, 98, 120, 120, 121, 98, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 98, 98, 132, 98, 98, 98, 133, 134, 135, 136, 137, 138, 139, 98, 98, 140, 98, 141, 142, 143, 144, 98, 98, 145, 98, 98, 98, 146, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 147, 147, 147, 147, 147, 147, 147, 148, 149, 147, 150, 98, 98, 98, 98, 98, 151, 151, 151, 151, 151, 151, 151, 151, 152, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 153, 153, 153, 153, 154, 98, 98, 98, 155, 155, 155, 155, 156, 157, 158, 159, 98, 98, 98, 98, 98, 98, 160, 161, 162, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 163, 164, 98, 98, 98, 98, 98, 98, 59, 165, 166, 167, 168, 98, 169, 98, 170, 171, 172, 59, 59, 173, 59, 174, 175, 175, 175, 175, 175, 176, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 177, 178, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 179, 180, 98, 98, 181, 182, 183, 184, 185, 98, 59, 59, 59, 59, 186, 187, 59, 188, 189, 190, 191, 192, 193, 194, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 195, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 196, 71, 197, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 198, 98, 98, 71, 71, 71, 71, 199, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 200, 98, 201, 202, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, }; static RE_UINT16 re_script_stage_3[] = { 0, 0, 0, 0, 1, 2, 1, 2, 0, 0, 3, 3, 4, 5, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 0, 0, 7, 0, 8, 8, 8, 8, 8, 8, 8, 9, 10, 11, 12, 11, 11, 11, 13, 11, 14, 14, 14, 14, 14, 14, 14, 14, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16, 17, 18, 16, 17, 19, 20, 21, 21, 22, 21, 23, 24, 25, 26, 27, 27, 28, 29, 27, 30, 27, 27, 27, 27, 27, 31, 27, 27, 32, 33, 33, 33, 34, 27, 27, 27, 35, 35, 35, 36, 37, 37, 37, 38, 39, 39, 40, 41, 42, 43, 44, 44, 44, 44, 27, 45, 44, 44, 46, 27, 47, 47, 47, 47, 47, 48, 49, 47, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 123, 124, 123, 125, 44, 44, 126, 127, 128, 129, 130, 131, 44, 44, 132, 132, 132, 132, 133, 132, 134, 135, 132, 133, 132, 136, 136, 137, 44, 44, 138, 138, 138, 138, 138, 138, 138, 138, 138, 138, 139, 139, 140, 139, 139, 141, 142, 142, 142, 142, 142, 142, 142, 142, 143, 143, 143, 143, 144, 145, 143, 143, 144, 143, 143, 146, 147, 148, 143, 143, 143, 147, 143, 143, 143, 149, 143, 150, 143, 151, 152, 152, 152, 152, 152, 153, 154, 154, 154, 154, 154, 154, 154, 154, 155, 156, 157, 157, 157, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 168, 168, 168, 168, 169, 170, 170, 171, 172, 173, 173, 173, 173, 173, 174, 173, 173, 175, 154, 154, 154, 154, 176, 177, 178, 179, 179, 180, 181, 182, 183, 184, 184, 185, 184, 186, 187, 168, 168, 188, 189, 190, 190, 190, 191, 190, 192, 193, 193, 194, 195, 44, 44, 44, 44, 196, 196, 196, 196, 197, 196, 196, 198, 199, 199, 199, 199, 200, 200, 200, 201, 202, 202, 202, 203, 204, 205, 205, 205, 44, 44, 44, 44, 206, 207, 208, 209, 4, 4, 210, 4, 4, 211, 212, 213, 4, 4, 4, 214, 8, 8, 8, 215, 11, 216, 11, 11, 216, 217, 11, 218, 11, 11, 11, 219, 219, 220, 11, 221, 222, 0, 0, 0, 0, 0, 223, 224, 225, 226, 0, 225, 44, 8, 8, 227, 0, 0, 228, 229, 230, 0, 4, 4, 231, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 232, 0, 0, 233, 44, 232, 44, 0, 0, 234, 234, 234, 234, 234, 234, 234, 234, 0, 0, 0, 0, 0, 0, 0, 235, 0, 236, 0, 237, 238, 239, 240, 44, 241, 241, 242, 241, 241, 242, 4, 4, 243, 243, 243, 243, 243, 243, 243, 244, 139, 139, 140, 245, 245, 245, 246, 247, 143, 248, 249, 249, 249, 249, 14, 14, 0, 0, 0, 0, 250, 44, 44, 44, 251, 252, 251, 251, 251, 251, 251, 253, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 254, 44, 255, 256, 0, 257, 258, 259, 260, 260, 260, 260, 261, 262, 263, 263, 263, 263, 264, 265, 266, 267, 268, 142, 142, 142, 142, 269, 0, 266, 270, 0, 0, 271, 263, 142, 269, 0, 0, 0, 0, 142, 272, 0, 0, 0, 0, 0, 263, 263, 273, 263, 263, 263, 263, 263, 274, 0, 0, 251, 251, 251, 254, 0, 0, 0, 0, 251, 251, 251, 251, 251, 254, 44, 44, 275, 275, 275, 275, 275, 275, 275, 275, 276, 275, 275, 275, 277, 278, 278, 278, 279, 279, 279, 279, 279, 279, 279, 279, 279, 279, 280, 44, 14, 14, 14, 14, 14, 14, 281, 281, 281, 281, 281, 282, 0, 0, 283, 4, 4, 4, 4, 4, 284, 4, 285, 286, 44, 44, 44, 287, 288, 288, 289, 290, 291, 291, 291, 292, 293, 293, 293, 293, 294, 295, 47, 296, 297, 297, 298, 299, 299, 300, 142, 301, 302, 302, 302, 302, 303, 304, 138, 305, 306, 306, 306, 307, 308, 309, 138, 138, 310, 310, 310, 310, 311, 312, 313, 314, 315, 316, 249, 4, 4, 317, 318, 152, 152, 152, 152, 152, 313, 313, 319, 320, 142, 142, 321, 142, 322, 142, 142, 323, 44, 44, 44, 44, 44, 44, 44, 44, 251, 251, 251, 251, 251, 251, 324, 251, 251, 251, 251, 251, 251, 325, 44, 44, 326, 327, 21, 328, 329, 27, 27, 27, 27, 27, 27, 27, 330, 46, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 331, 44, 27, 27, 27, 27, 332, 27, 27, 333, 44, 44, 334, 8, 290, 335, 0, 0, 336, 337, 338, 27, 27, 27, 27, 27, 27, 27, 339, 340, 0, 1, 2, 1, 2, 341, 262, 263, 342, 142, 269, 343, 344, 345, 346, 347, 348, 349, 350, 351, 351, 44, 44, 348, 348, 348, 348, 348, 348, 348, 352, 353, 0, 0, 354, 11, 11, 11, 11, 355, 255, 356, 44, 44, 0, 0, 357, 358, 359, 360, 360, 360, 361, 362, 255, 363, 363, 364, 365, 366, 367, 367, 368, 369, 370, 371, 371, 372, 373, 44, 44, 374, 374, 374, 374, 374, 375, 375, 375, 376, 377, 378, 44, 44, 44, 44, 44, 379, 379, 380, 381, 381, 381, 382, 44, 383, 383, 383, 383, 383, 383, 383, 383, 383, 383, 383, 384, 383, 385, 386, 44, 387, 388, 388, 389, 390, 391, 392, 392, 393, 394, 395, 44, 44, 44, 396, 397, 398, 399, 400, 401, 44, 44, 44, 44, 402, 402, 403, 404, 403, 405, 403, 403, 406, 407, 408, 409, 410, 411, 412, 412, 413, 413, 44, 44, 414, 414, 415, 416, 417, 417, 417, 418, 419, 420, 421, 422, 423, 424, 425, 44, 44, 44, 44, 44, 426, 426, 426, 426, 427, 44, 44, 44, 428, 428, 428, 429, 428, 428, 428, 430, 44, 44, 44, 44, 44, 44, 27, 431, 432, 432, 432, 432, 433, 434, 432, 435, 436, 436, 436, 436, 437, 438, 439, 440, 441, 441, 441, 442, 443, 444, 444, 445, 446, 446, 446, 446, 447, 446, 448, 449, 450, 451, 450, 452, 44, 44, 44, 44, 453, 454, 455, 456, 456, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 467, 467, 467, 468, 469, 44, 44, 470, 470, 470, 471, 470, 472, 44, 44, 473, 473, 473, 473, 474, 475, 44, 44, 476, 476, 476, 477, 478, 44, 44, 44, 479, 480, 481, 479, 44, 44, 44, 44, 44, 44, 482, 482, 482, 482, 482, 483, 44, 44, 44, 44, 484, 484, 484, 485, 486, 486, 486, 486, 486, 486, 486, 486, 486, 487, 44, 44, 44, 44, 44, 44, 486, 486, 486, 486, 486, 486, 488, 489, 486, 486, 486, 486, 490, 44, 44, 44, 491, 491, 491, 491, 491, 491, 491, 491, 491, 491, 492, 44, 44, 44, 44, 44, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 494, 44, 44, 44, 281, 281, 281, 281, 281, 281, 281, 281, 281, 281, 281, 495, 496, 497, 498, 44, 44, 44, 44, 44, 44, 499, 500, 501, 502, 502, 502, 502, 503, 504, 505, 506, 502, 44, 44, 44, 44, 44, 44, 44, 507, 507, 507, 507, 508, 507, 507, 509, 510, 507, 44, 44, 44, 44, 44, 44, 511, 44, 44, 44, 44, 44, 44, 44, 512, 512, 512, 512, 512, 512, 513, 514, 515, 516, 271, 44, 44, 44, 44, 44, 0, 0, 0, 0, 0, 0, 0, 517, 0, 0, 518, 0, 0, 0, 519, 520, 521, 0, 522, 0, 0, 0, 523, 44, 11, 11, 11, 11, 524, 44, 44, 44, 0, 0, 0, 0, 0, 233, 0, 239, 0, 0, 0, 0, 0, 223, 0, 0, 0, 525, 526, 527, 528, 0, 0, 0, 529, 530, 0, 531, 532, 533, 0, 0, 0, 0, 236, 0, 0, 0, 0, 0, 0, 0, 0, 0, 534, 0, 0, 0, 535, 535, 535, 535, 535, 535, 535, 535, 536, 537, 538, 44, 44, 44, 44, 44, 539, 539, 539, 539, 539, 539, 539, 539, 539, 539, 539, 539, 540, 541, 44, 44, 542, 27, 543, 544, 545, 546, 547, 548, 549, 550, 551, 550, 44, 44, 44, 330, 0, 0, 255, 0, 0, 0, 0, 0, 0, 271, 225, 340, 340, 340, 0, 517, 552, 0, 225, 0, 0, 0, 255, 0, 0, 232, 44, 44, 44, 44, 553, 0, 554, 0, 0, 232, 523, 239, 44, 44, 0, 0, 0, 0, 0, 0, 0, 555, 0, 0, 528, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 556, 552, 271, 0, 0, 0, 0, 0, 0, 0, 271, 0, 0, 0, 0, 0, 557, 44, 44, 255, 0, 0, 0, 558, 290, 0, 0, 558, 0, 559, 44, 44, 44, 44, 44, 44, 523, 44, 44, 44, 44, 44, 44, 557, 44, 44, 44, 556, 44, 44, 44, 251, 251, 251, 251, 251, 560, 44, 44, 251, 251, 251, 561, 251, 251, 251, 251, 251, 324, 251, 251, 251, 251, 251, 251, 251, 251, 562, 44, 44, 44, 44, 44, 251, 324, 44, 44, 44, 44, 44, 44, 563, 44, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 44, }; static RE_UINT16 re_script_stage_4[] = { 0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 3, 0, 0, 0, 4, 0, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 5, 0, 2, 5, 6, 0, 7, 7, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15, 8, 8, 8, 8, 16, 8, 8, 8, 17, 18, 18, 18, 19, 19, 19, 19, 19, 20, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 21, 22, 22, 22, 24, 21, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 12, 12, 26, 26, 27, 12, 26, 28, 12, 12, 29, 30, 29, 31, 29, 29, 32, 33, 29, 29, 29, 29, 31, 29, 34, 7, 7, 35, 29, 29, 36, 29, 29, 29, 29, 29, 29, 30, 37, 37, 37, 38, 37, 37, 37, 37, 37, 37, 39, 40, 41, 41, 41, 41, 42, 12, 12, 12, 43, 43, 43, 43, 43, 43, 44, 12, 45, 45, 45, 45, 45, 45, 45, 46, 45, 45, 45, 47, 48, 48, 48, 48, 48, 48, 48, 49, 12, 12, 12, 12, 29, 50, 12, 12, 51, 29, 29, 29, 52, 52, 52, 52, 53, 52, 52, 52, 52, 54, 52, 52, 55, 56, 55, 57, 57, 55, 55, 55, 55, 55, 58, 55, 59, 60, 61, 55, 55, 57, 57, 62, 12, 63, 12, 64, 55, 60, 55, 55, 55, 55, 55, 12, 65, 65, 66, 67, 68, 69, 69, 69, 69, 69, 70, 69, 70, 71, 72, 70, 66, 67, 68, 72, 73, 12, 65, 74, 12, 75, 69, 69, 69, 72, 12, 12, 76, 76, 77, 78, 78, 77, 77, 77, 77, 77, 79, 77, 79, 76, 80, 77, 77, 78, 78, 80, 81, 12, 12, 12, 77, 82, 77, 77, 80, 12, 83, 12, 84, 84, 85, 86, 86, 85, 85, 85, 85, 85, 87, 85, 87, 84, 88, 85, 85, 86, 86, 88, 12, 89, 12, 90, 85, 89, 85, 85, 85, 85, 12, 12, 91, 92, 93, 91, 94, 95, 96, 94, 97, 98, 93, 91, 99, 99, 95, 91, 93, 91, 94, 95, 98, 97, 12, 12, 12, 91, 99, 99, 99, 99, 93, 12, 100, 101, 100, 102, 102, 100, 100, 100, 100, 100, 102, 100, 100, 100, 103, 101, 100, 102, 102, 103, 12, 104, 105, 12, 100, 106, 100, 100, 12, 12, 100, 100, 107, 107, 108, 109, 109, 108, 108, 108, 108, 108, 109, 108, 108, 107, 110, 108, 108, 109, 109, 110, 12, 111, 12, 112, 108, 113, 108, 108, 111, 12, 12, 12, 114, 114, 115, 116, 116, 115, 115, 115, 115, 115, 115, 115, 115, 115, 117, 114, 115, 116, 116, 117, 12, 118, 12, 118, 115, 119, 115, 115, 115, 120, 114, 115, 121, 122, 123, 123, 123, 124, 121, 123, 123, 123, 123, 123, 125, 123, 123, 126, 123, 124, 127, 128, 123, 129, 123, 123, 12, 121, 123, 123, 121, 130, 12, 12, 131, 132, 132, 132, 132, 132, 132, 132, 132, 132, 133, 134, 132, 132, 132, 12, 135, 136, 137, 138, 12, 139, 140, 139, 140, 141, 142, 140, 139, 139, 143, 144, 139, 137, 139, 144, 139, 139, 144, 139, 145, 145, 145, 145, 145, 145, 146, 145, 145, 145, 145, 147, 146, 145, 145, 145, 145, 145, 145, 148, 145, 149, 150, 12, 151, 151, 151, 151, 152, 152, 152, 152, 152, 153, 12, 154, 152, 152, 155, 152, 156, 156, 156, 156, 157, 157, 157, 157, 157, 157, 158, 159, 157, 160, 158, 159, 158, 159, 157, 160, 158, 159, 157, 157, 157, 160, 157, 157, 157, 157, 160, 161, 157, 157, 157, 162, 157, 157, 159, 12, 163, 163, 163, 163, 163, 164, 163, 164, 165, 165, 165, 165, 166, 166, 166, 166, 166, 166, 166, 167, 168, 168, 168, 168, 168, 168, 169, 170, 168, 168, 171, 12, 172, 172, 172, 173, 172, 174, 12, 12, 175, 175, 175, 175, 175, 176, 12, 12, 177, 177, 177, 177, 177, 12, 12, 12, 178, 178, 178, 179, 179, 12, 12, 12, 180, 180, 180, 180, 180, 180, 180, 181, 180, 180, 181, 12, 182, 183, 184, 185, 184, 184, 186, 12, 184, 184, 184, 184, 184, 184, 12, 12, 184, 184, 185, 12, 165, 187, 12, 12, 188, 188, 188, 188, 188, 188, 188, 189, 188, 188, 188, 12, 190, 188, 188, 188, 191, 191, 191, 191, 191, 191, 191, 192, 191, 193, 12, 12, 194, 194, 194, 194, 194, 194, 194, 12, 194, 194, 195, 12, 194, 194, 196, 197, 198, 198, 198, 198, 198, 198, 198, 199, 200, 200, 200, 200, 200, 200, 200, 201, 200, 200, 200, 202, 200, 200, 203, 12, 200, 200, 200, 203, 7, 7, 7, 204, 205, 205, 205, 205, 205, 205, 205, 12, 205, 205, 205, 206, 207, 207, 207, 207, 208, 208, 208, 208, 208, 12, 12, 208, 209, 209, 209, 209, 209, 209, 210, 209, 209, 209, 211, 212, 213, 213, 213, 213, 207, 207, 12, 12, 214, 7, 7, 7, 215, 7, 216, 217, 0, 218, 219, 12, 2, 220, 221, 2, 2, 2, 2, 222, 223, 220, 224, 2, 2, 2, 225, 2, 2, 2, 2, 226, 7, 219, 12, 7, 8, 227, 8, 227, 8, 8, 228, 228, 8, 8, 8, 227, 8, 15, 8, 8, 8, 10, 8, 229, 10, 15, 8, 14, 0, 0, 0, 230, 0, 231, 0, 0, 232, 0, 0, 233, 0, 0, 0, 234, 2, 2, 2, 235, 236, 12, 12, 12, 0, 237, 238, 0, 4, 0, 0, 0, 0, 0, 0, 4, 2, 2, 5, 12, 0, 0, 234, 12, 0, 234, 12, 12, 239, 239, 239, 239, 0, 240, 0, 0, 0, 241, 0, 0, 0, 0, 241, 242, 0, 0, 231, 0, 241, 12, 12, 12, 12, 12, 12, 0, 243, 243, 243, 243, 243, 243, 243, 244, 18, 18, 18, 18, 18, 12, 245, 18, 246, 246, 246, 246, 246, 246, 12, 247, 248, 12, 12, 247, 157, 160, 12, 12, 157, 160, 157, 160, 234, 12, 12, 12, 249, 249, 249, 249, 249, 249, 250, 249, 249, 12, 12, 12, 249, 251, 12, 12, 0, 0, 0, 12, 0, 252, 0, 0, 253, 249, 254, 255, 0, 0, 249, 0, 256, 257, 257, 257, 257, 257, 257, 257, 257, 258, 259, 260, 261, 262, 262, 262, 262, 262, 262, 262, 262, 262, 263, 261, 12, 264, 265, 265, 265, 265, 265, 265, 265, 265, 265, 266, 267, 156, 156, 156, 156, 156, 156, 268, 265, 265, 269, 12, 0, 12, 12, 12, 156, 156, 156, 270, 262, 262, 262, 271, 262, 262, 0, 0, 272, 272, 272, 272, 272, 272, 272, 273, 272, 274, 12, 12, 275, 275, 275, 275, 276, 276, 276, 276, 276, 276, 276, 12, 277, 277, 277, 277, 277, 277, 12, 12, 238, 2, 2, 2, 2, 2, 233, 2, 2, 2, 2, 278, 2, 2, 12, 12, 12, 279, 2, 2, 280, 280, 280, 280, 280, 280, 280, 12, 0, 0, 241, 12, 281, 281, 281, 281, 281, 281, 12, 12, 282, 282, 282, 282, 282, 283, 12, 284, 282, 282, 285, 12, 52, 52, 52, 286, 287, 287, 287, 287, 287, 287, 287, 288, 289, 289, 289, 289, 289, 12, 12, 290, 156, 156, 156, 291, 292, 292, 292, 292, 292, 292, 292, 293, 292, 292, 294, 295, 151, 151, 151, 296, 297, 297, 297, 297, 297, 298, 12, 12, 297, 297, 297, 299, 297, 297, 299, 297, 300, 300, 300, 300, 301, 12, 12, 12, 12, 12, 302, 300, 303, 303, 303, 303, 303, 304, 12, 12, 161, 160, 161, 160, 161, 160, 12, 12, 2, 2, 3, 2, 2, 305, 12, 12, 303, 303, 303, 306, 303, 303, 306, 12, 156, 12, 12, 12, 156, 268, 307, 156, 156, 156, 156, 12, 249, 249, 249, 251, 249, 249, 251, 12, 2, 308, 12, 12, 309, 22, 12, 25, 26, 27, 26, 310, 311, 312, 26, 26, 313, 12, 12, 12, 29, 29, 29, 314, 315, 29, 29, 29, 29, 29, 12, 12, 29, 29, 29, 313, 7, 7, 7, 316, 234, 0, 0, 0, 0, 234, 0, 12, 29, 317, 29, 29, 29, 29, 29, 318, 242, 0, 0, 0, 0, 319, 262, 262, 262, 262, 262, 320, 321, 156, 321, 156, 321, 156, 321, 291, 0, 234, 0, 234, 12, 12, 242, 241, 322, 322, 322, 323, 322, 322, 322, 322, 322, 324, 322, 322, 322, 322, 324, 325, 322, 322, 322, 326, 322, 322, 324, 12, 234, 134, 0, 0, 0, 134, 0, 0, 8, 8, 8, 327, 327, 12, 12, 12, 0, 0, 0, 328, 329, 329, 329, 329, 329, 329, 329, 330, 331, 331, 331, 331, 332, 12, 12, 12, 216, 0, 0, 0, 333, 333, 333, 333, 333, 12, 12, 12, 334, 334, 334, 334, 334, 334, 335, 12, 336, 336, 336, 336, 336, 336, 337, 12, 338, 338, 338, 338, 338, 338, 338, 339, 340, 340, 340, 340, 340, 12, 340, 340, 340, 341, 12, 12, 342, 342, 342, 342, 343, 343, 343, 343, 344, 344, 344, 344, 344, 344, 344, 345, 344, 344, 345, 12, 346, 346, 346, 346, 346, 346, 12, 12, 347, 347, 347, 347, 347, 12, 12, 348, 349, 349, 349, 349, 349, 350, 12, 12, 349, 351, 12, 12, 349, 349, 12, 12, 352, 353, 354, 352, 352, 352, 352, 352, 352, 355, 356, 357, 358, 358, 358, 358, 358, 359, 358, 358, 360, 360, 360, 360, 361, 361, 361, 361, 361, 361, 361, 362, 12, 363, 361, 361, 364, 364, 364, 364, 365, 366, 367, 364, 368, 368, 368, 368, 368, 368, 368, 369, 370, 370, 370, 370, 370, 370, 371, 372, 373, 373, 373, 373, 374, 374, 374, 374, 374, 374, 12, 374, 375, 374, 374, 374, 376, 377, 12, 376, 376, 378, 378, 376, 376, 376, 376, 376, 376, 12, 379, 380, 376, 376, 12, 12, 376, 376, 381, 12, 382, 382, 382, 382, 383, 383, 383, 383, 384, 384, 384, 384, 384, 385, 386, 384, 384, 385, 12, 12, 387, 387, 387, 387, 387, 388, 389, 387, 390, 390, 390, 390, 390, 391, 390, 390, 392, 392, 392, 392, 393, 12, 392, 392, 394, 394, 394, 394, 395, 12, 396, 397, 12, 12, 396, 394, 398, 398, 398, 398, 398, 398, 399, 12, 400, 400, 400, 400, 401, 12, 12, 12, 401, 12, 402, 400, 29, 29, 29, 403, 404, 404, 404, 404, 404, 404, 404, 405, 406, 404, 404, 404, 12, 12, 12, 407, 408, 408, 408, 408, 409, 12, 12, 12, 410, 410, 410, 410, 410, 410, 411, 12, 410, 410, 412, 12, 413, 413, 413, 413, 413, 414, 413, 413, 413, 12, 12, 12, 415, 415, 415, 415, 415, 416, 12, 12, 417, 417, 417, 417, 417, 417, 417, 418, 122, 123, 123, 123, 123, 130, 12, 12, 419, 419, 419, 419, 420, 419, 419, 419, 419, 419, 419, 421, 422, 423, 424, 425, 422, 422, 422, 425, 422, 422, 426, 12, 427, 427, 427, 427, 427, 427, 428, 12, 427, 427, 429, 12, 430, 431, 430, 432, 432, 430, 430, 430, 430, 430, 433, 430, 433, 431, 434, 430, 430, 432, 432, 434, 435, 436, 12, 431, 430, 437, 430, 435, 430, 435, 12, 12, 438, 438, 438, 438, 438, 438, 12, 12, 438, 438, 439, 12, 440, 440, 440, 440, 440, 441, 440, 440, 440, 440, 440, 441, 442, 442, 442, 442, 442, 443, 12, 12, 442, 442, 444, 12, 445, 445, 445, 445, 445, 445, 12, 12, 445, 445, 446, 12, 447, 447, 447, 447, 447, 447, 448, 449, 447, 447, 447, 12, 450, 450, 450, 450, 451, 12, 12, 452, 453, 453, 453, 453, 453, 453, 454, 12, 455, 455, 455, 455, 455, 455, 456, 12, 455, 455, 455, 457, 455, 458, 12, 12, 455, 12, 12, 12, 459, 459, 459, 459, 459, 459, 459, 460, 461, 461, 461, 461, 461, 462, 12, 12, 277, 277, 463, 12, 464, 464, 464, 464, 464, 464, 464, 465, 464, 464, 466, 467, 468, 468, 468, 468, 468, 468, 468, 469, 468, 469, 12, 12, 470, 470, 470, 470, 470, 471, 12, 12, 470, 470, 472, 470, 472, 470, 470, 470, 470, 470, 12, 473, 474, 474, 474, 474, 474, 475, 12, 12, 474, 474, 474, 476, 12, 12, 12, 477, 478, 12, 12, 12, 479, 479, 479, 479, 479, 479, 480, 12, 479, 479, 479, 481, 479, 479, 481, 12, 479, 479, 482, 479, 0, 241, 12, 12, 0, 234, 242, 0, 0, 483, 230, 0, 0, 0, 483, 7, 214, 484, 7, 0, 0, 0, 485, 230, 0, 0, 486, 12, 8, 227, 12, 12, 0, 0, 0, 231, 487, 488, 242, 231, 0, 0, 489, 242, 0, 242, 0, 0, 0, 489, 234, 242, 0, 231, 0, 231, 0, 0, 489, 234, 0, 490, 240, 0, 231, 0, 0, 0, 0, 0, 0, 240, 491, 491, 491, 491, 491, 491, 491, 12, 12, 12, 492, 491, 493, 491, 491, 491, 494, 494, 494, 494, 494, 495, 494, 494, 494, 496, 12, 12, 29, 497, 29, 29, 498, 499, 497, 29, 403, 29, 500, 12, 501, 51, 500, 497, 498, 499, 500, 500, 498, 499, 403, 29, 403, 29, 497, 502, 29, 29, 503, 29, 29, 29, 29, 12, 497, 497, 503, 29, 0, 0, 0, 486, 12, 240, 0, 0, 504, 12, 12, 12, 0, 0, 489, 0, 486, 12, 12, 12, 0, 486, 12, 12, 0, 0, 12, 12, 0, 0, 0, 241, 249, 505, 12, 12, 249, 506, 12, 12, 251, 12, 12, 12, 507, 12, 12, 12, }; static RE_UINT8 re_script_stage_5[] = { 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 35, 35, 41, 41, 41, 41, 3, 3, 3, 3, 1, 3, 3, 3, 0, 0, 3, 3, 3, 3, 1, 3, 0, 0, 0, 0, 3, 1, 3, 1, 3, 3, 3, 0, 3, 0, 3, 3, 3, 3, 0, 3, 3, 3, 55, 55, 55, 55, 55, 55, 4, 4, 4, 4, 4, 41, 41, 4, 0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 0, 0, 1, 5, 0, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 6, 0, 0, 0, 7, 7, 7, 7, 7, 1, 7, 7, 1, 7, 7, 7, 7, 7, 7, 1, 1, 0, 7, 1, 7, 7, 7, 41, 41, 41, 7, 7, 41, 7, 7, 7, 8, 8, 8, 8, 8, 8, 0, 8, 8, 8, 8, 0, 0, 8, 8, 8, 9, 9, 9, 9, 9, 9, 0, 0, 66, 66, 66, 66, 66, 66, 66, 0, 82, 82, 82, 82, 82, 82, 0, 0, 82, 82, 82, 0, 95, 95, 95, 95, 0, 0, 95, 0, 7, 0, 0, 0, 0, 0, 0, 7, 10, 10, 10, 10, 10, 41, 41, 10, 1, 1, 10, 10, 11, 11, 11, 11, 0, 11, 11, 11, 11, 0, 0, 11, 11, 0, 11, 11, 11, 0, 11, 0, 0, 0, 11, 11, 11, 11, 0, 0, 11, 11, 11, 0, 0, 0, 0, 11, 11, 11, 0, 11, 0, 12, 12, 12, 12, 12, 12, 0, 0, 0, 0, 12, 12, 0, 0, 12, 12, 12, 12, 12, 12, 0, 12, 12, 0, 12, 12, 0, 12, 12, 0, 0, 0, 12, 0, 0, 12, 0, 12, 0, 0, 0, 12, 12, 0, 13, 13, 13, 13, 13, 13, 13, 13, 13, 0, 13, 13, 0, 13, 13, 13, 13, 0, 0, 13, 0, 0, 0, 0, 0, 13, 13, 0, 13, 0, 0, 0, 14, 14, 14, 14, 14, 14, 14, 14, 0, 0, 14, 14, 0, 14, 14, 14, 14, 0, 0, 0, 0, 14, 14, 14, 14, 0, 14, 0, 0, 15, 15, 0, 15, 15, 15, 15, 15, 15, 0, 15, 0, 15, 15, 15, 15, 0, 0, 0, 15, 15, 0, 0, 0, 0, 15, 15, 0, 0, 0, 15, 15, 15, 15, 16, 16, 16, 16, 0, 16, 16, 16, 16, 0, 16, 16, 16, 16, 0, 0, 0, 16, 16, 0, 16, 16, 16, 0, 0, 0, 16, 16, 0, 17, 17, 17, 17, 17, 17, 17, 17, 0, 17, 17, 17, 17, 0, 0, 0, 17, 17, 0, 0, 0, 17, 0, 0, 0, 17, 17, 0, 18, 18, 18, 18, 18, 18, 18, 18, 0, 18, 18, 18, 18, 18, 0, 0, 0, 0, 18, 0, 0, 18, 18, 18, 18, 0, 0, 0, 0, 19, 19, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 19, 19, 0, 19, 0, 19, 0, 0, 0, 0, 19, 0, 0, 0, 0, 19, 19, 0, 19, 0, 19, 0, 0, 0, 0, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 0, 0, 0, 0, 1, 0, 21, 21, 0, 21, 0, 0, 21, 21, 0, 21, 0, 0, 21, 0, 0, 21, 21, 21, 21, 0, 21, 21, 21, 0, 21, 0, 21, 0, 0, 21, 21, 21, 21, 0, 21, 21, 21, 0, 0, 22, 22, 22, 22, 0, 22, 22, 22, 22, 0, 0, 0, 22, 0, 22, 22, 22, 1, 1, 1, 1, 22, 22, 0, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 0, 24, 0, 24, 0, 0, 24, 24, 24, 1, 25, 25, 25, 25, 26, 26, 26, 26, 26, 0, 26, 26, 26, 26, 0, 0, 26, 26, 26, 0, 0, 26, 26, 26, 26, 0, 0, 0, 27, 27, 27, 27, 27, 27, 0, 0, 28, 28, 28, 28, 29, 29, 29, 29, 29, 0, 0, 0, 30, 30, 30, 30, 30, 30, 30, 1, 1, 1, 30, 30, 30, 0, 0, 0, 42, 42, 42, 42, 42, 0, 42, 42, 42, 0, 0, 0, 43, 43, 43, 43, 43, 1, 1, 0, 44, 44, 44, 44, 45, 45, 45, 45, 45, 0, 45, 45, 31, 31, 31, 31, 31, 31, 0, 0, 32, 32, 1, 1, 32, 1, 32, 32, 32, 32, 32, 32, 32, 32, 32, 0, 32, 32, 0, 0, 28, 28, 0, 0, 46, 46, 46, 46, 46, 46, 46, 0, 46, 0, 0, 0, 47, 47, 47, 47, 47, 47, 0, 0, 47, 0, 0, 0, 56, 56, 56, 56, 56, 56, 0, 0, 56, 56, 56, 0, 0, 0, 56, 56, 54, 54, 54, 54, 0, 0, 54, 54, 78, 78, 78, 78, 78, 78, 78, 0, 78, 0, 0, 78, 78, 78, 0, 0, 41, 41, 41, 0, 62, 62, 62, 62, 62, 0, 0, 0, 67, 67, 67, 67, 93, 93, 93, 93, 68, 68, 68, 68, 0, 0, 0, 68, 68, 68, 0, 0, 0, 68, 68, 68, 69, 69, 69, 69, 41, 41, 41, 1, 41, 1, 41, 41, 41, 1, 1, 1, 1, 41, 1, 1, 41, 1, 1, 0, 41, 41, 0, 0, 2, 2, 3, 3, 3, 3, 3, 4, 2, 3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 2, 4, 2, 2, 2, 2, 2, 2, 3, 3, 3, 0, 0, 0, 3, 0, 3, 0, 3, 3, 3, 41, 41, 1, 1, 1, 0, 1, 1, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 1, 0, 2, 0, 0, 0, 41, 0, 0, 0, 1, 1, 3, 1, 1, 1, 2, 2, 53, 53, 53, 53, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 57, 57, 57, 57, 57, 57, 57, 0, 0, 55, 55, 55, 58, 58, 58, 58, 0, 0, 0, 58, 58, 0, 0, 0, 36, 36, 36, 36, 36, 36, 0, 36, 36, 36, 0, 0, 1, 36, 1, 36, 1, 36, 36, 36, 36, 36, 41, 41, 41, 41, 25, 25, 0, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 0, 0, 41, 41, 1, 1, 33, 33, 33, 1, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 1, 0, 35, 35, 35, 35, 35, 35, 35, 35, 35, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 35, 35, 35, 0, 25, 25, 25, 1, 34, 34, 34, 0, 37, 37, 37, 37, 37, 0, 0, 0, 37, 37, 37, 0, 83, 83, 83, 83, 70, 70, 70, 70, 84, 84, 84, 84, 2, 2, 0, 0, 0, 0, 0, 2, 59, 59, 59, 59, 65, 65, 65, 65, 71, 71, 71, 71, 71, 0, 0, 0, 0, 0, 71, 71, 71, 71, 0, 0, 10, 10, 0, 0, 72, 72, 72, 72, 72, 72, 1, 72, 73, 73, 73, 73, 0, 0, 0, 73, 25, 0, 0, 0, 85, 85, 85, 85, 85, 85, 0, 1, 85, 85, 0, 0, 0, 0, 85, 85, 23, 23, 23, 0, 77, 77, 77, 77, 77, 77, 77, 0, 77, 77, 0, 0, 79, 79, 79, 79, 79, 79, 79, 0, 0, 0, 0, 79, 86, 86, 86, 86, 86, 86, 86, 0, 2, 3, 0, 0, 86, 86, 0, 0, 0, 0, 0, 25, 2, 2, 2, 0, 0, 0, 0, 5, 6, 0, 6, 0, 6, 6, 0, 6, 6, 0, 6, 6, 7, 7, 0, 0, 7, 7, 1, 1, 0, 0, 7, 7, 41, 41, 4, 4, 7, 0, 7, 7, 7, 0, 0, 1, 1, 1, 34, 34, 34, 34, 1, 1, 0, 0, 25, 25, 48, 48, 48, 48, 0, 48, 48, 48, 48, 48, 48, 0, 48, 48, 0, 48, 48, 48, 0, 0, 3, 0, 0, 0, 1, 41, 0, 0, 74, 74, 74, 74, 74, 0, 0, 0, 75, 75, 75, 75, 75, 0, 0, 0, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 0, 120, 120, 120, 120, 120, 120, 120, 0, 49, 49, 49, 49, 49, 49, 0, 49, 60, 60, 60, 60, 60, 60, 0, 0, 40, 40, 40, 40, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 0, 0, 106, 106, 106, 106, 103, 103, 103, 103, 0, 0, 0, 103, 110, 110, 110, 110, 110, 110, 110, 0, 110, 110, 0, 0, 52, 52, 52, 52, 52, 52, 0, 0, 52, 0, 52, 52, 52, 52, 0, 52, 52, 0, 0, 0, 52, 0, 0, 52, 87, 87, 87, 87, 87, 87, 0, 87, 118, 118, 118, 118, 117, 117, 117, 117, 117, 117, 117, 0, 0, 0, 0, 117, 128, 128, 128, 128, 128, 128, 128, 0, 128, 128, 0, 0, 0, 0, 0, 128, 64, 64, 64, 64, 0, 0, 0, 64, 76, 76, 76, 76, 76, 76, 0, 0, 0, 0, 0, 76, 98, 98, 98, 98, 97, 97, 97, 97, 0, 0, 97, 97, 61, 61, 61, 61, 0, 61, 61, 0, 0, 61, 61, 61, 61, 61, 61, 0, 0, 0, 0, 61, 61, 0, 0, 0, 88, 88, 88, 88, 116, 116, 116, 116, 112, 112, 112, 112, 112, 112, 112, 0, 0, 0, 0, 112, 80, 80, 80, 80, 80, 80, 0, 0, 0, 80, 80, 80, 89, 89, 89, 89, 89, 89, 0, 0, 90, 90, 90, 90, 90, 90, 90, 0, 121, 121, 121, 121, 121, 121, 0, 0, 0, 121, 121, 121, 121, 0, 0, 0, 91, 91, 91, 91, 91, 0, 0, 0, 130, 130, 130, 130, 130, 130, 130, 0, 0, 0, 130, 130, 7, 7, 7, 0, 94, 94, 94, 94, 94, 94, 0, 0, 0, 0, 94, 94, 0, 0, 0, 94, 92, 92, 92, 92, 92, 92, 0, 0, 101, 101, 101, 101, 101, 0, 0, 0, 101, 101, 0, 0, 96, 96, 96, 96, 96, 0, 96, 96, 111, 111, 111, 111, 111, 111, 111, 0, 100, 100, 100, 100, 100, 100, 0, 0, 109, 109, 109, 109, 109, 109, 0, 109, 109, 109, 0, 0, 129, 129, 129, 129, 129, 129, 129, 0, 129, 0, 129, 129, 129, 129, 0, 129, 129, 129, 0, 0, 123, 123, 123, 123, 123, 123, 123, 0, 123, 123, 0, 0, 107, 107, 107, 107, 0, 107, 107, 107, 107, 0, 0, 107, 107, 0, 107, 107, 107, 107, 0, 0, 107, 0, 0, 0, 0, 0, 0, 107, 0, 0, 107, 107, 124, 124, 124, 124, 124, 124, 0, 0, 122, 122, 122, 122, 122, 122, 0, 0, 114, 114, 114, 114, 114, 0, 0, 0, 114, 114, 0, 0, 102, 102, 102, 102, 102, 102, 0, 0, 126, 126, 126, 126, 126, 126, 0, 0, 0, 126, 126, 126, 125, 125, 125, 125, 125, 125, 125, 0, 0, 0, 0, 125, 119, 119, 119, 119, 119, 0, 0, 0, 63, 63, 63, 63, 63, 63, 0, 0, 63, 63, 63, 0, 63, 0, 0, 0, 81, 81, 81, 81, 81, 81, 81, 0, 127, 127, 127, 127, 127, 127, 127, 0, 84, 0, 0, 0, 115, 115, 115, 115, 115, 115, 115, 0, 115, 115, 0, 0, 0, 0, 115, 115, 104, 104, 104, 104, 104, 104, 0, 0, 108, 108, 108, 108, 108, 108, 0, 0, 108, 108, 0, 108, 0, 108, 108, 108, 99, 99, 99, 99, 99, 0, 0, 0, 99, 99, 99, 0, 0, 0, 0, 99, 34, 33, 0, 0, 105, 105, 105, 105, 105, 105, 105, 0, 105, 0, 0, 0, 105, 105, 0, 0, 1, 1, 1, 41, 1, 41, 41, 41, 1, 1, 41, 41, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 131, 131, 131, 131, 0, 0, 0, 131, 0, 131, 131, 131, 113, 113, 113, 113, 113, 0, 0, 113, 113, 113, 113, 0, 0, 7, 7, 7, 0, 7, 7, 0, 7, 0, 0, 7, 0, 7, 0, 7, 0, 0, 7, 0, 7, 0, 7, 0, 7, 7, 0, 7, 33, 1, 1, 0, 36, 36, 36, 0, 36, 0, 0, 0, 0, 1, 0, 0, }; /* Script: 10928 bytes. */ RE_UINT32 re_get_script(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 11; code = ch ^ (f << 11); pos = (RE_UINT32)re_script_stage_1[f] << 4; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_script_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_script_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_script_stage_4[pos + f] << 2; value = re_script_stage_5[pos + code]; return value; } /* Word_Break. */ static RE_UINT8 re_word_break_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 5, 6, 6, 7, 4, 8, 9, 10, 11, 12, 13, 4, 14, 4, 4, 4, 4, 15, 4, 16, 17, 18, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 19, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_word_break_stage_2[] = { 0, 1, 2, 2, 2, 3, 4, 5, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 2, 2, 31, 32, 33, 34, 35, 2, 2, 2, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 2, 50, 2, 2, 51, 52, 53, 54, 55, 56, 57, 57, 57, 57, 57, 58, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 59, 60, 61, 62, 63, 57, 57, 57, 64, 65, 66, 67, 57, 68, 69, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 2, 2, 2, 2, 2, 70, 2, 2, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 83, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 84, 85, 2, 2, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 57, 96, 97, 98, 2, 99, 100, 57, 2, 2, 101, 57, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 57, 57, 57, 57, 57, 57, 112, 113, 114, 115, 116, 117, 118, 57, 57, 119, 57, 120, 121, 122, 123, 57, 57, 124, 57, 57, 57, 125, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 2, 2, 2, 126, 127, 2, 128, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 2, 2, 2, 2, 129, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 130, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 2, 2, 2, 131, 132, 133, 134, 57, 57, 57, 57, 57, 57, 135, 136, 137, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 138, 139, 57, 57, 57, 57, 57, 57, 57, 57, 140, 141, 142, 57, 57, 57, 143, 144, 145, 2, 2, 146, 147, 148, 57, 57, 57, 57, 149, 150, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 2, 151, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 152, 153, 57, 57, 57, 57, 154, 155, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 156, 57, 157, 158, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, }; static RE_UINT8 re_word_break_stage_3[] = { 0, 1, 0, 0, 2, 3, 4, 5, 6, 7, 7, 8, 6, 7, 7, 9, 10, 0, 0, 0, 0, 11, 12, 13, 7, 7, 14, 7, 7, 7, 14, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 15, 7, 16, 0, 17, 18, 0, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 21, 22, 23, 7, 7, 24, 7, 7, 7, 7, 7, 7, 7, 7, 7, 25, 7, 26, 27, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 7, 14, 28, 6, 7, 7, 7, 7, 29, 30, 19, 19, 19, 19, 31, 32, 0, 33, 33, 33, 34, 35, 0, 36, 37, 19, 38, 7, 7, 7, 7, 7, 39, 19, 19, 4, 40, 41, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 42, 43, 44, 45, 4, 46, 0, 47, 48, 7, 7, 7, 19, 19, 19, 49, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 50, 19, 51, 0, 4, 52, 7, 7, 7, 39, 53, 54, 7, 7, 50, 55, 56, 57, 0, 0, 7, 7, 7, 58, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 17, 0, 0, 0, 0, 0, 59, 19, 19, 19, 60, 7, 7, 7, 7, 7, 7, 61, 19, 19, 62, 7, 63, 4, 6, 7, 64, 65, 66, 7, 7, 67, 68, 69, 70, 71, 72, 73, 63, 4, 74, 0, 75, 76, 66, 7, 7, 67, 77, 78, 79, 80, 81, 82, 83, 4, 84, 0, 75, 25, 24, 7, 7, 67, 85, 69, 31, 86, 87, 0, 63, 4, 0, 28, 75, 65, 66, 7, 7, 67, 85, 69, 70, 80, 88, 73, 63, 4, 28, 0, 89, 90, 91, 92, 93, 90, 7, 94, 95, 96, 97, 0, 83, 4, 0, 0, 98, 20, 67, 7, 7, 67, 7, 99, 100, 96, 101, 9, 63, 4, 0, 0, 75, 20, 67, 7, 7, 67, 102, 69, 100, 96, 101, 103, 63, 4, 104, 0, 75, 20, 67, 7, 7, 7, 7, 105, 100, 106, 72, 107, 63, 4, 0, 108, 109, 7, 14, 108, 7, 7, 24, 110, 14, 111, 112, 19, 83, 4, 113, 0, 0, 0, 0, 0, 0, 0, 114, 115, 72, 116, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 114, 118, 0, 119, 4, 117, 0, 0, 0, 0, 87, 0, 0, 120, 4, 117, 121, 122, 7, 6, 7, 7, 7, 17, 30, 19, 100, 123, 19, 30, 19, 19, 19, 124, 125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 19, 116, 4, 117, 88, 126, 127, 119, 128, 0, 129, 31, 4, 130, 7, 7, 7, 7, 25, 131, 7, 7, 7, 7, 7, 132, 7, 7, 7, 7, 7, 7, 7, 7, 7, 91, 14, 91, 7, 7, 7, 7, 7, 91, 7, 7, 7, 7, 91, 14, 91, 7, 14, 7, 7, 7, 7, 7, 7, 7, 91, 7, 7, 7, 7, 7, 7, 7, 7, 133, 0, 0, 0, 0, 7, 7, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 134, 134, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 65, 7, 7, 6, 7, 7, 9, 7, 7, 7, 7, 7, 7, 7, 7, 7, 90, 7, 87, 7, 20, 135, 0, 7, 7, 135, 0, 7, 7, 136, 0, 7, 20, 137, 0, 0, 0, 0, 0, 0, 0, 138, 19, 19, 19, 139, 140, 4, 117, 0, 0, 0, 141, 4, 117, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 0, 7, 7, 7, 7, 7, 142, 7, 7, 7, 7, 7, 7, 7, 7, 134, 0, 7, 7, 7, 14, 19, 139, 19, 139, 83, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 117, 0, 0, 0, 0, 7, 7, 143, 139, 0, 0, 0, 0, 0, 0, 144, 116, 19, 19, 19, 70, 4, 117, 4, 117, 0, 0, 19, 116, 0, 0, 0, 0, 0, 0, 0, 0, 145, 7, 7, 7, 7, 7, 146, 19, 145, 147, 4, 117, 0, 59, 139, 0, 148, 7, 7, 7, 62, 149, 4, 52, 7, 7, 7, 7, 50, 19, 139, 0, 7, 7, 7, 7, 146, 19, 19, 0, 4, 150, 4, 52, 7, 7, 7, 134, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151, 19, 19, 152, 153, 120, 7, 7, 7, 7, 7, 7, 7, 7, 19, 19, 19, 19, 19, 19, 119, 138, 7, 7, 134, 134, 7, 7, 7, 7, 134, 134, 7, 154, 7, 7, 7, 134, 7, 7, 7, 7, 7, 7, 20, 155, 156, 17, 157, 147, 7, 17, 156, 17, 0, 158, 0, 159, 160, 161, 0, 162, 163, 0, 164, 0, 165, 166, 28, 107, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 19, 19, 19, 19, 167, 0, 168, 108, 110, 169, 18, 170, 7, 171, 172, 173, 0, 0, 7, 7, 7, 7, 7, 87, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 174, 7, 7, 7, 7, 7, 7, 74, 0, 0, 7, 7, 7, 7, 7, 14, 7, 7, 7, 7, 7, 14, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 17, 175, 176, 0, 7, 7, 7, 7, 25, 131, 7, 7, 7, 7, 7, 7, 7, 107, 0, 72, 7, 7, 14, 0, 14, 14, 14, 14, 14, 14, 14, 14, 19, 19, 19, 19, 0, 0, 0, 0, 0, 107, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 131, 0, 0, 0, 0, 129, 177, 93, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 178, 179, 179, 179, 179, 179, 179, 179, 179, 179, 179, 179, 180, 172, 7, 7, 7, 7, 134, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 0, 0, 7, 7, 7, 9, 0, 0, 0, 0, 0, 0, 179, 179, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 179, 179, 179, 179, 179, 181, 179, 179, 179, 179, 179, 179, 179, 179, 179, 179, 179, 0, 0, 0, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 134, 7, 17, 7, 7, 4, 182, 0, 0, 7, 7, 7, 7, 7, 143, 151, 183, 7, 7, 7, 50, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 120, 0, 0, 0, 107, 7, 108, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 66, 7, 7, 7, 134, 7, 0, 0, 0, 0, 0, 0, 0, 107, 7, 184, 185, 7, 7, 39, 0, 0, 0, 7, 7, 7, 7, 7, 7, 147, 0, 27, 7, 7, 7, 7, 7, 146, 19, 124, 0, 4, 117, 19, 19, 27, 186, 4, 52, 7, 7, 50, 119, 7, 7, 143, 19, 139, 0, 7, 7, 7, 17, 60, 7, 7, 7, 7, 7, 39, 19, 167, 107, 4, 117, 140, 0, 4, 117, 7, 7, 7, 7, 7, 62, 116, 0, 185, 187, 4, 117, 0, 0, 0, 188, 0, 0, 0, 0, 0, 0, 127, 189, 81, 0, 0, 0, 7, 39, 190, 0, 191, 191, 191, 0, 14, 14, 7, 7, 7, 7, 7, 132, 134, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 39, 192, 4, 117, 7, 7, 7, 7, 147, 0, 7, 7, 14, 193, 7, 7, 7, 7, 7, 147, 14, 0, 193, 194, 33, 195, 196, 197, 198, 33, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 74, 0, 0, 0, 193, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 134, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 108, 7, 7, 7, 7, 7, 7, 0, 0, 0, 0, 0, 7, 147, 19, 19, 199, 0, 19, 19, 200, 0, 0, 201, 202, 0, 0, 0, 20, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 203, 204, 3, 0, 205, 6, 7, 7, 8, 6, 7, 7, 9, 206, 179, 179, 179, 179, 179, 179, 207, 7, 7, 7, 14, 108, 108, 108, 208, 0, 0, 0, 209, 7, 102, 7, 7, 14, 7, 7, 210, 7, 134, 7, 134, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 140, 7, 7, 7, 17, 7, 7, 7, 7, 7, 7, 87, 0, 167, 0, 0, 0, 7, 7, 7, 7, 0, 0, 7, 7, 7, 9, 7, 7, 7, 7, 50, 115, 7, 7, 7, 134, 7, 7, 7, 7, 147, 7, 169, 0, 0, 0, 0, 0, 7, 7, 7, 134, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 0, 7, 7, 7, 7, 7, 7, 147, 0, 0, 0, 7, 7, 7, 7, 7, 7, 14, 0, 7, 7, 134, 0, 7, 0, 0, 0, 134, 67, 7, 7, 7, 7, 25, 211, 7, 7, 134, 0, 7, 7, 14, 0, 7, 7, 7, 14, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 212, 0, 7, 7, 134, 0, 7, 7, 7, 74, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 174, 0, 0, 0, 0, 0, 0, 0, 0, 213, 138, 102, 6, 7, 7, 147, 79, 0, 0, 0, 0, 7, 7, 7, 17, 7, 7, 7, 17, 0, 0, 0, 0, 7, 6, 7, 7, 214, 0, 0, 0, 7, 7, 7, 7, 7, 7, 134, 0, 7, 7, 134, 0, 7, 7, 9, 0, 7, 7, 74, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 87, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 9, 0, 7, 7, 7, 7, 7, 7, 9, 0, 148, 7, 7, 7, 7, 7, 7, 19, 116, 0, 0, 0, 83, 4, 0, 72, 148, 7, 7, 7, 7, 7, 19, 215, 0, 0, 7, 7, 7, 87, 4, 117, 148, 7, 7, 7, 143, 19, 216, 4, 0, 0, 7, 7, 7, 7, 217, 0, 148, 7, 7, 7, 7, 7, 39, 19, 218, 219, 4, 220, 0, 0, 0, 0, 7, 7, 24, 7, 7, 146, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 170, 7, 25, 7, 87, 7, 7, 7, 7, 7, 143, 19, 115, 4, 117, 98, 65, 66, 7, 7, 67, 85, 69, 70, 80, 97, 172, 221, 124, 124, 0, 7, 7, 7, 7, 7, 7, 19, 19, 222, 0, 4, 117, 0, 0, 0, 0, 7, 7, 7, 7, 7, 143, 119, 19, 167, 0, 0, 187, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 19, 19, 223, 0, 4, 117, 0, 0, 0, 0, 7, 7, 7, 7, 7, 39, 19, 0, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 144, 19, 139, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 4, 117, 0, 107, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 87, 7, 7, 7, 74, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 147, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 14, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 87, 7, 7, 7, 14, 4, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 134, 124, 0, 7, 7, 7, 7, 7, 7, 116, 0, 147, 0, 4, 117, 193, 7, 7, 172, 7, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 17, 0, 62, 19, 19, 19, 19, 116, 0, 72, 148, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 224, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 7, 17, 7, 87, 7, 225, 226, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 144, 227, 228, 229, 230, 139, 0, 0, 0, 231, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 219, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 20, 7, 7, 7, 7, 7, 7, 7, 7, 20, 232, 233, 7, 234, 102, 7, 7, 7, 7, 7, 7, 7, 25, 235, 20, 20, 7, 7, 7, 236, 155, 108, 67, 7, 7, 7, 7, 7, 7, 7, 7, 7, 134, 7, 7, 7, 67, 7, 7, 132, 7, 7, 7, 132, 7, 7, 20, 7, 7, 7, 20, 7, 7, 14, 7, 7, 7, 14, 7, 7, 7, 67, 7, 7, 7, 67, 7, 7, 132, 237, 4, 4, 4, 4, 4, 4, 19, 19, 19, 19, 19, 19, 116, 59, 19, 19, 19, 19, 19, 124, 140, 0, 238, 0, 0, 59, 30, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 17, 0, 116, 0, 0, 0, 0, 0, 102, 7, 7, 7, 239, 6, 132, 240, 168, 241, 239, 154, 239, 132, 132, 82, 7, 24, 7, 147, 242, 24, 7, 147, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 74, 7, 7, 7, 74, 7, 7, 7, 74, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 243, 244, 244, 244, 245, 0, 0, 0, 166, 166, 166, 166, 166, 166, 166, 166, 166, 166, 166, 166, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 0, }; static RE_UINT8 re_word_break_stage_4[] = { 0, 0, 1, 2, 3, 4, 0, 5, 6, 6, 7, 0, 8, 9, 9, 9, 10, 11, 10, 0, 0, 12, 13, 14, 0, 15, 13, 0, 9, 10, 16, 17, 16, 18, 9, 19, 0, 20, 21, 21, 9, 22, 17, 23, 0, 24, 10, 22, 25, 9, 9, 25, 26, 21, 27, 9, 28, 0, 29, 0, 30, 21, 21, 31, 32, 31, 33, 33, 34, 0, 35, 36, 37, 38, 0, 39, 40, 41, 42, 21, 43, 44, 45, 9, 9, 46, 21, 47, 21, 48, 49, 27, 50, 51, 0, 52, 53, 9, 40, 8, 9, 54, 55, 0, 50, 9, 21, 16, 56, 0, 57, 21, 21, 58, 58, 59, 58, 0, 60, 21, 21, 9, 54, 61, 58, 21, 54, 62, 58, 8, 9, 51, 51, 9, 22, 9, 20, 17, 16, 61, 21, 63, 63, 64, 0, 60, 0, 25, 16, 0, 30, 8, 10, 65, 22, 66, 16, 49, 40, 60, 63, 59, 67, 0, 8, 20, 0, 62, 27, 68, 22, 8, 31, 59, 19, 0, 0, 69, 70, 8, 10, 17, 22, 16, 66, 22, 65, 19, 16, 69, 40, 69, 49, 59, 19, 60, 21, 8, 16, 46, 21, 49, 0, 32, 9, 8, 0, 13, 66, 0, 10, 46, 49, 64, 0, 65, 17, 9, 69, 8, 9, 28, 71, 60, 21, 72, 69, 0, 67, 21, 40, 0, 21, 40, 73, 0, 31, 74, 21, 59, 59, 0, 0, 75, 67, 69, 9, 58, 21, 74, 0, 71, 59, 69, 49, 63, 30, 74, 69, 21, 76, 59, 0, 28, 10, 9, 10, 30, 9, 16, 54, 74, 54, 0, 77, 0, 0, 21, 21, 0, 0, 67, 60, 78, 79, 0, 9, 42, 0, 30, 21, 45, 9, 21, 9, 0, 80, 9, 21, 27, 73, 8, 40, 21, 45, 53, 54, 81, 82, 82, 9, 20, 17, 22, 9, 17, 0, 83, 84, 0, 0, 85, 86, 87, 0, 11, 88, 89, 0, 88, 37, 90, 37, 37, 74, 0, 13, 65, 8, 16, 22, 25, 16, 9, 0, 8, 16, 13, 0, 17, 65, 42, 27, 0, 91, 92, 93, 94, 95, 95, 96, 95, 95, 96, 50, 0, 21, 97, 98, 98, 42, 9, 65, 28, 9, 59, 60, 59, 74, 69, 17, 99, 8, 10, 40, 59, 65, 9, 0, 100, 101, 33, 33, 34, 33, 102, 103, 101, 104, 89, 11, 88, 0, 105, 5, 106, 9, 107, 0, 108, 109, 0, 0, 110, 95, 111, 17, 19, 112, 0, 10, 25, 19, 51, 10, 16, 58, 32, 9, 99, 40, 14, 21, 113, 42, 13, 45, 19, 69, 74, 114, 19, 54, 69, 21, 25, 74, 19, 94, 0, 16, 32, 37, 0, 59, 30, 115, 37, 116, 21, 40, 30, 69, 59, 13, 66, 8, 22, 25, 8, 10, 8, 25, 10, 9, 62, 0, 74, 66, 51, 82, 0, 82, 8, 8, 8, 0, 117, 118, 118, 14, 0, }; static RE_UINT8 re_word_break_stage_5[] = { 0, 0, 0, 0, 0, 0, 5, 6, 6, 4, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 13, 0, 14, 0, 15, 15, 15, 15, 15, 15, 12, 13, 0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 0, 0, 0, 0, 16, 0, 6, 0, 0, 0, 0, 11, 0, 0, 9, 0, 0, 0, 11, 0, 12, 11, 11, 0, 0, 0, 0, 11, 11, 0, 0, 0, 12, 11, 0, 0, 0, 11, 0, 11, 0, 7, 7, 7, 7, 11, 0, 11, 11, 11, 11, 13, 11, 0, 0, 11, 12, 11, 11, 0, 11, 11, 11, 0, 7, 7, 7, 11, 11, 0, 11, 0, 0, 0, 13, 0, 0, 0, 7, 7, 7, 7, 7, 0, 7, 0, 7, 7, 0, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 11, 12, 0, 0, 0, 9, 9, 9, 9, 9, 9, 0, 0, 13, 13, 0, 0, 7, 7, 7, 0, 9, 0, 0, 0, 11, 11, 11, 7, 15, 15, 0, 15, 13, 0, 11, 11, 7, 11, 11, 11, 0, 11, 7, 7, 7, 9, 0, 7, 7, 11, 11, 7, 7, 0, 7, 7, 15, 15, 11, 11, 11, 0, 0, 11, 0, 0, 0, 9, 11, 7, 11, 11, 11, 11, 7, 7, 7, 11, 0, 0, 13, 0, 11, 0, 7, 7, 11, 7, 11, 7, 7, 7, 7, 7, 0, 0, 0, 0, 0, 7, 7, 11, 7, 7, 0, 0, 15, 15, 7, 0, 0, 7, 7, 7, 11, 0, 0, 0, 0, 11, 0, 11, 11, 0, 0, 7, 0, 0, 11, 7, 0, 0, 0, 0, 7, 7, 0, 0, 7, 11, 0, 0, 7, 0, 7, 0, 7, 0, 15, 15, 0, 0, 7, 0, 0, 0, 0, 7, 0, 7, 15, 15, 7, 7, 11, 0, 7, 7, 7, 7, 9, 0, 11, 7, 11, 0, 7, 7, 7, 11, 7, 11, 11, 0, 0, 11, 0, 11, 7, 7, 9, 9, 14, 14, 0, 0, 14, 0, 0, 12, 6, 6, 9, 9, 9, 9, 9, 0, 16, 0, 0, 0, 13, 0, 0, 0, 9, 0, 9, 9, 0, 10, 10, 10, 10, 10, 0, 0, 0, 7, 7, 10, 10, 0, 0, 0, 10, 10, 10, 10, 10, 10, 10, 0, 7, 7, 0, 11, 11, 11, 7, 11, 11, 7, 7, 0, 0, 3, 7, 3, 3, 0, 3, 3, 3, 0, 3, 0, 3, 3, 0, 3, 13, 0, 0, 12, 0, 16, 16, 16, 13, 12, 0, 0, 11, 0, 0, 9, 0, 0, 0, 14, 0, 0, 12, 13, 0, 0, 10, 10, 10, 10, 7, 7, 0, 9, 9, 9, 7, 0, 15, 15, 15, 15, 11, 0, 7, 7, 7, 9, 9, 9, 9, 7, 0, 0, 8, 8, 8, 8, 8, 8, }; /* Word_Break: 4424 bytes. */ RE_UINT32 re_get_word_break(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_word_break_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_word_break_stage_2[pos + f] << 4; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_word_break_stage_3[pos + f] << 1; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_word_break_stage_4[pos + f] << 2; value = re_word_break_stage_5[pos + code]; return value; } /* Grapheme_Cluster_Break. */ static RE_UINT8 re_grapheme_cluster_break_stage_1[] = { 0, 1, 2, 2, 2, 3, 4, 5, 6, 2, 2, 7, 2, 8, 9, 10, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 11, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_grapheme_cluster_break_stage_2[] = { 0, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 1, 1, 1, 18, 19, 20, 21, 22, 23, 24, 1, 1, 25, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 26, 27, 1, 1, 28, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 29, 1, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 35, 36, 37, 38, 39, 40, 34, 41, 42, 42, 42, 42, 42, 42, 42, 42, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 43, 1, 1, 44, 45, 1, 46, 47, 48, 1, 1, 1, 1, 1, 1, 49, 1, 1, 1, 1, 1, 50, 51, 52, 53, 54, 55, 56, 57, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 58, 59, 1, 1, 1, 60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 61, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 62, 63, 1, 1, 1, 1, 1, 1, 1, 64, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 65, 1, 1, 1, 1, 1, 1, 1, 1, 66, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 42, 67, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_grapheme_cluster_break_stage_3[] = { 0, 1, 2, 2, 2, 2, 2, 3, 1, 1, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 6, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 5, 8, 9, 2, 2, 2, 10, 11, 2, 2, 12, 5, 2, 13, 2, 2, 2, 2, 2, 14, 15, 2, 3, 16, 2, 5, 17, 2, 2, 2, 2, 2, 18, 13, 2, 2, 12, 19, 2, 20, 21, 2, 2, 22, 2, 2, 2, 2, 2, 2, 2, 2, 23, 5, 24, 2, 2, 25, 26, 27, 28, 2, 29, 2, 2, 30, 31, 32, 28, 2, 33, 2, 2, 34, 35, 16, 2, 36, 33, 2, 2, 34, 37, 2, 28, 2, 29, 2, 2, 38, 31, 39, 28, 2, 40, 2, 2, 41, 42, 32, 2, 2, 43, 2, 2, 44, 45, 46, 28, 2, 29, 2, 2, 47, 48, 46, 28, 2, 29, 2, 2, 41, 49, 32, 28, 2, 50, 2, 2, 2, 51, 52, 2, 50, 2, 2, 2, 53, 54, 2, 2, 2, 2, 2, 2, 55, 56, 2, 2, 2, 2, 57, 2, 58, 2, 2, 2, 59, 60, 61, 5, 62, 63, 2, 2, 2, 2, 2, 64, 65, 2, 66, 13, 67, 68, 69, 2, 2, 2, 2, 2, 2, 70, 70, 70, 70, 70, 70, 71, 71, 71, 71, 72, 73, 73, 73, 73, 73, 2, 2, 2, 2, 2, 64, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 74, 2, 74, 2, 28, 2, 28, 2, 2, 2, 75, 76, 77, 2, 2, 78, 2, 2, 2, 2, 2, 2, 2, 2, 2, 79, 2, 2, 2, 2, 2, 2, 2, 80, 81, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 82, 2, 2, 2, 83, 84, 85, 2, 2, 2, 86, 2, 2, 2, 2, 87, 2, 2, 88, 89, 2, 12, 19, 90, 2, 91, 2, 2, 2, 92, 93, 2, 2, 94, 95, 2, 2, 2, 2, 2, 2, 2, 2, 2, 96, 97, 98, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 99, 100, 2, 101, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 5, 5, 13, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 102, 103, 2, 2, 2, 2, 2, 2, 2, 102, 2, 2, 2, 2, 2, 2, 5, 5, 2, 2, 104, 2, 2, 2, 2, 2, 2, 105, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 102, 106, 2, 44, 2, 2, 2, 2, 2, 103, 107, 2, 108, 2, 2, 2, 2, 2, 109, 2, 2, 110, 111, 2, 5, 103, 2, 2, 112, 2, 113, 93, 70, 114, 24, 2, 2, 115, 116, 2, 117, 2, 2, 2, 118, 119, 120, 2, 2, 121, 2, 2, 2, 122, 16, 2, 123, 124, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 125, 2, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 130, 128, 128, 129, 128, 130, 128, 126, 127, 128, 129, 128, 131, 71, 132, 73, 73, 133, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 134, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 2, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 44, 2, 2, 2, 2, 2, 135, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 69, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 13, 2, 2, 2, 2, 2, 2, 2, 2, 136, 2, 2, 2, 2, 2, 2, 2, 2, 137, 2, 2, 138, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 46, 2, 139, 2, 2, 140, 141, 2, 2, 102, 90, 2, 2, 142, 2, 2, 2, 2, 143, 2, 144, 145, 2, 2, 2, 146, 90, 2, 2, 147, 148, 2, 2, 2, 2, 2, 149, 150, 2, 2, 2, 2, 2, 2, 2, 2, 2, 102, 151, 2, 93, 2, 2, 30, 152, 32, 153, 145, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 154, 155, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 102, 156, 13, 157, 2, 2, 2, 2, 2, 158, 13, 2, 2, 2, 2, 2, 159, 160, 2, 2, 2, 2, 2, 64, 161, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 145, 2, 2, 2, 141, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 162, 163, 164, 102, 143, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 165, 166, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 167, 168, 169, 2, 170, 2, 2, 2, 2, 2, 2, 2, 2, 2, 74, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 171, 5, 5, 62, 117, 172, 12, 7, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 141, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 173, 174, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, }; static RE_UINT8 re_grapheme_cluster_break_stage_4[] = { 0, 0, 1, 2, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 6, 6, 6, 6, 7, 6, 8, 3, 9, 6, 6, 6, 6, 6, 6, 10, 11, 10, 3, 3, 0, 12, 3, 3, 6, 6, 13, 14, 3, 3, 7, 6, 15, 3, 3, 3, 3, 16, 6, 17, 6, 18, 19, 8, 20, 3, 3, 3, 6, 6, 13, 3, 3, 16, 6, 6, 6, 3, 3, 3, 3, 16, 10, 6, 6, 9, 9, 8, 3, 3, 9, 3, 7, 6, 6, 6, 21, 3, 3, 3, 3, 3, 22, 23, 24, 6, 25, 26, 9, 6, 3, 3, 16, 3, 3, 3, 27, 3, 3, 3, 3, 3, 3, 28, 24, 29, 30, 31, 3, 7, 3, 3, 32, 3, 3, 3, 3, 3, 3, 23, 33, 7, 18, 8, 8, 20, 3, 3, 24, 10, 34, 31, 3, 3, 3, 19, 3, 16, 3, 3, 35, 3, 3, 3, 3, 3, 3, 22, 36, 37, 38, 31, 25, 3, 3, 3, 3, 3, 3, 16, 25, 39, 19, 8, 3, 11, 3, 3, 3, 3, 3, 40, 41, 42, 38, 8, 24, 23, 38, 31, 37, 3, 3, 3, 3, 3, 35, 7, 43, 44, 45, 46, 47, 6, 13, 3, 3, 7, 6, 13, 47, 6, 10, 15, 3, 3, 6, 8, 3, 3, 8, 3, 3, 48, 20, 37, 9, 6, 6, 21, 6, 19, 3, 9, 6, 6, 9, 6, 6, 6, 6, 15, 3, 35, 3, 3, 3, 3, 3, 9, 49, 6, 32, 33, 3, 37, 8, 16, 9, 15, 3, 3, 35, 33, 3, 20, 3, 3, 3, 20, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 16, 15, 3, 3, 3, 53, 6, 54, 45, 41, 24, 6, 6, 3, 3, 20, 3, 3, 7, 55, 3, 3, 20, 3, 21, 46, 25, 3, 41, 45, 24, 3, 3, 7, 56, 3, 3, 57, 6, 13, 44, 9, 6, 25, 46, 6, 6, 18, 6, 6, 6, 13, 6, 58, 3, 3, 3, 49, 21, 25, 41, 58, 3, 3, 59, 3, 3, 3, 60, 54, 53, 8, 3, 22, 54, 61, 54, 3, 3, 3, 3, 45, 45, 6, 6, 43, 3, 3, 13, 6, 6, 6, 49, 6, 15, 20, 37, 15, 8, 3, 6, 8, 3, 6, 3, 3, 4, 62, 3, 3, 0, 63, 3, 3, 3, 7, 8, 3, 3, 3, 3, 3, 16, 6, 3, 3, 11, 3, 13, 6, 6, 8, 35, 35, 7, 3, 64, 65, 3, 3, 66, 3, 3, 3, 3, 45, 45, 45, 45, 15, 3, 3, 3, 16, 6, 8, 3, 7, 6, 6, 50, 50, 50, 67, 7, 43, 54, 25, 58, 3, 3, 3, 3, 20, 3, 3, 3, 3, 9, 21, 65, 33, 3, 3, 7, 3, 3, 68, 3, 3, 3, 15, 19, 18, 15, 16, 3, 3, 64, 54, 3, 69, 3, 3, 64, 26, 36, 31, 70, 71, 71, 71, 71, 71, 71, 70, 71, 71, 71, 71, 71, 71, 70, 71, 71, 70, 71, 71, 71, 3, 3, 3, 51, 72, 73, 52, 52, 52, 52, 3, 3, 3, 3, 35, 0, 0, 0, 3, 3, 16, 13, 3, 9, 11, 3, 6, 3, 3, 13, 7, 74, 3, 3, 3, 3, 3, 6, 6, 6, 13, 3, 3, 46, 21, 33, 5, 13, 3, 3, 3, 3, 7, 6, 24, 6, 15, 3, 3, 7, 3, 3, 3, 64, 43, 6, 21, 58, 3, 16, 15, 3, 3, 3, 46, 54, 49, 3, 3, 46, 6, 13, 3, 25, 30, 30, 66, 37, 16, 6, 15, 56, 6, 75, 61, 49, 3, 3, 3, 43, 8, 45, 53, 3, 3, 3, 8, 46, 6, 21, 61, 3, 3, 7, 26, 6, 53, 3, 3, 43, 53, 6, 3, 76, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 77, 3, 3, 3, 11, 0, 3, 3, 3, 3, 78, 8, 60, 79, 0, 80, 6, 13, 9, 6, 3, 3, 3, 16, 8, 6, 13, 7, 6, 3, 15, 3, 3, 3, 81, 82, 82, 82, 82, 82, 82, }; static RE_UINT8 re_grapheme_cluster_break_stage_5[] = { 3, 3, 3, 3, 3, 3, 2, 3, 3, 1, 3, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0, 0, 4, 4, 4, 4, 0, 0, 0, 4, 4, 4, 0, 0, 0, 4, 4, 4, 4, 4, 0, 4, 0, 4, 4, 0, 3, 3, 0, 0, 4, 4, 4, 0, 3, 0, 0, 0, 4, 0, 0, 0, 0, 0, 4, 4, 4, 3, 0, 4, 4, 0, 0, 4, 4, 0, 4, 4, 0, 4, 0, 0, 4, 4, 4, 6, 0, 0, 4, 6, 4, 0, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 4, 6, 6, 0, 4, 6, 6, 4, 0, 4, 6, 4, 0, 0, 6, 6, 0, 0, 6, 6, 4, 0, 0, 0, 4, 4, 6, 6, 4, 4, 0, 4, 6, 0, 6, 0, 0, 4, 0, 4, 6, 6, 0, 0, 0, 6, 6, 6, 0, 6, 6, 6, 0, 4, 4, 4, 0, 6, 4, 6, 6, 4, 6, 6, 0, 4, 6, 6, 6, 4, 4, 4, 0, 4, 0, 6, 6, 6, 6, 6, 6, 6, 4, 0, 4, 0, 6, 0, 4, 0, 4, 4, 6, 4, 4, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 4, 4, 6, 4, 4, 4, 6, 6, 4, 4, 3, 0, 4, 6, 6, 4, 0, 6, 4, 6, 6, 0, 0, 0, 4, 4, 6, 0, 0, 6, 4, 4, 6, 4, 6, 4, 4, 4, 3, 3, 3, 3, 3, 0, 0, 0, 0, 6, 6, 4, 4, 6, 6, 6, 0, 0, 7, 0, 0, 0, 4, 6, 0, 0, 0, 6, 4, 0, 10, 11, 11, 11, 11, 11, 11, 11, 8, 8, 8, 0, 0, 0, 0, 9, 6, 4, 6, 0, 4, 6, 4, 6, 0, 6, 6, 6, 6, 6, 6, 0, 0, 4, 6, 4, 4, 4, 4, 3, 3, 3, 3, 4, 0, 0, 5, 5, 5, 5, 5, 5, }; /* Grapheme_Cluster_Break: 2640 bytes. */ RE_UINT32 re_get_grapheme_cluster_break(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_grapheme_cluster_break_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_grapheme_cluster_break_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_grapheme_cluster_break_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_grapheme_cluster_break_stage_4[pos + f] << 2; value = re_grapheme_cluster_break_stage_5[pos + code]; return value; } /* Sentence_Break. */ static RE_UINT8 re_sentence_break_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 6, 7, 5, 5, 8, 9, 10, 11, 12, 13, 14, 15, 9, 16, 9, 9, 9, 9, 17, 9, 18, 19, 20, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 21, 22, 23, 9, 9, 24, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 25, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, }; static RE_UINT8 re_sentence_break_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 17, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 33, 33, 36, 33, 37, 33, 33, 38, 39, 40, 33, 41, 42, 33, 33, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 43, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 44, 17, 17, 17, 17, 45, 17, 46, 47, 48, 49, 50, 51, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 52, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 53, 54, 17, 55, 56, 57, 58, 59, 60, 61, 62, 63, 17, 64, 65, 66, 67, 68, 69, 33, 33, 33, 70, 71, 72, 73, 74, 75, 76, 77, 78, 33, 79, 33, 33, 33, 33, 33, 17, 17, 17, 80, 81, 82, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 17, 17, 83, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 84, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 85, 86, 33, 33, 33, 87, 88, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 89, 33, 33, 33, 33, 90, 91, 33, 92, 93, 94, 95, 33, 33, 96, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 97, 33, 33, 33, 33, 33, 98, 33, 33, 99, 33, 33, 33, 33, 100, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 17, 17, 17, 17, 101, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 102, 103, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 104, 33, 33, 33, 33, 33, 33, 33, 33, 33, 17, 17, 105, 33, 33, 33, 33, 33, 106, 107, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, }; static RE_UINT16 re_sentence_break_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 8, 16, 17, 18, 19, 20, 21, 22, 23, 23, 23, 24, 25, 26, 27, 28, 29, 30, 18, 8, 31, 8, 32, 8, 8, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 41, 41, 44, 45, 46, 47, 48, 41, 41, 49, 50, 51, 52, 53, 54, 55, 55, 56, 55, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 71, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 85, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 55, 99, 100, 101, 55, 102, 103, 104, 105, 106, 107, 108, 55, 41, 109, 110, 111, 112, 29, 113, 114, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 115, 41, 116, 117, 118, 41, 119, 41, 120, 121, 122, 29, 29, 123, 96, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 124, 125, 41, 41, 126, 127, 128, 129, 130, 41, 131, 132, 133, 134, 41, 41, 135, 41, 136, 41, 137, 138, 139, 140, 141, 41, 142, 143, 55, 144, 41, 145, 146, 147, 148, 55, 55, 149, 131, 150, 151, 152, 153, 41, 154, 41, 155, 156, 157, 55, 55, 158, 159, 18, 18, 18, 18, 18, 18, 23, 160, 8, 8, 8, 8, 161, 8, 8, 8, 162, 163, 164, 165, 163, 166, 167, 168, 169, 170, 171, 172, 173, 55, 174, 175, 176, 177, 178, 30, 179, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 180, 181, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 182, 30, 183, 55, 55, 184, 185, 55, 55, 186, 187, 55, 55, 55, 55, 188, 55, 189, 190, 29, 191, 192, 193, 8, 8, 8, 194, 18, 195, 41, 196, 197, 198, 198, 23, 199, 200, 201, 55, 55, 55, 55, 55, 202, 203, 96, 41, 204, 96, 41, 114, 205, 206, 41, 41, 207, 208, 55, 209, 41, 41, 41, 41, 41, 137, 55, 55, 41, 41, 41, 41, 41, 41, 137, 55, 41, 41, 41, 41, 210, 55, 209, 211, 212, 213, 8, 214, 215, 41, 41, 216, 217, 218, 8, 219, 220, 221, 55, 222, 223, 224, 41, 225, 226, 131, 227, 228, 50, 229, 230, 231, 58, 232, 233, 234, 41, 235, 236, 237, 41, 238, 239, 240, 241, 242, 243, 244, 18, 18, 41, 245, 41, 41, 41, 41, 41, 246, 247, 248, 41, 41, 41, 249, 41, 41, 250, 55, 251, 252, 253, 41, 41, 254, 255, 41, 41, 256, 209, 41, 257, 41, 258, 259, 260, 261, 262, 263, 41, 41, 41, 264, 265, 2, 266, 267, 268, 138, 269, 270, 271, 272, 273, 55, 41, 41, 41, 208, 55, 55, 41, 56, 55, 55, 55, 274, 55, 55, 55, 55, 231, 41, 275, 276, 41, 209, 277, 278, 279, 41, 280, 55, 29, 281, 282, 41, 279, 133, 55, 55, 41, 283, 41, 284, 55, 55, 55, 55, 41, 197, 137, 258, 55, 55, 55, 55, 285, 286, 137, 197, 138, 55, 55, 287, 137, 250, 55, 55, 41, 288, 55, 55, 289, 290, 291, 231, 231, 55, 104, 292, 41, 137, 137, 293, 254, 55, 55, 55, 41, 41, 294, 55, 29, 295, 18, 296, 152, 297, 298, 299, 152, 300, 301, 302, 152, 303, 304, 305, 152, 232, 306, 55, 307, 308, 55, 55, 309, 310, 311, 312, 313, 71, 314, 315, 55, 55, 55, 55, 55, 55, 55, 55, 41, 47, 316, 55, 55, 55, 55, 55, 41, 317, 318, 55, 41, 47, 319, 55, 41, 320, 133, 55, 321, 322, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 29, 18, 323, 55, 55, 55, 55, 55, 55, 41, 324, 41, 41, 41, 41, 250, 55, 55, 55, 41, 41, 41, 207, 41, 41, 41, 41, 41, 41, 284, 55, 55, 55, 55, 55, 41, 207, 55, 55, 55, 55, 55, 55, 41, 41, 325, 55, 55, 55, 55, 55, 41, 324, 138, 326, 55, 55, 209, 327, 41, 328, 329, 330, 122, 55, 55, 55, 41, 41, 331, 332, 333, 55, 55, 55, 334, 55, 55, 55, 55, 55, 55, 55, 41, 41, 41, 335, 336, 337, 55, 55, 55, 55, 55, 338, 339, 340, 55, 55, 55, 55, 341, 55, 55, 55, 55, 55, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 342, 343, 355, 345, 356, 357, 358, 349, 359, 360, 361, 362, 363, 364, 191, 365, 366, 367, 368, 23, 369, 23, 370, 371, 372, 55, 55, 41, 41, 41, 41, 41, 41, 373, 55, 374, 375, 376, 377, 378, 379, 55, 55, 55, 380, 381, 381, 382, 55, 55, 55, 55, 55, 55, 383, 55, 55, 55, 55, 41, 41, 41, 41, 41, 41, 197, 55, 41, 56, 41, 41, 41, 41, 41, 41, 279, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 334, 55, 55, 279, 55, 55, 55, 55, 55, 55, 55, 384, 385, 385, 385, 55, 55, 55, 55, 23, 23, 23, 23, 23, 23, 23, 386, }; static RE_UINT8 re_sentence_break_stage_4[] = { 0, 0, 1, 2, 0, 0, 0, 0, 3, 4, 5, 6, 7, 7, 8, 9, 10, 11, 11, 11, 11, 11, 12, 13, 14, 15, 15, 15, 15, 15, 16, 13, 0, 17, 0, 0, 0, 0, 0, 0, 18, 0, 19, 20, 0, 21, 19, 0, 11, 11, 11, 11, 11, 22, 11, 23, 15, 15, 15, 15, 15, 24, 15, 15, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 27, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 28, 29, 30, 31, 32, 33, 28, 31, 34, 28, 25, 31, 29, 31, 32, 26, 35, 34, 36, 28, 31, 26, 26, 26, 26, 27, 25, 25, 25, 25, 30, 31, 25, 25, 25, 25, 25, 25, 25, 15, 33, 30, 26, 23, 25, 25, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 37, 15, 15, 15, 15, 15, 15, 15, 15, 38, 36, 39, 40, 36, 36, 41, 0, 0, 0, 15, 42, 0, 43, 0, 0, 0, 0, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 25, 45, 46, 47, 0, 48, 22, 49, 32, 11, 11, 11, 50, 11, 11, 15, 15, 15, 15, 15, 15, 15, 15, 51, 33, 34, 25, 25, 25, 25, 25, 25, 15, 52, 30, 32, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 15, 15, 15, 15, 53, 44, 54, 25, 25, 25, 25, 25, 28, 26, 26, 29, 25, 25, 25, 25, 25, 25, 25, 25, 10, 11, 11, 11, 11, 11, 11, 11, 11, 22, 55, 56, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 57, 0, 58, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 59, 60, 59, 0, 0, 36, 36, 36, 36, 36, 36, 61, 0, 36, 0, 0, 0, 62, 63, 0, 64, 44, 44, 65, 66, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 67, 44, 44, 44, 44, 44, 7, 7, 68, 69, 70, 36, 36, 36, 36, 36, 36, 36, 36, 71, 44, 72, 44, 73, 74, 75, 7, 7, 76, 77, 78, 0, 0, 79, 80, 36, 36, 36, 36, 36, 36, 36, 44, 44, 44, 44, 44, 44, 65, 81, 36, 36, 36, 36, 36, 82, 44, 44, 83, 0, 0, 0, 7, 7, 76, 36, 36, 36, 36, 36, 36, 36, 67, 44, 44, 41, 84, 0, 36, 36, 36, 36, 36, 82, 85, 44, 44, 86, 86, 87, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 88, 0, 0, 89, 44, 44, 44, 44, 44, 44, 44, 44, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 82, 90, 44, 44, 44, 44, 86, 44, 36, 36, 82, 91, 7, 7, 81, 36, 36, 36, 86, 81, 36, 77, 77, 36, 36, 36, 36, 36, 92, 36, 43, 40, 41, 90, 44, 93, 93, 94, 0, 89, 0, 95, 82, 96, 7, 7, 41, 0, 0, 0, 58, 81, 61, 97, 77, 36, 36, 36, 36, 36, 92, 36, 92, 98, 41, 74, 65, 89, 93, 87, 99, 0, 81, 43, 0, 96, 7, 7, 75, 100, 0, 0, 58, 81, 36, 95, 95, 36, 36, 36, 36, 36, 92, 36, 92, 81, 41, 90, 44, 59, 59, 87, 88, 0, 0, 0, 82, 96, 7, 7, 0, 0, 55, 0, 58, 81, 36, 77, 77, 36, 36, 36, 44, 93, 93, 87, 0, 101, 0, 95, 82, 96, 7, 7, 55, 0, 0, 0, 102, 81, 61, 40, 92, 41, 98, 92, 97, 88, 61, 40, 36, 36, 41, 101, 65, 101, 74, 87, 88, 89, 0, 0, 0, 96, 7, 7, 0, 0, 0, 0, 44, 81, 36, 92, 92, 36, 36, 36, 36, 36, 92, 36, 36, 36, 41, 103, 44, 74, 74, 87, 0, 60, 61, 0, 82, 96, 7, 7, 0, 0, 0, 0, 58, 81, 36, 92, 92, 36, 36, 36, 36, 36, 92, 36, 36, 81, 41, 90, 44, 74, 74, 87, 0, 60, 0, 104, 82, 96, 7, 7, 98, 0, 0, 0, 36, 36, 36, 36, 36, 36, 61, 103, 44, 74, 74, 94, 0, 89, 0, 97, 82, 96, 7, 7, 0, 0, 40, 36, 101, 81, 36, 36, 36, 61, 40, 36, 36, 36, 36, 36, 95, 36, 36, 55, 36, 61, 105, 89, 44, 106, 44, 44, 0, 96, 7, 7, 101, 0, 0, 0, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 80, 44, 65, 0, 36, 67, 44, 65, 7, 7, 107, 0, 98, 77, 43, 55, 0, 36, 81, 36, 81, 108, 40, 81, 80, 44, 59, 83, 36, 43, 44, 87, 7, 7, 107, 36, 88, 0, 0, 0, 0, 0, 87, 0, 7, 7, 107, 0, 0, 109, 110, 111, 36, 36, 81, 36, 36, 36, 36, 36, 36, 36, 36, 88, 58, 44, 44, 44, 44, 74, 36, 86, 44, 44, 58, 44, 44, 44, 44, 44, 44, 44, 44, 112, 0, 105, 0, 0, 0, 0, 0, 0, 36, 36, 67, 44, 44, 44, 44, 113, 7, 7, 114, 0, 36, 82, 75, 82, 90, 73, 44, 75, 86, 70, 36, 36, 82, 44, 44, 85, 7, 7, 115, 87, 11, 50, 0, 116, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 61, 36, 36, 36, 92, 41, 36, 61, 92, 41, 36, 36, 92, 41, 36, 36, 36, 36, 36, 36, 36, 36, 92, 41, 36, 61, 92, 41, 36, 36, 36, 61, 36, 36, 36, 36, 36, 36, 92, 41, 36, 36, 36, 36, 36, 36, 36, 36, 61, 58, 117, 9, 118, 0, 0, 0, 0, 0, 36, 36, 36, 36, 0, 0, 0, 0, 11, 11, 11, 11, 11, 119, 15, 39, 36, 36, 36, 120, 36, 36, 36, 36, 121, 36, 36, 36, 36, 36, 122, 123, 36, 36, 61, 40, 36, 36, 88, 0, 36, 36, 36, 92, 82, 112, 0, 0, 36, 36, 36, 36, 82, 124, 0, 0, 36, 36, 36, 36, 82, 0, 0, 0, 36, 36, 36, 92, 125, 0, 0, 0, 36, 36, 36, 36, 36, 44, 44, 44, 44, 44, 44, 44, 44, 97, 0, 100, 7, 7, 107, 0, 0, 0, 0, 0, 126, 0, 127, 128, 7, 7, 107, 0, 36, 36, 36, 36, 36, 36, 0, 0, 36, 36, 129, 0, 36, 36, 36, 36, 36, 36, 36, 36, 36, 41, 0, 0, 36, 36, 36, 36, 36, 36, 36, 61, 44, 44, 44, 0, 44, 44, 44, 0, 0, 91, 7, 7, 36, 36, 36, 36, 36, 36, 36, 41, 36, 88, 0, 0, 36, 36, 36, 0, 36, 36, 36, 36, 36, 36, 41, 0, 7, 7, 107, 0, 36, 36, 36, 36, 36, 67, 44, 0, 36, 36, 36, 36, 36, 86, 44, 65, 44, 44, 44, 44, 44, 44, 44, 93, 7, 7, 107, 0, 7, 7, 107, 0, 0, 97, 130, 0, 44, 44, 44, 65, 44, 70, 36, 36, 36, 36, 36, 36, 44, 70, 36, 0, 7, 7, 114, 131, 0, 0, 89, 44, 44, 0, 0, 0, 113, 36, 36, 36, 36, 36, 36, 36, 86, 44, 44, 75, 7, 7, 76, 36, 36, 82, 44, 44, 44, 0, 0, 0, 36, 44, 44, 44, 44, 44, 9, 118, 7, 7, 107, 81, 7, 7, 76, 36, 36, 36, 36, 36, 36, 36, 36, 132, 0, 0, 0, 0, 65, 44, 44, 44, 44, 44, 70, 80, 82, 133, 87, 0, 44, 44, 44, 44, 44, 87, 0, 44, 25, 25, 25, 25, 25, 34, 15, 27, 15, 15, 11, 11, 15, 39, 11, 119, 15, 15, 11, 11, 15, 15, 11, 11, 15, 39, 11, 119, 15, 15, 134, 134, 15, 15, 11, 11, 15, 15, 15, 39, 15, 15, 11, 11, 15, 135, 11, 136, 46, 135, 11, 137, 15, 46, 11, 0, 15, 15, 11, 137, 46, 135, 11, 137, 138, 138, 139, 140, 141, 142, 143, 143, 0, 144, 145, 146, 0, 0, 147, 148, 0, 149, 148, 0, 0, 0, 0, 150, 62, 151, 62, 62, 21, 0, 0, 152, 0, 0, 0, 147, 15, 15, 15, 42, 0, 0, 0, 0, 44, 44, 44, 44, 44, 44, 44, 44, 112, 0, 0, 0, 48, 153, 154, 155, 23, 116, 10, 119, 0, 156, 49, 157, 11, 38, 158, 33, 0, 159, 39, 160, 0, 0, 0, 0, 161, 38, 88, 0, 0, 0, 0, 0, 0, 0, 143, 0, 0, 0, 0, 0, 0, 0, 147, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 162, 11, 11, 15, 15, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 143, 123, 0, 143, 143, 143, 5, 0, 0, 0, 147, 0, 0, 0, 0, 0, 0, 0, 163, 143, 143, 0, 0, 0, 0, 4, 143, 143, 143, 143, 143, 123, 0, 0, 0, 0, 0, 0, 0, 143, 0, 0, 0, 0, 0, 0, 0, 0, 5, 11, 11, 11, 22, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 24, 31, 164, 26, 32, 25, 29, 15, 33, 25, 42, 153, 165, 54, 0, 0, 0, 15, 166, 0, 21, 36, 36, 36, 36, 36, 36, 0, 97, 0, 0, 0, 89, 36, 36, 36, 36, 36, 61, 0, 0, 36, 61, 36, 61, 36, 61, 36, 61, 143, 143, 143, 5, 0, 0, 0, 5, 143, 143, 5, 167, 0, 0, 0, 118, 168, 0, 0, 0, 0, 0, 0, 0, 169, 81, 143, 143, 5, 143, 143, 170, 81, 36, 82, 44, 81, 41, 36, 88, 36, 36, 36, 36, 36, 61, 60, 81, 0, 81, 36, 36, 36, 36, 36, 36, 36, 36, 36, 41, 81, 36, 36, 36, 36, 36, 36, 61, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 61, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 36, 88, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 36, 171, 36, 36, 36, 172, 36, 36, 36, 36, 7, 7, 76, 0, 0, 0, 0, 0, 25, 25, 25, 173, 65, 44, 44, 174, 25, 25, 25, 25, 25, 25, 25, 175, 36, 36, 36, 36, 176, 9, 0, 0, 0, 0, 0, 0, 0, 97, 36, 36, 177, 25, 25, 25, 27, 25, 25, 25, 25, 25, 25, 25, 15, 15, 26, 30, 25, 25, 178, 179, 25, 27, 25, 25, 25, 25, 31, 119, 11, 25, 0, 0, 0, 0, 0, 0, 0, 97, 180, 36, 181, 181, 67, 36, 36, 36, 36, 36, 67, 44, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 131, 0, 0, 75, 36, 36, 36, 36, 36, 36, 36, 44, 112, 0, 131, 7, 7, 107, 0, 44, 44, 44, 44, 75, 36, 97, 55, 36, 82, 44, 176, 36, 36, 36, 36, 36, 67, 44, 44, 44, 0, 0, 0, 36, 36, 36, 36, 36, 36, 36, 88, 36, 36, 36, 36, 67, 44, 44, 44, 112, 0, 148, 97, 7, 7, 107, 0, 36, 80, 36, 36, 7, 7, 76, 61, 36, 36, 86, 44, 44, 65, 0, 0, 67, 36, 36, 87, 7, 7, 107, 182, 36, 36, 36, 36, 36, 61, 183, 75, 36, 36, 36, 36, 90, 73, 70, 82, 129, 0, 0, 0, 0, 0, 97, 41, 36, 36, 67, 44, 184, 185, 0, 0, 81, 61, 81, 61, 81, 61, 0, 0, 36, 61, 36, 61, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 24, 15, 15, 39, 0, 0, 15, 15, 15, 15, 67, 44, 186, 87, 7, 7, 107, 0, 36, 0, 0, 0, 36, 36, 36, 36, 36, 61, 97, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 0, 36, 36, 36, 41, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 41, 0, 15, 24, 0, 0, 187, 15, 0, 188, 36, 36, 92, 36, 36, 61, 36, 43, 95, 92, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 41, 0, 0, 0, 0, 0, 0, 0, 97, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 189, 36, 36, 36, 36, 40, 36, 36, 36, 36, 36, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 36, 36, 0, 44, 44, 44, 44, 190, 4, 123, 0, 44, 44, 44, 44, 191, 170, 143, 143, 143, 192, 123, 0, 6, 193, 194, 195, 141, 0, 0, 0, 36, 92, 36, 36, 36, 36, 36, 36, 36, 36, 36, 196, 57, 0, 5, 6, 0, 0, 197, 9, 14, 15, 15, 15, 15, 15, 16, 198, 199, 200, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 82, 40, 36, 40, 36, 40, 36, 40, 88, 0, 0, 0, 0, 0, 0, 201, 0, 36, 36, 36, 81, 36, 36, 36, 36, 36, 61, 36, 36, 36, 36, 61, 95, 36, 36, 36, 41, 36, 36, 36, 41, 0, 0, 0, 0, 0, 0, 0, 99, 36, 36, 36, 36, 88, 0, 0, 0, 112, 0, 0, 0, 0, 0, 0, 0, 36, 36, 61, 0, 36, 36, 36, 36, 36, 36, 36, 36, 36, 82, 65, 0, 36, 36, 36, 36, 36, 36, 36, 41, 36, 0, 36, 36, 81, 41, 0, 0, 11, 11, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 36, 36, 36, 36, 36, 36, 0, 0, 36, 36, 36, 36, 36, 0, 0, 0, 0, 0, 0, 0, 36, 41, 92, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 95, 88, 77, 36, 36, 36, 36, 61, 41, 0, 0, 36, 36, 36, 36, 36, 36, 0, 40, 86, 60, 0, 44, 36, 81, 81, 36, 36, 36, 36, 36, 36, 0, 65, 89, 0, 0, 0, 0, 0, 131, 0, 0, 36, 185, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 61, 0, 0, 0, 36, 36, 88, 0, 0, 0, 0, 0, 11, 11, 11, 11, 22, 0, 0, 0, 15, 15, 15, 15, 24, 0, 0, 0, 36, 36, 36, 36, 36, 36, 44, 44, 44, 186, 118, 0, 0, 0, 0, 0, 0, 96, 7, 7, 0, 0, 0, 89, 36, 36, 36, 36, 44, 44, 65, 202, 148, 0, 0, 0, 36, 36, 36, 36, 36, 36, 88, 0, 7, 7, 107, 0, 36, 67, 44, 44, 44, 203, 7, 7, 182, 0, 0, 0, 36, 36, 36, 36, 36, 36, 36, 36, 67, 104, 0, 0, 70, 204, 101, 205, 7, 7, 206, 172, 36, 36, 36, 36, 95, 36, 36, 36, 36, 36, 36, 44, 44, 44, 207, 118, 36, 61, 92, 95, 36, 36, 36, 95, 36, 36, 208, 0, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 67, 44, 44, 65, 0, 7, 7, 107, 0, 44, 81, 36, 77, 77, 36, 36, 36, 44, 93, 93, 87, 88, 89, 0, 81, 82, 101, 44, 112, 44, 112, 0, 0, 44, 95, 0, 0, 7, 7, 107, 0, 36, 36, 36, 67, 44, 87, 44, 44, 209, 0, 182, 130, 130, 130, 36, 87, 124, 88, 0, 0, 7, 7, 107, 0, 36, 36, 67, 44, 44, 44, 0, 0, 36, 36, 36, 36, 36, 36, 41, 58, 44, 44, 44, 0, 7, 7, 107, 78, 7, 7, 107, 0, 0, 0, 0, 97, 36, 36, 36, 36, 36, 36, 88, 0, 36, 61, 0, 0, 0, 0, 0, 0, 7, 7, 107, 131, 0, 0, 0, 0, 36, 36, 36, 41, 44, 205, 0, 0, 36, 36, 36, 36, 44, 186, 118, 0, 36, 118, 0, 0, 7, 7, 107, 0, 97, 36, 36, 36, 36, 36, 0, 81, 36, 88, 0, 0, 86, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 65, 0, 0, 0, 89, 113, 36, 36, 36, 41, 0, 0, 0, 0, 0, 0, 0, 36, 36, 61, 0, 36, 36, 36, 88, 36, 36, 88, 0, 36, 36, 41, 210, 62, 0, 0, 0, 0, 0, 0, 0, 0, 58, 87, 58, 211, 62, 212, 44, 65, 58, 44, 0, 0, 0, 0, 0, 0, 0, 101, 87, 0, 0, 0, 0, 101, 112, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 135, 15, 15, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 49, 48, 213, 10, 49, 11, 155, 166, 14, 15, 14, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 50, 22, 10, 11, 49, 11, 214, 15, 15, 15, 15, 15, 15, 50, 22, 11, 156, 162, 11, 214, 15, 15, 15, 15, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 11, 11, 11, 155, 15, 15, 15, 15, 155, 15, 15, 15, 15, 15, 15, 11, 11, 11, 11, 11, 11, 155, 15, 15, 15, 15, 15, 15, 11, 11, 11, 11, 15, 39, 11, 11, 11, 11, 11, 11, 214, 15, 15, 15, 15, 15, 24, 15, 33, 11, 11, 11, 11, 11, 22, 15, 15, 15, 15, 15, 15, 135, 15, 11, 11, 11, 11, 11, 11, 214, 15, 15, 15, 15, 15, 24, 15, 33, 11, 11, 15, 15, 135, 15, 11, 11, 11, 11, 11, 11, 214, 15, 15, 15, 15, 15, 24, 15, 27, 96, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 44, 44, 44, 44, 44, 65, 89, 44, 44, 44, 44, 112, 0, 99, 0, 0, 0, 112, 118, 0, 0, 0, 89, 44, 58, 44, 44, 44, 0, 0, 0, 0, 36, 88, 0, 0, 44, 65, 0, 0, 36, 81, 36, 36, 36, 36, 36, 36, 98, 77, 81, 36, 61, 36, 108, 0, 104, 97, 108, 81, 98, 77, 108, 108, 98, 77, 61, 36, 61, 36, 81, 43, 36, 36, 95, 36, 36, 36, 36, 0, 81, 81, 95, 36, 36, 36, 36, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 119, 0, 11, 11, 11, 11, 11, 11, 119, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 163, 123, 0, 20, 0, 0, 0, 0, 0, 0, 0, 62, 62, 62, 62, 62, 62, 62, 62, 44, 44, 44, 44, 0, 0, 0, 0, }; static RE_UINT8 re_sentence_break_stage_5[] = { 0, 0, 0, 0, 0, 6, 2, 6, 6, 1, 0, 0, 6, 12, 13, 0, 0, 0, 0, 13, 13, 13, 0, 0, 14, 14, 11, 0, 10, 10, 10, 10, 10, 10, 14, 0, 0, 0, 0, 12, 0, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 13, 0, 13, 0, 0, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 13, 0, 4, 0, 0, 6, 0, 0, 0, 0, 0, 7, 13, 0, 5, 0, 0, 0, 7, 0, 0, 8, 8, 8, 0, 8, 8, 8, 7, 7, 7, 7, 0, 8, 7, 8, 7, 7, 8, 7, 8, 7, 7, 8, 7, 8, 8, 7, 8, 7, 8, 7, 7, 7, 8, 8, 7, 8, 7, 8, 8, 7, 8, 8, 8, 7, 7, 8, 8, 8, 7, 7, 7, 8, 7, 7, 9, 9, 9, 9, 9, 9, 7, 7, 7, 7, 9, 9, 9, 7, 7, 0, 0, 0, 0, 9, 9, 9, 9, 0, 0, 7, 0, 0, 0, 9, 0, 9, 0, 3, 3, 3, 3, 9, 0, 8, 7, 0, 0, 7, 7, 7, 7, 0, 8, 0, 0, 8, 0, 8, 0, 8, 8, 8, 8, 0, 8, 7, 7, 7, 8, 8, 7, 0, 8, 8, 7, 0, 3, 3, 3, 8, 7, 0, 9, 0, 0, 0, 14, 0, 0, 0, 12, 0, 0, 0, 3, 3, 3, 3, 3, 0, 3, 0, 3, 3, 0, 9, 9, 9, 0, 5, 5, 5, 5, 5, 5, 0, 0, 14, 14, 0, 0, 3, 3, 3, 0, 5, 0, 0, 12, 9, 9, 9, 3, 10, 10, 0, 10, 10, 0, 9, 9, 3, 9, 9, 9, 12, 9, 3, 3, 3, 5, 0, 3, 3, 9, 9, 3, 3, 0, 3, 3, 3, 3, 9, 9, 10, 10, 9, 9, 9, 0, 0, 9, 12, 12, 12, 0, 0, 0, 0, 5, 9, 3, 9, 9, 0, 9, 9, 9, 9, 9, 3, 3, 3, 9, 0, 0, 14, 12, 9, 0, 3, 3, 9, 3, 9, 3, 3, 3, 3, 3, 0, 0, 9, 0, 0, 0, 0, 0, 0, 3, 3, 9, 3, 3, 12, 12, 10, 10, 9, 0, 9, 9, 3, 0, 0, 3, 3, 3, 9, 0, 9, 9, 0, 9, 0, 0, 10, 10, 0, 0, 0, 9, 0, 9, 9, 0, 0, 3, 0, 0, 9, 3, 0, 0, 0, 0, 3, 3, 0, 0, 3, 9, 0, 9, 3, 3, 0, 0, 9, 0, 0, 0, 3, 0, 3, 0, 3, 0, 10, 10, 0, 0, 0, 9, 0, 9, 0, 3, 0, 3, 0, 3, 13, 13, 13, 13, 3, 3, 3, 0, 0, 0, 3, 3, 3, 9, 10, 10, 12, 12, 10, 10, 3, 3, 0, 8, 0, 0, 0, 0, 12, 0, 12, 0, 0, 0, 8, 8, 0, 0, 9, 0, 12, 9, 6, 9, 9, 9, 9, 9, 9, 13, 13, 0, 0, 0, 3, 12, 12, 0, 9, 0, 3, 3, 0, 0, 14, 12, 14, 12, 0, 3, 3, 3, 5, 0, 9, 3, 9, 0, 12, 12, 12, 12, 0, 0, 12, 12, 9, 9, 12, 12, 3, 9, 9, 0, 0, 8, 0, 8, 7, 0, 7, 7, 8, 0, 7, 0, 8, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 5, 3, 3, 5, 5, 0, 0, 0, 14, 14, 0, 0, 0, 13, 13, 13, 13, 11, 0, 0, 0, 4, 4, 5, 5, 5, 5, 5, 6, 0, 13, 13, 0, 12, 12, 0, 0, 0, 13, 13, 12, 0, 0, 0, 6, 5, 0, 5, 5, 0, 13, 13, 7, 0, 0, 0, 8, 0, 0, 7, 8, 8, 8, 7, 7, 8, 0, 8, 0, 8, 8, 0, 7, 9, 7, 0, 0, 0, 8, 7, 7, 0, 0, 7, 0, 9, 9, 9, 8, 0, 0, 8, 8, 0, 0, 13, 13, 8, 7, 7, 8, 7, 8, 7, 3, 7, 7, 0, 7, 0, 0, 12, 9, 0, 0, 13, 0, 6, 14, 12, 0, 0, 13, 13, 13, 9, 9, 0, 12, 9, 0, 12, 12, 8, 7, 9, 3, 3, 3, 0, 9, 7, 7, 3, 3, 3, 3, 0, 12, 0, 0, 8, 7, 9, 0, 0, 8, 7, 8, 7, 9, 7, 7, 7, 9, 9, 9, 3, 9, 0, 12, 12, 12, 0, 0, 9, 3, 12, 12, 9, 9, 9, 3, 3, 0, 3, 3, 3, 12, 0, 0, 0, 7, 0, 9, 3, 9, 9, 9, 13, 13, 14, 14, 0, 14, 0, 14, 14, 0, 13, 0, 0, 13, 0, 14, 12, 12, 14, 13, 13, 13, 13, 13, 13, 0, 9, 0, 0, 5, 0, 0, 14, 0, 0, 13, 0, 13, 13, 12, 13, 13, 14, 0, 9, 9, 0, 5, 5, 5, 0, 5, 12, 12, 3, 0, 10, 10, 9, 12, 12, 0, 3, 12, 0, 0, 10, 10, 9, 0, 12, 12, 0, 12, 9, 12, 0, 0, 3, 0, 12, 12, 0, 3, 3, 12, 3, 3, 3, 5, 5, 5, 5, 3, 0, 8, 8, 0, 8, 0, 7, 7, }; /* Sentence_Break: 6372 bytes. */ RE_UINT32 re_get_sentence_break(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_sentence_break_stage_1[f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_sentence_break_stage_2[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_sentence_break_stage_3[pos + f] << 3; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_sentence_break_stage_4[pos + f] << 2; value = re_sentence_break_stage_5[pos + code]; return value; } /* Math. */ static RE_UINT8 re_math_stage_1[] = { 0, 1, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_math_stage_2[] = { 0, 1, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 6, 1, 1, }; static RE_UINT8 re_math_stage_3[] = { 0, 1, 1, 2, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 5, 6, 7, 1, 8, 9, 10, 1, 6, 6, 11, 1, 1, 1, 1, 1, 1, 1, 12, 1, 1, 13, 14, 1, 1, 1, 1, 15, 16, 17, 18, 1, 1, 1, 1, 1, 1, 19, 1, }; static RE_UINT8 re_math_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 9, 10, 11, 12, 13, 0, 14, 15, 16, 17, 18, 0, 19, 20, 21, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 25, 0, 26, 27, 28, 29, 30, 0, 0, 0, 0, 0, 31, 32, 33, 34, 0, 35, 36, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 19, 37, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 0, 0, 0, 1, 3, 3, 0, 0, 0, 0, 40, 23, 23, 41, 23, 42, 43, 44, 23, 45, 46, 47, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 48, 23, 23, 23, 23, 23, 23, 23, 23, 49, 23, 44, 50, 51, 52, 53, 54, 0, 55, }; static RE_UINT8 re_math_stage_5[] = { 0, 0, 0, 0, 0, 8, 0, 112, 0, 0, 0, 64, 0, 0, 0, 80, 0, 16, 2, 0, 0, 0, 128, 0, 0, 0, 39, 0, 0, 0, 115, 0, 192, 1, 0, 0, 0, 0, 64, 0, 0, 0, 28, 0, 17, 0, 4, 0, 30, 0, 0, 124, 0, 124, 0, 0, 0, 0, 255, 31, 98, 248, 0, 0, 132, 252, 47, 63, 16, 179, 251, 241, 255, 11, 0, 0, 0, 0, 255, 255, 255, 126, 195, 240, 255, 255, 255, 47, 48, 0, 240, 255, 255, 255, 255, 255, 0, 15, 0, 0, 3, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 248, 255, 255, 191, 0, 0, 0, 1, 240, 7, 0, 0, 0, 3, 192, 255, 240, 195, 140, 15, 0, 148, 31, 0, 255, 96, 0, 0, 0, 5, 0, 0, 0, 15, 224, 0, 0, 159, 31, 0, 0, 0, 2, 0, 0, 126, 1, 0, 0, 4, 30, 0, 0, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 255, 207, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 0, 0, 3, 0, }; /* Math: 538 bytes. */ RE_UINT32 re_get_math(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_math_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_math_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_math_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_math_stage_4[pos + f] << 5; pos += code; value = (re_math_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Alphabetic. */ static RE_UINT8 re_alphabetic_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_alphabetic_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 27, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_alphabetic_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 31, 36, 37, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 48, 49, 1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 1, 59, 60, 61, 62, 63, 64, 31, 31, 31, 65, 66, 67, 68, 69, 70, 71, 72, 73, 31, 74, 31, 31, 31, 31, 31, 1, 1, 1, 75, 76, 77, 31, 31, 1, 1, 1, 1, 78, 31, 31, 31, 31, 31, 31, 31, 1, 1, 79, 31, 1, 1, 80, 81, 31, 31, 31, 82, 83, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 84, 31, 31, 31, 31, 31, 31, 31, 85, 86, 87, 88, 89, 31, 31, 31, 31, 31, 90, 31, 31, 91, 31, 31, 31, 31, 31, 31, 1, 1, 1, 1, 1, 1, 92, 1, 1, 1, 1, 1, 1, 1, 1, 93, 94, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 95, 31, 1, 1, 96, 31, 31, 31, 31, 31, }; static RE_UINT8 re_alphabetic_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 0, 0, 7, 8, 9, 10, 4, 11, 4, 4, 4, 4, 12, 4, 4, 4, 4, 13, 14, 15, 16, 17, 18, 19, 20, 4, 21, 22, 4, 4, 23, 24, 25, 4, 26, 4, 4, 27, 28, 29, 30, 31, 32, 0, 0, 33, 0, 34, 4, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 47, 51, 52, 53, 54, 55, 0, 56, 57, 58, 59, 60, 61, 62, 63, 60, 64, 65, 66, 67, 68, 69, 70, 15, 71, 72, 0, 73, 74, 75, 0, 76, 0, 77, 78, 79, 80, 0, 0, 4, 81, 25, 82, 83, 4, 84, 85, 4, 4, 86, 4, 87, 88, 89, 4, 90, 4, 91, 0, 92, 4, 4, 93, 15, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 94, 1, 4, 4, 95, 96, 97, 97, 98, 4, 99, 100, 0, 0, 4, 4, 101, 4, 102, 4, 103, 104, 105, 25, 106, 4, 107, 108, 0, 109, 4, 104, 110, 0, 111, 0, 0, 4, 112, 113, 0, 4, 114, 4, 115, 4, 103, 116, 117, 0, 0, 0, 118, 4, 4, 4, 4, 4, 4, 0, 119, 93, 4, 120, 117, 4, 121, 122, 123, 0, 0, 0, 124, 125, 0, 0, 0, 126, 127, 128, 4, 129, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 4, 108, 4, 131, 104, 4, 4, 4, 4, 132, 4, 84, 4, 133, 134, 135, 135, 4, 0, 136, 0, 0, 0, 0, 0, 0, 137, 138, 15, 4, 139, 15, 4, 85, 140, 141, 4, 4, 142, 71, 0, 25, 4, 4, 4, 4, 4, 103, 0, 0, 4, 4, 4, 4, 4, 4, 103, 0, 4, 4, 4, 4, 31, 0, 25, 117, 143, 144, 4, 145, 4, 4, 4, 92, 146, 147, 4, 4, 148, 149, 0, 146, 150, 16, 4, 97, 4, 4, 59, 151, 28, 102, 152, 80, 4, 153, 136, 154, 4, 134, 155, 156, 4, 104, 157, 158, 159, 160, 85, 161, 4, 4, 4, 162, 4, 4, 4, 4, 4, 163, 164, 109, 4, 4, 4, 165, 4, 4, 166, 0, 167, 168, 169, 4, 4, 27, 170, 4, 4, 117, 25, 4, 171, 4, 16, 172, 0, 0, 0, 173, 4, 4, 4, 80, 0, 1, 1, 174, 4, 104, 175, 0, 176, 177, 178, 0, 4, 4, 4, 71, 0, 0, 4, 33, 0, 0, 0, 0, 0, 0, 0, 0, 80, 4, 179, 0, 4, 25, 102, 71, 117, 4, 180, 0, 4, 4, 4, 4, 117, 0, 0, 0, 4, 181, 4, 59, 0, 0, 0, 0, 4, 134, 103, 16, 0, 0, 0, 0, 182, 183, 103, 134, 104, 0, 0, 184, 103, 166, 0, 0, 4, 185, 0, 0, 186, 97, 0, 80, 80, 0, 77, 187, 4, 103, 103, 152, 27, 0, 0, 0, 4, 4, 129, 0, 4, 152, 4, 152, 4, 4, 188, 0, 147, 32, 25, 129, 4, 152, 25, 189, 4, 4, 190, 0, 191, 192, 0, 0, 193, 194, 4, 129, 38, 47, 195, 59, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 196, 0, 0, 0, 0, 0, 4, 197, 198, 0, 4, 104, 199, 0, 4, 103, 0, 0, 200, 162, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 201, 0, 0, 0, 0, 0, 0, 4, 32, 4, 4, 4, 4, 166, 0, 0, 0, 4, 4, 4, 142, 4, 4, 4, 4, 4, 4, 59, 0, 0, 0, 0, 0, 4, 142, 0, 0, 0, 0, 0, 0, 4, 4, 202, 0, 0, 0, 0, 0, 4, 32, 104, 0, 0, 0, 25, 155, 4, 134, 59, 203, 92, 0, 0, 0, 4, 4, 204, 104, 170, 0, 0, 0, 205, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 206, 207, 0, 0, 0, 4, 4, 208, 4, 209, 210, 211, 4, 212, 213, 214, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 215, 216, 85, 208, 208, 131, 131, 217, 217, 218, 0, 4, 4, 4, 4, 4, 4, 187, 0, 211, 219, 220, 221, 222, 223, 0, 0, 0, 25, 224, 224, 108, 0, 0, 0, 4, 4, 4, 4, 4, 4, 134, 0, 4, 33, 4, 4, 4, 4, 4, 4, 117, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 205, 0, 0, 117, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_alphabetic_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 32, 0, 0, 0, 0, 0, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 255, 255, 254, 0, 192, 255, 255, 255, 255, 239, 31, 254, 225, 0, 156, 0, 0, 255, 255, 0, 224, 255, 255, 255, 255, 3, 0, 0, 252, 255, 255, 255, 7, 48, 4, 255, 255, 255, 252, 255, 31, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 248, 3, 255, 255, 255, 255, 255, 239, 255, 223, 225, 255, 15, 0, 254, 255, 239, 159, 249, 255, 255, 253, 197, 227, 159, 89, 128, 176, 15, 0, 3, 0, 238, 135, 249, 255, 255, 253, 109, 195, 135, 25, 2, 94, 0, 0, 63, 0, 238, 191, 251, 255, 255, 253, 237, 227, 191, 27, 1, 0, 15, 0, 0, 2, 238, 159, 249, 255, 159, 25, 192, 176, 15, 0, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 29, 129, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 29, 96, 7, 15, 0, 0, 0, 238, 223, 253, 255, 255, 253, 239, 227, 223, 29, 96, 64, 15, 0, 6, 0, 255, 255, 255, 231, 223, 93, 128, 128, 15, 0, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 128, 95, 255, 0, 0, 12, 0, 255, 255, 255, 7, 127, 32, 0, 0, 150, 37, 240, 254, 174, 236, 255, 59, 95, 32, 0, 240, 1, 0, 0, 0, 255, 254, 255, 255, 255, 31, 254, 255, 3, 255, 255, 254, 255, 255, 255, 31, 255, 255, 127, 249, 231, 193, 255, 255, 127, 64, 0, 48, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 135, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 15, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 207, 255, 255, 1, 128, 16, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 1, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 0, 0, 255, 255, 255, 15, 254, 255, 31, 0, 128, 0, 0, 0, 255, 255, 239, 255, 239, 15, 0, 0, 255, 243, 0, 252, 191, 255, 3, 0, 0, 224, 0, 252, 255, 255, 255, 63, 0, 222, 111, 0, 128, 255, 31, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 62, 80, 189, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 0, 0, 192, 255, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 255, 255, 127, 0, 127, 127, 127, 127, 0, 128, 0, 0, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 224, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 0, 12, 0, 0, 255, 127, 240, 143, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 0, 0, 252, 40, 255, 255, 7, 0, 255, 255, 247, 255, 223, 255, 0, 124, 255, 63, 0, 0, 255, 255, 127, 196, 5, 0, 0, 56, 255, 255, 60, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 7, 0, 0, 15, 0, 255, 255, 127, 248, 255, 255, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 0, 0, 223, 255, 192, 255, 255, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 15, 255, 62, 0, 255, 0, 255, 255, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 31, 0, 0, 0, 63, 0, 0, 0, 255, 255, 71, 0, 30, 0, 0, 20, 255, 255, 251, 255, 255, 255, 159, 0, 127, 189, 255, 191, 255, 1, 255, 255, 159, 25, 129, 224, 179, 0, 0, 0, 255, 255, 63, 127, 0, 0, 0, 63, 17, 0, 0, 0, 255, 255, 255, 227, 0, 0, 0, 128, 127, 0, 0, 0, 248, 255, 255, 224, 31, 0, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 67, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 15, 0, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 255, 3, 255, 255, }; /* Alphabetic: 2085 bytes. */ RE_UINT32 re_get_alphabetic(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_alphabetic_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_alphabetic_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_alphabetic_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_alphabetic_stage_4[pos + f] << 5; pos += code; value = (re_alphabetic_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Lowercase. */ static RE_UINT8 re_lowercase_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_lowercase_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 8, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_lowercase_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 8, 9, 10, 11, 12, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 16, 17, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 19, 6, 6, 6, 20, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 23, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 24, 25, 26, 27, 6, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_lowercase_stage_4[] = { 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 5, 13, 14, 15, 16, 17, 18, 19, 0, 0, 20, 21, 22, 23, 24, 25, 0, 26, 15, 5, 27, 5, 28, 5, 5, 29, 0, 30, 31, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 15, 15, 15, 15, 15, 15, 0, 0, 5, 5, 5, 5, 33, 5, 5, 5, 34, 35, 36, 37, 35, 38, 39, 40, 0, 0, 0, 41, 42, 0, 0, 0, 43, 44, 45, 26, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 47, 0, 26, 48, 49, 5, 5, 5, 50, 15, 51, 0, 0, 0, 0, 0, 0, 0, 0, 5, 52, 53, 0, 0, 0, 0, 54, 5, 55, 56, 57, 0, 58, 0, 26, 59, 60, 15, 15, 0, 0, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 62, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 64, 0, 0, 0, 0, 0, 0, 15, 0, 65, 66, 67, 31, 68, 69, 70, 71, 72, 73, 74, 75, 76, 65, 66, 77, 31, 68, 78, 63, 71, 79, 80, 81, 82, 78, 83, 26, 84, 71, 85, 0, }; static RE_UINT8 re_lowercase_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 0, 0, 0, 128, 255, 255, 127, 255, 170, 170, 170, 170, 170, 170, 170, 85, 85, 171, 170, 170, 170, 170, 170, 212, 41, 49, 36, 78, 42, 45, 81, 230, 64, 82, 85, 181, 170, 170, 41, 170, 170, 170, 250, 147, 133, 170, 255, 255, 255, 255, 255, 255, 255, 255, 239, 255, 255, 255, 255, 1, 3, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 0, 0, 138, 60, 0, 0, 1, 0, 0, 240, 255, 255, 255, 127, 227, 170, 170, 170, 47, 25, 0, 0, 255, 255, 2, 168, 170, 170, 84, 213, 170, 170, 170, 170, 0, 0, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 63, 170, 170, 234, 191, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 255, 0, 255, 0, 255, 63, 255, 0, 223, 64, 220, 0, 207, 0, 255, 0, 220, 0, 0, 0, 2, 128, 0, 0, 255, 31, 0, 196, 8, 0, 0, 128, 16, 50, 192, 67, 0, 0, 16, 0, 0, 0, 255, 3, 0, 0, 255, 255, 255, 127, 98, 21, 218, 63, 26, 80, 8, 0, 191, 32, 0, 0, 170, 42, 0, 0, 170, 170, 170, 58, 168, 170, 171, 170, 170, 170, 255, 149, 170, 80, 186, 170, 170, 2, 160, 0, 0, 0, 0, 7, 255, 255, 255, 247, 63, 0, 255, 255, 127, 0, 248, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 7, 0, 0, 0, 0, 252, 255, 255, 15, 0, 0, 192, 223, 255, 252, 255, 255, 15, 0, 0, 192, 235, 239, 255, 0, 0, 0, 252, 255, 255, 15, 0, 0, 192, 255, 255, 255, 0, 0, 0, 252, 255, 255, 15, 0, 0, 192, 255, 255, 255, 0, 192, 255, 255, 0, 0, 192, 255, 63, 0, 0, 0, 252, 255, 255, 247, 3, 0, 0, 240, 255, 255, 223, 15, 255, 127, 63, 0, 255, 253, 0, 0, 247, 11, 0, 0, }; /* Lowercase: 777 bytes. */ RE_UINT32 re_get_lowercase(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_lowercase_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_lowercase_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_lowercase_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_lowercase_stage_4[pos + f] << 5; pos += code; value = (re_lowercase_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Uppercase. */ static RE_UINT8 re_uppercase_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_uppercase_stage_2[] = { 0, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 8, 9, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 12, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_uppercase_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 16, 6, 6, 6, 6, 17, 6, 6, 6, 6, 6, 6, 6, 18, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 20, 21, 22, 23, 6, 24, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_uppercase_stage_4[] = { 0, 0, 1, 0, 0, 0, 2, 0, 3, 4, 5, 6, 7, 8, 9, 10, 3, 11, 12, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 0, 3, 20, 3, 21, 3, 3, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 24, 0, 0, 0, 0, 0, 0, 18, 18, 25, 3, 3, 3, 3, 26, 3, 3, 3, 27, 28, 29, 30, 0, 31, 32, 33, 34, 35, 36, 19, 37, 0, 0, 0, 0, 0, 0, 0, 0, 38, 19, 0, 18, 39, 0, 40, 3, 3, 3, 41, 0, 0, 3, 42, 43, 0, 0, 0, 0, 44, 3, 45, 46, 47, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 18, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 49, 0, 0, 0, 0, 0, 0, 0, 18, 0, 0, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 50, 51, 52, 53, 63, 25, 56, 57, 53, 64, 65, 66, 67, 38, 39, 56, 68, 69, 0, 0, 56, 70, 70, 57, 0, 0, 0, }; static RE_UINT8 re_uppercase_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 255, 255, 127, 127, 85, 85, 85, 85, 85, 85, 85, 170, 170, 84, 85, 85, 85, 85, 85, 43, 214, 206, 219, 177, 213, 210, 174, 17, 144, 164, 170, 74, 85, 85, 210, 85, 85, 85, 5, 108, 122, 85, 0, 0, 0, 0, 69, 128, 64, 215, 254, 255, 251, 15, 0, 0, 0, 128, 28, 85, 85, 85, 144, 230, 255, 255, 255, 255, 255, 255, 0, 0, 1, 84, 85, 85, 171, 42, 85, 85, 85, 85, 254, 255, 255, 255, 127, 0, 191, 32, 0, 0, 255, 255, 63, 0, 85, 85, 21, 64, 0, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 170, 0, 255, 0, 0, 0, 0, 0, 15, 0, 15, 0, 15, 0, 31, 0, 15, 132, 56, 39, 62, 80, 61, 15, 192, 32, 0, 0, 0, 8, 0, 0, 0, 0, 0, 192, 255, 255, 127, 0, 0, 157, 234, 37, 192, 5, 40, 4, 0, 85, 21, 0, 0, 85, 85, 85, 5, 84, 85, 84, 85, 85, 85, 0, 106, 85, 40, 69, 85, 85, 61, 95, 0, 255, 0, 0, 0, 255, 255, 7, 0, 255, 255, 255, 3, 0, 0, 240, 255, 255, 63, 0, 0, 0, 255, 255, 255, 3, 0, 0, 208, 100, 222, 63, 0, 0, 0, 255, 255, 255, 3, 0, 0, 176, 231, 223, 31, 0, 0, 0, 123, 95, 252, 1, 0, 0, 240, 255, 255, 63, 0, 0, 0, 3, 0, 0, 240, 1, 0, 0, 0, 252, 255, 255, 7, 0, 0, 0, 240, 255, 255, 31, 0, 255, 1, 0, 0, 0, 4, 0, 0, 255, 3, 255, 255, }; /* Uppercase: 701 bytes. */ RE_UINT32 re_get_uppercase(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_uppercase_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_uppercase_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_uppercase_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_uppercase_stage_4[pos + f] << 5; pos += code; value = (re_uppercase_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Cased. */ static RE_UINT8 re_cased_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_cased_stage_2[] = { 0, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 9, 10, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 12, 1, 1, 1, 13, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_cased_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 11, 12, 13, 6, 6, 14, 6, 6, 6, 6, 6, 6, 6, 15, 16, 6, 6, 6, 6, 6, 6, 6, 6, 17, 18, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 20, 6, 6, 6, 21, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, 23, 6, 6, 6, 24, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 25, 26, 27, 28, 6, 29, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_cased_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 4, 4, 4, 4, 5, 6, 4, 4, 4, 4, 4, 7, 8, 9, 10, 0, 0, 11, 12, 13, 14, 4, 15, 4, 4, 4, 4, 16, 4, 4, 4, 4, 17, 18, 19, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 21, 0, 0, 0, 0, 0, 0, 4, 4, 22, 4, 4, 4, 4, 4, 4, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 22, 4, 23, 24, 4, 25, 26, 27, 0, 0, 0, 28, 29, 0, 0, 0, 30, 31, 32, 4, 33, 0, 0, 0, 0, 0, 0, 0, 0, 34, 4, 35, 4, 36, 37, 4, 4, 4, 4, 38, 4, 21, 0, 0, 0, 0, 0, 0, 0, 0, 4, 39, 24, 0, 0, 0, 0, 40, 4, 4, 41, 42, 0, 43, 0, 44, 5, 45, 4, 4, 0, 0, 46, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 4, 4, 47, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 48, 4, 48, 0, 0, 0, 0, 0, 4, 4, 0, 4, 4, 49, 4, 50, 51, 52, 4, 53, 54, 55, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 56, 57, 5, 49, 49, 36, 36, 58, 58, 59, 0, 0, 44, 60, 60, 35, 0, 0, 0, }; static RE_UINT8 re_cased_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 255, 255, 255, 247, 240, 255, 255, 255, 255, 255, 239, 255, 255, 255, 255, 1, 3, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 0, 0, 207, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 0, 254, 255, 255, 255, 255, 0, 0, 0, 191, 32, 0, 0, 255, 255, 63, 63, 63, 63, 255, 170, 255, 255, 255, 63, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 62, 80, 189, 31, 242, 224, 67, 0, 0, 24, 0, 0, 0, 0, 0, 192, 255, 255, 3, 0, 0, 255, 127, 255, 255, 255, 255, 255, 127, 31, 120, 12, 0, 255, 63, 0, 0, 252, 255, 255, 255, 255, 120, 255, 255, 255, 63, 255, 0, 0, 0, 0, 7, 0, 0, 255, 255, 63, 0, 255, 255, 127, 0, 248, 0, 255, 255, 0, 0, 255, 255, 7, 0, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 15, 0, 0, 255, 3, 255, 255, }; /* Cased: 709 bytes. */ RE_UINT32 re_get_cased(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_cased_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_cased_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_cased_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_cased_stage_4[pos + f] << 5; pos += code; value = (re_cased_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Case_Ignorable. */ static RE_UINT8 re_case_ignorable_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, }; static RE_UINT8 re_case_ignorable_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 7, 7, 7, 7, 7, 7, 7, 7, 7, 10, 11, 12, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 7, 7, 7, 7, 7, 7, 7, 7, 7, 15, 7, 7, 16, 17, 7, 18, 19, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 20, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, }; static RE_UINT8 re_case_ignorable_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 1, 17, 1, 1, 1, 18, 19, 20, 21, 22, 23, 24, 1, 25, 26, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 28, 29, 1, 30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 31, 1, 1, 1, 32, 1, 33, 34, 35, 36, 37, 38, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 40, 41, 1, 42, 43, 44, 1, 1, 1, 1, 1, 1, 45, 1, 1, 1, 1, 1, 46, 47, 48, 49, 50, 51, 52, 53, 1, 1, 54, 55, 1, 1, 1, 56, 1, 1, 1, 1, 57, 1, 1, 1, 1, 58, 59, 1, 1, 1, 1, 1, 1, 1, 60, 1, 1, 1, 1, 1, 61, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 62, 1, 1, 1, 1, 63, 64, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_case_ignorable_stage_4[] = { 0, 1, 2, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 6, 6, 6, 6, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 10, 0, 11, 12, 13, 14, 15, 0, 16, 17, 0, 0, 18, 19, 20, 5, 21, 0, 0, 22, 0, 23, 24, 25, 26, 0, 0, 0, 0, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 33, 37, 38, 36, 33, 39, 35, 32, 40, 41, 35, 42, 0, 43, 0, 3, 44, 45, 35, 32, 40, 46, 35, 32, 0, 34, 35, 0, 0, 47, 0, 0, 48, 49, 0, 0, 50, 51, 0, 52, 53, 0, 54, 55, 56, 57, 0, 0, 58, 59, 60, 61, 0, 0, 33, 0, 0, 62, 0, 0, 0, 0, 0, 63, 63, 64, 64, 0, 65, 66, 0, 67, 0, 68, 0, 0, 69, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 71, 0, 72, 73, 0, 74, 0, 0, 75, 76, 42, 77, 78, 79, 0, 80, 0, 81, 0, 82, 0, 0, 83, 84, 0, 85, 6, 86, 87, 6, 6, 88, 0, 0, 0, 0, 0, 89, 90, 91, 92, 93, 0, 94, 95, 0, 5, 96, 0, 0, 0, 97, 0, 0, 0, 98, 0, 0, 0, 99, 0, 0, 0, 6, 0, 100, 0, 0, 0, 0, 0, 0, 101, 102, 0, 0, 103, 0, 0, 104, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 82, 106, 0, 0, 107, 108, 0, 0, 109, 6, 78, 0, 17, 110, 0, 0, 52, 111, 112, 0, 0, 0, 0, 113, 114, 0, 115, 116, 0, 28, 117, 100, 112, 0, 118, 119, 120, 0, 121, 122, 123, 0, 0, 87, 0, 0, 0, 0, 124, 2, 0, 0, 0, 0, 125, 78, 0, 126, 127, 128, 0, 0, 0, 0, 129, 1, 2, 3, 17, 44, 0, 0, 130, 0, 0, 0, 0, 0, 0, 0, 131, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 132, 0, 0, 0, 0, 133, 134, 0, 0, 0, 0, 0, 112, 32, 135, 136, 129, 78, 137, 0, 0, 28, 138, 0, 139, 78, 140, 141, 0, 0, 142, 0, 0, 0, 0, 129, 143, 78, 33, 3, 144, 0, 0, 0, 0, 0, 0, 0, 0, 0, 145, 146, 0, 0, 0, 0, 0, 0, 147, 148, 0, 0, 149, 3, 0, 0, 150, 0, 0, 62, 151, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152, 0, 153, 75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 154, 0, 0, 0, 0, 0, 0, 0, 155, 75, 0, 0, 0, 0, 0, 156, 157, 158, 0, 0, 0, 0, 159, 0, 0, 0, 0, 0, 6, 160, 6, 161, 162, 163, 0, 0, 0, 0, 0, 0, 0, 0, 153, 0, 0, 0, 0, 0, 0, 0, 0, 87, 32, 6, 6, 6, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 127, }; static RE_UINT8 re_case_ignorable_stage_5[] = { 0, 0, 0, 0, 128, 64, 0, 4, 0, 0, 0, 64, 1, 0, 0, 0, 0, 161, 144, 1, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 48, 4, 176, 0, 0, 0, 248, 3, 0, 0, 0, 0, 0, 2, 0, 0, 254, 255, 255, 255, 255, 191, 182, 0, 0, 0, 0, 0, 16, 0, 63, 0, 255, 23, 1, 248, 255, 255, 0, 0, 1, 0, 0, 0, 192, 191, 255, 61, 0, 0, 0, 128, 2, 0, 255, 7, 0, 0, 192, 255, 1, 0, 0, 248, 63, 4, 0, 0, 192, 255, 255, 63, 0, 0, 0, 0, 0, 14, 248, 255, 255, 255, 7, 0, 0, 0, 0, 0, 0, 20, 254, 33, 254, 0, 12, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 16, 30, 32, 0, 0, 12, 0, 0, 0, 6, 0, 0, 0, 134, 57, 2, 0, 0, 0, 35, 0, 190, 33, 0, 0, 0, 0, 0, 144, 30, 32, 64, 0, 4, 0, 0, 0, 1, 32, 0, 0, 0, 0, 0, 192, 193, 61, 96, 0, 64, 48, 0, 0, 0, 4, 92, 0, 0, 0, 242, 7, 192, 127, 0, 0, 0, 0, 242, 27, 64, 63, 0, 0, 0, 0, 0, 3, 0, 0, 160, 2, 0, 0, 254, 127, 223, 224, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 0, 224, 253, 102, 0, 0, 0, 195, 1, 0, 30, 0, 100, 32, 0, 32, 0, 0, 0, 224, 0, 0, 28, 0, 0, 0, 12, 0, 0, 0, 176, 63, 64, 254, 143, 32, 0, 120, 0, 0, 8, 0, 0, 0, 0, 2, 0, 0, 135, 1, 4, 14, 0, 0, 128, 9, 0, 0, 64, 127, 229, 31, 248, 159, 128, 0, 255, 127, 15, 0, 0, 0, 0, 0, 208, 23, 0, 248, 15, 0, 3, 0, 0, 0, 60, 59, 0, 0, 64, 163, 3, 0, 0, 240, 207, 0, 0, 0, 0, 63, 0, 0, 247, 255, 253, 33, 16, 3, 0, 240, 255, 255, 255, 7, 0, 1, 0, 0, 0, 248, 255, 255, 63, 240, 0, 0, 0, 160, 3, 224, 0, 224, 0, 224, 0, 96, 0, 248, 0, 3, 144, 124, 0, 0, 223, 255, 2, 128, 0, 0, 255, 31, 255, 255, 1, 0, 0, 0, 0, 48, 0, 128, 3, 0, 0, 128, 0, 128, 0, 128, 0, 0, 32, 0, 0, 0, 0, 60, 62, 8, 0, 0, 0, 126, 0, 0, 0, 112, 0, 0, 32, 0, 0, 16, 0, 0, 0, 128, 247, 191, 0, 0, 0, 240, 0, 0, 3, 0, 0, 7, 0, 0, 68, 8, 0, 0, 96, 0, 0, 0, 16, 0, 0, 0, 255, 255, 3, 0, 192, 63, 0, 0, 128, 255, 3, 0, 0, 0, 200, 19, 0, 126, 102, 0, 8, 16, 0, 0, 0, 0, 1, 16, 0, 0, 157, 193, 2, 0, 0, 32, 0, 48, 88, 0, 32, 33, 0, 0, 0, 0, 252, 255, 255, 255, 8, 0, 255, 255, 0, 0, 0, 0, 36, 0, 0, 0, 0, 128, 8, 0, 0, 14, 0, 0, 0, 32, 0, 0, 192, 7, 110, 240, 0, 0, 0, 0, 0, 135, 0, 0, 0, 255, 127, 0, 0, 0, 0, 0, 120, 38, 128, 239, 31, 0, 0, 0, 8, 0, 0, 0, 192, 127, 0, 28, 0, 0, 0, 128, 211, 0, 248, 7, 0, 0, 192, 31, 31, 0, 0, 0, 248, 133, 13, 0, 0, 0, 0, 0, 60, 176, 1, 0, 0, 48, 0, 0, 248, 167, 0, 40, 191, 0, 188, 15, 0, 0, 0, 0, 31, 0, 0, 0, 127, 0, 0, 128, 255, 255, 0, 0, 0, 96, 128, 3, 248, 255, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, }; /* Case_Ignorable: 1474 bytes. */ RE_UINT32 re_get_case_ignorable(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_case_ignorable_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_case_ignorable_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_case_ignorable_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_case_ignorable_stage_4[pos + f] << 5; pos += code; value = (re_case_ignorable_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Lowercased. */ static RE_UINT8 re_changes_when_lowercased_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_lowercased_stage_2[] = { 0, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 8, 9, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_changes_when_lowercased_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 16, 6, 6, 6, 6, 17, 6, 6, 6, 6, 6, 6, 6, 18, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_lowercased_stage_4[] = { 0, 0, 1, 0, 0, 0, 2, 0, 3, 4, 5, 6, 7, 8, 9, 10, 3, 11, 12, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 0, 3, 20, 3, 21, 3, 3, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 24, 0, 0, 0, 0, 0, 0, 18, 18, 25, 3, 3, 3, 3, 26, 3, 3, 3, 27, 28, 29, 30, 28, 31, 32, 33, 0, 34, 0, 19, 35, 0, 0, 0, 0, 0, 0, 0, 0, 36, 19, 0, 18, 37, 0, 38, 3, 3, 3, 39, 0, 0, 3, 40, 41, 0, 0, 0, 0, 42, 3, 43, 44, 45, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 18, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 47, 0, 0, 0, 0, 0, 0, 0, 18, 0, 0, }; static RE_UINT8 re_changes_when_lowercased_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 255, 255, 127, 127, 85, 85, 85, 85, 85, 85, 85, 170, 170, 84, 85, 85, 85, 85, 85, 43, 214, 206, 219, 177, 213, 210, 174, 17, 176, 173, 170, 74, 85, 85, 214, 85, 85, 85, 5, 108, 122, 85, 0, 0, 0, 0, 69, 128, 64, 215, 254, 255, 251, 15, 0, 0, 0, 128, 0, 85, 85, 85, 144, 230, 255, 255, 255, 255, 255, 255, 0, 0, 1, 84, 85, 85, 171, 42, 85, 85, 85, 85, 254, 255, 255, 255, 127, 0, 191, 32, 0, 0, 255, 255, 63, 0, 85, 85, 21, 64, 0, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 170, 0, 255, 0, 0, 0, 255, 0, 31, 0, 31, 0, 15, 0, 31, 0, 31, 64, 12, 4, 0, 8, 0, 0, 0, 0, 0, 192, 255, 255, 127, 0, 0, 157, 234, 37, 192, 5, 40, 4, 0, 85, 21, 0, 0, 85, 85, 85, 5, 84, 85, 84, 85, 85, 85, 0, 106, 85, 40, 69, 85, 85, 61, 95, 0, 255, 0, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Lowercased: 538 bytes. */ RE_UINT32 re_get_changes_when_lowercased(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_changes_when_lowercased_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_changes_when_lowercased_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_lowercased_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_lowercased_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_lowercased_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Uppercased. */ static RE_UINT8 re_changes_when_uppercased_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_changes_when_uppercased_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_uppercased_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 8, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 14, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 15, 16, 6, 6, 6, 17, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 18, 6, 6, 6, 19, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_uppercased_stage_4[] = { 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 5, 13, 14, 15, 16, 0, 0, 0, 0, 0, 17, 18, 19, 20, 21, 22, 0, 23, 24, 5, 25, 5, 26, 5, 5, 27, 0, 28, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 31, 0, 0, 0, 0, 5, 5, 5, 5, 32, 5, 5, 5, 33, 34, 35, 36, 24, 37, 38, 39, 0, 0, 40, 23, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 42, 0, 23, 43, 44, 5, 5, 5, 45, 24, 46, 0, 0, 0, 0, 0, 0, 0, 0, 5, 47, 48, 0, 0, 0, 0, 49, 5, 50, 51, 52, 0, 0, 0, 0, 53, 23, 24, 24, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 55, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 57, 0, 0, 0, 0, 0, 0, 24, 0, }; static RE_UINT8 re_changes_when_uppercased_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 32, 0, 0, 0, 0, 128, 255, 255, 127, 255, 170, 170, 170, 170, 170, 170, 170, 84, 85, 171, 170, 170, 170, 170, 170, 212, 41, 17, 36, 70, 42, 33, 81, 162, 96, 91, 85, 181, 170, 170, 45, 170, 168, 170, 10, 144, 133, 170, 223, 26, 107, 155, 38, 32, 137, 31, 4, 96, 32, 0, 0, 0, 0, 0, 138, 56, 0, 0, 1, 0, 0, 240, 255, 255, 255, 127, 227, 170, 170, 170, 47, 9, 0, 0, 255, 255, 255, 255, 255, 255, 2, 168, 170, 170, 84, 213, 170, 170, 170, 170, 0, 0, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 34, 170, 170, 234, 15, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 255, 0, 255, 0, 255, 63, 255, 255, 223, 80, 220, 16, 207, 0, 255, 0, 220, 16, 0, 64, 0, 0, 16, 0, 0, 0, 255, 3, 0, 0, 255, 255, 255, 127, 98, 21, 72, 0, 10, 80, 8, 0, 191, 32, 0, 0, 170, 42, 0, 0, 170, 170, 170, 10, 168, 170, 168, 170, 170, 170, 0, 148, 170, 16, 138, 170, 170, 2, 160, 0, 0, 0, 8, 0, 127, 0, 248, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Uppercased: 609 bytes. */ RE_UINT32 re_get_changes_when_uppercased(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_changes_when_uppercased_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_changes_when_uppercased_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_uppercased_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_uppercased_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_uppercased_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Titlecased. */ static RE_UINT8 re_changes_when_titlecased_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_changes_when_titlecased_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_titlecased_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 8, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 14, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 15, 16, 6, 6, 6, 17, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 18, 6, 6, 6, 19, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_titlecased_stage_4[] = { 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 5, 13, 14, 15, 16, 0, 0, 0, 0, 0, 17, 18, 19, 20, 21, 22, 0, 23, 24, 5, 25, 5, 26, 5, 5, 27, 0, 28, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 31, 0, 0, 0, 0, 5, 5, 5, 5, 32, 5, 5, 5, 33, 34, 35, 36, 34, 37, 38, 39, 0, 0, 40, 23, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 42, 0, 23, 43, 44, 5, 5, 5, 45, 24, 46, 0, 0, 0, 0, 0, 0, 0, 0, 5, 47, 48, 0, 0, 0, 0, 49, 5, 50, 51, 52, 0, 0, 0, 0, 53, 23, 24, 24, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 55, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 57, 0, 0, 0, 0, 0, 0, 24, 0, }; static RE_UINT8 re_changes_when_titlecased_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 32, 0, 0, 0, 0, 128, 255, 255, 127, 255, 170, 170, 170, 170, 170, 170, 170, 84, 85, 171, 170, 170, 170, 170, 170, 212, 41, 17, 36, 70, 42, 33, 81, 162, 208, 86, 85, 181, 170, 170, 43, 170, 168, 170, 10, 144, 133, 170, 223, 26, 107, 155, 38, 32, 137, 31, 4, 96, 32, 0, 0, 0, 0, 0, 138, 56, 0, 0, 1, 0, 0, 240, 255, 255, 255, 127, 227, 170, 170, 170, 47, 9, 0, 0, 255, 255, 255, 255, 255, 255, 2, 168, 170, 170, 84, 213, 170, 170, 170, 170, 0, 0, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 34, 170, 170, 234, 15, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 255, 0, 255, 0, 255, 63, 255, 0, 223, 64, 220, 0, 207, 0, 255, 0, 220, 0, 0, 64, 0, 0, 16, 0, 0, 0, 255, 3, 0, 0, 255, 255, 255, 127, 98, 21, 72, 0, 10, 80, 8, 0, 191, 32, 0, 0, 170, 42, 0, 0, 170, 170, 170, 10, 168, 170, 168, 170, 170, 170, 0, 148, 170, 16, 138, 170, 170, 2, 160, 0, 0, 0, 8, 0, 127, 0, 248, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Titlecased: 609 bytes. */ RE_UINT32 re_get_changes_when_titlecased(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_changes_when_titlecased_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_changes_when_titlecased_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_titlecased_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_titlecased_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_titlecased_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Casefolded. */ static RE_UINT8 re_changes_when_casefolded_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_changes_when_casefolded_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_casefolded_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 17, 6, 6, 6, 18, 6, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_casefolded_stage_4[] = { 0, 0, 1, 0, 0, 2, 3, 0, 4, 5, 6, 7, 8, 9, 10, 11, 4, 12, 13, 0, 0, 0, 0, 0, 0, 0, 14, 15, 16, 17, 18, 19, 20, 21, 0, 4, 22, 4, 23, 4, 4, 24, 25, 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 27, 0, 0, 0, 0, 0, 0, 0, 0, 28, 4, 4, 4, 4, 29, 4, 4, 4, 30, 31, 32, 33, 20, 34, 35, 36, 0, 37, 0, 21, 38, 0, 0, 0, 0, 0, 0, 0, 0, 39, 21, 0, 20, 40, 0, 41, 4, 4, 4, 42, 0, 0, 4, 43, 44, 0, 0, 0, 0, 45, 4, 46, 47, 48, 0, 0, 0, 0, 0, 49, 20, 20, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 20, 51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 52, 0, 0, 0, 0, 0, 0, 0, 20, 0, 0, }; static RE_UINT8 re_changes_when_casefolded_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 32, 0, 255, 255, 127, 255, 85, 85, 85, 85, 85, 85, 85, 170, 170, 86, 85, 85, 85, 85, 85, 171, 214, 206, 219, 177, 213, 210, 174, 17, 176, 173, 170, 74, 85, 85, 214, 85, 85, 85, 5, 108, 122, 85, 0, 0, 32, 0, 0, 0, 0, 0, 69, 128, 64, 215, 254, 255, 251, 15, 0, 0, 4, 128, 99, 85, 85, 85, 179, 230, 255, 255, 255, 255, 255, 255, 0, 0, 1, 84, 85, 85, 171, 42, 85, 85, 85, 85, 254, 255, 255, 255, 127, 0, 128, 0, 0, 0, 191, 32, 0, 0, 0, 0, 0, 63, 85, 85, 21, 76, 0, 255, 0, 63, 0, 255, 0, 255, 0, 63, 0, 170, 0, 255, 0, 0, 255, 255, 156, 31, 156, 31, 0, 15, 0, 31, 156, 31, 64, 12, 4, 0, 8, 0, 0, 0, 0, 0, 192, 255, 255, 127, 0, 0, 157, 234, 37, 192, 5, 40, 4, 0, 85, 21, 0, 0, 85, 85, 85, 5, 84, 85, 84, 85, 85, 85, 0, 106, 85, 40, 69, 85, 85, 61, 95, 0, 0, 0, 255, 255, 127, 0, 248, 0, 255, 0, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Casefolded: 581 bytes. */ RE_UINT32 re_get_changes_when_casefolded(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_changes_when_casefolded_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_changes_when_casefolded_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_casefolded_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_casefolded_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_casefolded_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Changes_When_Casemapped. */ static RE_UINT8 re_changes_when_casemapped_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_changes_when_casemapped_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 5, 6, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_changes_when_casemapped_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 16, 17, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 19, 6, 6, 6, 20, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 23, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_changes_when_casemapped_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 5, 4, 4, 6, 7, 8, 4, 4, 9, 10, 11, 12, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 4, 4, 4, 4, 19, 4, 4, 4, 4, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 24, 0, 0, 0, 0, 0, 0, 4, 4, 25, 0, 0, 0, 26, 0, 0, 0, 0, 4, 4, 4, 4, 27, 4, 4, 4, 25, 4, 28, 29, 4, 30, 31, 32, 0, 33, 34, 4, 35, 0, 0, 0, 0, 0, 0, 0, 0, 36, 4, 37, 4, 38, 39, 40, 4, 4, 4, 41, 4, 24, 0, 0, 0, 0, 0, 0, 0, 0, 4, 42, 43, 0, 0, 0, 0, 44, 4, 45, 46, 47, 0, 0, 0, 0, 48, 49, 4, 4, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 4, 4, 51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 52, 4, 52, 0, 0, 0, 0, 0, 4, 4, 0, }; static RE_UINT8 re_changes_when_casemapped_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 32, 0, 255, 255, 127, 255, 255, 255, 255, 255, 255, 255, 255, 254, 255, 223, 255, 247, 255, 243, 255, 179, 240, 255, 255, 255, 253, 255, 15, 252, 255, 255, 223, 26, 107, 155, 38, 32, 137, 31, 4, 96, 32, 0, 0, 0, 0, 0, 207, 184, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 227, 255, 255, 255, 191, 239, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 0, 254, 255, 255, 255, 255, 0, 0, 0, 191, 32, 0, 0, 255, 255, 63, 63, 0, 0, 0, 34, 255, 255, 255, 79, 63, 63, 255, 170, 255, 255, 255, 63, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 64, 12, 4, 0, 0, 64, 0, 0, 24, 0, 0, 0, 0, 0, 192, 255, 255, 3, 0, 0, 255, 127, 255, 255, 255, 255, 255, 127, 255, 255, 109, 192, 15, 120, 12, 0, 255, 63, 0, 0, 255, 255, 255, 15, 252, 255, 252, 255, 255, 255, 0, 254, 255, 56, 207, 255, 255, 63, 255, 0, 0, 0, 8, 0, 0, 0, 255, 255, 127, 0, 248, 0, 255, 255, 0, 0, 255, 255, 7, 0, }; /* Changes_When_Casemapped: 597 bytes. */ RE_UINT32 re_get_changes_when_casemapped(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_changes_when_casemapped_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_changes_when_casemapped_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_changes_when_casemapped_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_changes_when_casemapped_stage_4[pos + f] << 5; pos += code; value = (re_changes_when_casemapped_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* ID_Start. */ static RE_UINT8 re_id_start_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_id_start_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 27, 7, 28, 29, 7, 30, 13, 13, 13, 13, 13, 31, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_id_start_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 31, 31, 34, 35, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 38, 1, 39, 40, 41, 42, 43, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 46, 47, 1, 48, 49, 50, 51, 52, 53, 54, 55, 56, 1, 57, 58, 59, 60, 61, 62, 31, 31, 31, 63, 64, 65, 66, 67, 68, 69, 70, 71, 31, 72, 31, 31, 31, 31, 31, 1, 1, 1, 73, 74, 75, 31, 31, 1, 1, 1, 1, 76, 31, 31, 31, 31, 31, 31, 31, 1, 1, 77, 31, 1, 1, 78, 79, 31, 31, 31, 80, 81, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 82, 31, 31, 31, 31, 31, 31, 31, 83, 84, 85, 86, 87, 31, 31, 31, 31, 31, 88, 31, 1, 1, 1, 1, 1, 1, 89, 1, 1, 1, 1, 1, 1, 1, 1, 90, 91, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 92, 31, 1, 1, 93, 31, 31, 31, 31, 31, }; static RE_UINT8 re_id_start_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 0, 0, 0, 7, 8, 9, 4, 10, 4, 4, 4, 4, 11, 4, 4, 4, 4, 12, 13, 14, 15, 0, 16, 17, 0, 4, 18, 19, 4, 4, 20, 21, 22, 23, 24, 4, 4, 25, 26, 27, 28, 29, 30, 0, 0, 31, 0, 0, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 45, 49, 50, 51, 52, 46, 0, 53, 54, 55, 56, 53, 57, 58, 59, 53, 60, 61, 62, 63, 64, 65, 0, 14, 66, 65, 0, 67, 68, 69, 0, 70, 0, 71, 72, 73, 0, 0, 0, 4, 74, 75, 76, 77, 4, 78, 79, 4, 4, 80, 4, 81, 82, 83, 4, 84, 4, 85, 0, 23, 4, 4, 86, 14, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 87, 1, 4, 4, 88, 89, 90, 90, 91, 4, 92, 93, 0, 0, 4, 4, 94, 4, 95, 4, 96, 97, 0, 16, 98, 4, 99, 100, 0, 101, 4, 31, 0, 0, 102, 0, 0, 103, 92, 104, 0, 105, 106, 4, 107, 4, 108, 109, 110, 0, 0, 0, 111, 4, 4, 4, 4, 4, 4, 0, 0, 86, 4, 112, 110, 4, 113, 114, 115, 0, 0, 0, 116, 117, 0, 0, 0, 118, 119, 120, 4, 121, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 122, 97, 4, 4, 4, 4, 123, 4, 78, 4, 124, 101, 125, 125, 0, 126, 127, 14, 4, 128, 14, 4, 79, 103, 129, 4, 4, 130, 85, 0, 16, 4, 4, 4, 4, 4, 96, 0, 0, 4, 4, 4, 4, 4, 4, 96, 0, 4, 4, 4, 4, 72, 0, 16, 110, 131, 132, 4, 133, 110, 4, 4, 23, 134, 135, 4, 4, 136, 137, 0, 134, 138, 139, 4, 92, 135, 92, 0, 140, 26, 141, 65, 142, 32, 143, 144, 145, 4, 121, 146, 147, 4, 148, 149, 150, 151, 152, 79, 141, 4, 4, 4, 139, 4, 4, 4, 4, 4, 153, 154, 155, 4, 4, 4, 156, 4, 4, 157, 0, 158, 159, 160, 4, 4, 90, 161, 4, 4, 110, 16, 4, 162, 4, 15, 163, 0, 0, 0, 164, 4, 4, 4, 142, 0, 1, 1, 165, 4, 97, 166, 0, 167, 168, 169, 0, 4, 4, 4, 85, 0, 0, 4, 31, 0, 0, 0, 0, 0, 0, 0, 0, 142, 4, 170, 0, 4, 16, 171, 96, 110, 4, 172, 0, 4, 4, 4, 4, 110, 0, 0, 0, 4, 173, 4, 108, 0, 0, 0, 0, 4, 101, 96, 15, 0, 0, 0, 0, 174, 175, 96, 101, 97, 0, 0, 176, 96, 157, 0, 0, 4, 177, 0, 0, 178, 92, 0, 142, 142, 0, 71, 179, 4, 96, 96, 143, 90, 0, 0, 0, 4, 4, 121, 0, 4, 143, 4, 143, 105, 94, 0, 0, 105, 23, 16, 121, 105, 65, 16, 180, 105, 143, 181, 0, 182, 183, 0, 0, 184, 185, 97, 0, 48, 45, 186, 56, 0, 0, 0, 0, 0, 0, 0, 0, 4, 23, 187, 0, 0, 0, 0, 0, 4, 130, 188, 0, 4, 23, 189, 0, 4, 18, 0, 0, 157, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 190, 0, 0, 0, 0, 0, 0, 4, 30, 4, 4, 4, 4, 157, 0, 0, 0, 4, 4, 4, 130, 4, 4, 4, 4, 4, 4, 108, 0, 0, 0, 0, 0, 4, 130, 0, 0, 0, 0, 0, 0, 4, 4, 65, 0, 0, 0, 0, 0, 4, 30, 97, 0, 0, 0, 16, 191, 4, 23, 108, 192, 23, 0, 0, 0, 4, 4, 193, 0, 161, 0, 0, 0, 56, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 194, 195, 0, 0, 0, 4, 4, 196, 4, 197, 198, 199, 4, 200, 201, 202, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 203, 204, 79, 196, 196, 122, 122, 205, 205, 146, 0, 4, 4, 4, 4, 4, 4, 179, 0, 199, 206, 207, 208, 209, 210, 0, 0, 4, 4, 4, 4, 4, 4, 101, 0, 4, 31, 4, 4, 4, 4, 4, 4, 110, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 56, 0, 0, 110, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_id_start_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 0, 0, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 255, 255, 7, 7, 0, 255, 7, 0, 0, 0, 192, 254, 255, 255, 255, 47, 0, 96, 192, 0, 156, 0, 0, 253, 255, 255, 255, 0, 0, 0, 224, 255, 255, 63, 0, 2, 0, 0, 252, 255, 255, 255, 7, 48, 4, 255, 255, 63, 4, 16, 1, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 240, 255, 255, 255, 255, 255, 255, 35, 0, 0, 1, 255, 3, 0, 254, 255, 225, 159, 249, 255, 255, 253, 197, 35, 0, 64, 0, 176, 3, 0, 3, 0, 224, 135, 249, 255, 255, 253, 109, 3, 0, 0, 0, 94, 0, 0, 28, 0, 224, 191, 251, 255, 255, 253, 237, 35, 0, 0, 1, 0, 3, 0, 0, 2, 224, 159, 249, 255, 0, 0, 0, 176, 3, 0, 2, 0, 232, 199, 61, 214, 24, 199, 255, 3, 224, 223, 253, 255, 255, 253, 255, 35, 0, 0, 0, 7, 3, 0, 0, 0, 255, 253, 239, 35, 0, 0, 0, 64, 3, 0, 6, 0, 255, 255, 255, 39, 0, 64, 0, 128, 3, 0, 0, 252, 224, 255, 127, 252, 255, 255, 251, 47, 127, 0, 0, 0, 255, 255, 13, 0, 150, 37, 240, 254, 174, 236, 13, 32, 95, 0, 0, 240, 1, 0, 0, 0, 255, 254, 255, 255, 255, 31, 0, 0, 0, 31, 0, 0, 255, 7, 0, 128, 0, 0, 63, 60, 98, 192, 225, 255, 3, 64, 0, 0, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 7, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 3, 0, 255, 255, 3, 0, 255, 223, 1, 0, 255, 255, 15, 0, 0, 0, 128, 16, 255, 255, 255, 0, 255, 5, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 0, 0, 255, 255, 127, 0, 128, 0, 0, 0, 224, 255, 255, 255, 224, 15, 0, 0, 248, 255, 255, 255, 1, 192, 0, 252, 63, 0, 0, 0, 15, 0, 0, 0, 0, 224, 0, 252, 255, 255, 255, 63, 0, 222, 99, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 63, 80, 253, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 127, 127, 127, 127, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 248, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 0, 12, 0, 0, 255, 127, 0, 128, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 7, 0, 0, 0, 0, 0, 252, 40, 63, 0, 255, 255, 255, 255, 255, 31, 255, 255, 7, 0, 0, 128, 0, 0, 223, 255, 0, 124, 247, 15, 0, 0, 255, 255, 127, 196, 255, 255, 98, 62, 5, 0, 0, 56, 255, 7, 28, 0, 126, 126, 126, 0, 127, 127, 255, 255, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 15, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 160, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 0, 0, 223, 255, 192, 255, 255, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 255, 7, 255, 255, 15, 255, 62, 0, 255, 0, 255, 255, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 1, 0, 239, 254, 31, 0, 0, 0, 255, 255, 71, 0, 30, 0, 0, 20, 255, 255, 251, 255, 255, 15, 0, 0, 127, 189, 255, 191, 255, 1, 255, 255, 0, 0, 1, 224, 176, 0, 0, 0, 0, 0, 0, 15, 16, 0, 0, 0, 0, 0, 0, 128, 255, 63, 0, 0, 248, 255, 255, 224, 31, 0, 1, 0, 255, 7, 255, 31, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* ID_Start: 1997 bytes. */ RE_UINT32 re_get_id_start(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_id_start_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_id_start_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_id_start_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_id_start_stage_4[pos + f] << 5; pos += code; value = (re_id_start_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* ID_Continue. */ static RE_UINT8 re_id_continue_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, }; static RE_UINT8 re_id_continue_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 33, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_id_continue_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 31, 31, 34, 35, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 38, 1, 39, 40, 41, 42, 43, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 46, 47, 1, 48, 49, 50, 51, 52, 53, 54, 55, 56, 1, 57, 58, 59, 60, 61, 62, 31, 31, 31, 63, 64, 65, 66, 67, 68, 69, 70, 71, 31, 72, 31, 31, 31, 31, 31, 1, 1, 1, 73, 74, 75, 31, 31, 1, 1, 1, 1, 76, 31, 31, 31, 31, 31, 31, 31, 1, 1, 77, 31, 1, 1, 78, 79, 31, 31, 31, 80, 81, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 82, 31, 31, 31, 31, 83, 84, 31, 85, 86, 87, 88, 31, 31, 89, 31, 31, 31, 31, 31, 90, 31, 31, 31, 31, 31, 91, 31, 1, 1, 1, 1, 1, 1, 92, 1, 1, 1, 1, 1, 1, 1, 1, 93, 94, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 95, 31, 1, 1, 96, 31, 31, 31, 31, 31, 31, 97, 31, 31, 31, 31, 31, 31, }; static RE_UINT8 re_id_continue_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 6, 6, 13, 6, 6, 6, 6, 14, 15, 16, 17, 18, 19, 20, 21, 6, 6, 22, 6, 6, 23, 24, 25, 6, 26, 6, 6, 27, 6, 28, 6, 29, 30, 0, 0, 31, 0, 32, 6, 6, 6, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 43, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 57, 61, 62, 63, 64, 65, 66, 67, 16, 68, 69, 0, 70, 71, 72, 0, 73, 74, 75, 76, 77, 78, 79, 0, 6, 6, 80, 6, 81, 6, 82, 83, 6, 6, 84, 6, 85, 86, 87, 6, 88, 6, 61, 89, 90, 6, 6, 91, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 92, 3, 6, 6, 93, 94, 31, 95, 96, 6, 6, 97, 98, 99, 6, 6, 100, 6, 101, 6, 102, 103, 104, 105, 106, 6, 107, 108, 0, 30, 6, 103, 109, 110, 111, 0, 0, 6, 6, 112, 113, 6, 6, 6, 95, 6, 100, 114, 81, 0, 0, 115, 116, 6, 6, 6, 6, 6, 6, 6, 117, 91, 6, 118, 81, 6, 119, 120, 121, 0, 122, 123, 124, 125, 0, 125, 126, 127, 128, 129, 6, 130, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 131, 103, 6, 6, 6, 6, 132, 6, 82, 6, 133, 134, 135, 135, 6, 136, 137, 16, 6, 138, 16, 6, 83, 139, 140, 6, 6, 141, 68, 0, 25, 6, 6, 6, 6, 6, 102, 0, 0, 6, 6, 6, 6, 6, 6, 102, 0, 6, 6, 6, 6, 142, 0, 25, 81, 143, 144, 6, 145, 6, 6, 6, 27, 146, 147, 6, 6, 148, 149, 0, 146, 6, 150, 6, 95, 6, 6, 151, 152, 6, 153, 95, 78, 6, 6, 154, 103, 6, 134, 155, 156, 6, 6, 157, 158, 159, 160, 83, 161, 6, 6, 6, 162, 6, 6, 6, 6, 6, 163, 164, 30, 6, 6, 6, 153, 6, 6, 165, 0, 166, 167, 168, 6, 6, 27, 169, 6, 6, 81, 25, 6, 170, 6, 150, 171, 90, 172, 173, 174, 6, 6, 6, 78, 1, 2, 3, 105, 6, 103, 175, 0, 176, 177, 178, 0, 6, 6, 6, 68, 0, 0, 6, 31, 0, 0, 0, 179, 0, 0, 0, 0, 78, 6, 180, 181, 6, 25, 101, 68, 81, 6, 182, 0, 6, 6, 6, 6, 81, 98, 0, 0, 6, 183, 6, 184, 0, 0, 0, 0, 6, 134, 102, 150, 0, 0, 0, 0, 185, 186, 102, 134, 103, 0, 0, 187, 102, 165, 0, 0, 6, 188, 0, 0, 189, 190, 0, 78, 78, 0, 75, 191, 6, 102, 102, 192, 27, 0, 0, 0, 6, 6, 130, 0, 6, 192, 6, 192, 6, 6, 191, 193, 6, 68, 25, 194, 6, 195, 25, 196, 6, 6, 197, 0, 198, 100, 0, 0, 199, 200, 6, 201, 34, 43, 202, 203, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 204, 0, 0, 0, 0, 0, 6, 205, 206, 0, 6, 6, 207, 0, 6, 100, 98, 0, 208, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 209, 0, 0, 0, 0, 0, 0, 6, 210, 6, 6, 6, 6, 165, 0, 0, 0, 6, 6, 6, 141, 6, 6, 6, 6, 6, 6, 184, 0, 0, 0, 0, 0, 6, 141, 0, 0, 0, 0, 0, 0, 6, 6, 191, 0, 0, 0, 0, 0, 6, 210, 103, 98, 0, 0, 25, 106, 6, 134, 211, 212, 90, 0, 0, 0, 6, 6, 213, 103, 214, 0, 0, 0, 215, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 216, 217, 0, 0, 0, 0, 0, 0, 218, 219, 220, 0, 0, 0, 0, 221, 0, 0, 0, 0, 0, 6, 6, 195, 6, 222, 223, 224, 6, 225, 226, 227, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 228, 229, 83, 195, 195, 131, 131, 230, 230, 231, 6, 6, 232, 6, 233, 234, 235, 0, 0, 6, 6, 6, 6, 6, 6, 236, 0, 224, 237, 238, 239, 240, 241, 0, 0, 6, 6, 6, 6, 6, 6, 134, 0, 6, 31, 6, 6, 6, 6, 6, 6, 81, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 215, 0, 0, 81, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 90, }; static RE_UINT8 re_id_continue_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 135, 254, 255, 255, 7, 0, 4, 160, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 255, 255, 223, 188, 192, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 251, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 254, 255, 255, 255, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 195, 255, 255, 255, 255, 239, 159, 255, 253, 255, 159, 0, 0, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 63, 4, 255, 63, 0, 0, 255, 255, 255, 15, 255, 255, 31, 0, 248, 255, 255, 255, 207, 255, 254, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 3, 0, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 0, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 0, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 0, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 12, 0, 255, 255, 255, 7, 255, 127, 255, 3, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 1, 0, 0, 3, 255, 3, 160, 194, 255, 254, 255, 255, 255, 31, 254, 255, 223, 255, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 255, 3, 255, 255, 255, 255, 255, 63, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 0, 254, 3, 0, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 31, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 143, 48, 255, 3, 0, 0, 0, 56, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 15, 192, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 7, 255, 255, 255, 159, 255, 3, 255, 3, 128, 0, 255, 63, 255, 15, 255, 3, 0, 248, 15, 0, 255, 227, 255, 255, 0, 0, 247, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 0, 128, 1, 0, 16, 0, 0, 0, 2, 128, 0, 0, 255, 31, 226, 255, 1, 0, 132, 252, 47, 63, 80, 253, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 255, 127, 255, 255, 31, 248, 15, 0, 255, 128, 0, 128, 255, 255, 127, 0, 127, 127, 127, 127, 224, 0, 0, 0, 254, 255, 62, 31, 255, 255, 127, 254, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 0, 0, 255, 31, 255, 255, 255, 15, 0, 0, 255, 255, 240, 191, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 255, 0, 0, 0, 31, 0, 255, 3, 255, 255, 255, 40, 255, 63, 255, 255, 1, 128, 255, 3, 255, 63, 255, 3, 255, 255, 127, 252, 7, 0, 0, 56, 255, 255, 124, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 55, 255, 3, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 255, 255, 24, 0, 0, 224, 0, 0, 0, 0, 223, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 0, 0, 0, 32, 255, 255, 1, 0, 1, 0, 0, 0, 15, 255, 62, 0, 255, 0, 255, 255, 15, 0, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 255, 255, 15, 135, 127, 0, 0, 0, 255, 255, 7, 0, 192, 255, 0, 128, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 79, 0, 31, 28, 255, 23, 255, 255, 251, 255, 127, 189, 255, 191, 255, 1, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 191, 0, 255, 3, 255, 255, 63, 255, 1, 0, 0, 63, 17, 0, 255, 3, 255, 255, 255, 227, 255, 3, 0, 128, 255, 255, 255, 1, 15, 0, 255, 3, 248, 255, 255, 224, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 99, 224, 227, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 207, 255, 255, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, 31, 0, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* ID_Continue: 2186 bytes. */ RE_UINT32 re_get_id_continue(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_id_continue_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_id_continue_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_id_continue_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_id_continue_stage_4[pos + f] << 5; pos += code; value = (re_id_continue_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* XID_Start. */ static RE_UINT8 re_xid_start_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_xid_start_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 27, 7, 28, 29, 7, 30, 13, 13, 13, 13, 13, 31, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_xid_start_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 31, 31, 34, 35, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 38, 1, 39, 40, 41, 42, 43, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 1, 58, 59, 60, 61, 62, 63, 31, 31, 31, 64, 65, 66, 67, 68, 69, 70, 71, 72, 31, 73, 31, 31, 31, 31, 31, 1, 1, 1, 74, 75, 76, 31, 31, 1, 1, 1, 1, 77, 31, 31, 31, 31, 31, 31, 31, 1, 1, 78, 31, 1, 1, 79, 80, 31, 31, 31, 81, 82, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 83, 31, 31, 31, 31, 31, 31, 31, 84, 85, 86, 87, 88, 31, 31, 31, 31, 31, 89, 31, 1, 1, 1, 1, 1, 1, 90, 1, 1, 1, 1, 1, 1, 1, 1, 91, 92, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 93, 31, 1, 1, 94, 31, 31, 31, 31, 31, }; static RE_UINT8 re_xid_start_stage_4[] = { 0, 0, 1, 1, 0, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 0, 0, 0, 7, 8, 9, 4, 10, 4, 4, 4, 4, 11, 4, 4, 4, 4, 12, 13, 14, 15, 0, 16, 17, 0, 4, 18, 19, 4, 4, 20, 21, 22, 23, 24, 4, 4, 25, 26, 27, 28, 29, 30, 0, 0, 31, 0, 0, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 45, 49, 50, 51, 52, 46, 0, 53, 54, 55, 56, 53, 57, 58, 59, 53, 60, 61, 62, 63, 64, 65, 0, 14, 66, 65, 0, 67, 68, 69, 0, 70, 0, 71, 72, 73, 0, 0, 0, 4, 74, 75, 76, 77, 4, 78, 79, 4, 4, 80, 4, 81, 82, 83, 4, 84, 4, 85, 0, 23, 4, 4, 86, 14, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 87, 1, 4, 4, 88, 89, 90, 90, 91, 4, 92, 93, 0, 0, 4, 4, 94, 4, 95, 4, 96, 97, 0, 16, 98, 4, 99, 100, 0, 101, 4, 31, 0, 0, 102, 0, 0, 103, 92, 104, 0, 105, 106, 4, 107, 4, 108, 109, 110, 0, 0, 0, 111, 4, 4, 4, 4, 4, 4, 0, 0, 86, 4, 112, 110, 4, 113, 114, 115, 0, 0, 0, 116, 117, 0, 0, 0, 118, 119, 120, 4, 121, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 122, 97, 4, 4, 4, 4, 123, 4, 78, 4, 124, 101, 125, 125, 0, 126, 127, 14, 4, 128, 14, 4, 79, 103, 129, 4, 4, 130, 85, 0, 16, 4, 4, 4, 4, 4, 96, 0, 0, 4, 4, 4, 4, 4, 4, 96, 0, 4, 4, 4, 4, 72, 0, 16, 110, 131, 132, 4, 133, 110, 4, 4, 23, 134, 135, 4, 4, 136, 137, 0, 134, 138, 139, 4, 92, 135, 92, 0, 140, 26, 141, 65, 142, 32, 143, 144, 145, 4, 121, 146, 147, 4, 148, 149, 150, 151, 152, 79, 141, 4, 4, 4, 139, 4, 4, 4, 4, 4, 153, 154, 155, 4, 4, 4, 156, 4, 4, 157, 0, 158, 159, 160, 4, 4, 90, 161, 4, 4, 4, 110, 32, 4, 4, 4, 4, 4, 110, 16, 4, 162, 4, 15, 163, 0, 0, 0, 164, 4, 4, 4, 142, 0, 1, 1, 165, 110, 97, 166, 0, 167, 168, 169, 0, 4, 4, 4, 85, 0, 0, 4, 31, 0, 0, 0, 0, 0, 0, 0, 0, 142, 4, 170, 0, 4, 16, 171, 96, 110, 4, 172, 0, 4, 4, 4, 4, 110, 0, 0, 0, 4, 173, 4, 108, 0, 0, 0, 0, 4, 101, 96, 15, 0, 0, 0, 0, 174, 175, 96, 101, 97, 0, 0, 176, 96, 157, 0, 0, 4, 177, 0, 0, 178, 92, 0, 142, 142, 0, 71, 179, 4, 96, 96, 143, 90, 0, 0, 0, 4, 4, 121, 0, 4, 143, 4, 143, 105, 94, 0, 0, 105, 23, 16, 121, 105, 65, 16, 180, 105, 143, 181, 0, 182, 183, 0, 0, 184, 185, 97, 0, 48, 45, 186, 56, 0, 0, 0, 0, 0, 0, 0, 0, 4, 23, 187, 0, 0, 0, 0, 0, 4, 130, 188, 0, 4, 23, 189, 0, 4, 18, 0, 0, 157, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 190, 0, 0, 0, 0, 0, 0, 4, 30, 4, 4, 4, 4, 157, 0, 0, 0, 4, 4, 4, 130, 4, 4, 4, 4, 4, 4, 108, 0, 0, 0, 0, 0, 4, 130, 0, 0, 0, 0, 0, 0, 4, 4, 65, 0, 0, 0, 0, 0, 4, 30, 97, 0, 0, 0, 16, 191, 4, 23, 108, 192, 23, 0, 0, 0, 4, 4, 193, 0, 161, 0, 0, 0, 56, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 194, 195, 0, 0, 0, 4, 4, 196, 4, 197, 198, 199, 4, 200, 201, 202, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 203, 204, 79, 196, 196, 122, 122, 205, 205, 146, 0, 4, 4, 4, 4, 4, 4, 179, 0, 199, 206, 207, 208, 209, 210, 0, 0, 4, 4, 4, 4, 4, 4, 101, 0, 4, 31, 4, 4, 4, 4, 4, 4, 110, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 56, 0, 0, 110, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_xid_start_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 0, 0, 223, 184, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 255, 255, 7, 7, 0, 255, 7, 0, 0, 0, 192, 254, 255, 255, 255, 47, 0, 96, 192, 0, 156, 0, 0, 253, 255, 255, 255, 0, 0, 0, 224, 255, 255, 63, 0, 2, 0, 0, 252, 255, 255, 255, 7, 48, 4, 255, 255, 63, 4, 16, 1, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 240, 255, 255, 255, 255, 255, 255, 35, 0, 0, 1, 255, 3, 0, 254, 255, 225, 159, 249, 255, 255, 253, 197, 35, 0, 64, 0, 176, 3, 0, 3, 0, 224, 135, 249, 255, 255, 253, 109, 3, 0, 0, 0, 94, 0, 0, 28, 0, 224, 191, 251, 255, 255, 253, 237, 35, 0, 0, 1, 0, 3, 0, 0, 2, 224, 159, 249, 255, 0, 0, 0, 176, 3, 0, 2, 0, 232, 199, 61, 214, 24, 199, 255, 3, 224, 223, 253, 255, 255, 253, 255, 35, 0, 0, 0, 7, 3, 0, 0, 0, 255, 253, 239, 35, 0, 0, 0, 64, 3, 0, 6, 0, 255, 255, 255, 39, 0, 64, 0, 128, 3, 0, 0, 252, 224, 255, 127, 252, 255, 255, 251, 47, 127, 0, 0, 0, 255, 255, 5, 0, 150, 37, 240, 254, 174, 236, 5, 32, 95, 0, 0, 240, 1, 0, 0, 0, 255, 254, 255, 255, 255, 31, 0, 0, 0, 31, 0, 0, 255, 7, 0, 128, 0, 0, 63, 60, 98, 192, 225, 255, 3, 64, 0, 0, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 7, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 3, 0, 255, 255, 3, 0, 255, 223, 1, 0, 255, 255, 15, 0, 0, 0, 128, 16, 255, 255, 255, 0, 255, 5, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 0, 0, 255, 255, 127, 0, 128, 0, 0, 0, 224, 255, 255, 255, 224, 15, 0, 0, 248, 255, 255, 255, 1, 192, 0, 252, 63, 0, 0, 0, 15, 0, 0, 0, 0, 224, 0, 252, 255, 255, 255, 63, 0, 222, 99, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 63, 80, 253, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 127, 127, 127, 127, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 224, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 0, 12, 0, 0, 255, 127, 0, 128, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 7, 0, 0, 0, 0, 0, 252, 40, 63, 0, 255, 255, 255, 255, 255, 31, 255, 255, 7, 0, 0, 128, 0, 0, 223, 255, 0, 124, 247, 15, 0, 0, 255, 255, 127, 196, 255, 255, 98, 62, 5, 0, 0, 56, 255, 7, 28, 0, 126, 126, 126, 0, 127, 127, 255, 255, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 15, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 160, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 3, 0, 0, 138, 170, 192, 255, 255, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 255, 7, 255, 255, 15, 255, 62, 0, 255, 0, 255, 255, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 1, 0, 239, 254, 31, 0, 0, 0, 255, 255, 71, 0, 30, 0, 0, 20, 255, 255, 251, 255, 255, 15, 0, 0, 127, 189, 255, 191, 255, 1, 255, 255, 0, 0, 1, 224, 176, 0, 0, 0, 0, 0, 0, 15, 16, 0, 0, 0, 0, 0, 0, 128, 255, 63, 0, 0, 248, 255, 255, 224, 31, 0, 1, 0, 255, 7, 255, 31, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* XID_Start: 2005 bytes. */ RE_UINT32 re_get_xid_start(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_xid_start_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_xid_start_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_xid_start_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_xid_start_stage_4[pos + f] << 5; pos += code; value = (re_xid_start_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* XID_Continue. */ static RE_UINT8 re_xid_continue_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, }; static RE_UINT8 re_xid_continue_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 33, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_xid_continue_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 31, 31, 34, 35, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 38, 1, 39, 40, 41, 42, 43, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 1, 58, 59, 60, 61, 62, 63, 31, 31, 31, 64, 65, 66, 67, 68, 69, 70, 71, 72, 31, 73, 31, 31, 31, 31, 31, 1, 1, 1, 74, 75, 76, 31, 31, 1, 1, 1, 1, 77, 31, 31, 31, 31, 31, 31, 31, 1, 1, 78, 31, 1, 1, 79, 80, 31, 31, 31, 81, 82, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 83, 31, 31, 31, 31, 84, 85, 31, 86, 87, 88, 89, 31, 31, 90, 31, 31, 31, 31, 31, 91, 31, 31, 31, 31, 31, 92, 31, 1, 1, 1, 1, 1, 1, 93, 1, 1, 1, 1, 1, 1, 1, 1, 94, 95, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 96, 31, 1, 1, 97, 31, 31, 31, 31, 31, 31, 98, 31, 31, 31, 31, 31, 31, }; static RE_UINT8 re_xid_continue_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 6, 6, 13, 6, 6, 6, 6, 14, 15, 16, 17, 18, 19, 20, 21, 6, 6, 22, 6, 6, 23, 24, 25, 6, 26, 6, 6, 27, 6, 28, 6, 29, 30, 0, 0, 31, 0, 32, 6, 6, 6, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 43, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 57, 61, 62, 63, 64, 65, 66, 67, 16, 68, 69, 0, 70, 71, 72, 0, 73, 74, 75, 76, 77, 78, 79, 0, 6, 6, 80, 6, 81, 6, 82, 83, 6, 6, 84, 6, 85, 86, 87, 6, 88, 6, 61, 89, 90, 6, 6, 91, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 92, 3, 6, 6, 93, 94, 31, 95, 96, 6, 6, 97, 98, 99, 6, 6, 100, 6, 101, 6, 102, 103, 104, 105, 106, 6, 107, 108, 0, 30, 6, 103, 109, 110, 111, 0, 0, 6, 6, 112, 113, 6, 6, 6, 95, 6, 100, 114, 81, 0, 0, 115, 116, 6, 6, 6, 6, 6, 6, 6, 117, 91, 6, 118, 81, 6, 119, 120, 121, 0, 122, 123, 124, 125, 0, 125, 126, 127, 128, 129, 6, 130, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 131, 103, 6, 6, 6, 6, 132, 6, 82, 6, 133, 134, 135, 135, 6, 136, 137, 16, 6, 138, 16, 6, 83, 139, 140, 6, 6, 141, 68, 0, 25, 6, 6, 6, 6, 6, 102, 0, 0, 6, 6, 6, 6, 6, 6, 102, 0, 6, 6, 6, 6, 142, 0, 25, 81, 143, 144, 6, 145, 6, 6, 6, 27, 146, 147, 6, 6, 148, 149, 0, 146, 6, 150, 6, 95, 6, 6, 151, 152, 6, 153, 95, 78, 6, 6, 154, 103, 6, 134, 155, 156, 6, 6, 157, 158, 159, 160, 83, 161, 6, 6, 6, 162, 6, 6, 6, 6, 6, 163, 164, 30, 6, 6, 6, 153, 6, 6, 165, 0, 166, 167, 168, 6, 6, 27, 169, 6, 6, 6, 81, 170, 6, 6, 6, 6, 6, 81, 25, 6, 171, 6, 150, 1, 90, 172, 173, 174, 6, 6, 6, 78, 1, 2, 3, 105, 6, 103, 175, 0, 176, 177, 178, 0, 6, 6, 6, 68, 0, 0, 6, 31, 0, 0, 0, 179, 0, 0, 0, 0, 78, 6, 180, 181, 6, 25, 101, 68, 81, 6, 182, 0, 6, 6, 6, 6, 81, 98, 0, 0, 6, 183, 6, 184, 0, 0, 0, 0, 6, 134, 102, 150, 0, 0, 0, 0, 185, 186, 102, 134, 103, 0, 0, 187, 102, 165, 0, 0, 6, 188, 0, 0, 189, 190, 0, 78, 78, 0, 75, 191, 6, 102, 102, 192, 27, 0, 0, 0, 6, 6, 130, 0, 6, 192, 6, 192, 6, 6, 191, 193, 6, 68, 25, 194, 6, 195, 25, 196, 6, 6, 197, 0, 198, 100, 0, 0, 199, 200, 6, 201, 34, 43, 202, 203, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 204, 0, 0, 0, 0, 0, 6, 205, 206, 0, 6, 6, 207, 0, 6, 100, 98, 0, 208, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 209, 0, 0, 0, 0, 0, 0, 6, 210, 6, 6, 6, 6, 165, 0, 0, 0, 6, 6, 6, 141, 6, 6, 6, 6, 6, 6, 184, 0, 0, 0, 0, 0, 6, 141, 0, 0, 0, 0, 0, 0, 6, 6, 191, 0, 0, 0, 0, 0, 6, 210, 103, 98, 0, 0, 25, 106, 6, 134, 211, 212, 90, 0, 0, 0, 6, 6, 213, 103, 214, 0, 0, 0, 215, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 216, 217, 0, 0, 0, 0, 0, 0, 218, 219, 220, 0, 0, 0, 0, 221, 0, 0, 0, 0, 0, 6, 6, 195, 6, 222, 223, 224, 6, 225, 226, 227, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 228, 229, 83, 195, 195, 131, 131, 230, 230, 231, 6, 6, 232, 6, 233, 234, 235, 0, 0, 6, 6, 6, 6, 6, 6, 236, 0, 224, 237, 238, 239, 240, 241, 0, 0, 6, 6, 6, 6, 6, 6, 134, 0, 6, 31, 6, 6, 6, 6, 6, 6, 81, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 215, 0, 0, 81, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 90, }; static RE_UINT8 re_xid_continue_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 135, 254, 255, 255, 7, 0, 4, 160, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 255, 255, 223, 184, 192, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 251, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 254, 255, 255, 255, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 195, 255, 255, 255, 255, 239, 159, 255, 253, 255, 159, 0, 0, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 63, 4, 255, 63, 0, 0, 255, 255, 255, 15, 255, 255, 31, 0, 248, 255, 255, 255, 207, 255, 254, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 3, 0, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 0, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 0, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 0, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 12, 0, 255, 255, 255, 7, 255, 127, 255, 3, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 1, 0, 0, 3, 255, 3, 160, 194, 255, 254, 255, 255, 255, 31, 254, 255, 223, 255, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 255, 3, 255, 255, 255, 255, 255, 63, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 0, 254, 3, 0, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 31, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 143, 48, 255, 3, 0, 0, 0, 56, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 15, 192, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 7, 255, 255, 255, 159, 255, 3, 255, 3, 128, 0, 255, 63, 255, 15, 255, 3, 0, 248, 15, 0, 255, 227, 255, 255, 0, 0, 247, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 0, 128, 1, 0, 16, 0, 0, 0, 2, 128, 0, 0, 255, 31, 226, 255, 1, 0, 132, 252, 47, 63, 80, 253, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 255, 127, 255, 255, 31, 248, 15, 0, 255, 128, 0, 128, 255, 255, 127, 0, 127, 127, 127, 127, 224, 0, 0, 0, 254, 255, 62, 31, 255, 255, 127, 230, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 0, 0, 255, 31, 255, 255, 255, 15, 0, 0, 255, 255, 240, 191, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 255, 0, 0, 0, 31, 0, 255, 3, 255, 255, 255, 40, 255, 63, 255, 255, 1, 128, 255, 3, 255, 63, 255, 3, 255, 255, 127, 252, 7, 0, 0, 56, 255, 255, 124, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 55, 255, 3, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 240, 255, 255, 255, 255, 255, 252, 255, 255, 255, 24, 0, 0, 224, 0, 0, 0, 0, 138, 170, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 0, 0, 0, 32, 255, 255, 1, 0, 1, 0, 0, 0, 15, 255, 62, 0, 255, 0, 255, 255, 15, 0, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 255, 255, 15, 135, 127, 0, 0, 0, 255, 255, 7, 0, 192, 255, 0, 128, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 79, 0, 31, 28, 255, 23, 255, 255, 251, 255, 127, 189, 255, 191, 255, 1, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 191, 0, 255, 3, 255, 255, 63, 255, 1, 0, 0, 63, 17, 0, 255, 3, 255, 255, 255, 227, 255, 3, 0, 128, 255, 255, 255, 1, 15, 0, 255, 3, 248, 255, 255, 224, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 99, 224, 227, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 207, 255, 255, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, 31, 0, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* XID_Continue: 2194 bytes. */ RE_UINT32 re_get_xid_continue(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_xid_continue_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_xid_continue_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_xid_continue_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_xid_continue_stage_4[pos + f] << 5; pos += code; value = (re_xid_continue_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Default_Ignorable_Code_Point. */ static RE_UINT8 re_default_ignorable_code_point_stage_1[] = { 0, 1, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, }; static RE_UINT8 re_default_ignorable_code_point_stage_2[] = { 0, 1, 2, 3, 4, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 1, 1, 8, 1, 1, 1, 1, 1, 9, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_default_ignorable_code_point_stage_3[] = { 0, 1, 1, 2, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 5, 6, 1, 1, 1, 1, 1, 1, 1, 7, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 10, 1, 1, 1, 1, 11, 1, 1, 1, 1, 12, 1, 1, 1, 1, 1, 1, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_default_ignorable_code_point_stage_4[] = { 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 8, 9, 0, 10, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 5, 0, 12, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 15, 15, 15, 15, 15, 15, 15, 15, }; static RE_UINT8 re_default_ignorable_code_point_stage_5[] = { 0, 0, 0, 0, 0, 32, 0, 0, 0, 128, 0, 0, 0, 0, 0, 16, 0, 0, 0, 128, 1, 0, 0, 0, 0, 0, 48, 0, 0, 120, 0, 0, 0, 248, 0, 0, 0, 124, 0, 0, 255, 255, 0, 0, 16, 0, 0, 0, 0, 0, 255, 1, 15, 0, 0, 0, 0, 0, 248, 7, 255, 255, 255, 255, }; /* Default_Ignorable_Code_Point: 370 bytes. */ RE_UINT32 re_get_default_ignorable_code_point(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_default_ignorable_code_point_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_default_ignorable_code_point_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_default_ignorable_code_point_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_default_ignorable_code_point_stage_4[pos + f] << 5; pos += code; value = (re_default_ignorable_code_point_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Grapheme_Extend. */ static RE_UINT8 re_grapheme_extend_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, }; static RE_UINT8 re_grapheme_extend_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 7, 7, 7, 7, 7, 7, 7, 7, 7, 10, 11, 12, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 7, 7, 7, 7, 7, 7, 7, 7, 7, 15, 7, 7, 16, 17, 7, 18, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 19, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, }; static RE_UINT8 re_grapheme_extend_stage_3[] = { 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 0, 0, 15, 0, 0, 0, 16, 17, 18, 19, 20, 21, 22, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 25, 0, 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 28, 29, 30, 31, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 33, 34, 0, 35, 36, 37, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 0, 39, 40, 41, 42, 43, 44, 45, 46, 0, 0, 47, 48, 0, 0, 0, 49, 0, 0, 0, 0, 50, 0, 0, 0, 0, 51, 52, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_grapheme_extend_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 7, 0, 8, 9, 0, 0, 10, 11, 12, 13, 14, 0, 0, 15, 0, 16, 17, 18, 19, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 24, 28, 29, 30, 31, 28, 29, 32, 24, 25, 33, 34, 24, 35, 36, 37, 0, 38, 39, 40, 24, 25, 41, 42, 24, 25, 36, 27, 24, 0, 0, 43, 0, 0, 44, 45, 0, 0, 46, 47, 0, 48, 49, 0, 50, 51, 52, 53, 0, 0, 54, 55, 56, 57, 0, 0, 0, 0, 0, 58, 0, 0, 0, 0, 0, 59, 59, 60, 60, 0, 61, 62, 0, 63, 0, 0, 0, 0, 64, 0, 0, 0, 65, 0, 0, 0, 0, 0, 0, 66, 0, 67, 68, 0, 69, 0, 0, 70, 71, 35, 16, 72, 73, 0, 74, 0, 75, 0, 0, 0, 0, 76, 77, 0, 0, 0, 0, 0, 0, 1, 78, 79, 0, 0, 0, 0, 0, 13, 80, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 82, 0, 0, 0, 1, 0, 83, 0, 0, 84, 0, 0, 0, 0, 0, 0, 85, 39, 0, 0, 86, 87, 88, 0, 0, 0, 0, 89, 90, 0, 91, 92, 0, 21, 93, 0, 94, 0, 95, 96, 29, 0, 97, 25, 98, 0, 0, 0, 0, 0, 0, 0, 99, 36, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100, 0, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 101, 0, 0, 0, 0, 102, 103, 0, 0, 0, 0, 0, 88, 25, 104, 105, 82, 72, 106, 0, 0, 21, 107, 0, 108, 72, 109, 110, 0, 0, 111, 0, 0, 0, 0, 82, 112, 72, 26, 113, 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 115, 116, 0, 0, 0, 0, 0, 0, 117, 118, 0, 0, 119, 38, 0, 0, 120, 0, 0, 58, 121, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 122, 0, 123, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 124, 0, 0, 0, 0, 0, 0, 0, 125, 0, 0, 0, 0, 0, 0, 126, 127, 128, 0, 0, 0, 0, 129, 0, 0, 0, 0, 0, 1, 130, 1, 131, 132, 133, 0, 0, 0, 0, 0, 0, 0, 0, 123, 0, 1, 1, 1, 1, 1, 1, 1, 2, }; static RE_UINT8 re_grapheme_extend_stage_5[] = { 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 248, 3, 0, 0, 0, 0, 254, 255, 255, 255, 255, 191, 182, 0, 0, 0, 0, 0, 255, 7, 0, 248, 255, 255, 0, 0, 1, 0, 0, 0, 192, 159, 159, 61, 0, 0, 0, 0, 2, 0, 0, 0, 255, 255, 255, 7, 0, 0, 192, 255, 1, 0, 0, 248, 15, 0, 0, 0, 192, 251, 239, 62, 0, 0, 0, 0, 0, 14, 248, 255, 255, 255, 7, 0, 0, 0, 0, 0, 0, 20, 254, 33, 254, 0, 12, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 80, 30, 32, 128, 0, 6, 0, 0, 0, 0, 0, 0, 16, 134, 57, 2, 0, 0, 0, 35, 0, 190, 33, 0, 0, 0, 0, 0, 208, 30, 32, 192, 0, 4, 0, 0, 0, 0, 0, 0, 64, 1, 32, 128, 0, 1, 0, 0, 0, 0, 0, 0, 192, 193, 61, 96, 0, 0, 0, 0, 144, 68, 48, 96, 0, 0, 132, 92, 128, 0, 0, 242, 7, 128, 127, 0, 0, 0, 0, 242, 27, 0, 63, 0, 0, 0, 0, 0, 3, 0, 0, 160, 2, 0, 0, 254, 127, 223, 224, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 0, 224, 253, 102, 0, 0, 0, 195, 1, 0, 30, 0, 100, 32, 0, 32, 0, 0, 0, 224, 0, 0, 28, 0, 0, 0, 12, 0, 0, 0, 176, 63, 64, 254, 15, 32, 0, 56, 0, 0, 0, 2, 0, 0, 135, 1, 4, 14, 0, 0, 128, 9, 0, 0, 64, 127, 229, 31, 248, 159, 0, 0, 255, 127, 15, 0, 0, 0, 0, 0, 208, 23, 3, 0, 0, 0, 60, 59, 0, 0, 64, 163, 3, 0, 0, 240, 207, 0, 0, 0, 247, 255, 253, 33, 16, 3, 255, 255, 63, 240, 0, 48, 0, 0, 255, 255, 1, 0, 0, 128, 3, 0, 0, 0, 0, 128, 0, 252, 0, 0, 0, 0, 0, 6, 0, 128, 247, 63, 0, 0, 3, 0, 68, 8, 0, 0, 96, 0, 0, 0, 16, 0, 0, 0, 255, 255, 3, 0, 192, 63, 0, 0, 128, 255, 3, 0, 0, 0, 200, 19, 32, 0, 0, 0, 0, 126, 102, 0, 8, 16, 0, 0, 0, 0, 157, 193, 0, 48, 64, 0, 32, 33, 0, 0, 0, 0, 0, 32, 0, 0, 192, 7, 110, 240, 0, 0, 0, 0, 0, 135, 0, 0, 0, 255, 127, 0, 0, 0, 0, 0, 120, 6, 128, 239, 31, 0, 0, 0, 8, 0, 0, 0, 192, 127, 0, 28, 0, 0, 0, 128, 211, 0, 248, 7, 0, 0, 1, 0, 128, 0, 192, 31, 31, 0, 0, 0, 249, 165, 13, 0, 0, 0, 0, 128, 60, 176, 1, 0, 0, 48, 0, 0, 248, 167, 0, 40, 191, 0, 188, 15, 0, 0, 0, 0, 31, 0, 0, 0, 127, 0, 0, 128, 7, 0, 0, 0, 0, 96, 160, 195, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, }; /* Grapheme_Extend: 1274 bytes. */ RE_UINT32 re_get_grapheme_extend(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_grapheme_extend_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_grapheme_extend_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_grapheme_extend_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_grapheme_extend_stage_4[pos + f] << 5; pos += code; value = (re_grapheme_extend_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Grapheme_Base. */ static RE_UINT8 re_grapheme_base_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_grapheme_base_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 13, 13, 13, 13, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 13, 16, 17, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 19, 29, 30, 19, 19, 13, 31, 19, 19, 19, 32, 19, 19, 19, 19, 19, 19, 19, 19, 33, 34, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 35, 19, 19, 36, 19, 19, 19, 19, 37, 38, 39, 19, 19, 19, 40, 41, 42, 43, 44, 19, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 45, 13, 13, 13, 46, 47, 13, 13, 13, 13, 48, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 49, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, }; static RE_UINT8 re_grapheme_base_stage_3[] = { 0, 1, 2, 2, 2, 2, 3, 4, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 2, 2, 30, 31, 32, 33, 2, 2, 2, 2, 2, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 2, 47, 2, 2, 48, 49, 50, 51, 2, 52, 2, 2, 2, 53, 54, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 55, 56, 57, 58, 59, 60, 61, 62, 2, 63, 64, 65, 66, 67, 68, 69, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 70, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 71, 2, 72, 2, 2, 73, 74, 2, 75, 76, 77, 78, 79, 80, 81, 82, 83, 2, 2, 2, 2, 2, 2, 2, 84, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 2, 2, 86, 87, 88, 89, 2, 2, 90, 91, 92, 93, 94, 95, 96, 53, 97, 98, 85, 99, 100, 101, 2, 102, 103, 85, 2, 2, 104, 85, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 85, 85, 115, 85, 85, 85, 116, 117, 118, 119, 120, 121, 122, 85, 85, 123, 85, 124, 125, 126, 127, 85, 85, 128, 85, 85, 85, 129, 85, 85, 2, 2, 2, 2, 2, 2, 2, 130, 131, 2, 132, 85, 85, 85, 85, 85, 133, 85, 85, 85, 85, 85, 85, 85, 2, 2, 2, 2, 134, 85, 85, 85, 2, 2, 2, 2, 135, 136, 137, 138, 85, 85, 85, 85, 85, 85, 139, 140, 141, 85, 85, 85, 85, 85, 85, 85, 142, 143, 85, 85, 85, 85, 85, 85, 2, 144, 145, 146, 147, 85, 148, 85, 149, 150, 151, 2, 2, 152, 2, 153, 2, 2, 2, 2, 154, 155, 85, 85, 2, 156, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 157, 158, 85, 85, 159, 160, 161, 162, 163, 85, 2, 2, 2, 2, 164, 165, 2, 166, 167, 168, 169, 170, 171, 172, 85, 85, 85, 85, 2, 2, 2, 2, 2, 173, 2, 2, 2, 2, 2, 2, 2, 2, 174, 2, 175, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 176, 85, 85, 2, 2, 2, 2, 177, 85, 85, 85, }; static RE_UINT8 re_grapheme_base_stage_4[] = { 0, 0, 1, 1, 1, 1, 1, 2, 0, 0, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 4, 5, 1, 6, 1, 1, 1, 1, 1, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1, 9, 8, 1, 10, 0, 0, 11, 12, 1, 13, 14, 15, 16, 1, 1, 13, 0, 1, 8, 1, 1, 1, 1, 1, 17, 18, 1, 19, 20, 1, 0, 21, 1, 1, 1, 1, 1, 22, 23, 1, 1, 13, 24, 1, 25, 26, 2, 1, 27, 0, 0, 0, 0, 1, 14, 0, 0, 0, 0, 28, 1, 1, 29, 30, 31, 32, 1, 33, 34, 35, 36, 37, 38, 39, 40, 41, 34, 35, 42, 43, 44, 15, 45, 46, 6, 35, 47, 48, 43, 39, 49, 50, 34, 35, 51, 52, 38, 39, 53, 54, 55, 56, 57, 58, 43, 15, 13, 59, 20, 35, 60, 61, 62, 39, 63, 64, 20, 35, 65, 66, 11, 39, 67, 64, 20, 1, 68, 69, 70, 39, 71, 72, 73, 1, 74, 75, 76, 15, 45, 8, 1, 1, 77, 78, 40, 0, 0, 79, 80, 81, 82, 83, 84, 0, 0, 1, 4, 1, 85, 86, 1, 87, 70, 88, 0, 0, 89, 90, 13, 0, 0, 1, 1, 87, 91, 1, 92, 8, 93, 94, 3, 1, 1, 95, 1, 1, 1, 1, 1, 1, 1, 96, 97, 1, 1, 96, 1, 1, 98, 99, 100, 1, 1, 1, 99, 1, 1, 1, 13, 1, 87, 1, 101, 1, 1, 1, 1, 1, 102, 1, 87, 1, 1, 1, 1, 1, 103, 3, 104, 1, 105, 1, 104, 3, 43, 1, 1, 1, 106, 107, 108, 101, 101, 13, 101, 1, 1, 1, 1, 1, 53, 1, 1, 109, 1, 1, 1, 1, 22, 1, 2, 110, 111, 112, 1, 19, 14, 1, 1, 40, 1, 101, 113, 1, 1, 1, 114, 1, 1, 1, 115, 116, 117, 101, 101, 19, 0, 0, 0, 0, 0, 118, 1, 1, 119, 120, 1, 13, 108, 121, 1, 122, 1, 1, 1, 123, 124, 1, 1, 40, 125, 126, 1, 1, 1, 0, 0, 0, 0, 53, 127, 128, 129, 1, 1, 1, 1, 0, 0, 0, 0, 1, 102, 1, 1, 102, 130, 1, 19, 1, 1, 1, 131, 131, 132, 1, 133, 13, 1, 134, 1, 1, 1, 0, 32, 2, 87, 1, 2, 0, 0, 0, 0, 40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 13, 1, 1, 75, 0, 13, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 135, 1, 136, 1, 126, 35, 104, 137, 0, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 138, 1, 1, 95, 1, 1, 1, 134, 43, 1, 75, 139, 139, 139, 139, 0, 0, 1, 1, 1, 1, 117, 0, 0, 0, 1, 140, 1, 1, 1, 1, 1, 141, 1, 1, 1, 1, 1, 22, 0, 40, 1, 1, 101, 1, 8, 1, 1, 1, 1, 142, 1, 1, 1, 1, 1, 1, 143, 1, 19, 8, 1, 1, 1, 1, 2, 1, 1, 13, 1, 1, 141, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 22, 1, 1, 1, 1, 1, 1, 1, 1, 1, 22, 0, 0, 87, 1, 1, 1, 75, 1, 1, 1, 1, 1, 40, 0, 1, 1, 2, 144, 1, 19, 1, 1, 1, 1, 1, 145, 1, 1, 19, 53, 0, 0, 0, 146, 147, 1, 148, 101, 1, 1, 1, 53, 1, 1, 1, 1, 149, 101, 0, 150, 1, 1, 151, 1, 75, 152, 1, 87, 28, 1, 1, 153, 154, 155, 131, 2, 1, 1, 156, 157, 158, 84, 1, 159, 1, 1, 1, 160, 161, 162, 163, 22, 164, 165, 139, 1, 1, 1, 22, 1, 1, 1, 1, 1, 1, 1, 166, 101, 1, 1, 141, 1, 142, 1, 1, 40, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 19, 1, 1, 1, 1, 1, 1, 101, 0, 0, 75, 167, 1, 168, 169, 1, 1, 1, 1, 1, 1, 1, 104, 28, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 121, 1, 1, 53, 0, 0, 19, 0, 101, 0, 1, 1, 170, 171, 131, 1, 1, 1, 1, 1, 1, 1, 87, 8, 1, 1, 1, 1, 1, 1, 1, 1, 19, 1, 2, 172, 173, 139, 174, 159, 1, 100, 175, 19, 19, 0, 0, 176, 1, 1, 177, 1, 1, 1, 1, 87, 40, 43, 0, 0, 1, 1, 87, 1, 87, 1, 1, 1, 43, 8, 40, 1, 1, 141, 1, 13, 1, 1, 22, 1, 154, 1, 1, 178, 22, 0, 0, 1, 19, 101, 0, 0, 0, 0, 0, 1, 1, 53, 1, 1, 1, 179, 0, 1, 1, 1, 75, 1, 22, 53, 0, 180, 1, 1, 181, 1, 182, 1, 1, 1, 2, 146, 0, 0, 0, 1, 183, 1, 184, 1, 57, 0, 0, 0, 0, 1, 1, 1, 185, 1, 121, 1, 1, 43, 186, 1, 141, 53, 103, 1, 1, 1, 1, 0, 0, 1, 1, 187, 75, 1, 1, 1, 71, 1, 136, 1, 188, 1, 189, 190, 0, 0, 0, 0, 0, 1, 1, 1, 1, 103, 0, 0, 0, 1, 1, 1, 117, 1, 1, 1, 7, 0, 0, 0, 0, 0, 0, 1, 2, 20, 1, 1, 53, 191, 121, 1, 0, 121, 1, 1, 192, 104, 1, 103, 101, 28, 1, 193, 15, 141, 1, 1, 194, 121, 1, 1, 195, 60, 1, 8, 14, 1, 6, 2, 196, 0, 0, 0, 0, 197, 154, 101, 1, 1, 2, 117, 101, 50, 34, 35, 198, 199, 200, 141, 0, 1, 1, 1, 201, 202, 101, 0, 0, 1, 1, 2, 203, 8, 40, 0, 0, 1, 1, 1, 204, 61, 101, 0, 0, 1, 1, 205, 206, 101, 0, 0, 0, 1, 101, 207, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 208, 0, 0, 0, 0, 1, 1, 1, 103, 1, 101, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 14, 1, 1, 1, 1, 141, 0, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 75, 0, 0, 0, 1, 1, 1, 103, 1, 2, 155, 0, 0, 0, 0, 0, 0, 1, 19, 209, 1, 1, 1, 146, 22, 140, 6, 210, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 14, 1, 1, 2, 0, 28, 0, 0, 0, 0, 0, 0, 104, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 13, 87, 103, 211, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 22, 1, 1, 9, 1, 1, 1, 212, 0, 213, 1, 155, 1, 1, 1, 103, 0, 1, 1, 1, 1, 214, 0, 0, 0, 1, 1, 1, 1, 1, 75, 1, 104, 1, 1, 1, 1, 1, 131, 1, 1, 1, 3, 215, 29, 216, 1, 1, 1, 217, 218, 1, 219, 220, 20, 1, 1, 1, 1, 136, 1, 1, 1, 1, 1, 1, 1, 1, 1, 163, 1, 1, 1, 0, 0, 0, 221, 0, 0, 21, 131, 222, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 223, 0, 0, 0, 216, 1, 224, 225, 226, 227, 228, 229, 140, 40, 230, 40, 0, 0, 0, 104, 1, 1, 40, 1, 1, 1, 1, 1, 1, 141, 2, 8, 8, 8, 1, 22, 87, 1, 2, 1, 1, 1, 40, 1, 1, 13, 0, 0, 0, 0, 15, 1, 117, 1, 1, 13, 103, 104, 0, 0, 1, 1, 1, 1, 1, 1, 1, 140, 1, 1, 216, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 43, 87, 141, 1, 1, 1, 1, 1, 1, 1, 141, 1, 1, 1, 1, 1, 14, 0, 0, 40, 1, 1, 1, 53, 101, 1, 1, 53, 1, 19, 0, 0, 0, 0, 0, 0, 103, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 43, 0, 0, 0, 1, 1, 1, 1, 1, 75, 0, 0, 1, 1, 1, 14, 1, 1, 1, 1, 1, 19, 1, 1, 1, 1, 1, 1, 1, 1, 104, 0, 0, 0, 0, 0, 1, 19, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_grapheme_base_stage_5[] = { 0, 0, 255, 255, 255, 127, 255, 223, 255, 252, 240, 215, 251, 255, 7, 252, 254, 255, 127, 254, 255, 230, 0, 64, 73, 0, 255, 7, 31, 0, 192, 255, 0, 200, 63, 64, 96, 194, 255, 63, 253, 255, 0, 224, 63, 0, 2, 0, 240, 7, 63, 4, 16, 1, 255, 65, 248, 255, 255, 235, 1, 222, 1, 255, 243, 255, 237, 159, 249, 255, 255, 253, 197, 163, 129, 89, 0, 176, 195, 255, 255, 15, 232, 135, 109, 195, 1, 0, 0, 94, 28, 0, 232, 191, 237, 227, 1, 26, 3, 2, 236, 159, 237, 35, 129, 25, 255, 0, 232, 199, 61, 214, 24, 199, 255, 131, 198, 29, 238, 223, 255, 35, 30, 0, 0, 7, 0, 255, 236, 223, 239, 99, 155, 13, 6, 0, 255, 167, 193, 93, 0, 128, 63, 254, 236, 255, 127, 252, 251, 47, 127, 0, 3, 127, 13, 128, 127, 128, 150, 37, 240, 254, 174, 236, 13, 32, 95, 0, 255, 243, 95, 253, 255, 254, 255, 31, 32, 31, 0, 192, 191, 223, 2, 153, 255, 60, 225, 255, 155, 223, 191, 32, 255, 61, 127, 61, 61, 127, 61, 255, 127, 255, 255, 3, 63, 63, 255, 1, 3, 0, 99, 0, 79, 192, 191, 1, 240, 31, 255, 5, 120, 14, 251, 1, 241, 255, 255, 199, 127, 198, 191, 0, 26, 224, 7, 0, 240, 255, 47, 232, 251, 15, 252, 255, 195, 196, 191, 92, 12, 240, 48, 248, 255, 227, 8, 0, 2, 222, 111, 0, 255, 170, 223, 255, 207, 239, 220, 127, 255, 128, 207, 255, 63, 255, 0, 240, 12, 254, 127, 127, 255, 251, 15, 0, 127, 248, 224, 255, 8, 192, 252, 0, 128, 255, 187, 247, 159, 15, 15, 192, 252, 63, 63, 192, 12, 128, 55, 236, 255, 191, 255, 195, 255, 129, 25, 0, 247, 47, 255, 239, 98, 62, 5, 0, 0, 248, 255, 207, 126, 126, 126, 0, 223, 30, 248, 160, 127, 95, 219, 255, 247, 255, 127, 15, 252, 252, 252, 28, 0, 48, 255, 183, 135, 255, 143, 255, 15, 255, 15, 128, 63, 253, 191, 145, 191, 255, 55, 248, 255, 143, 255, 240, 239, 254, 31, 248, 7, 255, 3, 30, 0, 254, 128, 63, 135, 217, 127, 16, 119, 0, 63, 128, 44, 63, 127, 189, 237, 163, 158, 57, 1, 224, 6, 90, 242, 0, 3, 79, 7, 88, 255, 215, 64, 0, 67, 0, 7, 128, 32, 0, 255, 224, 255, 147, 95, 60, 24, 240, 35, 0, 100, 222, 239, 255, 191, 231, 223, 223, 255, 123, 95, 252, 128, 7, 239, 15, 159, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 238, 251, }; /* Grapheme_Base: 2544 bytes. */ RE_UINT32 re_get_grapheme_base(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_grapheme_base_stage_1[f] << 5; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_grapheme_base_stage_2[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_grapheme_base_stage_3[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_grapheme_base_stage_4[pos + f] << 4; pos += code; value = (re_grapheme_base_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Grapheme_Link. */ static RE_UINT8 re_grapheme_link_stage_1[] = { 0, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_grapheme_link_stage_2[] = { 0, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 8, 0, 9, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_grapheme_link_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 0, 0, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 0, 0, 0, 0, 8, 0, 9, 10, 0, 0, 11, 0, 0, 0, 0, 0, 12, 9, 13, 14, 0, 15, 0, 16, 0, 0, 0, 0, 17, 0, 0, 0, 18, 19, 20, 14, 21, 22, 1, 0, 0, 23, 0, 17, 17, 24, 25, 0, }; static RE_UINT8 re_grapheme_link_stage_4[] = { 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 0, 0, 5, 0, 0, 6, 6, 0, 0, 0, 0, 7, 0, 0, 0, 0, 8, 0, 0, 4, 0, 0, 9, 0, 10, 0, 0, 0, 11, 12, 0, 0, 0, 0, 0, 13, 0, 0, 0, 8, 0, 0, 0, 0, 14, 0, 0, 0, 1, 0, 11, 0, 0, 0, 0, 12, 11, 0, 15, 0, 0, 0, 16, 0, 0, 0, 17, 0, 0, 0, 0, 0, 2, 0, 0, 18, 0, 0, 14, 0, 0, 0, 19, 0, 0, }; static RE_UINT8 re_grapheme_link_stage_5[] = { 0, 0, 0, 0, 0, 32, 0, 0, 0, 4, 0, 0, 0, 0, 0, 4, 16, 0, 0, 0, 0, 0, 0, 6, 0, 0, 16, 0, 0, 0, 4, 0, 1, 0, 0, 0, 0, 12, 0, 0, 0, 0, 12, 0, 0, 0, 0, 128, 64, 0, 0, 0, 0, 0, 8, 0, 0, 0, 64, 0, 0, 0, 0, 2, 0, 0, 24, 0, 0, 0, 32, 0, 4, 0, 0, 0, 0, 8, 0, 0, }; /* Grapheme_Link: 404 bytes. */ RE_UINT32 re_get_grapheme_link(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 14; code = ch ^ (f << 14); pos = (RE_UINT32)re_grapheme_link_stage_1[f] << 4; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_grapheme_link_stage_2[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_grapheme_link_stage_3[pos + f] << 2; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_grapheme_link_stage_4[pos + f] << 5; pos += code; value = (re_grapheme_link_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* White_Space. */ static RE_UINT8 re_white_space_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_white_space_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_white_space_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_white_space_stage_4[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 4, 5, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_white_space_stage_5[] = { 0, 62, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 255, 7, 0, 0, 0, 131, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, }; /* White_Space: 169 bytes. */ RE_UINT32 re_get_white_space(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_white_space_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_white_space_stage_2[pos + f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_white_space_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_white_space_stage_4[pos + f] << 6; pos += code; value = (re_white_space_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Bidi_Control. */ static RE_UINT8 re_bidi_control_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_bidi_control_stage_2[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_bidi_control_stage_3[] = { 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_bidi_control_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_bidi_control_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 192, 0, 0, 0, 124, 0, 0, 0, 0, 0, 0, 192, 3, 0, 0, }; /* Bidi_Control: 129 bytes. */ RE_UINT32 re_get_bidi_control(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_bidi_control_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_bidi_control_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_bidi_control_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_bidi_control_stage_4[pos + f] << 6; pos += code; value = (re_bidi_control_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Join_Control. */ static RE_UINT8 re_join_control_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_join_control_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_join_control_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_join_control_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_join_control_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, }; /* Join_Control: 97 bytes. */ RE_UINT32 re_get_join_control(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_join_control_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_join_control_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_join_control_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_join_control_stage_4[pos + f] << 6; pos += code; value = (re_join_control_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Dash. */ static RE_UINT8 re_dash_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_dash_stage_2[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_dash_stage_3[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 4, 1, 1, 1, 5, 6, 1, 1, 1, 1, 1, 7, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, }; static RE_UINT8 re_dash_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 5, 6, 7, 1, 1, 1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 9, 3, 1, 1, 1, 1, 1, 1, 10, 1, 11, 1, 1, 1, 1, 1, 12, 13, 1, 1, 14, 1, 1, 1, }; static RE_UINT8 re_dash_stage_5[] = { 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 64, 1, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 128, 4, 0, 0, 0, 12, 0, 0, 0, 16, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 1, 8, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, }; /* Dash: 297 bytes. */ RE_UINT32 re_get_dash(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_dash_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_dash_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_dash_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_dash_stage_4[pos + f] << 6; pos += code; value = (re_dash_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Hyphen. */ static RE_UINT8 re_hyphen_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_hyphen_stage_2[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_hyphen_stage_3[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 5, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, }; static RE_UINT8 re_hyphen_stage_4[] = { 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 7, 1, 1, 8, 9, 1, 1, }; static RE_UINT8 re_hyphen_stage_5[] = { 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, }; /* Hyphen: 241 bytes. */ RE_UINT32 re_get_hyphen(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_hyphen_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_hyphen_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_hyphen_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_hyphen_stage_4[pos + f] << 6; pos += code; value = (re_hyphen_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Quotation_Mark. */ static RE_UINT8 re_quotation_mark_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_quotation_mark_stage_2[] = { 0, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_quotation_mark_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, }; static RE_UINT8 re_quotation_mark_stage_4[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 7, 8, 1, 1, }; static RE_UINT8 re_quotation_mark_stage_5[] = { 0, 0, 0, 0, 132, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 8, 0, 0, 0, 255, 0, 0, 0, 6, 4, 0, 0, 0, 0, 0, 0, 0, 0, 240, 0, 224, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 0, 132, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, }; /* Quotation_Mark: 209 bytes. */ RE_UINT32 re_get_quotation_mark(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_quotation_mark_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_quotation_mark_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_quotation_mark_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_quotation_mark_stage_4[pos + f] << 6; pos += code; value = (re_quotation_mark_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Terminal_Punctuation. */ static RE_UINT8 re_terminal_punctuation_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_terminal_punctuation_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 10, 11, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 12, 13, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 14, 15, 9, 16, 9, 17, 18, 9, 9, 9, 19, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 20, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 21, 9, 9, 9, 9, 9, 9, 22, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, }; static RE_UINT8 re_terminal_punctuation_stage_3[] = { 0, 1, 1, 1, 1, 1, 2, 3, 1, 1, 1, 4, 5, 6, 7, 8, 9, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 12, 1, 13, 1, 1, 1, 1, 1, 14, 1, 1, 1, 1, 1, 15, 16, 17, 18, 19, 1, 20, 1, 1, 21, 22, 1, 23, 1, 1, 1, 1, 1, 1, 1, 24, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 25, 1, 1, 1, 26, 1, 1, 1, 1, 1, 1, 1, 1, 27, 1, 1, 28, 29, 1, 1, 30, 31, 32, 33, 34, 35, 1, 36, 1, 1, 1, 1, 37, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 39, 40, 1, 41, 1, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 1, 1, 1, 1, 1, 52, 53, 1, 54, 1, 55, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 56, 57, 58, 1, 1, 41, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 59, 1, 1, }; static RE_UINT8 re_terminal_punctuation_stage_4[] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 4, 0, 5, 0, 6, 0, 0, 0, 0, 0, 7, 0, 8, 0, 0, 0, 0, 0, 0, 9, 0, 10, 2, 0, 0, 0, 0, 11, 0, 0, 12, 0, 13, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 15, 0, 0, 0, 16, 0, 0, 0, 17, 0, 18, 0, 0, 0, 0, 19, 0, 20, 0, 0, 0, 0, 0, 11, 0, 0, 21, 0, 0, 0, 0, 22, 0, 0, 23, 0, 24, 0, 25, 26, 0, 0, 27, 28, 0, 29, 0, 0, 0, 0, 0, 0, 24, 30, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 33, 0, 0, 34, 0, 0, 0, 0, 26, 0, 0, 0, 35, 0, 0, 0, 36, 37, 0, 0, 0, 38, 0, 0, 39, 0, 1, 0, 0, 40, 36, 0, 41, 0, 0, 0, 42, 0, 36, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 43, 0, 44, 0, 0, 45, 0, 0, 0, 0, 0, 46, 0, 0, 24, 47, 0, 0, 0, 48, 0, 0, 0, 49, 0, 0, 50, 0, 0, 0, 4, 0, 0, 0, 0, 51, 0, 0, 0, 29, 0, 0, 52, 0, 0, 0, 0, 0, 53, 0, 0, 0, 33, 0, 0, 0, 54, 0, 55, 56, 0, 57, 0, 0, 0, }; static RE_UINT8 re_terminal_punctuation_stage_5[] = { 0, 0, 0, 0, 2, 80, 0, 140, 0, 0, 0, 64, 128, 0, 0, 0, 0, 2, 0, 0, 8, 0, 0, 0, 0, 16, 0, 136, 0, 0, 16, 0, 255, 23, 0, 0, 0, 0, 0, 3, 0, 0, 255, 127, 48, 0, 0, 0, 0, 0, 0, 12, 0, 225, 7, 0, 0, 12, 0, 0, 254, 1, 0, 0, 0, 96, 0, 0, 0, 56, 0, 0, 0, 0, 96, 0, 0, 0, 112, 4, 60, 3, 0, 0, 0, 15, 0, 0, 0, 0, 0, 236, 0, 0, 0, 248, 0, 0, 0, 192, 0, 0, 0, 48, 128, 3, 0, 0, 0, 64, 0, 16, 2, 0, 0, 0, 6, 0, 0, 0, 0, 224, 0, 0, 0, 0, 248, 0, 0, 0, 192, 0, 0, 192, 0, 0, 0, 128, 0, 0, 0, 0, 0, 224, 0, 0, 0, 128, 0, 0, 3, 0, 0, 8, 0, 0, 0, 0, 247, 0, 18, 0, 0, 0, 0, 0, 1, 0, 0, 0, 128, 0, 0, 0, 63, 0, 0, 0, 0, 252, 0, 0, 0, 30, 128, 63, 0, 0, 3, 0, 0, 0, 14, 0, 0, 0, 96, 32, 0, 192, 0, 0, 0, 31, 60, 254, 255, 0, 0, 0, 0, 112, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 128, 3, 16, 0, 0, 0, 128, 7, 0, 0, }; /* Terminal_Punctuation: 850 bytes. */ RE_UINT32 re_get_terminal_punctuation(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_terminal_punctuation_stage_1[f] << 5; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_terminal_punctuation_stage_2[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_terminal_punctuation_stage_3[pos + f] << 2; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_terminal_punctuation_stage_4[pos + f] << 5; pos += code; value = (re_terminal_punctuation_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Math. */ static RE_UINT8 re_other_math_stage_1[] = { 0, 1, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_other_math_stage_2[] = { 0, 1, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 6, 1, 1, }; static RE_UINT8 re_other_math_stage_3[] = { 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 4, 1, 5, 1, 6, 7, 8, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 10, 11, 1, 1, 1, 1, 12, 13, 14, 15, 1, 1, 1, 1, 1, 1, 16, 1, }; static RE_UINT8 re_other_math_stage_4[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 4, 5, 6, 7, 8, 0, 9, 10, 11, 12, 13, 0, 14, 15, 16, 17, 18, 0, 0, 0, 0, 19, 20, 21, 0, 0, 0, 0, 0, 22, 23, 24, 25, 0, 26, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 28, 0, 0, 0, 0, 29, 0, 30, 31, 0, 0, 0, 32, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 34, 34, 35, 34, 36, 37, 38, 34, 39, 40, 41, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 42, 43, 44, 35, 35, 45, 45, 46, 46, 47, 34, 38, 48, 49, 50, 51, 52, 0, 0, }; static RE_UINT8 re_other_math_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 39, 0, 0, 0, 51, 0, 0, 0, 64, 0, 0, 0, 28, 0, 1, 0, 0, 0, 30, 0, 0, 96, 0, 96, 0, 0, 0, 0, 255, 31, 98, 248, 0, 0, 132, 252, 47, 62, 16, 179, 251, 241, 224, 3, 0, 0, 0, 0, 224, 243, 182, 62, 195, 240, 255, 63, 235, 47, 48, 0, 0, 0, 0, 15, 0, 0, 0, 0, 176, 0, 0, 0, 1, 0, 4, 0, 0, 0, 3, 192, 127, 240, 193, 140, 15, 0, 148, 31, 0, 0, 96, 0, 0, 0, 5, 0, 0, 0, 15, 96, 0, 0, 192, 255, 0, 0, 248, 255, 255, 1, 0, 0, 0, 15, 0, 0, 0, 48, 10, 1, 0, 0, 0, 0, 0, 80, 255, 255, 255, 255, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 255, 255, 247, 255, 127, 255, 255, 255, 253, 255, 255, 247, 207, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* Other_Math: 502 bytes. */ RE_UINT32 re_get_other_math(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_other_math_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_other_math_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_other_math_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_other_math_stage_4[pos + f] << 5; pos += code; value = (re_other_math_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Hex_Digit. */ static RE_UINT8 re_hex_digit_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_hex_digit_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_hex_digit_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, }; static RE_UINT8 re_hex_digit_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, }; static RE_UINT8 re_hex_digit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, }; /* Hex_Digit: 129 bytes. */ RE_UINT32 re_get_hex_digit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_hex_digit_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_hex_digit_stage_2[pos + f] << 3; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_hex_digit_stage_3[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_hex_digit_stage_4[pos + f] << 7; pos += code; value = (re_hex_digit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* ASCII_Hex_Digit. */ static RE_UINT8 re_ascii_hex_digit_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ascii_hex_digit_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ascii_hex_digit_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ascii_hex_digit_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ascii_hex_digit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; /* ASCII_Hex_Digit: 97 bytes. */ RE_UINT32 re_get_ascii_hex_digit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_ascii_hex_digit_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_ascii_hex_digit_stage_2[pos + f] << 3; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_ascii_hex_digit_stage_3[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_ascii_hex_digit_stage_4[pos + f] << 7; pos += code; value = (re_ascii_hex_digit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Alphabetic. */ static RE_UINT8 re_other_alphabetic_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_other_alphabetic_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 11, 12, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 6, 6, 6, 6, 6, 6, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_other_alphabetic_stage_3[] = { 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 0, 0, 14, 0, 0, 0, 15, 16, 17, 18, 19, 20, 21, 0, 0, 0, 0, 0, 0, 22, 0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 25, 26, 27, 28, 0, 0, 0, 0, 0, 0, 0, 29, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 32, 33, 34, 35, 36, 37, 38, 39, 0, 0, 0, 40, 0, 0, 0, 41, 0, 0, 0, 0, 42, 0, 0, 0, 0, 43, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_alphabetic_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 4, 0, 5, 6, 0, 0, 7, 8, 9, 10, 0, 0, 0, 11, 0, 0, 12, 13, 0, 0, 0, 0, 0, 14, 15, 16, 17, 18, 19, 20, 21, 18, 19, 20, 22, 23, 19, 20, 24, 18, 19, 20, 25, 18, 26, 20, 27, 0, 15, 20, 28, 18, 19, 20, 28, 18, 19, 20, 29, 18, 18, 0, 30, 31, 0, 32, 33, 0, 0, 34, 33, 0, 0, 0, 0, 35, 36, 37, 0, 0, 0, 38, 39, 40, 41, 0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 31, 31, 31, 31, 0, 43, 44, 0, 0, 0, 0, 0, 0, 45, 0, 0, 0, 46, 0, 0, 0, 0, 0, 0, 47, 0, 48, 49, 0, 0, 0, 0, 50, 51, 15, 0, 52, 53, 0, 54, 0, 55, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 43, 57, 58, 0, 0, 0, 0, 0, 0, 0, 57, 0, 0, 0, 59, 20, 0, 0, 0, 0, 60, 0, 0, 61, 62, 15, 0, 0, 63, 64, 0, 15, 62, 0, 0, 0, 65, 66, 0, 0, 67, 0, 68, 0, 0, 0, 0, 0, 0, 0, 69, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 71, 0, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 52, 73, 74, 0, 26, 75, 0, 0, 52, 64, 0, 0, 52, 76, 0, 0, 0, 77, 0, 0, 0, 0, 42, 44, 15, 20, 21, 18, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 61, 0, 0, 0, 0, 0, 0, 78, 79, 0, 0, 80, 81, 0, 0, 82, 0, 0, 83, 84, 0, 0, 0, 0, 0, 0, 0, 85, 0, 0, 0, 0, 0, 0, 0, 0, 35, 86, 0, 0, 0, 0, 0, 0, 0, 0, 70, 0, 0, 0, 0, 10, 87, 87, 58, 0, 0, 0, }; static RE_UINT8 re_other_alphabetic_stage_5[] = { 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 255, 191, 182, 0, 0, 0, 0, 0, 255, 7, 0, 248, 255, 254, 0, 0, 1, 0, 0, 0, 192, 31, 158, 33, 0, 0, 0, 0, 2, 0, 0, 0, 255, 255, 192, 255, 1, 0, 0, 0, 192, 248, 239, 30, 0, 0, 248, 3, 255, 255, 15, 0, 0, 0, 0, 0, 0, 204, 255, 223, 224, 0, 12, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 192, 159, 25, 128, 0, 135, 25, 2, 0, 0, 0, 35, 0, 191, 27, 0, 0, 159, 25, 192, 0, 4, 0, 0, 0, 199, 29, 128, 0, 223, 29, 96, 0, 223, 29, 128, 0, 0, 128, 95, 255, 0, 0, 12, 0, 0, 0, 242, 7, 0, 32, 0, 0, 0, 0, 242, 27, 0, 0, 254, 255, 3, 224, 255, 254, 255, 255, 255, 31, 0, 248, 127, 121, 0, 0, 192, 195, 133, 1, 30, 0, 124, 0, 0, 48, 0, 0, 0, 128, 0, 0, 192, 255, 255, 1, 0, 0, 0, 2, 0, 0, 255, 15, 255, 1, 0, 0, 128, 15, 0, 0, 224, 127, 254, 255, 31, 0, 31, 0, 0, 0, 0, 0, 224, 255, 7, 0, 0, 0, 254, 51, 0, 0, 128, 255, 3, 0, 240, 255, 63, 0, 128, 255, 31, 0, 255, 255, 255, 255, 255, 3, 0, 0, 0, 0, 240, 15, 248, 0, 0, 0, 3, 0, 0, 0, 0, 0, 240, 255, 192, 7, 0, 0, 128, 255, 7, 0, 0, 254, 127, 0, 8, 48, 0, 0, 0, 0, 157, 65, 0, 248, 32, 0, 248, 7, 0, 0, 0, 0, 0, 64, 0, 0, 192, 7, 110, 240, 0, 0, 0, 0, 0, 255, 63, 0, 0, 0, 0, 0, 255, 1, 0, 0, 248, 255, 0, 240, 159, 0, 0, 128, 63, 127, 0, 0, 0, 48, 0, 0, 255, 127, 1, 0, 0, 0, 0, 248, 63, 0, 0, 0, 0, 224, 255, 7, 0, 0, 0, 0, 127, 0, 255, 255, 255, 127, 255, 3, 255, 255, }; /* Other_Alphabetic: 945 bytes. */ RE_UINT32 re_get_other_alphabetic(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_alphabetic_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_other_alphabetic_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_other_alphabetic_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_other_alphabetic_stage_4[pos + f] << 5; pos += code; value = (re_other_alphabetic_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Ideographic. */ static RE_UINT8 re_ideographic_stage_1[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ideographic_stage_2[] = { 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 6, 2, 7, 8, 2, 9, 0, 0, 0, 0, 0, 10, }; static RE_UINT8 re_ideographic_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 0, 2, 5, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 6, 2, 2, 2, 2, 2, 2, 2, 2, 7, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 0, 2, 2, 10, 0, 0, 0, 0, 0, }; static RE_UINT8 re_ideographic_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 0, 0, 3, 3, 3, 3, 3, 3, 4, 0, 3, 3, 3, 5, 3, 3, 6, 0, 3, 3, 3, 3, 3, 3, 7, 0, 3, 8, 3, 3, 3, 3, 3, 3, 9, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 10, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_ideographic_stage_5[] = { 0, 0, 0, 0, 192, 0, 0, 0, 254, 3, 0, 7, 255, 255, 255, 255, 255, 255, 63, 0, 255, 63, 255, 255, 255, 255, 255, 3, 255, 255, 127, 0, 255, 255, 31, 0, 255, 255, 255, 63, 3, 0, 0, 0, }; /* Ideographic: 333 bytes. */ RE_UINT32 re_get_ideographic(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_ideographic_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_ideographic_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_ideographic_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_ideographic_stage_4[pos + f] << 5; pos += code; value = (re_ideographic_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Diacritic. */ static RE_UINT8 re_diacritic_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_diacritic_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 7, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 9, 10, 11, 12, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 13, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 14, 4, 4, 15, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_diacritic_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 1, 1, 1, 1, 1, 17, 1, 18, 19, 20, 21, 22, 1, 23, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 24, 1, 25, 1, 26, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 28, 29, 30, 31, 32, 1, 1, 1, 1, 1, 1, 1, 33, 1, 1, 34, 35, 1, 1, 36, 1, 1, 1, 1, 1, 1, 1, 37, 1, 1, 1, 1, 1, 38, 39, 40, 41, 42, 43, 44, 45, 1, 1, 46, 1, 1, 1, 1, 47, 1, 48, 1, 1, 1, 1, 1, 1, 49, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_diacritic_stage_4[] = { 0, 0, 1, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 5, 5, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 10, 0, 11, 12, 13, 0, 0, 0, 14, 0, 0, 0, 15, 16, 0, 4, 17, 0, 0, 18, 0, 19, 20, 0, 0, 0, 0, 0, 0, 21, 0, 22, 23, 24, 0, 22, 25, 0, 0, 22, 25, 0, 0, 22, 25, 0, 0, 22, 25, 0, 0, 0, 25, 0, 0, 0, 25, 0, 0, 22, 25, 0, 0, 0, 25, 0, 0, 0, 26, 0, 0, 0, 27, 0, 0, 0, 28, 0, 20, 29, 0, 0, 30, 0, 31, 0, 0, 32, 0, 0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 0, 0, 35, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 0, 37, 0, 0, 0, 38, 39, 40, 0, 41, 0, 0, 0, 42, 0, 43, 0, 0, 4, 44, 0, 45, 5, 17, 0, 0, 46, 47, 0, 0, 0, 0, 0, 48, 49, 50, 0, 0, 0, 0, 0, 0, 0, 51, 0, 52, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 54, 0, 0, 22, 0, 0, 0, 55, 56, 0, 0, 57, 58, 59, 0, 0, 60, 0, 0, 20, 0, 0, 0, 0, 0, 0, 39, 61, 0, 62, 63, 0, 0, 63, 2, 64, 0, 0, 0, 65, 0, 15, 66, 67, 0, 0, 68, 0, 0, 0, 0, 69, 1, 0, 0, 0, 0, 0, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 0, 1, 2, 71, 72, 0, 0, 73, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 0, 0, 75, 0, 0, 0, 76, 0, 63, 0, 0, 77, 0, 0, 78, 0, 0, 0, 0, 0, 79, 0, 22, 25, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 0, 0, 0, 15, 2, 0, 0, 15, 0, 0, 0, 42, 0, 0, 0, 82, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 84, 0, 0, 0, 0, 0, 0, 85, 86, 87, 0, 0, 0, 0, 0, 0, 0, 0, 88, 0, }; static RE_UINT8 re_diacritic_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 64, 1, 0, 0, 0, 0, 129, 144, 1, 0, 0, 255, 255, 255, 255, 255, 255, 255, 127, 255, 224, 7, 0, 48, 4, 48, 0, 0, 0, 248, 0, 0, 0, 0, 0, 0, 2, 0, 0, 254, 255, 251, 255, 255, 191, 22, 0, 0, 0, 0, 248, 135, 1, 0, 0, 0, 128, 97, 28, 0, 0, 255, 7, 0, 0, 192, 255, 1, 0, 0, 248, 63, 0, 0, 0, 0, 3, 248, 255, 255, 127, 0, 0, 0, 16, 0, 32, 30, 0, 0, 0, 2, 0, 0, 32, 0, 0, 0, 4, 0, 0, 128, 95, 0, 0, 0, 31, 0, 0, 0, 0, 160, 194, 220, 0, 0, 0, 64, 0, 0, 0, 0, 0, 128, 6, 128, 191, 0, 12, 0, 254, 15, 32, 0, 0, 0, 14, 0, 0, 224, 159, 0, 0, 255, 63, 0, 0, 16, 0, 16, 0, 0, 0, 0, 248, 15, 0, 0, 12, 0, 0, 0, 0, 192, 0, 0, 0, 0, 63, 255, 33, 16, 3, 0, 240, 255, 255, 240, 255, 0, 0, 0, 0, 32, 224, 0, 0, 0, 160, 3, 224, 0, 224, 0, 224, 0, 96, 0, 128, 3, 0, 0, 128, 0, 0, 0, 252, 0, 0, 0, 0, 0, 30, 0, 128, 0, 176, 0, 0, 0, 48, 0, 0, 3, 0, 0, 0, 128, 255, 3, 0, 0, 0, 0, 1, 0, 0, 255, 255, 3, 0, 0, 120, 0, 0, 0, 0, 8, 0, 32, 0, 0, 0, 0, 0, 0, 56, 7, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 248, 0, 48, 0, 0, 255, 255, 0, 0, 0, 0, 1, 0, 0, 0, 0, 192, 8, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 6, 0, 0, 24, 0, 1, 28, 0, 0, 0, 0, 96, 0, 0, 6, 0, 0, 192, 31, 31, 0, 12, 0, 0, 0, 0, 8, 0, 0, 0, 0, 31, 0, 0, 128, 255, 255, 128, 227, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 0, 0, 127, 0, }; /* Diacritic: 997 bytes. */ RE_UINT32 re_get_diacritic(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_diacritic_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_diacritic_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_diacritic_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_diacritic_stage_4[pos + f] << 5; pos += code; value = (re_diacritic_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Extender. */ static RE_UINT8 re_extender_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_extender_stage_2[] = { 0, 1, 2, 3, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 6, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_extender_stage_3[] = { 0, 1, 2, 1, 1, 1, 3, 4, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 7, 1, 8, 1, 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 10, 1, 1, 1, 1, 1, 11, 1, 1, 12, 13, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 14, 1, 1, 1, 15, 1, 16, 1, 1, 1, 1, 1, 17, 1, 1, 1, 1, }; static RE_UINT8 re_extender_stage_4[] = { 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 5, 0, 0, 0, 5, 0, 6, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 9, 0, 10, 0, 0, 0, 0, 11, 12, 0, 0, 13, 0, 0, 14, 15, 0, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 5, 0, 0, 0, 18, 0, 0, 19, 20, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 0, 0, 0, 22, 0, 0, 0, 0, 0, }; static RE_UINT8 re_extender_stage_5[] = { 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0, 0, 4, 64, 0, 0, 0, 0, 4, 0, 0, 8, 0, 0, 0, 128, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 8, 32, 0, 0, 0, 0, 0, 62, 0, 0, 0, 0, 96, 0, 0, 0, 112, 0, 0, 32, 0, 0, 16, 0, 0, 0, 128, 0, 0, 0, 0, 1, 0, 0, 0, 0, 32, 0, 0, 24, 0, 192, 1, 0, 0, 12, 0, 0, 0, }; /* Extender: 414 bytes. */ RE_UINT32 re_get_extender(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_extender_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_extender_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_extender_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_extender_stage_4[pos + f] << 5; pos += code; value = (re_extender_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Lowercase. */ static RE_UINT8 re_other_lowercase_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_lowercase_stage_2[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_other_lowercase_stage_3[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 4, 2, 5, 2, 2, 2, 6, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 8, 2, 2, }; static RE_UINT8 re_other_lowercase_stage_4[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 6, 7, 0, 0, 8, 9, 0, 0, 10, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 14, 0, 15, 0, 0, 0, 0, 0, 16, 0, 0, }; static RE_UINT8 re_other_lowercase_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 255, 1, 3, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 240, 255, 255, 255, 255, 255, 255, 255, 7, 0, 1, 0, 0, 0, 248, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 2, 128, 0, 0, 255, 31, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 0, 0, 255, 255, 255, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 240, 0, 0, 0, 0, }; /* Other_Lowercase: 297 bytes. */ RE_UINT32 re_get_other_lowercase(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_lowercase_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_other_lowercase_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_lowercase_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_lowercase_stage_4[pos + f] << 6; pos += code; value = (re_other_lowercase_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Uppercase. */ static RE_UINT8 re_other_uppercase_stage_1[] = { 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_uppercase_stage_2[] = { 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, }; static RE_UINT8 re_other_uppercase_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_uppercase_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 3, 4, 4, 5, 0, 0, 0, }; static RE_UINT8 re_other_uppercase_stage_5[] = { 0, 0, 0, 0, 255, 255, 0, 0, 0, 0, 192, 255, 0, 0, 255, 255, 255, 3, 255, 255, 255, 3, 0, 0, }; /* Other_Uppercase: 162 bytes. */ RE_UINT32 re_get_other_uppercase(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_other_uppercase_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_other_uppercase_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_other_uppercase_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_other_uppercase_stage_4[pos + f] << 5; pos += code; value = (re_other_uppercase_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Noncharacter_Code_Point. */ static RE_UINT8 re_noncharacter_code_point_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_noncharacter_code_point_stage_2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, }; static RE_UINT8 re_noncharacter_code_point_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 2, }; static RE_UINT8 re_noncharacter_code_point_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, }; static RE_UINT8 re_noncharacter_code_point_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, }; /* Noncharacter_Code_Point: 121 bytes. */ RE_UINT32 re_get_noncharacter_code_point(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_noncharacter_code_point_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_noncharacter_code_point_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_noncharacter_code_point_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_noncharacter_code_point_stage_4[pos + f] << 6; pos += code; value = (re_noncharacter_code_point_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Grapheme_Extend. */ static RE_UINT8 re_other_grapheme_extend_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_other_grapheme_extend_stage_2[] = { 0, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_grapheme_extend_stage_3[] = { 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 7, 8, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_grapheme_extend_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 2, 1, 2, 0, 0, 0, 3, 1, 2, 0, 4, 5, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 8, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 10, 0, 0, }; static RE_UINT8 re_other_grapheme_extend_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 128, 0, 0, 0, 0, 0, 4, 0, 96, 0, 0, 0, 0, 0, 0, 128, 0, 128, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 0, 0, 0, 0, 0, 192, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 32, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 32, 192, 7, 0, }; /* Other_Grapheme_Extend: 289 bytes. */ RE_UINT32 re_get_other_grapheme_extend(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_grapheme_extend_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_other_grapheme_extend_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_grapheme_extend_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_grapheme_extend_stage_4[pos + f] << 6; pos += code; value = (re_other_grapheme_extend_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* IDS_Binary_Operator. */ static RE_UINT8 re_ids_binary_operator_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ids_binary_operator_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_ids_binary_operator_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_ids_binary_operator_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_ids_binary_operator_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 243, 15, }; /* IDS_Binary_Operator: 97 bytes. */ RE_UINT32 re_get_ids_binary_operator(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_ids_binary_operator_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_ids_binary_operator_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_ids_binary_operator_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_ids_binary_operator_stage_4[pos + f] << 6; pos += code; value = (re_ids_binary_operator_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* IDS_Trinary_Operator. */ static RE_UINT8 re_ids_trinary_operator_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_ids_trinary_operator_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_ids_trinary_operator_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_ids_trinary_operator_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_ids_trinary_operator_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, }; /* IDS_Trinary_Operator: 97 bytes. */ RE_UINT32 re_get_ids_trinary_operator(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_ids_trinary_operator_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_ids_trinary_operator_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_ids_trinary_operator_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_ids_trinary_operator_stage_4[pos + f] << 6; pos += code; value = (re_ids_trinary_operator_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Radical. */ static RE_UINT8 re_radical_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_radical_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_radical_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, }; static RE_UINT8 re_radical_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 2, 2, 2, 2, 2, 2, 4, 0, }; static RE_UINT8 re_radical_stage_5[] = { 0, 0, 0, 0, 255, 255, 255, 251, 255, 255, 255, 255, 255, 255, 15, 0, 255, 255, 63, 0, }; /* Radical: 117 bytes. */ RE_UINT32 re_get_radical(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_radical_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_radical_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_radical_stage_3[pos + f] << 4; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_radical_stage_4[pos + f] << 5; pos += code; value = (re_radical_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Unified_Ideograph. */ static RE_UINT8 re_unified_ideograph_stage_1[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_unified_ideograph_stage_2[] = { 0, 0, 0, 1, 2, 3, 3, 3, 3, 4, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 7, 8, 0, 0, 0, }; static RE_UINT8 re_unified_ideograph_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 0, 0, 0, 0, 0, 4, 0, 0, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 6, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, }; static RE_UINT8 re_unified_ideograph_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 3, 4, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 0, 0, 0, 0, 0, }; static RE_UINT8 re_unified_ideograph_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 63, 0, 255, 255, 63, 0, 0, 0, 0, 0, 0, 192, 26, 128, 154, 3, 0, 0, 255, 255, 127, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 31, 0, 255, 255, 255, 63, 255, 255, 255, 255, 255, 255, 255, 255, 3, 0, 0, 0, }; /* Unified_Ideograph: 281 bytes. */ RE_UINT32 re_get_unified_ideograph(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_unified_ideograph_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_unified_ideograph_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_unified_ideograph_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_unified_ideograph_stage_4[pos + f] << 6; pos += code; value = (re_unified_ideograph_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_Default_Ignorable_Code_Point. */ static RE_UINT8 re_other_default_ignorable_code_point_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, }; static RE_UINT8 re_other_default_ignorable_code_point_stage_2[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_other_default_ignorable_code_point_stage_3[] = { 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 8, 8, 8, 8, 8, 8, }; static RE_UINT8 re_other_default_ignorable_code_point_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 0, 9, 9, 0, 0, 0, 10, 9, 9, 9, 9, 9, 9, 9, 9, }; static RE_UINT8 re_other_default_ignorable_code_point_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 1, 253, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 255, 255, }; /* Other_Default_Ignorable_Code_Point: 281 bytes. */ RE_UINT32 re_get_other_default_ignorable_code_point(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_default_ignorable_code_point_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_other_default_ignorable_code_point_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_default_ignorable_code_point_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_default_ignorable_code_point_stage_4[pos + f] << 6; pos += code; value = (re_other_default_ignorable_code_point_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Deprecated. */ static RE_UINT8 re_deprecated_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, }; static RE_UINT8 re_deprecated_stage_2[] = { 0, 1, 2, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_deprecated_stage_3[] = { 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 6, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_deprecated_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 7, 0, 0, 8, 0, 0, 0, 0, }; static RE_UINT8 re_deprecated_stage_5[] = { 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 8, 0, 0, 0, 128, 2, 24, 0, 0, 0, 0, 252, 0, 0, 0, 6, 0, 0, 2, 0, 0, 0, 0, 0, 0, 128, }; /* Deprecated: 230 bytes. */ RE_UINT32 re_get_deprecated(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_deprecated_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_deprecated_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_deprecated_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_deprecated_stage_4[pos + f] << 5; pos += code; value = (re_deprecated_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Soft_Dotted. */ static RE_UINT8 re_soft_dotted_stage_1[] = { 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_soft_dotted_stage_2[] = { 0, 1, 1, 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, }; static RE_UINT8 re_soft_dotted_stage_3[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 5, 8, 9, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 5, 5, 5, 5, 5, 5, 5, 11, 12, 13, 5, }; static RE_UINT8 re_soft_dotted_stage_4[] = { 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 10, 11, 0, 0, 0, 12, 0, 0, 0, 0, 13, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 17, 18, 0, 19, 20, 0, 21, 0, 22, 23, 0, 24, 0, 17, 18, 0, 19, 20, 0, 21, 0, 0, 0, }; static RE_UINT8 re_soft_dotted_stage_5[] = { 0, 0, 0, 0, 0, 6, 0, 0, 0, 128, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 32, 0, 0, 4, 0, 0, 0, 8, 0, 0, 0, 64, 1, 4, 0, 0, 0, 0, 0, 64, 0, 16, 1, 0, 0, 0, 32, 0, 0, 0, 8, 0, 0, 0, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 16, 12, 0, 0, 0, 0, 0, 192, 0, 0, 12, 0, 0, 0, 0, 0, 192, 0, 0, 12, 0, 192, 0, 0, 0, 0, 0, 0, 12, 0, 192, 0, 0, }; /* Soft_Dotted: 342 bytes. */ RE_UINT32 re_get_soft_dotted(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_soft_dotted_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_soft_dotted_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_soft_dotted_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_soft_dotted_stage_4[pos + f] << 5; pos += code; value = (re_soft_dotted_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Logical_Order_Exception. */ static RE_UINT8 re_logical_order_exception_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_logical_order_exception_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_logical_order_exception_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, }; static RE_UINT8 re_logical_order_exception_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, }; static RE_UINT8 re_logical_order_exception_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 224, 4, 0, 0, 0, 0, 0, 0, 96, 26, }; /* Logical_Order_Exception: 145 bytes. */ RE_UINT32 re_get_logical_order_exception(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_logical_order_exception_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_logical_order_exception_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_logical_order_exception_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_logical_order_exception_stage_4[pos + f] << 6; pos += code; value = (re_logical_order_exception_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_ID_Start. */ static RE_UINT8 re_other_id_start_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_id_start_stage_2[] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_id_start_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_id_start_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, }; static RE_UINT8 re_other_id_start_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 64, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, }; /* Other_ID_Start: 113 bytes. */ RE_UINT32 re_get_other_id_start(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_id_start_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_other_id_start_stage_2[pos + f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_id_start_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_id_start_stage_4[pos + f] << 6; pos += code; value = (re_other_id_start_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Other_ID_Continue. */ static RE_UINT8 re_other_id_continue_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_id_continue_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_other_id_continue_stage_3[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_other_id_continue_stage_4[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, }; static RE_UINT8 re_other_id_continue_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 254, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0, }; /* Other_ID_Continue: 145 bytes. */ RE_UINT32 re_get_other_id_continue(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_other_id_continue_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_other_id_continue_stage_2[pos + f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_other_id_continue_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_other_id_continue_stage_4[pos + f] << 6; pos += code; value = (re_other_id_continue_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* STerm. */ static RE_UINT8 re_sterm_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_sterm_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 7, 7, 7, 7, 7, 7, 7, 7, 7, 10, 7, 11, 12, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 14, 7, 7, 7, 15, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, }; static RE_UINT8 re_sterm_stage_3[] = { 0, 1, 1, 1, 1, 2, 3, 4, 1, 5, 1, 1, 1, 1, 1, 1, 6, 1, 1, 7, 1, 1, 8, 9, 10, 11, 12, 13, 14, 1, 1, 1, 15, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 16, 1, 17, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 18, 1, 19, 1, 20, 21, 22, 23, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 24, 25, 1, 1, 26, 1, 1, 1, 1, 1, 27, 28, 29, 1, 1, 30, 31, 32, 1, 1, 33, 34, 1, 1, 1, 1, 1, 1, 1, 1, 35, 1, 1, 1, 1, 1, 36, 1, 1, 1, 1, 1, }; static RE_UINT8 re_sterm_stage_4[] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0, 5, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 15, 0, 16, 0, 0, 0, 0, 0, 17, 18, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 21, 0, 0, 0, 0, 0, 0, 22, 0, 0, 0, 23, 0, 0, 21, 0, 0, 24, 0, 0, 0, 0, 25, 0, 0, 0, 26, 0, 0, 0, 0, 27, 0, 0, 0, 0, 0, 0, 0, 28, 0, 0, 29, 0, 0, 0, 0, 0, 1, 0, 0, 30, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 16, 32, 0, 0, 0, 33, 0, 0, 0, 34, 0, 0, 35, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 36, 0, 0, 0, 37, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 0, 0, 0, 39, 0, 40, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 42, 0, 0, 0, }; static RE_UINT8 re_sterm_stage_5[] = { 0, 0, 0, 0, 2, 64, 0, 128, 0, 2, 0, 0, 0, 0, 0, 128, 0, 0, 16, 0, 7, 0, 0, 0, 0, 0, 0, 2, 48, 0, 0, 0, 0, 12, 0, 0, 132, 1, 0, 0, 0, 64, 0, 0, 0, 0, 96, 0, 8, 2, 0, 0, 0, 15, 0, 0, 0, 0, 0, 204, 0, 0, 0, 24, 0, 0, 0, 192, 0, 0, 0, 48, 128, 3, 0, 0, 0, 64, 0, 16, 4, 0, 0, 0, 0, 192, 0, 0, 0, 0, 136, 0, 0, 0, 192, 0, 0, 128, 0, 0, 0, 3, 0, 0, 0, 0, 0, 224, 0, 0, 3, 0, 0, 8, 0, 0, 0, 0, 196, 0, 2, 0, 0, 0, 128, 1, 0, 0, 3, 0, 0, 0, 14, 0, 0, 0, 96, 32, 0, 192, 0, 0, 0, 27, 12, 254, 255, 0, 6, 0, 0, 0, 0, 0, 0, 112, 0, 0, 32, 0, 0, 0, 128, 1, 16, 0, 0, 0, 0, 1, 0, 0, }; /* STerm: 709 bytes. */ RE_UINT32 re_get_sterm(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_sterm_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_sterm_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_sterm_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_sterm_stage_4[pos + f] << 5; pos += code; value = (re_sterm_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Variation_Selector. */ static RE_UINT8 re_variation_selector_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, }; static RE_UINT8 re_variation_selector_stage_2[] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_variation_selector_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_variation_selector_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 4, }; static RE_UINT8 re_variation_selector_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 0, 255, 255, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, }; /* Variation_Selector: 169 bytes. */ RE_UINT32 re_get_variation_selector(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_variation_selector_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_variation_selector_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_variation_selector_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_variation_selector_stage_4[pos + f] << 6; pos += code; value = (re_variation_selector_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Pattern_White_Space. */ static RE_UINT8 re_pattern_white_space_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_white_space_stage_2[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_white_space_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_white_space_stage_4[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_white_space_stage_5[] = { 0, 62, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 192, 0, 0, 0, 3, 0, 0, }; /* Pattern_White_Space: 129 bytes. */ RE_UINT32 re_get_pattern_white_space(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_pattern_white_space_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_pattern_white_space_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_pattern_white_space_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_pattern_white_space_stage_4[pos + f] << 6; pos += code; value = (re_pattern_white_space_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Pattern_Syntax. */ static RE_UINT8 re_pattern_syntax_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_syntax_stage_2[] = { 0, 1, 1, 1, 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_pattern_syntax_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 4, 5, 4, 4, 6, 4, 4, 4, 4, 1, 1, 7, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 10, 1, }; static RE_UINT8 re_pattern_syntax_stage_4[] = { 0, 1, 2, 2, 0, 3, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 8, 8, 8, 9, 10, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 11, 12, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, }; static RE_UINT8 re_pattern_syntax_stage_5[] = { 0, 0, 0, 0, 254, 255, 0, 252, 1, 0, 0, 120, 254, 90, 67, 136, 0, 0, 128, 0, 0, 0, 255, 255, 255, 0, 255, 127, 254, 255, 239, 127, 255, 255, 255, 255, 255, 255, 63, 0, 0, 0, 240, 255, 14, 255, 255, 255, 1, 0, 1, 0, 0, 0, 0, 192, 96, 0, 0, 0, }; /* Pattern_Syntax: 277 bytes. */ RE_UINT32 re_get_pattern_syntax(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_pattern_syntax_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_pattern_syntax_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_pattern_syntax_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_pattern_syntax_stage_4[pos + f] << 5; pos += code; value = (re_pattern_syntax_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Hangul_Syllable_Type. */ static RE_UINT8 re_hangul_syllable_type_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_hangul_syllable_type_stage_2[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_hangul_syllable_type_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 11, }; static RE_UINT8 re_hangul_syllable_type_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 4, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 6, 5, 6, 6, 8, 0, 2, 2, 9, 10, 3, 3, 3, 3, 3, 11, }; static RE_UINT8 re_hangul_syllable_type_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 0, 0, 0, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, }; /* Hangul_Syllable_Type: 497 bytes. */ RE_UINT32 re_get_hangul_syllable_type(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_hangul_syllable_type_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_hangul_syllable_type_stage_2[pos + f] << 4; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_hangul_syllable_type_stage_3[pos + f] << 4; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_hangul_syllable_type_stage_4[pos + f] << 3; value = re_hangul_syllable_type_stage_5[pos + code]; return value; } /* Bidi_Class. */ static RE_UINT8 re_bidi_class_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 7, 8, 9, 5, 5, 5, 5, 10, 5, 5, 5, 5, 11, 5, 12, 13, 14, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 16, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, }; static RE_UINT8 re_bidi_class_stage_2[] = { 0, 1, 2, 2, 2, 3, 4, 5, 2, 6, 2, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 2, 2, 2, 2, 30, 31, 32, 2, 2, 2, 2, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 2, 46, 2, 2, 2, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 53, 53, 53, 58, 53, 53, 2, 2, 53, 53, 53, 53, 59, 60, 2, 61, 62, 63, 64, 65, 53, 66, 67, 68, 2, 69, 70, 71, 72, 73, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 74, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 75, 2, 2, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 2, 86, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 87, 88, 88, 88, 89, 90, 91, 92, 93, 94, 2, 2, 95, 96, 2, 97, 98, 2, 2, 2, 2, 2, 2, 2, 2, 2, 99, 99, 100, 99, 101, 102, 103, 99, 99, 99, 99, 99, 104, 99, 99, 99, 105, 106, 107, 108, 109, 110, 111, 2, 2, 112, 2, 113, 114, 115, 116, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 117, 118, 2, 2, 2, 2, 2, 2, 2, 2, 119, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 120, 2, 2, 2, 2, 2, 2, 2, 2, 121, 122, 123, 2, 124, 2, 2, 2, 2, 2, 2, 125, 126, 127, 2, 2, 2, 2, 128, 129, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 99, 130, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 88, 131, 99, 99, 132, 133, 134, 2, 2, 2, 53, 53, 53, 53, 135, 136, 53, 137, 138, 139, 140, 141, 142, 143, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 144, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 144, 145, 145, 146, 147, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, 145, }; static RE_UINT8 re_bidi_class_stage_3[] = { 0, 1, 2, 3, 4, 5, 4, 6, 7, 8, 9, 10, 11, 12, 11, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 13, 14, 14, 15, 16, 17, 17, 17, 17, 17, 17, 17, 18, 19, 11, 11, 11, 11, 11, 11, 20, 21, 11, 11, 11, 11, 11, 11, 11, 22, 23, 17, 24, 25, 26, 26, 26, 27, 28, 29, 29, 30, 17, 31, 32, 29, 29, 29, 29, 29, 33, 34, 35, 29, 36, 29, 17, 28, 29, 29, 29, 29, 29, 37, 32, 26, 26, 38, 39, 26, 40, 41, 26, 26, 42, 26, 26, 26, 26, 29, 29, 29, 29, 43, 17, 44, 11, 11, 45, 46, 47, 48, 11, 49, 11, 11, 50, 51, 11, 48, 52, 53, 11, 11, 50, 54, 49, 11, 55, 53, 11, 11, 50, 56, 11, 48, 57, 49, 11, 11, 58, 51, 59, 48, 11, 60, 11, 11, 11, 61, 11, 11, 62, 63, 11, 11, 64, 65, 66, 48, 67, 49, 11, 11, 50, 68, 11, 48, 11, 49, 11, 11, 11, 51, 11, 48, 11, 11, 11, 11, 11, 69, 70, 11, 11, 11, 11, 11, 71, 72, 11, 11, 11, 11, 11, 11, 73, 74, 11, 11, 11, 11, 75, 11, 76, 11, 11, 11, 77, 78, 79, 17, 80, 59, 11, 11, 11, 11, 11, 81, 82, 11, 83, 63, 84, 85, 86, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 81, 11, 11, 11, 87, 11, 11, 11, 11, 11, 11, 4, 11, 11, 11, 11, 11, 11, 11, 88, 89, 11, 11, 11, 11, 11, 11, 11, 90, 11, 90, 11, 48, 11, 48, 11, 11, 11, 91, 92, 93, 11, 87, 94, 11, 11, 11, 11, 11, 11, 11, 11, 11, 95, 11, 11, 11, 11, 11, 11, 11, 96, 97, 98, 11, 11, 11, 11, 11, 11, 11, 11, 99, 16, 16, 11, 100, 11, 11, 11, 101, 102, 103, 11, 11, 11, 104, 11, 11, 11, 11, 105, 11, 11, 106, 60, 11, 107, 105, 108, 11, 109, 11, 11, 11, 110, 108, 11, 11, 111, 112, 11, 11, 11, 11, 11, 11, 11, 11, 11, 113, 114, 115, 11, 11, 11, 11, 17, 17, 17, 116, 11, 11, 11, 117, 118, 119, 119, 120, 121, 16, 122, 123, 124, 125, 126, 127, 128, 11, 129, 129, 129, 17, 17, 63, 130, 131, 132, 133, 134, 16, 11, 11, 135, 16, 16, 16, 16, 16, 16, 16, 16, 136, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 137, 11, 11, 11, 5, 16, 138, 16, 16, 16, 16, 16, 139, 16, 16, 140, 11, 139, 11, 16, 16, 141, 142, 11, 11, 11, 11, 143, 16, 16, 16, 144, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 145, 16, 146, 16, 147, 148, 149, 150, 11, 11, 11, 11, 11, 11, 11, 151, 152, 11, 11, 11, 11, 11, 11, 11, 153, 11, 11, 11, 11, 11, 11, 17, 17, 16, 16, 16, 16, 154, 11, 11, 11, 16, 155, 16, 16, 16, 16, 16, 156, 16, 16, 16, 16, 16, 137, 11, 157, 158, 16, 159, 160, 11, 11, 11, 11, 11, 161, 4, 11, 11, 11, 11, 162, 11, 11, 11, 11, 16, 16, 156, 11, 11, 120, 11, 11, 11, 16, 11, 163, 11, 11, 11, 164, 150, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 165, 11, 11, 11, 11, 11, 99, 11, 166, 11, 11, 11, 11, 16, 16, 16, 16, 11, 16, 16, 16, 140, 11, 11, 11, 119, 11, 11, 11, 11, 11, 153, 167, 11, 64, 11, 11, 11, 11, 11, 108, 16, 16, 149, 11, 11, 11, 11, 11, 168, 11, 11, 11, 11, 11, 11, 11, 169, 11, 170, 171, 11, 11, 11, 172, 11, 11, 11, 11, 173, 11, 17, 108, 11, 11, 174, 11, 175, 108, 11, 11, 44, 11, 11, 176, 11, 11, 177, 11, 11, 11, 178, 179, 180, 11, 11, 50, 11, 11, 11, 181, 49, 11, 68, 59, 11, 11, 11, 11, 11, 11, 182, 11, 11, 183, 184, 26, 26, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 185, 29, 29, 29, 29, 29, 29, 29, 29, 29, 8, 8, 186, 17, 87, 17, 16, 16, 187, 188, 29, 29, 29, 29, 29, 29, 29, 29, 189, 190, 3, 4, 5, 4, 5, 137, 11, 11, 11, 11, 11, 11, 11, 191, 192, 193, 11, 11, 11, 16, 16, 16, 16, 194, 157, 4, 11, 11, 11, 11, 86, 11, 11, 11, 11, 11, 11, 195, 142, 11, 11, 11, 11, 11, 11, 11, 196, 26, 26, 26, 26, 26, 26, 26, 26, 26, 197, 26, 26, 26, 26, 26, 26, 198, 26, 26, 199, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 200, 26, 26, 26, 26, 201, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 202, 203, 49, 11, 11, 204, 205, 14, 137, 153, 108, 11, 11, 206, 11, 11, 11, 11, 44, 11, 207, 208, 11, 11, 11, 209, 108, 11, 11, 210, 211, 11, 11, 11, 11, 11, 153, 212, 11, 11, 11, 11, 11, 11, 11, 11, 11, 153, 213, 11, 108, 11, 11, 50, 63, 11, 214, 208, 11, 11, 11, 215, 216, 11, 11, 11, 11, 11, 11, 217, 63, 68, 11, 11, 11, 11, 11, 218, 63, 11, 11, 11, 11, 11, 219, 220, 11, 11, 11, 11, 11, 81, 221, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 208, 11, 11, 11, 205, 11, 11, 11, 11, 153, 44, 11, 11, 11, 11, 11, 11, 11, 222, 223, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 224, 225, 226, 11, 227, 11, 11, 11, 11, 11, 16, 16, 16, 16, 228, 11, 11, 11, 16, 16, 16, 16, 16, 140, 11, 11, 11, 11, 11, 11, 11, 162, 11, 11, 11, 229, 11, 11, 166, 11, 11, 11, 230, 11, 11, 11, 231, 232, 232, 232, 17, 17, 17, 233, 17, 17, 80, 177, 173, 107, 234, 11, 11, 11, 11, 11, 26, 26, 26, 26, 26, 235, 26, 26, 29, 29, 29, 29, 29, 29, 29, 236, 16, 16, 157, 16, 16, 16, 16, 16, 16, 156, 237, 164, 164, 164, 16, 137, 238, 11, 11, 11, 11, 11, 133, 11, 16, 16, 16, 16, 16, 16, 16, 155, 16, 16, 239, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 4, 194, 156, 16, 16, 16, 16, 16, 16, 16, 156, 16, 16, 16, 16, 16, 240, 11, 11, 157, 16, 16, 16, 241, 87, 16, 16, 241, 16, 242, 11, 11, 11, 11, 11, 11, 243, 11, 11, 11, 11, 11, 11, 240, 11, 11, 11, 4, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 244, 8, 8, 8, 8, 8, 8, 8, 8, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 8, }; static RE_UINT8 re_bidi_class_stage_4[] = { 0, 0, 1, 2, 0, 0, 0, 3, 4, 5, 6, 7, 8, 8, 9, 10, 11, 12, 12, 12, 12, 12, 13, 10, 12, 12, 13, 14, 0, 15, 0, 0, 0, 0, 0, 0, 16, 5, 17, 18, 19, 20, 21, 10, 12, 12, 12, 12, 12, 13, 12, 12, 12, 12, 22, 12, 23, 10, 10, 10, 12, 24, 10, 17, 10, 10, 10, 10, 25, 25, 25, 25, 12, 26, 12, 27, 12, 17, 12, 12, 12, 27, 12, 12, 28, 25, 29, 12, 12, 12, 27, 30, 31, 25, 25, 25, 25, 25, 25, 32, 33, 32, 34, 34, 34, 34, 34, 34, 35, 36, 37, 38, 25, 25, 39, 40, 40, 40, 40, 40, 40, 40, 41, 25, 35, 35, 42, 43, 44, 40, 40, 40, 40, 45, 25, 46, 25, 47, 48, 49, 8, 8, 50, 40, 51, 40, 40, 40, 40, 45, 25, 25, 34, 34, 52, 25, 25, 53, 54, 34, 34, 55, 32, 25, 25, 31, 31, 56, 34, 34, 31, 34, 41, 25, 25, 25, 57, 12, 12, 12, 12, 12, 58, 59, 60, 25, 59, 61, 60, 25, 12, 12, 62, 12, 12, 12, 61, 12, 12, 12, 12, 12, 12, 59, 60, 59, 12, 61, 63, 12, 64, 12, 65, 12, 12, 12, 65, 28, 66, 29, 29, 61, 12, 12, 60, 67, 59, 61, 68, 12, 12, 12, 12, 12, 12, 66, 12, 58, 12, 12, 58, 12, 12, 12, 59, 12, 12, 61, 13, 10, 69, 12, 59, 12, 12, 12, 12, 12, 12, 62, 59, 62, 70, 29, 12, 65, 12, 12, 12, 12, 10, 71, 12, 12, 12, 29, 12, 12, 58, 12, 62, 72, 12, 12, 61, 25, 57, 64, 12, 28, 25, 57, 61, 25, 67, 59, 12, 12, 25, 29, 12, 12, 29, 12, 12, 73, 74, 26, 60, 25, 25, 57, 25, 70, 12, 60, 25, 25, 60, 25, 25, 25, 25, 59, 12, 12, 12, 60, 70, 25, 65, 65, 12, 12, 29, 62, 60, 59, 12, 12, 58, 65, 12, 61, 12, 12, 12, 61, 10, 10, 26, 12, 75, 12, 12, 12, 12, 12, 13, 11, 62, 59, 12, 12, 12, 67, 25, 29, 12, 58, 60, 25, 25, 12, 64, 61, 10, 10, 76, 77, 12, 12, 61, 12, 57, 28, 59, 12, 58, 12, 60, 12, 11, 26, 12, 12, 12, 12, 12, 23, 12, 28, 66, 12, 12, 58, 25, 57, 72, 60, 25, 59, 28, 25, 25, 66, 25, 25, 25, 57, 25, 12, 12, 12, 12, 70, 57, 59, 12, 12, 28, 25, 29, 12, 12, 12, 62, 29, 67, 29, 12, 58, 29, 73, 12, 12, 12, 25, 25, 62, 12, 12, 57, 25, 25, 25, 70, 25, 59, 61, 12, 59, 29, 12, 25, 29, 12, 25, 12, 12, 12, 78, 26, 12, 12, 24, 12, 12, 12, 24, 12, 12, 12, 22, 79, 79, 80, 81, 10, 10, 82, 83, 84, 85, 10, 10, 10, 86, 10, 10, 10, 10, 10, 87, 0, 88, 89, 0, 90, 8, 91, 71, 8, 8, 91, 71, 84, 84, 84, 84, 17, 71, 26, 12, 12, 20, 11, 23, 10, 78, 92, 93, 12, 12, 23, 12, 10, 11, 23, 26, 12, 12, 24, 12, 94, 10, 10, 10, 10, 26, 12, 12, 10, 20, 10, 10, 10, 10, 71, 12, 10, 71, 12, 12, 10, 10, 8, 8, 8, 8, 8, 12, 12, 12, 23, 10, 10, 10, 10, 24, 10, 23, 10, 10, 10, 26, 10, 10, 10, 10, 26, 24, 10, 10, 20, 10, 26, 12, 12, 12, 12, 12, 12, 10, 12, 24, 71, 28, 29, 12, 24, 10, 12, 12, 12, 28, 71, 12, 12, 12, 10, 10, 17, 10, 10, 12, 12, 12, 10, 10, 10, 12, 95, 11, 10, 10, 11, 12, 62, 29, 11, 23, 12, 24, 12, 12, 96, 11, 12, 12, 13, 12, 12, 12, 12, 71, 24, 10, 10, 10, 12, 13, 71, 12, 12, 12, 12, 13, 97, 25, 25, 98, 12, 12, 11, 12, 58, 58, 28, 12, 12, 65, 10, 12, 12, 12, 99, 12, 12, 10, 12, 12, 12, 59, 12, 12, 12, 62, 25, 29, 12, 28, 25, 25, 28, 62, 29, 59, 12, 61, 12, 12, 12, 12, 60, 57, 65, 65, 12, 12, 28, 12, 12, 59, 70, 66, 59, 62, 12, 61, 59, 61, 12, 12, 12, 100, 34, 34, 101, 34, 40, 40, 40, 102, 40, 40, 40, 103, 104, 105, 10, 106, 107, 71, 108, 12, 40, 40, 40, 109, 30, 5, 6, 7, 5, 110, 10, 71, 0, 0, 111, 112, 92, 12, 12, 12, 10, 10, 10, 11, 113, 8, 8, 8, 12, 62, 57, 12, 34, 34, 34, 114, 31, 33, 34, 25, 34, 34, 115, 52, 34, 33, 34, 34, 34, 34, 116, 10, 35, 35, 35, 35, 35, 35, 35, 117, 12, 12, 25, 25, 25, 57, 12, 12, 28, 57, 65, 12, 12, 28, 25, 60, 25, 59, 12, 12, 28, 12, 12, 12, 12, 62, 25, 57, 12, 12, 62, 59, 29, 70, 12, 12, 28, 25, 57, 12, 12, 62, 25, 59, 28, 25, 72, 28, 70, 12, 12, 12, 62, 29, 12, 67, 28, 25, 57, 73, 12, 12, 28, 61, 25, 67, 12, 12, 62, 67, 25, 12, 12, 12, 12, 65, 0, 12, 12, 12, 12, 28, 29, 12, 118, 0, 119, 25, 57, 60, 25, 12, 12, 12, 62, 29, 120, 121, 12, 12, 12, 92, 12, 12, 12, 12, 92, 12, 13, 12, 12, 122, 8, 8, 8, 8, 25, 57, 28, 25, 60, 25, 25, 25, 25, 115, 34, 34, 123, 40, 40, 40, 10, 10, 10, 71, 8, 8, 124, 11, 10, 24, 10, 10, 10, 11, 12, 12, 10, 10, 12, 12, 10, 10, 10, 26, 10, 10, 11, 12, 12, 12, 12, 125, }; static RE_UINT8 re_bidi_class_stage_5[] = { 11, 11, 11, 11, 11, 8, 7, 8, 9, 7, 11, 11, 7, 7, 7, 8, 9, 10, 10, 4, 4, 4, 10, 10, 10, 10, 10, 3, 6, 3, 6, 6, 2, 2, 2, 2, 2, 2, 6, 10, 10, 10, 10, 10, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 10, 10, 10, 11, 11, 7, 11, 11, 6, 10, 4, 4, 10, 10, 0, 10, 10, 11, 10, 10, 4, 4, 2, 2, 10, 0, 10, 10, 10, 2, 0, 10, 0, 10, 10, 0, 0, 0, 10, 10, 0, 10, 10, 10, 12, 12, 12, 12, 10, 10, 0, 0, 0, 0, 10, 0, 0, 0, 0, 12, 12, 12, 0, 0, 0, 10, 10, 4, 1, 12, 12, 12, 12, 12, 1, 12, 1, 12, 12, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 10, 10, 13, 4, 4, 13, 6, 13, 10, 10, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 12, 5, 5, 4, 5, 5, 13, 13, 13, 12, 13, 13, 13, 13, 13, 12, 12, 12, 5, 10, 12, 12, 13, 13, 12, 12, 10, 12, 12, 12, 12, 13, 13, 2, 2, 13, 13, 13, 12, 13, 13, 1, 1, 1, 12, 1, 1, 10, 10, 10, 10, 1, 1, 1, 1, 12, 12, 12, 12, 1, 1, 12, 12, 12, 0, 0, 0, 12, 0, 12, 0, 0, 0, 0, 12, 12, 12, 0, 12, 0, 0, 0, 0, 12, 12, 0, 0, 4, 4, 0, 0, 0, 4, 0, 12, 12, 0, 12, 0, 0, 12, 12, 12, 0, 12, 0, 4, 0, 0, 10, 4, 10, 0, 12, 0, 12, 12, 10, 10, 10, 0, 12, 0, 12, 0, 0, 12, 0, 12, 0, 12, 10, 10, 9, 0, 0, 0, 10, 10, 10, 12, 12, 12, 11, 0, 0, 10, 0, 10, 9, 9, 9, 9, 9, 9, 9, 11, 11, 11, 0, 1, 9, 7, 16, 17, 18, 14, 15, 6, 4, 4, 4, 4, 4, 10, 10, 10, 6, 10, 10, 10, 10, 10, 10, 9, 11, 11, 19, 20, 21, 22, 11, 11, 2, 0, 0, 0, 2, 2, 3, 3, 0, 10, 0, 0, 0, 0, 4, 0, 10, 10, 3, 4, 9, 10, 10, 10, 0, 12, 12, 10, 12, 12, 12, 10, 12, 12, 10, 10, 4, 4, 0, 0, 0, 1, 12, 1, 1, 3, 1, 1, 13, 13, 10, 10, 13, 10, 13, 13, 6, 10, 6, 0, 10, 6, 10, 10, 10, 10, 10, 4, 10, 10, 3, 3, 10, 4, 4, 10, 13, 13, 13, 11, 10, 4, 4, 0, 11, 10, 10, 10, 10, 10, 11, 11, 12, 2, 2, 2, 1, 1, 1, 10, 12, 12, 12, 1, 1, 10, 10, 10, 5, 5, 5, 1, 0, 0, 0, 11, 11, 11, 11, 12, 10, 10, 12, 12, 12, 10, 0, 0, 0, 0, 2, 2, 10, 10, 13, 13, 2, 2, 2, 10, 0, 0, 11, 11, }; /* Bidi_Class: 3484 bytes. */ RE_UINT32 re_get_bidi_class(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_bidi_class_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_bidi_class_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_bidi_class_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_bidi_class_stage_4[pos + f] << 2; value = re_bidi_class_stage_5[pos + code]; return value; } /* Canonical_Combining_Class. */ static RE_UINT8 re_canonical_combining_class_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 4, 5, 2, 2, 6, 2, 7, 8, 9, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_canonical_combining_class_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 10, 11, 12, 13, 0, 14, 0, 0, 0, 0, 0, 15, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 18, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 0, 21, 22, 23, 0, 0, 0, 24, 0, 0, 25, 26, 27, 28, 0, 0, 0, 0, 0, 0, 0, 0, 0, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 31, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_canonical_combining_class_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 0, 9, 0, 10, 11, 0, 0, 12, 13, 14, 15, 16, 0, 0, 0, 0, 17, 18, 19, 20, 0, 0, 0, 0, 21, 0, 22, 23, 0, 0, 22, 24, 0, 0, 22, 24, 0, 0, 22, 24, 0, 0, 22, 24, 0, 0, 0, 24, 0, 0, 0, 25, 0, 0, 22, 24, 0, 0, 0, 24, 0, 0, 0, 26, 0, 0, 27, 28, 0, 0, 29, 30, 0, 31, 32, 0, 33, 34, 0, 35, 0, 0, 36, 0, 0, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 39, 0, 0, 0, 0, 40, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 43, 0, 0, 44, 0, 45, 0, 0, 0, 46, 47, 48, 0, 49, 0, 50, 0, 51, 0, 0, 0, 0, 52, 53, 0, 0, 0, 0, 0, 0, 54, 55, 0, 0, 0, 0, 0, 0, 56, 57, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 58, 0, 0, 0, 59, 0, 0, 0, 60, 0, 61, 0, 0, 62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 64, 0, 0, 65, 0, 0, 0, 0, 0, 0, 0, 0, 66, 0, 0, 0, 0, 0, 47, 67, 0, 68, 69, 0, 0, 70, 71, 0, 0, 0, 0, 0, 0, 72, 73, 74, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 75, 0, 0, 0, 0, 0, 0, 0, 0, 76, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 77, 0, 0, 0, 0, 0, 0, 0, 78, 0, 0, 0, 79, 0, 0, 0, 0, 80, 81, 0, 0, 0, 0, 0, 82, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 66, 59, 0, 83, 0, 0, 84, 85, 0, 70, 0, 0, 86, 0, 0, 87, 0, 0, 0, 0, 0, 88, 0, 22, 24, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 90, 0, 0, 0, 0, 0, 0, 59, 91, 0, 0, 59, 0, 0, 0, 92, 0, 0, 0, 93, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 94, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 97, 98, 99, 0, 0, 0, 0, 100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_canonical_combining_class_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 4, 4, 8, 9, 10, 1, 11, 12, 13, 14, 15, 16, 17, 18, 1, 1, 1, 0, 0, 0, 0, 19, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 1, 23, 4, 21, 24, 25, 26, 27, 28, 29, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 31, 0, 0, 0, 32, 33, 34, 35, 1, 36, 0, 0, 0, 0, 37, 0, 0, 0, 0, 0, 0, 0, 0, 38, 1, 39, 14, 39, 40, 41, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 43, 36, 44, 45, 21, 45, 46, 0, 0, 0, 0, 0, 0, 0, 19, 1, 21, 0, 0, 0, 0, 0, 0, 0, 0, 38, 47, 1, 1, 48, 48, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 0, 51, 21, 43, 52, 53, 21, 35, 1, 0, 0, 0, 0, 0, 0, 0, 54, 0, 0, 0, 55, 56, 57, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, 0, 55, 0, 58, 0, 0, 0, 0, 59, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 0, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 65, 66, 0, 0, 0, 0, 0, 67, 68, 69, 70, 71, 72, 0, 0, 0, 0, 0, 0, 0, 73, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 74, 75, 0, 0, 0, 0, 76, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 77, 0, 0, 0, 0, 0, 0, 59, 0, 0, 78, 0, 0, 79, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 0, 0, 0, 0, 0, 0, 19, 81, 0, 77, 0, 0, 0, 0, 48, 1, 82, 0, 0, 0, 0, 1, 52, 15, 41, 0, 0, 0, 0, 0, 54, 0, 0, 0, 77, 0, 0, 0, 0, 0, 0, 0, 0, 19, 10, 1, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 0, 0, 84, 0, 0, 83, 0, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 0, 0, 0, 85, 9, 12, 4, 86, 8, 87, 76, 0, 57, 49, 0, 21, 1, 21, 88, 89, 1, 1, 1, 1, 1, 1, 1, 1, 49, 0, 90, 0, 0, 0, 0, 91, 1, 92, 57, 78, 93, 94, 4, 57, 0, 0, 0, 0, 0, 0, 19, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 96, 97, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 98, 0, 0, 0, 0, 19, 0, 1, 1, 49, 0, 0, 0, 0, 0, 0, 0, 38, 0, 0, 0, 0, 49, 0, 0, 0, 0, 59, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 49, 0, 0, 0, 0, 0, 51, 64, 0, 0, 0, 0, 0, 0, 0, 0, 95, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 100, 57, 38, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0, 0, 0, 0, 0, 0, 0, 0, 0, 101, 1, 14, 4, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 76, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 38, 85, 0, 0, 0, 0, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 103, 95, 0, 104, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 105, 0, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 77, 0, 0, 77, 0, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 105, 0, 0, 0, 0, 106, 0, 0, 0, 0, 0, 0, 38, 1, 57, 1, 57, 0, 0, 107, 0, 0, 0, 0, 0, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 107, 0, 0, 0, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 87, 0, 0, 0, 0, 0, 0, 1, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 108, 0, 109, 110, 111, 112, 0, 51, 4, 113, 48, 23, 0, 0, 0, 0, 0, 0, 0, 38, 49, 0, 0, 0, 0, 38, 57, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 113, 0, 0, }; static RE_UINT8 re_canonical_combining_class_stage_5[] = { 0, 0, 0, 0, 50, 50, 50, 50, 50, 51, 45, 45, 45, 45, 51, 43, 45, 45, 45, 45, 45, 41, 41, 45, 45, 45, 45, 41, 41, 45, 45, 45, 1, 1, 1, 1, 1, 45, 45, 45, 45, 50, 50, 50, 50, 54, 50, 45, 45, 45, 50, 50, 50, 45, 45, 0, 50, 50, 50, 45, 45, 45, 45, 50, 51, 45, 45, 50, 52, 53, 53, 52, 53, 53, 52, 50, 0, 0, 0, 50, 0, 45, 50, 50, 50, 50, 45, 50, 50, 50, 46, 45, 50, 50, 45, 45, 50, 46, 49, 50, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 15, 16, 17, 0, 18, 0, 19, 20, 0, 50, 45, 0, 13, 25, 26, 27, 0, 0, 0, 0, 22, 23, 24, 25, 26, 27, 28, 29, 50, 50, 45, 45, 50, 45, 50, 50, 45, 30, 0, 0, 0, 0, 0, 50, 50, 50, 0, 0, 50, 50, 0, 45, 50, 50, 45, 0, 0, 0, 31, 0, 0, 50, 45, 50, 50, 45, 45, 50, 45, 45, 50, 45, 50, 45, 50, 50, 0, 50, 50, 0, 50, 0, 50, 50, 50, 50, 50, 0, 0, 0, 45, 45, 45, 0, 0, 0, 45, 50, 45, 45, 45, 22, 23, 24, 50, 2, 0, 0, 0, 0, 4, 0, 0, 0, 50, 45, 50, 50, 0, 0, 0, 0, 32, 33, 0, 0, 0, 4, 0, 34, 34, 4, 0, 35, 35, 35, 35, 36, 36, 0, 0, 37, 37, 37, 37, 45, 45, 0, 0, 0, 45, 0, 45, 0, 43, 0, 0, 0, 38, 39, 0, 40, 0, 0, 0, 0, 0, 39, 39, 39, 39, 0, 0, 39, 0, 50, 50, 4, 0, 50, 50, 0, 0, 45, 0, 0, 0, 0, 2, 0, 4, 4, 0, 0, 45, 0, 0, 4, 0, 0, 0, 0, 50, 0, 0, 0, 49, 0, 0, 0, 46, 50, 45, 45, 0, 0, 0, 50, 0, 0, 45, 0, 0, 4, 4, 0, 0, 2, 0, 50, 50, 50, 0, 50, 0, 1, 1, 1, 0, 0, 0, 50, 53, 42, 45, 41, 50, 50, 50, 52, 45, 50, 45, 50, 50, 1, 1, 1, 1, 1, 50, 0, 1, 1, 50, 45, 50, 1, 1, 0, 0, 0, 4, 0, 0, 44, 49, 51, 46, 47, 47, 0, 3, 3, 0, 50, 0, 50, 50, 45, 0, 0, 50, 0, 0, 21, 0, 0, 45, 0, 50, 50, 1, 45, 0, 0, 50, 45, 0, 0, 4, 2, 0, 0, 2, 4, 0, 0, 0, 4, 2, 0, 0, 1, 0, 0, 43, 43, 1, 1, 1, 0, 0, 0, 48, 43, 43, 43, 43, 43, 0, 45, 45, 45, 0, }; /* Canonical_Combining_Class: 2112 bytes. */ RE_UINT32 re_get_canonical_combining_class(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_canonical_combining_class_stage_1[f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_canonical_combining_class_stage_2[pos + f] << 4; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_canonical_combining_class_stage_3[pos + f] << 3; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_canonical_combining_class_stage_4[pos + f] << 2; value = re_canonical_combining_class_stage_5[pos + code]; return value; } /* Decomposition_Type. */ static RE_UINT8 re_decomposition_type_stage_1[] = { 0, 1, 2, 2, 2, 3, 4, 5, 6, 2, 2, 2, 2, 2, 7, 8, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_decomposition_type_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 16, 7, 17, 18, 19, 20, 21, 22, 23, 24, 7, 7, 7, 7, 7, 25, 7, 26, 27, 28, 29, 30, 31, 32, 33, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 34, 35, 7, 7, 7, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 37, 39, 40, 41, 42, 43, 44, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 45, 46, 7, 47, 48, 49, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 50, 7, 7, 51, 52, 53, 54, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 55, 7, 7, 56, 57, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 37, 37, 58, 7, 7, 7, 7, 7, }; static RE_UINT8 re_decomposition_type_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 3, 5, 6, 7, 8, 9, 10, 11, 8, 12, 0, 0, 13, 14, 15, 16, 17, 18, 6, 19, 20, 21, 0, 0, 0, 0, 0, 0, 0, 22, 0, 23, 24, 0, 0, 0, 0, 0, 25, 0, 0, 26, 27, 14, 28, 14, 29, 30, 0, 31, 32, 33, 0, 33, 0, 32, 0, 34, 0, 0, 0, 0, 35, 36, 37, 38, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 0, 0, 0, 0, 41, 0, 0, 0, 0, 42, 43, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 44, 0, 45, 0, 0, 0, 0, 0, 0, 46, 47, 0, 0, 0, 0, 0, 48, 0, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 51, 0, 0, 0, 52, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 57, 0, 0, 0, 0, 0, 0, 0, 57, 0, 58, 0, 0, 59, 0, 0, 0, 60, 61, 33, 62, 63, 60, 61, 33, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 66, 67, 0, 68, 69, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 71, 72, 73, 74, 75, 0, 76, 73, 73, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 77, 6, 6, 6, 6, 6, 78, 6, 79, 6, 6, 79, 80, 6, 81, 6, 6, 6, 82, 83, 84, 6, 85, 86, 87, 88, 89, 90, 91, 0, 92, 93, 94, 95, 0, 0, 0, 0, 0, 96, 97, 98, 99, 100, 101, 102, 102, 103, 104, 105, 0, 106, 0, 0, 0, 107, 0, 108, 109, 110, 0, 111, 112, 112, 0, 113, 0, 0, 0, 114, 0, 0, 0, 115, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 116, 117, 102, 102, 102, 118, 116, 116, 119, 0, 120, 0, 0, 0, 0, 0, 0, 121, 0, 0, 0, 0, 0, 122, 0, 0, 0, 0, 0, 0, 0, 0, 0, 123, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 124, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 125, 0, 0, 0, 0, 0, 57, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 126, 0, 0, 127, 0, 0, 128, 129, 130, 131, 132, 0, 133, 129, 130, 131, 132, 0, 134, 0, 0, 0, 135, 102, 102, 102, 102, 136, 137, 0, 0, 0, 0, 0, 0, 102, 136, 102, 102, 138, 139, 116, 140, 116, 116, 116, 116, 141, 116, 116, 140, 142, 142, 142, 142, 142, 143, 102, 144, 142, 142, 142, 142, 142, 142, 102, 145, 0, 0, 0, 0, 0, 0, 0, 0, 0, 146, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 147, 0, 0, 0, 0, 0, 0, 0, 148, 0, 0, 0, 0, 0, 149, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 21, 0, 0, 0, 0, 0, 81, 150, 151, 6, 6, 6, 81, 6, 6, 6, 6, 6, 6, 78, 0, 0, 152, 153, 154, 155, 156, 157, 158, 158, 159, 158, 160, 161, 0, 162, 163, 164, 165, 165, 165, 165, 165, 165, 166, 167, 167, 168, 169, 169, 169, 170, 171, 172, 165, 173, 174, 175, 0, 176, 177, 178, 179, 180, 167, 181, 182, 0, 0, 183, 0, 184, 0, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 194, 195, 196, 197, 198, 198, 198, 198, 198, 199, 200, 200, 200, 200, 201, 202, 203, 204, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 205, 206, 0, 0, 0, 0, 0, 0, 0, 207, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 208, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 104, 0, 0, 0, 0, 0, 0, 0, 0, 0, 207, 209, 0, 0, 0, 0, 210, 14, 0, 0, 0, 211, 211, 211, 211, 211, 212, 211, 211, 211, 213, 214, 215, 216, 211, 211, 211, 217, 218, 211, 219, 220, 221, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 222, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 223, 211, 211, 211, 216, 211, 224, 225, 226, 227, 228, 229, 230, 231, 232, 231, 0, 0, 0, 0, 233, 102, 234, 142, 142, 0, 235, 0, 0, 236, 0, 0, 0, 0, 0, 0, 237, 142, 142, 238, 239, 240, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_decomposition_type_stage_4[] = { 0, 0, 0, 0, 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 8, 8, 10, 11, 10, 12, 10, 11, 10, 9, 8, 8, 8, 8, 13, 8, 8, 8, 8, 12, 8, 8, 14, 8, 10, 15, 16, 8, 17, 8, 12, 8, 8, 8, 8, 8, 8, 15, 12, 0, 0, 18, 19, 0, 0, 0, 0, 20, 20, 21, 8, 8, 8, 22, 8, 13, 8, 8, 23, 12, 8, 8, 8, 8, 8, 13, 0, 13, 8, 8, 8, 0, 0, 0, 24, 24, 25, 0, 0, 0, 20, 5, 24, 25, 0, 0, 9, 19, 0, 0, 0, 19, 26, 27, 0, 21, 11, 22, 0, 0, 13, 8, 0, 0, 13, 11, 28, 29, 0, 0, 30, 5, 31, 0, 9, 18, 0, 11, 0, 0, 32, 0, 0, 13, 0, 0, 33, 0, 0, 0, 8, 13, 13, 8, 13, 8, 13, 8, 8, 12, 12, 0, 0, 3, 0, 0, 13, 11, 0, 0, 0, 34, 35, 0, 36, 0, 0, 0, 18, 0, 0, 0, 32, 19, 0, 0, 0, 0, 8, 8, 0, 0, 18, 19, 0, 0, 0, 9, 18, 27, 0, 0, 0, 0, 10, 27, 0, 0, 37, 19, 0, 0, 0, 12, 0, 19, 0, 0, 0, 0, 13, 19, 0, 0, 19, 0, 19, 18, 22, 0, 0, 0, 27, 11, 3, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 1, 18, 0, 0, 32, 27, 18, 0, 19, 18, 38, 17, 0, 32, 0, 0, 0, 0, 27, 0, 0, 0, 0, 0, 25, 0, 27, 36, 36, 27, 0, 0, 0, 0, 0, 18, 32, 9, 0, 0, 0, 0, 0, 0, 39, 24, 24, 39, 24, 24, 24, 24, 40, 24, 24, 24, 24, 41, 42, 43, 0, 0, 0, 25, 0, 0, 0, 44, 24, 8, 8, 45, 0, 8, 8, 12, 0, 8, 12, 8, 12, 8, 8, 46, 46, 8, 8, 8, 12, 8, 22, 8, 47, 21, 22, 8, 8, 8, 13, 8, 10, 13, 22, 8, 48, 49, 50, 30, 0, 51, 3, 0, 0, 0, 30, 0, 52, 3, 53, 0, 54, 0, 3, 5, 0, 0, 3, 0, 3, 55, 24, 24, 24, 42, 42, 42, 43, 42, 42, 42, 56, 0, 0, 35, 0, 57, 34, 58, 59, 59, 60, 61, 62, 63, 64, 65, 66, 66, 67, 68, 59, 69, 61, 62, 0, 70, 70, 70, 70, 20, 20, 20, 20, 0, 0, 71, 0, 0, 0, 13, 0, 0, 0, 0, 27, 0, 0, 0, 10, 0, 19, 32, 19, 0, 36, 0, 72, 35, 0, 0, 0, 32, 37, 32, 0, 36, 0, 0, 10, 12, 12, 12, 0, 0, 0, 0, 8, 8, 0, 13, 12, 0, 0, 33, 0, 73, 73, 73, 73, 73, 20, 20, 20, 20, 74, 73, 73, 73, 73, 75, 0, 0, 0, 0, 35, 0, 30, 0, 0, 0, 0, 0, 19, 0, 0, 0, 76, 0, 0, 0, 44, 0, 0, 0, 3, 20, 5, 0, 0, 77, 0, 0, 0, 0, 26, 30, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 46, 32, 0, 9, 22, 33, 12, 0, 19, 3, 78, 0, 37, 11, 79, 34, 20, 20, 20, 20, 20, 20, 30, 4, 24, 24, 24, 20, 73, 0, 0, 80, 73, 73, 73, 73, 73, 73, 75, 20, 20, 20, 81, 81, 81, 81, 81, 81, 81, 20, 20, 82, 81, 81, 81, 20, 20, 20, 83, 0, 0, 0, 55, 25, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 24, 36, 10, 8, 11, 36, 33, 13, 8, 20, 30, 0, 0, 3, 20, 0, 46, 59, 59, 84, 8, 8, 11, 8, 36, 9, 22, 8, 15, 85, 86, 86, 86, 86, 86, 86, 86, 86, 85, 85, 85, 87, 85, 86, 86, 88, 0, 0, 0, 89, 90, 91, 92, 85, 87, 86, 85, 85, 85, 93, 87, 94, 94, 94, 94, 94, 95, 95, 95, 95, 95, 95, 95, 95, 96, 97, 97, 97, 97, 97, 97, 97, 97, 97, 98, 99, 99, 99, 99, 99, 100, 94, 94, 101, 95, 95, 95, 95, 95, 95, 102, 97, 99, 99, 103, 104, 97, 105, 106, 107, 105, 108, 105, 104, 96, 95, 105, 96, 109, 110, 97, 111, 106, 112, 105, 95, 106, 113, 95, 96, 106, 0, 0, 94, 94, 94, 114, 115, 115, 116, 0, 115, 115, 115, 115, 115, 117, 118, 20, 119, 120, 120, 120, 120, 119, 120, 0, 121, 122, 123, 123, 124, 91, 125, 126, 90, 125, 127, 127, 127, 127, 126, 91, 125, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 126, 125, 126, 91, 128, 129, 130, 130, 130, 130, 130, 130, 130, 131, 132, 132, 132, 132, 132, 132, 132, 132, 132, 132, 133, 134, 132, 134, 132, 134, 132, 134, 135, 130, 136, 132, 133, 0, 0, 27, 19, 0, 0, 18, 0, 0, 0, 0, 13, 0, 0, 18, 36, 8, 19, 0, 0, 0, 0, 18, 8, 59, 59, 59, 59, 59, 137, 59, 59, 59, 59, 59, 137, 138, 139, 61, 137, 59, 59, 66, 61, 59, 61, 59, 59, 59, 66, 140, 61, 59, 137, 59, 137, 59, 59, 66, 140, 59, 141, 142, 59, 137, 59, 59, 59, 59, 62, 59, 59, 59, 59, 59, 142, 139, 143, 61, 59, 140, 59, 144, 0, 138, 145, 144, 61, 139, 143, 144, 144, 139, 143, 140, 59, 140, 59, 61, 141, 59, 59, 66, 59, 59, 59, 59, 0, 61, 61, 66, 59, 20, 20, 30, 0, 20, 20, 146, 75, 0, 0, 4, 0, 147, 0, 0, 0, 148, 0, 0, 0, 81, 81, 148, 0, 20, 20, 35, 0, 149, 0, 0, 0, }; static RE_UINT8 re_decomposition_type_stage_5[] = { 0, 0, 0, 0, 4, 0, 0, 0, 2, 0, 10, 0, 0, 0, 0, 2, 0, 0, 10, 10, 2, 2, 0, 0, 2, 10, 10, 0, 17, 17, 17, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 2, 2, 1, 1, 1, 2, 2, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 1, 1, 0, 1, 1, 1, 2, 2, 2, 10, 10, 10, 10, 10, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 2, 2, 2, 1, 1, 2, 2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 2, 2, 2, 2, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 2, 10, 10, 10, 0, 10, 10, 0, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 0, 0, 0, 0, 10, 1, 1, 2, 1, 0, 1, 0, 1, 1, 2, 1, 2, 1, 1, 2, 0, 1, 1, 2, 2, 2, 2, 2, 4, 0, 4, 0, 0, 0, 0, 0, 4, 2, 0, 2, 2, 2, 0, 2, 0, 10, 10, 0, 0, 11, 0, 0, 0, 2, 2, 3, 2, 0, 2, 3, 3, 3, 3, 3, 3, 0, 3, 2, 0, 0, 3, 3, 3, 3, 3, 0, 0, 10, 2, 10, 0, 3, 0, 1, 0, 3, 0, 1, 1, 3, 3, 0, 3, 3, 2, 2, 2, 2, 3, 0, 2, 3, 0, 0, 0, 17, 17, 17, 17, 0, 17, 0, 0, 2, 2, 0, 2, 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 9, 0, 11, 10, 0, 0, 13, 0, 0, 0, 2, 0, 1, 12, 0, 0, 1, 12, 16, 9, 9, 9, 16, 16, 16, 16, 2, 16, 16, 16, 2, 2, 2, 16, 3, 3, 1, 1, 8, 7, 8, 7, 5, 6, 8, 7, 8, 7, 5, 6, 8, 7, 0, 0, 0, 0, 0, 8, 7, 5, 6, 8, 7, 8, 7, 8, 7, 8, 8, 7, 5, 8, 7, 5, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 7, 7, 7, 7, 5, 5, 5, 7, 8, 0, 0, 5, 7, 5, 5, 7, 5, 7, 7, 5, 5, 7, 7, 5, 5, 7, 5, 5, 7, 7, 5, 7, 7, 5, 7, 5, 5, 5, 7, 0, 0, 5, 5, 5, 7, 7, 7, 5, 7, 5, 7, 8, 0, 0, 0, 12, 12, 12, 12, 12, 12, 0, 0, 12, 0, 0, 12, 12, 2, 2, 2, 15, 15, 15, 0, 15, 15, 15, 15, 8, 6, 8, 0, 8, 0, 8, 6, 8, 6, 8, 6, 8, 8, 7, 8, 7, 8, 7, 5, 6, 8, 7, 8, 6, 8, 7, 5, 7, 0, 0, 0, 0, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 0, 0, 0, 14, 14, 14, 0, 0, 0, 13, 13, 13, 0, 3, 0, 3, 3, 0, 0, 3, 0, 0, 3, 3, 0, 3, 3, 3, 0, 3, 0, 3, 0, 0, 0, 3, 3, 3, 0, 0, 3, 0, 3, 0, 3, 0, 0, 0, 3, 2, 2, 2, 9, 16, 0, 0, 0, 16, 16, 16, 0, 9, 9, 0, 0, }; /* Decomposition_Type: 2964 bytes. */ RE_UINT32 re_get_decomposition_type(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_decomposition_type_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_decomposition_type_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_decomposition_type_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_decomposition_type_stage_4[pos + f] << 2; value = re_decomposition_type_stage_5[pos + code]; return value; } /* East_Asian_Width. */ static RE_UINT8 re_east_asian_width_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 5, 5, 7, 8, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 10, 10, 10, 12, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 13, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 13, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 14, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 15, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 15, }; static RE_UINT8 re_east_asian_width_stage_2[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7, 8, 9, 10, 11, 12, 13, 14, 5, 15, 5, 16, 5, 5, 17, 18, 19, 20, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 24, 5, 5, 5, 5, 25, 5, 5, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 26, 5, 5, 5, 5, 5, 5, 5, 5, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 22, 22, 5, 5, 5, 28, 29, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 30, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 31, 32, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 33, 5, 34, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 35, }; static RE_UINT8 re_east_asian_width_stage_3[] = { 0, 0, 1, 1, 1, 1, 1, 2, 0, 0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 11, 0, 0, 0, 0, 0, 15, 16, 0, 0, 0, 0, 0, 0, 0, 9, 9, 0, 0, 0, 0, 0, 17, 18, 0, 0, 19, 19, 19, 19, 19, 19, 19, 0, 0, 20, 21, 20, 21, 0, 0, 0, 9, 19, 19, 19, 19, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 22, 22, 22, 22, 22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 24, 25, 0, 0, 0, 26, 27, 0, 28, 0, 0, 0, 0, 0, 29, 30, 31, 0, 0, 32, 33, 34, 35, 34, 0, 36, 0, 37, 38, 0, 39, 40, 41, 42, 43, 44, 45, 0, 46, 47, 48, 49, 0, 0, 0, 0, 0, 44, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 19, 19, 19, 19, 19, 19, 19, 51, 19, 19, 19, 19, 19, 33, 19, 19, 52, 19, 53, 21, 54, 55, 56, 57, 0, 58, 59, 0, 0, 60, 0, 61, 0, 0, 62, 0, 62, 63, 19, 64, 19, 0, 0, 0, 65, 0, 38, 0, 66, 0, 0, 0, 0, 0, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 70, 22, 22, 22, 22, 22, 71, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 72, 0, 73, 74, 22, 22, 75, 76, 22, 22, 22, 22, 77, 22, 22, 22, 22, 22, 22, 78, 22, 79, 76, 22, 22, 22, 22, 75, 22, 22, 80, 22, 22, 71, 22, 22, 75, 22, 22, 81, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 75, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 0, 0, 0, 0, 22, 22, 22, 22, 22, 22, 22, 22, 82, 22, 22, 22, 83, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 82, 0, 0, 0, 0, 0, 0, 0, 0, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 71, 0, 0, 0, 0, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 84, 0, 22, 22, 85, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 87, 88, 88, 88, 88, 88, 89, 90, 90, 90, 90, 91, 92, 93, 94, 65, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 19, 97, 19, 19, 19, 34, 19, 19, 96, 0, 0, 0, 0, 0, 0, 98, 22, 22, 80, 99, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 79, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 97, }; static RE_UINT8 re_east_asian_width_stage_4[] = { 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 7, 0, 10, 0, 0, 11, 12, 11, 13, 14, 10, 9, 14, 8, 12, 9, 5, 15, 0, 0, 0, 16, 0, 12, 0, 0, 13, 12, 0, 17, 0, 11, 12, 9, 11, 7, 15, 13, 0, 0, 0, 0, 0, 0, 10, 5, 5, 5, 11, 0, 18, 17, 15, 11, 0, 7, 16, 7, 7, 7, 7, 17, 7, 7, 7, 19, 7, 14, 0, 20, 20, 20, 20, 18, 9, 14, 14, 9, 7, 0, 0, 8, 15, 12, 10, 0, 11, 0, 12, 17, 11, 0, 0, 0, 0, 21, 11, 12, 15, 15, 0, 12, 10, 0, 0, 22, 10, 12, 0, 12, 11, 12, 9, 7, 7, 7, 0, 7, 7, 14, 0, 0, 0, 15, 0, 0, 0, 14, 0, 10, 11, 0, 0, 0, 12, 0, 0, 8, 12, 18, 12, 15, 15, 10, 17, 18, 16, 7, 5, 0, 7, 0, 14, 0, 0, 11, 11, 10, 0, 0, 0, 14, 7, 13, 13, 13, 13, 0, 0, 0, 15, 15, 0, 0, 15, 0, 0, 0, 0, 0, 12, 0, 0, 23, 0, 7, 7, 19, 7, 7, 0, 0, 0, 13, 14, 0, 0, 13, 13, 0, 14, 14, 13, 18, 13, 14, 0, 0, 0, 13, 14, 0, 12, 0, 22, 15, 13, 0, 14, 0, 5, 5, 0, 0, 0, 19, 19, 9, 19, 0, 0, 0, 13, 0, 7, 7, 19, 19, 0, 7, 7, 0, 0, 0, 15, 0, 13, 7, 7, 0, 24, 1, 25, 0, 26, 0, 0, 0, 17, 14, 0, 20, 20, 27, 20, 20, 0, 0, 0, 20, 28, 0, 0, 20, 20, 20, 0, 29, 20, 20, 20, 20, 20, 20, 30, 31, 20, 20, 20, 20, 30, 31, 20, 0, 31, 20, 20, 20, 20, 20, 28, 20, 20, 30, 0, 20, 20, 7, 7, 20, 20, 20, 32, 20, 30, 0, 0, 20, 20, 28, 0, 30, 20, 20, 20, 20, 30, 20, 0, 33, 34, 34, 34, 34, 34, 34, 34, 35, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 38, 36, 38, 36, 38, 36, 38, 39, 34, 40, 36, 37, 28, 0, 0, 0, 7, 7, 9, 0, 7, 7, 7, 14, 30, 0, 0, 0, 20, 20, 32, 0, }; static RE_UINT8 re_east_asian_width_stage_5[] = { 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 0, 0, 1, 5, 5, 1, 5, 5, 1, 1, 0, 1, 0, 5, 1, 1, 5, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 3, 3, 3, 3, 0, 2, 0, 0, 0, 1, 1, 0, 0, 3, 3, 0, 0, 0, 5, 5, 5, 5, 0, 0, 0, 5, 5, 0, 3, 3, 0, 3, 3, 3, 0, 0, 4, 3, 3, 3, 3, 3, 3, 0, 0, 3, 3, 3, 3, 0, 0, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 2, 2, 2, 0, 0, 0, 4, 4, 4, 0, }; /* East_Asian_Width: 1668 bytes. */ RE_UINT32 re_get_east_asian_width(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_east_asian_width_stage_1[f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_east_asian_width_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_east_asian_width_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_east_asian_width_stage_4[pos + f] << 2; value = re_east_asian_width_stage_5[pos + code]; return value; } /* Joining_Group. */ static RE_UINT8 re_joining_group_stage_1[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_joining_group_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_joining_group_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_joining_group_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 0, 0, 21, 0, 22, 0, 0, 23, 24, 25, 26, 0, 0, 0, 27, 28, 29, 30, 31, 32, 33, 0, 0, 0, 0, 34, 35, 36, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 37, 38, 39, 40, 41, 42, 0, 0, }; static RE_UINT8 re_joining_group_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 45, 0, 3, 3, 43, 3, 45, 3, 4, 41, 4, 4, 13, 13, 13, 6, 6, 31, 31, 35, 35, 33, 33, 39, 39, 1, 1, 11, 11, 55, 55, 55, 0, 9, 29, 19, 22, 24, 26, 16, 43, 45, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 29, 0, 3, 3, 3, 0, 3, 43, 43, 45, 4, 4, 4, 4, 4, 4, 4, 4, 13, 13, 13, 13, 13, 13, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 31, 31, 31, 31, 31, 31, 31, 31, 31, 35, 35, 35, 33, 33, 39, 1, 9, 9, 9, 9, 9, 9, 29, 29, 11, 38, 11, 19, 19, 19, 11, 11, 11, 11, 11, 11, 22, 22, 22, 22, 26, 26, 26, 26, 56, 21, 13, 41, 17, 17, 14, 43, 43, 43, 43, 43, 43, 43, 43, 55, 47, 55, 43, 45, 45, 46, 46, 0, 41, 0, 0, 0, 0, 0, 0, 0, 0, 6, 31, 0, 0, 35, 33, 1, 0, 0, 21, 2, 0, 5, 12, 12, 7, 7, 15, 44, 50, 18, 42, 42, 48, 49, 20, 23, 25, 27, 36, 10, 8, 28, 32, 34, 30, 7, 37, 40, 5, 12, 7, 0, 0, 0, 0, 0, 51, 52, 53, 4, 4, 4, 4, 4, 4, 4, 13, 13, 6, 6, 31, 35, 1, 1, 1, 9, 9, 11, 11, 11, 24, 24, 26, 26, 26, 22, 31, 31, 35, 13, 13, 35, 31, 13, 3, 3, 55, 55, 45, 43, 43, 54, 54, 13, 35, 35, 19, 4, 4, 13, 39, 9, 29, 22, 24, 45, 45, 31, 43, 57, 0, 6, 33, 11, 58, 31, 1, 19, 0, 0, 0, 59, 61, 61, 65, 65, 62, 0, 83, 0, 85, 85, 0, 0, 66, 80, 84, 68, 68, 68, 69, 63, 81, 70, 71, 77, 60, 60, 73, 73, 76, 74, 74, 74, 75, 0, 0, 78, 0, 0, 0, 0, 0, 0, 72, 64, 79, 82, 67, }; /* Joining_Group: 586 bytes. */ RE_UINT32 re_get_joining_group(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_joining_group_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_joining_group_stage_2[pos + f] << 4; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_joining_group_stage_3[pos + f] << 4; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_joining_group_stage_4[pos + f] << 3; value = re_joining_group_stage_5[pos + code]; return value; } /* Joining_Type. */ static RE_UINT8 re_joining_type_stage_1[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 6, 7, 8, 4, 4, 4, 4, 9, 4, 4, 4, 4, 10, 4, 11, 12, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 13, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_joining_type_stage_2[] = { 0, 1, 0, 0, 0, 0, 2, 0, 0, 3, 0, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 0, 0, 0, 0, 27, 0, 0, 0, 0, 0, 0, 0, 28, 29, 30, 31, 32, 0, 33, 34, 35, 36, 37, 38, 0, 39, 0, 0, 0, 0, 40, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 43, 44, 0, 0, 0, 0, 45, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 47, 48, 0, 0, 49, 50, 51, 52, 53, 54, 0, 55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 57, 43, 0, 58, 0, 0, 0, 59, 0, 60, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 63, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 65, 66, 67, 68, 69, 70, 71, 0, 0, 72, 0, 73, 74, 75, 76, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 77, 78, 0, 0, 0, 0, 0, 0, 0, 0, 79, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 0, 81, 82, 83, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 84, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 87, 0, 88, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_joining_type_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 4, 2, 5, 6, 0, 0, 0, 0, 7, 8, 9, 10, 2, 11, 12, 13, 14, 15, 15, 16, 17, 18, 19, 20, 21, 22, 2, 23, 24, 25, 26, 0, 0, 27, 28, 29, 15, 30, 31, 0, 32, 33, 0, 34, 35, 0, 0, 0, 0, 36, 37, 0, 0, 38, 2, 39, 0, 0, 40, 41, 42, 43, 0, 44, 0, 0, 45, 46, 0, 43, 0, 47, 0, 0, 45, 48, 44, 0, 49, 47, 0, 0, 45, 50, 0, 43, 0, 44, 0, 0, 51, 46, 52, 43, 0, 53, 0, 0, 0, 54, 0, 0, 0, 28, 0, 0, 55, 56, 57, 43, 0, 44, 0, 0, 51, 58, 0, 43, 0, 44, 0, 0, 0, 46, 0, 43, 0, 0, 0, 0, 0, 59, 60, 0, 0, 0, 0, 0, 61, 62, 0, 0, 0, 0, 0, 0, 63, 64, 0, 0, 0, 0, 65, 0, 66, 0, 0, 0, 67, 68, 69, 2, 70, 52, 0, 0, 0, 0, 0, 71, 72, 0, 73, 28, 74, 75, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 71, 0, 0, 0, 76, 0, 76, 0, 43, 0, 43, 0, 0, 0, 77, 78, 79, 0, 0, 80, 0, 15, 15, 15, 15, 15, 81, 82, 15, 83, 0, 0, 0, 0, 0, 0, 0, 84, 85, 0, 0, 0, 0, 0, 86, 0, 0, 0, 87, 88, 89, 0, 0, 0, 90, 0, 0, 0, 0, 91, 0, 0, 92, 53, 0, 93, 91, 94, 0, 95, 0, 0, 0, 96, 94, 0, 0, 97, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 100, 101, 0, 0, 0, 0, 2, 2, 2, 102, 103, 0, 104, 0, 0, 0, 105, 0, 0, 0, 0, 0, 0, 2, 2, 28, 0, 0, 0, 0, 0, 0, 20, 94, 0, 0, 0, 0, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 106, 0, 0, 0, 0, 0, 0, 107, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 108, 0, 55, 0, 0, 0, 0, 0, 94, 109, 0, 57, 0, 15, 15, 15, 110, 0, 0, 0, 0, 111, 0, 2, 94, 0, 0, 112, 0, 113, 94, 0, 0, 39, 0, 0, 114, 0, 0, 115, 0, 0, 0, 116, 117, 118, 0, 0, 45, 0, 0, 0, 119, 44, 0, 120, 52, 0, 0, 0, 0, 0, 0, 121, 0, 0, 122, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 123, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 28, 0, 0, 0, 0, 0, 0, 0, 0, 124, 125, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 127, 128, 129, 0, 130, 131, 132, 0, 0, 0, 0, 0, 44, 0, 0, 133, 134, 0, 0, 20, 94, 0, 0, 135, 0, 0, 0, 0, 39, 0, 136, 137, 0, 0, 0, 138, 94, 0, 0, 139, 140, 0, 0, 0, 0, 0, 20, 141, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 142, 0, 94, 0, 0, 45, 28, 0, 143, 137, 0, 0, 0, 144, 145, 0, 0, 0, 0, 0, 0, 146, 28, 120, 0, 0, 0, 0, 0, 147, 28, 0, 0, 0, 0, 0, 148, 149, 0, 0, 0, 0, 0, 71, 150, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 137, 0, 0, 0, 134, 0, 0, 0, 0, 20, 39, 0, 0, 0, 0, 0, 0, 0, 151, 91, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152, 38, 153, 0, 106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 76, 0, 0, 0, 2, 2, 2, 154, 2, 2, 70, 115, 111, 93, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 0, 0, 44, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_joining_type_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 3, 2, 4, 0, 5, 2, 2, 2, 2, 2, 2, 6, 7, 6, 0, 0, 2, 2, 8, 9, 10, 11, 12, 13, 14, 15, 15, 15, 16, 15, 17, 2, 0, 0, 0, 18, 19, 20, 15, 15, 15, 15, 21, 21, 21, 21, 22, 15, 15, 15, 15, 15, 23, 21, 21, 24, 25, 26, 2, 27, 2, 27, 28, 29, 0, 0, 18, 30, 0, 0, 0, 3, 31, 32, 22, 33, 15, 15, 34, 23, 2, 2, 8, 35, 15, 15, 32, 15, 15, 15, 13, 36, 24, 36, 22, 15, 0, 37, 2, 2, 9, 0, 0, 0, 0, 0, 18, 15, 15, 15, 38, 2, 2, 0, 39, 0, 0, 37, 6, 2, 2, 5, 5, 4, 36, 25, 12, 15, 15, 40, 5, 0, 15, 15, 25, 41, 42, 43, 0, 0, 3, 2, 2, 2, 8, 0, 0, 0, 0, 0, 44, 9, 5, 2, 9, 1, 5, 2, 0, 0, 37, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 9, 5, 9, 0, 1, 7, 0, 0, 0, 7, 3, 27, 4, 4, 1, 0, 0, 5, 6, 9, 1, 0, 0, 0, 27, 0, 44, 0, 0, 44, 0, 0, 0, 9, 0, 0, 1, 0, 0, 0, 37, 9, 37, 28, 4, 0, 7, 0, 0, 0, 44, 0, 4, 0, 0, 44, 0, 37, 45, 0, 0, 1, 2, 8, 0, 0, 3, 2, 8, 1, 2, 6, 9, 0, 0, 2, 4, 0, 0, 4, 0, 0, 46, 1, 0, 5, 2, 2, 8, 2, 28, 0, 5, 2, 2, 5, 2, 2, 2, 2, 9, 0, 0, 0, 5, 28, 2, 7, 7, 0, 0, 4, 37, 5, 9, 0, 0, 44, 7, 0, 1, 37, 9, 0, 0, 0, 6, 2, 4, 0, 44, 5, 2, 2, 0, 0, 1, 0, 47, 48, 4, 15, 15, 0, 0, 0, 47, 15, 15, 15, 15, 49, 0, 8, 3, 9, 0, 44, 0, 5, 0, 0, 3, 27, 0, 0, 44, 2, 8, 45, 5, 2, 9, 3, 2, 2, 27, 2, 2, 2, 8, 2, 0, 0, 0, 0, 28, 8, 9, 0, 0, 3, 2, 4, 0, 0, 0, 37, 4, 6, 4, 0, 44, 4, 46, 0, 0, 0, 2, 2, 37, 0, 0, 8, 2, 2, 2, 28, 2, 9, 1, 0, 9, 4, 0, 2, 4, 0, 2, 0, 0, 3, 50, 0, 0, 37, 8, 2, 9, 37, 2, 0, 0, 37, 4, 0, 0, 7, 0, 8, 2, 2, 4, 44, 44, 3, 0, 51, 0, 0, 0, 0, 9, 0, 0, 0, 37, 2, 4, 0, 3, 2, 2, 3, 37, 4, 9, 0, 1, 0, 0, 0, 0, 5, 8, 7, 7, 0, 0, 3, 0, 0, 9, 28, 27, 9, 37, 0, 0, 0, 4, 0, 1, 9, 1, 0, 0, 0, 44, 0, 0, 5, 0, 0, 37, 8, 0, 5, 7, 0, 2, 0, 0, 8, 3, 15, 52, 53, 54, 14, 55, 15, 12, 56, 57, 47, 13, 24, 22, 12, 58, 56, 0, 0, 0, 0, 0, 20, 59, 0, 0, 2, 2, 2, 8, 0, 0, 3, 8, 7, 1, 0, 3, 2, 5, 2, 9, 0, 0, 3, 0, 0, 0, 0, 37, 2, 8, 0, 0, 37, 9, 4, 28, 0, 0, 3, 2, 8, 0, 0, 37, 2, 9, 3, 2, 45, 3, 28, 0, 0, 0, 37, 4, 0, 6, 3, 2, 8, 46, 0, 0, 3, 1, 2, 6, 0, 0, 37, 6, 2, 0, 0, 0, 0, 7, 0, 3, 4, 0, 8, 5, 2, 0, 2, 8, 3, 2, }; static RE_UINT8 re_joining_type_stage_5[] = { 0, 0, 0, 0, 0, 5, 0, 0, 5, 5, 5, 5, 0, 0, 0, 5, 5, 5, 0, 0, 0, 5, 5, 5, 5, 5, 0, 5, 0, 5, 5, 0, 5, 5, 5, 0, 5, 0, 0, 0, 2, 0, 3, 3, 3, 3, 2, 3, 2, 3, 2, 2, 2, 2, 2, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 2, 2, 2, 3, 2, 2, 5, 0, 0, 2, 2, 5, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 3, 2, 2, 3, 2, 3, 2, 3, 2, 2, 3, 3, 0, 3, 5, 5, 5, 0, 0, 5, 5, 0, 5, 5, 5, 5, 3, 3, 2, 0, 0, 2, 3, 5, 2, 2, 2, 3, 3, 3, 2, 2, 3, 2, 3, 2, 3, 2, 0, 3, 2, 2, 3, 2, 2, 2, 0, 0, 5, 5, 2, 2, 2, 5, 0, 0, 1, 0, 3, 2, 0, 0, 3, 0, 3, 2, 2, 3, 3, 2, 2, 0, 0, 0, 0, 0, 5, 0, 5, 0, 5, 0, 0, 5, 0, 5, 0, 0, 0, 2, 0, 0, 1, 5, 2, 5, 2, 0, 0, 1, 5, 5, 2, 2, 4, 0, 2, 3, 0, 3, 0, 3, 3, 0, 0, 4, 3, 3, 2, 2, 2, 4, 2, 3, 0, 0, 3, 5, 5, 0, 3, 2, 3, 3, 3, 2, 2, 0, }; /* Joining_Type: 2292 bytes. */ RE_UINT32 re_get_joining_type(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_joining_type_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_joining_type_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_joining_type_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_joining_type_stage_4[pos + f] << 2; value = re_joining_type_stage_5[pos + code]; return value; } /* Line_Break. */ static RE_UINT8 re_line_break_stage_1[] = { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 10, 17, 10, 10, 10, 10, 18, 10, 19, 20, 21, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 22, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 22, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 23, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, }; static RE_UINT8 re_line_break_stage_2[] = { 0, 1, 2, 2, 2, 3, 4, 5, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 2, 2, 2, 2, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 2, 51, 2, 2, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 2, 2, 2, 70, 2, 2, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 87, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 88, 79, 79, 79, 79, 79, 79, 79, 79, 89, 2, 2, 90, 91, 2, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 104, 105, 106, 107, 101, 102, 103, 108, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 79, 79, 79, 79, 111, 112, 2, 2, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 110, 123, 124, 125, 2, 126, 127, 110, 2, 2, 128, 110, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 110, 110, 139, 110, 110, 110, 140, 141, 142, 143, 144, 145, 146, 110, 110, 147, 110, 148, 149, 150, 151, 110, 110, 152, 110, 110, 110, 153, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 2, 2, 2, 2, 2, 2, 154, 155, 2, 156, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 2, 2, 2, 157, 158, 159, 2, 160, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 2, 2, 161, 162, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 2, 2, 2, 163, 164, 165, 166, 110, 110, 110, 110, 110, 110, 167, 168, 169, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 170, 171, 110, 110, 110, 110, 110, 110, 2, 172, 173, 174, 175, 110, 176, 110, 177, 178, 179, 2, 2, 180, 2, 181, 2, 2, 2, 2, 182, 183, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 2, 184, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 185, 186, 110, 110, 187, 188, 189, 190, 191, 110, 79, 192, 79, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 204, 205, 110, 206, 207, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, }; static RE_UINT16 re_line_break_stage_3[] = { 0, 1, 2, 3, 4, 5, 4, 6, 7, 1, 8, 9, 4, 10, 4, 10, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 11, 12, 4, 4, 1, 1, 1, 1, 13, 14, 15, 16, 17, 4, 18, 4, 4, 4, 4, 4, 19, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 20, 4, 21, 20, 4, 22, 23, 1, 24, 25, 26, 27, 28, 29, 30, 4, 4, 31, 1, 32, 33, 4, 4, 4, 4, 4, 34, 35, 36, 37, 38, 4, 1, 39, 4, 4, 4, 4, 4, 40, 41, 36, 4, 31, 42, 4, 43, 44, 45, 4, 46, 47, 47, 47, 47, 4, 48, 47, 47, 49, 1, 50, 4, 4, 51, 1, 52, 53, 4, 54, 55, 56, 57, 58, 59, 60, 61, 62, 55, 56, 63, 64, 65, 66, 67, 68, 18, 56, 69, 70, 71, 60, 72, 73, 55, 56, 69, 74, 75, 60, 76, 77, 78, 79, 80, 81, 82, 66, 83, 84, 85, 56, 86, 87, 88, 60, 89, 90, 85, 56, 91, 87, 92, 60, 93, 90, 85, 4, 94, 95, 96, 60, 97, 98, 99, 4, 100, 101, 102, 66, 103, 104, 105, 105, 106, 107, 108, 47, 47, 109, 110, 111, 112, 113, 114, 47, 47, 115, 116, 36, 117, 118, 4, 119, 120, 121, 122, 1, 123, 124, 125, 47, 47, 105, 105, 105, 105, 126, 105, 105, 105, 105, 127, 4, 4, 128, 4, 4, 4, 129, 129, 129, 129, 129, 129, 130, 130, 130, 130, 131, 132, 132, 132, 132, 132, 4, 4, 4, 4, 133, 134, 4, 4, 133, 4, 4, 135, 136, 137, 4, 4, 4, 136, 4, 4, 4, 138, 139, 119, 4, 140, 4, 4, 4, 4, 4, 141, 142, 4, 4, 4, 4, 4, 4, 4, 142, 143, 4, 4, 4, 4, 144, 145, 146, 147, 4, 148, 4, 149, 146, 150, 105, 105, 105, 105, 105, 151, 152, 140, 153, 152, 4, 4, 4, 4, 4, 76, 4, 4, 154, 4, 4, 4, 4, 155, 4, 45, 156, 156, 157, 105, 158, 159, 105, 105, 160, 105, 161, 162, 4, 4, 4, 163, 105, 105, 105, 164, 105, 165, 152, 152, 158, 166, 47, 47, 47, 47, 167, 4, 4, 168, 169, 170, 171, 172, 173, 4, 174, 36, 4, 4, 40, 175, 4, 4, 168, 176, 177, 36, 4, 178, 47, 47, 47, 47, 76, 179, 180, 181, 4, 4, 4, 4, 1, 1, 1, 182, 4, 141, 4, 4, 141, 183, 4, 184, 4, 4, 4, 185, 185, 186, 4, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 119, 197, 198, 199, 1, 1, 200, 201, 202, 203, 4, 4, 204, 205, 206, 207, 206, 4, 4, 4, 208, 4, 4, 209, 210, 211, 212, 213, 214, 215, 4, 216, 217, 218, 219, 4, 4, 220, 4, 221, 222, 223, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 224, 4, 4, 225, 47, 226, 47, 227, 227, 227, 227, 227, 227, 227, 227, 227, 228, 227, 227, 227, 227, 205, 227, 227, 229, 227, 230, 231, 232, 233, 234, 235, 4, 236, 237, 4, 238, 239, 4, 240, 241, 4, 242, 4, 243, 244, 245, 246, 247, 248, 4, 4, 4, 4, 249, 250, 251, 227, 252, 4, 4, 253, 4, 254, 4, 255, 256, 4, 4, 4, 221, 4, 257, 4, 4, 4, 4, 4, 258, 4, 259, 4, 260, 4, 261, 56, 262, 263, 47, 4, 4, 45, 4, 4, 45, 4, 4, 4, 4, 4, 4, 4, 4, 264, 265, 4, 4, 128, 4, 4, 4, 266, 267, 4, 225, 268, 268, 268, 268, 1, 1, 269, 270, 271, 272, 273, 47, 47, 47, 274, 275, 274, 274, 274, 274, 274, 276, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 277, 47, 278, 279, 280, 281, 282, 283, 274, 284, 274, 285, 286, 287, 274, 284, 274, 285, 288, 289, 274, 290, 291, 274, 274, 274, 274, 292, 274, 274, 293, 274, 274, 276, 294, 274, 292, 274, 274, 295, 274, 274, 274, 274, 274, 274, 274, 274, 274, 274, 292, 274, 274, 274, 274, 4, 4, 4, 4, 274, 296, 274, 274, 274, 274, 274, 274, 297, 274, 274, 274, 298, 4, 4, 178, 299, 4, 300, 47, 4, 4, 264, 301, 4, 302, 4, 4, 4, 4, 4, 303, 4, 4, 184, 76, 47, 47, 47, 304, 305, 4, 306, 307, 4, 4, 4, 308, 309, 4, 4, 168, 310, 152, 1, 311, 36, 4, 312, 4, 313, 314, 129, 315, 50, 4, 4, 316, 317, 318, 105, 319, 4, 4, 320, 321, 322, 323, 105, 105, 105, 105, 105, 105, 324, 325, 31, 326, 327, 328, 268, 4, 4, 4, 155, 4, 4, 4, 4, 4, 4, 4, 329, 152, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 330, 331, 332, 333, 332, 334, 332, 333, 332, 335, 130, 336, 132, 132, 337, 338, 338, 338, 338, 338, 338, 338, 338, 47, 47, 47, 47, 47, 47, 47, 47, 225, 339, 340, 341, 342, 4, 4, 4, 4, 4, 4, 4, 262, 343, 4, 4, 4, 4, 4, 344, 47, 4, 4, 4, 4, 345, 4, 4, 76, 47, 47, 346, 1, 347, 1, 348, 349, 350, 351, 185, 4, 4, 4, 4, 4, 4, 4, 352, 353, 354, 274, 355, 274, 356, 357, 358, 4, 359, 4, 45, 360, 361, 362, 363, 364, 4, 137, 365, 184, 184, 47, 47, 4, 4, 4, 4, 4, 4, 4, 226, 366, 4, 4, 367, 4, 4, 4, 4, 119, 368, 71, 47, 47, 4, 4, 369, 4, 119, 4, 4, 4, 71, 33, 368, 4, 4, 370, 4, 226, 4, 4, 371, 4, 372, 4, 4, 373, 374, 47, 47, 4, 184, 152, 47, 47, 47, 47, 47, 4, 4, 76, 4, 4, 4, 375, 47, 4, 4, 4, 225, 4, 155, 76, 47, 376, 4, 4, 377, 4, 378, 4, 4, 4, 45, 304, 47, 47, 47, 4, 379, 4, 380, 4, 381, 47, 47, 47, 47, 4, 4, 4, 382, 4, 345, 4, 4, 383, 384, 4, 385, 76, 386, 4, 4, 4, 4, 47, 47, 4, 4, 387, 388, 4, 4, 4, 389, 4, 260, 4, 390, 4, 391, 392, 47, 47, 47, 47, 47, 4, 4, 4, 4, 145, 47, 47, 47, 4, 4, 4, 393, 4, 4, 4, 394, 47, 47, 47, 47, 47, 47, 4, 45, 173, 4, 4, 395, 396, 345, 397, 398, 173, 4, 4, 399, 400, 4, 145, 152, 173, 4, 313, 401, 402, 4, 4, 403, 173, 4, 4, 316, 404, 405, 20, 48, 4, 18, 406, 407, 47, 47, 47, 47, 408, 37, 409, 4, 4, 264, 410, 152, 411, 55, 56, 69, 74, 412, 413, 414, 4, 4, 4, 1, 415, 152, 47, 47, 4, 4, 264, 416, 417, 418, 47, 47, 4, 4, 4, 1, 419, 152, 47, 47, 4, 4, 31, 420, 152, 47, 47, 47, 105, 421, 160, 422, 47, 47, 47, 47, 47, 47, 4, 4, 4, 4, 36, 423, 47, 47, 47, 47, 4, 4, 4, 145, 4, 140, 47, 47, 47, 47, 47, 47, 4, 4, 4, 4, 4, 4, 45, 424, 4, 4, 4, 4, 370, 47, 47, 47, 4, 4, 4, 4, 4, 425, 4, 4, 426, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 427, 4, 4, 45, 47, 47, 47, 47, 47, 4, 4, 4, 4, 428, 4, 4, 4, 4, 4, 4, 4, 225, 47, 47, 47, 4, 4, 4, 145, 4, 45, 429, 47, 47, 47, 47, 47, 47, 4, 184, 430, 4, 4, 4, 431, 432, 433, 18, 434, 4, 47, 47, 47, 47, 47, 47, 47, 4, 4, 4, 4, 48, 435, 1, 166, 398, 173, 47, 47, 47, 47, 47, 47, 436, 47, 47, 47, 47, 47, 47, 47, 4, 4, 4, 4, 4, 4, 226, 119, 145, 437, 438, 47, 47, 47, 47, 47, 4, 4, 4, 4, 4, 4, 4, 155, 4, 4, 21, 4, 4, 4, 439, 1, 440, 4, 441, 4, 4, 4, 145, 47, 4, 4, 4, 4, 442, 47, 47, 47, 4, 4, 4, 4, 4, 225, 4, 262, 4, 4, 4, 4, 4, 185, 4, 4, 4, 146, 443, 444, 445, 4, 4, 4, 446, 447, 4, 448, 449, 85, 4, 4, 4, 4, 260, 4, 4, 4, 4, 4, 4, 4, 4, 4, 450, 451, 451, 451, 1, 1, 1, 452, 1, 1, 453, 454, 455, 456, 23, 47, 47, 47, 47, 47, 4, 4, 4, 4, 457, 321, 47, 47, 445, 4, 458, 459, 460, 461, 462, 463, 464, 368, 465, 368, 47, 47, 47, 262, 274, 274, 278, 274, 274, 274, 274, 274, 274, 276, 292, 291, 291, 291, 274, 277, 466, 227, 467, 227, 227, 227, 468, 227, 227, 469, 47, 47, 47, 47, 470, 471, 472, 274, 274, 293, 473, 436, 47, 47, 274, 474, 274, 475, 274, 274, 274, 476, 274, 274, 477, 478, 274, 274, 274, 274, 479, 480, 481, 482, 483, 274, 274, 275, 274, 274, 484, 274, 274, 485, 274, 486, 274, 274, 274, 274, 274, 4, 4, 487, 274, 274, 274, 274, 274, 488, 297, 276, 4, 4, 4, 4, 4, 4, 4, 370, 4, 4, 4, 4, 4, 48, 47, 47, 368, 4, 4, 4, 76, 140, 4, 4, 76, 4, 184, 47, 47, 47, 47, 47, 47, 473, 47, 47, 47, 47, 47, 47, 489, 47, 47, 47, 488, 47, 47, 47, 274, 274, 274, 274, 274, 274, 274, 290, 490, 47, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, }; static RE_UINT8 re_line_break_stage_4[] = { 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 12, 12, 12, 13, 14, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16, 17, 14, 14, 14, 14, 14, 14, 16, 18, 19, 0, 0, 20, 0, 0, 0, 0, 0, 21, 22, 23, 24, 25, 26, 27, 14, 22, 28, 29, 28, 28, 26, 28, 30, 14, 14, 14, 24, 14, 14, 14, 14, 14, 14, 14, 24, 31, 28, 31, 14, 25, 14, 14, 14, 28, 28, 24, 32, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 34, 34, 34, 35, 0, 0, 0, 0, 0, 0, 14, 14, 14, 14, 36, 14, 14, 37, 36, 36, 14, 14, 14, 38, 38, 14, 14, 39, 14, 14, 14, 14, 14, 14, 14, 19, 0, 0, 0, 14, 14, 14, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 39, 14, 14, 14, 14, 14, 14, 14, 40, 41, 39, 9, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 43, 19, 44, 0, 45, 36, 36, 36, 36, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 47, 36, 36, 46, 48, 38, 36, 36, 36, 36, 36, 14, 14, 14, 14, 49, 50, 13, 14, 0, 0, 0, 0, 0, 51, 52, 53, 14, 14, 14, 14, 14, 19, 0, 0, 12, 12, 12, 12, 12, 54, 55, 14, 44, 14, 14, 14, 14, 14, 14, 14, 14, 14, 56, 0, 0, 0, 44, 19, 0, 0, 44, 19, 44, 0, 0, 14, 12, 12, 12, 12, 12, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 39, 19, 14, 14, 14, 14, 14, 14, 14, 0, 0, 0, 0, 0, 52, 39, 14, 14, 14, 14, 0, 0, 0, 0, 0, 44, 36, 36, 36, 36, 36, 36, 36, 0, 0, 14, 14, 57, 38, 36, 36, 14, 14, 14, 0, 0, 19, 0, 0, 0, 0, 19, 0, 19, 0, 0, 36, 14, 14, 14, 14, 14, 14, 14, 38, 14, 14, 14, 14, 19, 0, 36, 38, 36, 36, 36, 36, 36, 36, 36, 36, 14, 14, 38, 36, 36, 36, 36, 36, 36, 42, 0, 0, 0, 0, 0, 0, 0, 0, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 0, 44, 0, 19, 0, 0, 0, 14, 14, 14, 14, 14, 0, 58, 12, 12, 12, 12, 12, 19, 0, 39, 14, 14, 14, 38, 39, 38, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 14, 14, 14, 38, 38, 36, 14, 14, 36, 44, 0, 0, 0, 52, 42, 52, 42, 0, 38, 36, 36, 36, 42, 36, 36, 14, 39, 14, 0, 36, 12, 12, 12, 12, 12, 14, 50, 14, 14, 49, 9, 36, 36, 42, 0, 39, 14, 14, 38, 36, 39, 38, 14, 39, 38, 14, 36, 52, 0, 0, 52, 36, 42, 52, 42, 0, 36, 42, 36, 36, 36, 39, 14, 38, 38, 36, 36, 36, 12, 12, 12, 12, 12, 0, 14, 19, 36, 36, 36, 36, 36, 42, 0, 39, 14, 14, 14, 14, 39, 38, 14, 39, 14, 14, 36, 44, 0, 0, 0, 0, 42, 0, 42, 0, 36, 38, 36, 36, 36, 36, 36, 36, 36, 9, 36, 36, 36, 39, 36, 36, 36, 42, 0, 39, 14, 14, 14, 38, 39, 0, 0, 52, 42, 52, 42, 0, 36, 36, 36, 36, 0, 36, 36, 14, 39, 14, 14, 14, 14, 36, 36, 36, 36, 36, 44, 39, 14, 14, 38, 36, 14, 38, 14, 14, 36, 39, 38, 38, 14, 36, 39, 38, 36, 14, 38, 36, 14, 14, 14, 14, 14, 14, 36, 36, 0, 0, 52, 36, 0, 52, 0, 0, 36, 38, 36, 36, 42, 36, 36, 36, 36, 14, 14, 14, 14, 9, 38, 36, 36, 0, 0, 39, 14, 14, 14, 38, 14, 38, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 39, 0, 0, 0, 52, 0, 52, 0, 0, 36, 36, 36, 42, 52, 14, 38, 36, 36, 36, 36, 36, 36, 14, 14, 14, 14, 42, 0, 39, 14, 14, 14, 38, 14, 14, 14, 39, 14, 14, 36, 44, 0, 36, 36, 42, 52, 36, 36, 36, 38, 39, 38, 36, 36, 36, 36, 36, 36, 14, 14, 14, 14, 14, 38, 39, 0, 0, 0, 52, 0, 52, 0, 0, 38, 36, 36, 36, 42, 36, 36, 36, 39, 14, 14, 14, 36, 59, 14, 14, 14, 36, 0, 39, 14, 14, 14, 14, 14, 14, 14, 14, 38, 36, 14, 14, 14, 14, 39, 14, 14, 14, 14, 39, 36, 14, 14, 14, 38, 36, 52, 36, 42, 0, 0, 52, 52, 0, 0, 0, 0, 36, 0, 38, 36, 36, 36, 36, 36, 60, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 62, 36, 63, 61, 61, 61, 61, 61, 61, 61, 64, 12, 12, 12, 12, 12, 58, 36, 36, 60, 62, 62, 60, 62, 62, 60, 36, 36, 36, 61, 61, 60, 61, 61, 61, 60, 61, 60, 60, 36, 61, 60, 61, 61, 61, 61, 61, 61, 60, 61, 36, 61, 61, 62, 62, 61, 61, 61, 36, 12, 12, 12, 12, 12, 36, 61, 61, 32, 65, 29, 65, 66, 67, 68, 53, 53, 69, 56, 14, 0, 14, 14, 14, 14, 14, 43, 19, 19, 70, 70, 0, 14, 14, 14, 14, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 36, 42, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 14, 14, 19, 0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52, 58, 14, 14, 14, 44, 14, 14, 38, 14, 65, 71, 14, 14, 72, 73, 36, 36, 12, 12, 12, 12, 12, 58, 14, 14, 12, 12, 12, 12, 12, 61, 61, 61, 14, 14, 14, 39, 36, 36, 39, 36, 74, 74, 74, 74, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 14, 14, 14, 14, 38, 14, 14, 36, 14, 14, 14, 38, 38, 14, 14, 36, 38, 14, 14, 36, 14, 14, 14, 38, 38, 14, 14, 36, 14, 14, 14, 14, 14, 14, 14, 38, 14, 14, 14, 14, 14, 14, 14, 14, 14, 38, 42, 0, 27, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 36, 36, 14, 14, 14, 36, 14, 14, 14, 36, 77, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16, 78, 36, 14, 14, 14, 14, 14, 27, 58, 14, 14, 14, 14, 14, 38, 36, 36, 36, 14, 14, 14, 14, 14, 14, 38, 14, 14, 0, 52, 36, 36, 36, 36, 36, 14, 0, 1, 41, 36, 36, 36, 36, 14, 0, 36, 36, 36, 36, 36, 36, 38, 0, 36, 36, 36, 36, 36, 36, 61, 61, 58, 79, 77, 80, 61, 36, 12, 12, 12, 12, 12, 36, 36, 36, 14, 53, 58, 29, 53, 19, 0, 73, 14, 14, 14, 14, 19, 38, 36, 36, 14, 14, 14, 36, 36, 36, 36, 36, 0, 0, 0, 0, 0, 0, 36, 36, 38, 36, 53, 12, 12, 12, 12, 12, 61, 61, 61, 61, 61, 61, 61, 36, 61, 61, 62, 36, 36, 36, 36, 36, 61, 61, 61, 61, 61, 61, 36, 36, 61, 61, 61, 61, 61, 36, 36, 36, 12, 12, 12, 12, 12, 62, 36, 61, 14, 14, 14, 19, 0, 0, 36, 14, 61, 61, 61, 61, 61, 61, 61, 62, 61, 61, 61, 61, 61, 61, 62, 42, 0, 0, 0, 0, 0, 0, 0, 52, 0, 0, 44, 14, 14, 14, 14, 14, 14, 14, 0, 0, 0, 0, 0, 0, 0, 0, 44, 14, 14, 14, 36, 36, 12, 12, 12, 12, 12, 58, 27, 58, 77, 14, 14, 14, 14, 19, 0, 0, 0, 0, 14, 14, 14, 14, 38, 36, 0, 44, 14, 14, 14, 14, 14, 14, 19, 0, 0, 0, 0, 0, 0, 14, 0, 0, 36, 36, 36, 36, 14, 14, 0, 0, 0, 0, 36, 81, 58, 58, 12, 12, 12, 12, 12, 36, 39, 14, 14, 14, 14, 14, 14, 14, 14, 58, 0, 44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 44, 14, 19, 14, 14, 0, 44, 38, 0, 36, 36, 36, 0, 0, 0, 36, 36, 36, 0, 0, 14, 14, 14, 14, 39, 39, 39, 39, 14, 14, 14, 14, 14, 14, 14, 36, 14, 14, 38, 14, 14, 14, 14, 14, 14, 14, 36, 14, 14, 14, 39, 14, 36, 14, 38, 14, 14, 14, 32, 38, 58, 58, 58, 82, 58, 83, 0, 0, 82, 58, 84, 25, 85, 86, 85, 86, 28, 14, 87, 88, 89, 0, 0, 33, 50, 50, 50, 50, 7, 90, 91, 14, 14, 14, 92, 93, 91, 14, 14, 14, 14, 14, 14, 77, 58, 58, 27, 58, 94, 14, 38, 0, 0, 0, 0, 0, 14, 36, 25, 14, 14, 14, 16, 95, 24, 28, 25, 14, 14, 14, 16, 78, 23, 23, 23, 6, 23, 23, 23, 23, 23, 23, 23, 22, 23, 6, 23, 22, 23, 23, 23, 23, 23, 23, 23, 23, 52, 36, 36, 36, 36, 36, 36, 36, 14, 49, 24, 14, 49, 14, 14, 14, 14, 24, 14, 96, 14, 14, 14, 14, 24, 25, 14, 14, 14, 24, 14, 14, 14, 14, 28, 14, 14, 24, 14, 25, 28, 28, 28, 28, 28, 28, 14, 14, 28, 28, 28, 28, 28, 14, 14, 14, 14, 14, 14, 14, 24, 14, 36, 36, 14, 25, 25, 14, 14, 14, 14, 14, 25, 28, 14, 24, 25, 24, 14, 24, 24, 23, 24, 14, 14, 25, 24, 28, 25, 24, 24, 24, 28, 28, 25, 25, 14, 14, 28, 28, 14, 14, 28, 14, 14, 14, 14, 14, 25, 14, 25, 14, 14, 25, 14, 14, 14, 14, 14, 14, 28, 14, 28, 28, 14, 28, 14, 28, 14, 28, 14, 28, 14, 14, 14, 14, 14, 14, 24, 14, 24, 14, 14, 14, 14, 14, 24, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 24, 14, 14, 14, 14, 14, 14, 14, 97, 14, 14, 14, 14, 70, 70, 14, 14, 14, 25, 14, 14, 14, 98, 14, 14, 14, 14, 14, 14, 16, 99, 14, 14, 98, 98, 14, 14, 14, 38, 36, 36, 14, 14, 14, 38, 36, 36, 36, 36, 14, 14, 14, 14, 14, 38, 36, 36, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 25, 28, 28, 25, 14, 14, 14, 14, 14, 14, 28, 28, 14, 14, 14, 14, 14, 28, 24, 28, 28, 28, 14, 14, 14, 14, 28, 14, 28, 14, 14, 28, 14, 28, 14, 14, 28, 25, 24, 14, 28, 28, 14, 14, 14, 14, 14, 14, 14, 14, 28, 28, 14, 14, 14, 14, 24, 98, 98, 24, 25, 24, 14, 14, 28, 14, 14, 98, 28, 100, 98, 98, 98, 14, 14, 14, 14, 101, 98, 14, 14, 25, 25, 14, 14, 14, 14, 14, 14, 28, 24, 28, 24, 102, 25, 28, 24, 14, 14, 14, 14, 14, 14, 14, 101, 14, 14, 14, 14, 14, 14, 14, 28, 14, 14, 14, 14, 14, 14, 101, 98, 98, 98, 98, 98, 102, 28, 103, 101, 98, 103, 102, 28, 98, 28, 102, 103, 98, 24, 14, 14, 28, 102, 28, 28, 103, 98, 98, 103, 98, 102, 103, 98, 98, 98, 100, 14, 98, 98, 98, 14, 14, 14, 14, 24, 14, 7, 85, 85, 5, 53, 14, 14, 70, 70, 70, 70, 70, 70, 70, 28, 28, 28, 28, 28, 28, 28, 14, 14, 14, 14, 14, 14, 14, 14, 16, 99, 14, 14, 14, 14, 14, 14, 14, 70, 70, 70, 70, 70, 14, 16, 104, 104, 104, 104, 104, 104, 104, 104, 104, 104, 99, 14, 14, 14, 14, 14, 14, 14, 14, 14, 70, 14, 14, 14, 24, 28, 28, 14, 14, 14, 14, 14, 36, 14, 14, 14, 14, 14, 14, 14, 14, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 39, 14, 14, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 19, 0, 14, 36, 36, 105, 58, 77, 106, 14, 14, 14, 14, 36, 36, 36, 39, 41, 36, 36, 36, 36, 36, 36, 42, 14, 14, 14, 38, 14, 14, 14, 38, 85, 85, 85, 85, 85, 85, 85, 58, 58, 58, 58, 27, 107, 14, 85, 14, 85, 70, 70, 70, 70, 58, 58, 56, 58, 27, 77, 14, 14, 108, 58, 77, 58, 109, 36, 36, 36, 36, 36, 36, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 110, 98, 98, 98, 98, 36, 36, 36, 36, 36, 36, 98, 98, 98, 36, 36, 36, 36, 36, 98, 98, 98, 98, 98, 98, 36, 36, 18, 111, 112, 98, 70, 70, 70, 70, 70, 98, 70, 70, 70, 70, 113, 114, 98, 98, 98, 98, 98, 0, 0, 0, 98, 98, 115, 98, 98, 112, 116, 98, 117, 118, 118, 118, 118, 98, 98, 98, 98, 118, 98, 98, 98, 98, 98, 98, 98, 118, 118, 118, 98, 98, 98, 119, 98, 98, 118, 120, 42, 121, 91, 116, 122, 118, 118, 118, 118, 98, 98, 98, 98, 98, 118, 119, 98, 112, 123, 116, 36, 36, 110, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 36, 110, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 124, 98, 98, 98, 98, 98, 124, 36, 36, 125, 125, 125, 125, 125, 125, 125, 125, 98, 98, 98, 98, 28, 28, 28, 28, 98, 98, 112, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 124, 36, 98, 98, 98, 124, 36, 36, 36, 36, 14, 14, 14, 14, 14, 14, 27, 106, 12, 12, 12, 12, 12, 14, 36, 36, 0, 44, 0, 0, 0, 0, 0, 14, 14, 14, 14, 14, 14, 14, 14, 0, 0, 27, 58, 58, 36, 36, 36, 36, 36, 36, 36, 39, 14, 14, 14, 14, 14, 44, 14, 44, 14, 19, 14, 14, 14, 19, 0, 0, 14, 14, 36, 36, 14, 14, 14, 14, 126, 36, 36, 36, 14, 14, 65, 53, 36, 36, 36, 36, 0, 14, 14, 14, 14, 14, 14, 14, 0, 0, 52, 36, 36, 36, 36, 58, 0, 14, 14, 14, 14, 14, 29, 36, 14, 14, 14, 0, 0, 0, 0, 58, 14, 14, 14, 19, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 39, 74, 74, 74, 74, 74, 74, 127, 36, 14, 19, 0, 0, 0, 0, 0, 0, 44, 14, 14, 27, 58, 14, 14, 39, 12, 12, 12, 12, 12, 36, 36, 14, 12, 12, 12, 12, 12, 61, 61, 62, 14, 14, 14, 14, 19, 0, 0, 0, 0, 0, 0, 52, 36, 36, 36, 36, 14, 19, 14, 14, 14, 14, 0, 36, 12, 12, 12, 12, 12, 36, 27, 58, 61, 62, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 60, 61, 61, 58, 14, 19, 52, 36, 36, 36, 36, 39, 14, 14, 38, 39, 14, 14, 38, 39, 14, 14, 38, 36, 36, 36, 36, 14, 19, 0, 0, 0, 1, 0, 36, 128, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 128, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 128, 129, 129, 129, 129, 129, 128, 129, 129, 129, 129, 129, 129, 129, 36, 36, 36, 36, 36, 36, 75, 75, 75, 130, 36, 131, 76, 76, 76, 76, 76, 76, 76, 76, 36, 36, 132, 132, 132, 132, 132, 132, 132, 132, 36, 39, 14, 14, 36, 36, 133, 134, 46, 46, 46, 46, 48, 46, 46, 46, 46, 46, 46, 47, 46, 46, 47, 47, 46, 133, 47, 46, 46, 46, 46, 46, 36, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 104, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 126, 36, 135, 136, 57, 137, 138, 36, 36, 36, 98, 98, 139, 104, 104, 104, 104, 104, 104, 104, 111, 139, 111, 98, 98, 98, 111, 78, 91, 53, 139, 104, 104, 111, 98, 98, 98, 124, 140, 141, 36, 36, 14, 14, 14, 14, 14, 14, 38, 142, 105, 98, 6, 98, 70, 98, 111, 111, 98, 98, 98, 98, 98, 91, 98, 143, 98, 98, 98, 98, 98, 139, 144, 98, 98, 98, 98, 98, 98, 139, 144, 139, 114, 70, 93, 145, 125, 125, 125, 125, 146, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 91, 36, 14, 14, 14, 36, 14, 14, 14, 36, 14, 14, 14, 36, 14, 38, 36, 22, 98, 140, 147, 14, 14, 14, 38, 36, 36, 36, 36, 42, 0, 148, 36, 14, 14, 14, 14, 14, 14, 39, 14, 14, 14, 14, 14, 14, 38, 14, 39, 58, 41, 36, 39, 14, 14, 14, 14, 14, 14, 36, 39, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 36, 14, 14, 14, 14, 14, 14, 19, 36, 14, 14, 36, 36, 36, 36, 36, 36, 14, 14, 14, 0, 0, 52, 36, 36, 14, 14, 14, 14, 14, 14, 14, 81, 14, 14, 36, 36, 14, 14, 14, 14, 77, 14, 14, 36, 36, 36, 36, 36, 14, 14, 36, 36, 36, 36, 36, 39, 14, 14, 14, 36, 38, 14, 14, 14, 14, 14, 14, 39, 38, 36, 38, 39, 14, 14, 14, 81, 14, 14, 14, 14, 14, 38, 14, 36, 36, 39, 14, 14, 14, 14, 14, 14, 14, 14, 36, 81, 14, 14, 14, 14, 14, 36, 36, 39, 14, 14, 14, 14, 36, 36, 14, 14, 19, 0, 42, 52, 36, 36, 0, 0, 14, 14, 39, 14, 39, 14, 14, 14, 14, 14, 36, 36, 0, 52, 36, 42, 58, 58, 58, 58, 38, 36, 36, 36, 14, 14, 19, 52, 36, 39, 14, 14, 58, 58, 58, 149, 36, 36, 36, 36, 14, 14, 14, 36, 81, 58, 58, 58, 14, 38, 36, 36, 14, 14, 14, 14, 14, 36, 36, 36, 39, 14, 38, 36, 36, 36, 36, 36, 39, 14, 14, 14, 14, 38, 36, 36, 36, 36, 36, 36, 14, 38, 36, 36, 36, 14, 14, 14, 14, 14, 14, 14, 0, 0, 0, 0, 0, 0, 0, 1, 77, 14, 14, 36, 14, 14, 14, 12, 12, 12, 12, 12, 36, 36, 36, 36, 36, 36, 36, 42, 0, 0, 0, 0, 0, 44, 14, 58, 58, 36, 36, 36, 36, 36, 36, 36, 0, 0, 52, 12, 12, 12, 12, 12, 58, 58, 36, 36, 36, 36, 36, 36, 14, 19, 32, 38, 36, 36, 36, 36, 44, 14, 27, 77, 77, 0, 44, 36, 12, 12, 12, 12, 12, 32, 27, 58, 14, 14, 14, 14, 14, 14, 0, 0, 0, 0, 0, 0, 58, 27, 77, 36, 14, 14, 14, 38, 38, 14, 14, 39, 14, 14, 14, 14, 27, 36, 36, 36, 0, 0, 0, 0, 0, 52, 36, 36, 0, 0, 39, 14, 14, 14, 38, 39, 38, 36, 36, 42, 36, 36, 39, 14, 14, 0, 36, 0, 0, 0, 52, 36, 0, 0, 52, 36, 36, 36, 36, 36, 0, 0, 14, 14, 36, 36, 36, 36, 0, 0, 0, 36, 0, 0, 0, 0, 150, 58, 53, 14, 27, 58, 58, 58, 58, 58, 58, 58, 14, 14, 0, 36, 1, 77, 38, 36, 36, 36, 36, 36, 0, 0, 0, 0, 36, 36, 36, 36, 61, 61, 61, 61, 61, 36, 60, 61, 12, 12, 12, 12, 12, 61, 58, 151, 14, 38, 36, 36, 36, 36, 36, 39, 58, 58, 41, 36, 36, 36, 36, 36, 14, 14, 14, 14, 152, 70, 114, 14, 14, 99, 14, 70, 70, 14, 14, 14, 14, 14, 14, 14, 16, 114, 14, 14, 14, 14, 14, 14, 14, 14, 14, 70, 12, 12, 12, 12, 12, 36, 36, 58, 0, 0, 1, 36, 36, 36, 36, 36, 0, 0, 0, 1, 58, 14, 14, 14, 14, 14, 77, 36, 36, 36, 36, 36, 12, 12, 12, 12, 12, 39, 14, 14, 14, 14, 14, 14, 36, 36, 39, 14, 19, 0, 0, 0, 0, 0, 0, 0, 98, 36, 36, 36, 36, 36, 36, 36, 14, 14, 14, 14, 14, 36, 19, 1, 0, 0, 36, 36, 36, 36, 36, 36, 14, 14, 19, 0, 0, 14, 19, 0, 0, 44, 19, 0, 0, 0, 14, 14, 14, 14, 14, 14, 14, 0, 0, 14, 14, 0, 44, 36, 36, 36, 36, 36, 36, 38, 39, 38, 39, 14, 38, 14, 14, 14, 14, 14, 14, 39, 39, 14, 14, 14, 39, 14, 14, 14, 14, 14, 14, 14, 14, 39, 14, 38, 39, 14, 14, 14, 38, 14, 14, 14, 38, 14, 14, 14, 14, 14, 14, 39, 14, 38, 14, 14, 38, 38, 36, 14, 14, 14, 14, 14, 14, 14, 14, 14, 36, 12, 12, 12, 12, 12, 12, 12, 12, 12, 0, 0, 0, 44, 14, 19, 0, 0, 0, 0, 0, 0, 0, 0, 44, 14, 14, 14, 19, 14, 14, 14, 14, 14, 14, 14, 44, 27, 58, 77, 36, 36, 36, 36, 36, 36, 36, 42, 0, 0, 14, 14, 38, 39, 14, 14, 14, 14, 39, 38, 38, 39, 39, 14, 14, 14, 14, 38, 14, 14, 39, 39, 36, 36, 36, 38, 36, 39, 39, 39, 39, 14, 39, 38, 38, 39, 39, 39, 39, 39, 39, 38, 38, 39, 14, 38, 14, 14, 14, 38, 14, 14, 39, 14, 38, 38, 14, 14, 14, 14, 14, 39, 14, 14, 39, 14, 39, 14, 14, 39, 14, 14, 28, 28, 28, 28, 28, 28, 153, 36, 28, 28, 28, 28, 28, 28, 28, 38, 28, 28, 28, 28, 28, 14, 36, 36, 28, 28, 28, 28, 28, 153, 36, 36, 36, 36, 36, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 98, 124, 36, 36, 36, 36, 36, 36, 98, 98, 98, 98, 124, 36, 36, 36, 98, 98, 98, 98, 98, 98, 14, 98, 98, 98, 100, 101, 98, 98, 101, 98, 98, 98, 98, 98, 98, 100, 14, 14, 101, 101, 101, 98, 98, 98, 98, 100, 100, 101, 98, 98, 98, 98, 98, 98, 14, 14, 14, 101, 98, 98, 98, 98, 98, 98, 98, 100, 14, 14, 14, 14, 14, 14, 101, 98, 98, 98, 98, 98, 98, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 98, 98, 98, 98, 98, 110, 98, 98, 98, 98, 98, 98, 98, 14, 14, 14, 14, 98, 98, 98, 98, 14, 14, 14, 98, 98, 98, 14, 14, 14, 85, 155, 91, 14, 14, 124, 36, 36, 36, 36, 36, 36, 36, 98, 98, 124, 36, 36, 36, 36, 36, 42, 36, 36, 36, 36, 36, 36, 36, }; static RE_UINT8 re_line_break_stage_5[] = { 16, 16, 16, 18, 22, 20, 20, 21, 19, 6, 3, 12, 9, 10, 12, 3, 1, 36, 12, 9, 8, 15, 8, 7, 11, 11, 8, 8, 12, 12, 12, 6, 12, 1, 9, 36, 18, 2, 12, 16, 16, 29, 4, 1, 10, 9, 9, 9, 12, 25, 25, 12, 25, 3, 12, 18, 25, 25, 17, 12, 25, 1, 17, 25, 12, 17, 16, 4, 4, 4, 4, 16, 0, 0, 8, 12, 12, 0, 0, 12, 0, 8, 18, 0, 0, 16, 18, 16, 16, 12, 6, 16, 37, 37, 37, 0, 37, 12, 12, 10, 10, 10, 16, 6, 16, 0, 6, 6, 10, 11, 11, 12, 6, 12, 8, 6, 18, 18, 0, 10, 0, 24, 24, 24, 24, 0, 0, 9, 24, 12, 17, 17, 4, 17, 17, 18, 4, 6, 4, 12, 1, 2, 18, 17, 12, 4, 4, 0, 31, 31, 32, 32, 33, 33, 18, 12, 2, 0, 5, 24, 18, 9, 0, 18, 18, 4, 18, 28, 26, 25, 3, 3, 1, 3, 14, 14, 14, 18, 20, 20, 3, 25, 5, 5, 8, 1, 2, 5, 30, 12, 2, 25, 9, 12, 12, 14, 13, 13, 2, 12, 13, 12, 12, 13, 13, 25, 25, 13, 2, 1, 0, 6, 6, 18, 1, 18, 26, 26, 1, 0, 0, 13, 2, 13, 13, 5, 5, 1, 2, 2, 13, 16, 5, 13, 0, 38, 13, 38, 38, 13, 38, 0, 16, 5, 5, 38, 38, 5, 13, 0, 38, 38, 10, 12, 31, 0, 34, 35, 35, 35, 32, 0, 0, 33, 27, 27, 0, 37, 16, 37, 8, 2, 2, 8, 6, 1, 2, 14, 13, 1, 13, 9, 10, 13, 0, 30, 13, 6, 13, 2, 12, 38, 38, 12, 9, 0, 23, 25, 14, 0, 16, 17, 18, 24, 1, 1, 25, 0, 39, 39, 3, 5, }; /* Line_Break: 8608 bytes. */ RE_UINT32 re_get_line_break(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_line_break_stage_1[f] << 5; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_line_break_stage_2[pos + f] << 3; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_line_break_stage_3[pos + f] << 3; f = code >> 1; code ^= f << 1; pos = (RE_UINT32)re_line_break_stage_4[pos + f] << 1; value = re_line_break_stage_5[pos + code]; return value; } /* Numeric_Type. */ static RE_UINT8 re_numeric_type_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 11, 11, 11, 12, 13, 14, 15, 11, 11, 11, 16, 11, 11, 11, 11, 11, 11, 17, 18, 19, 20, 11, 21, 22, 11, 11, 23, 11, 11, 11, 11, 11, 11, 11, 11, 24, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, }; static RE_UINT8 re_numeric_type_stage_2[] = { 0, 1, 1, 1, 1, 1, 2, 3, 1, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 12, 1, 1, 13, 14, 15, 16, 17, 18, 19, 1, 1, 1, 20, 21, 1, 1, 22, 1, 1, 23, 1, 1, 1, 1, 24, 1, 1, 1, 25, 26, 27, 1, 28, 1, 1, 1, 29, 1, 1, 30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 31, 32, 1, 33, 1, 34, 1, 1, 35, 1, 36, 1, 1, 1, 1, 1, 37, 38, 1, 1, 39, 40, 1, 1, 1, 41, 1, 1, 1, 1, 1, 1, 1, 42, 1, 1, 1, 43, 1, 1, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 1, 1, 1, 46, 1, 1, 1, 1, 1, 1, 1, 47, 48, 1, 1, 1, 1, 1, 1, 1, 1, 49, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 50, 1, 51, 52, 53, 54, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 55, 1, 1, 1, 1, 1, 15, 1, 56, 57, 58, 59, 1, 1, 1, 60, 61, 62, 63, 64, 1, 65, 1, 66, 67, 54, 1, 68, 1, 69, 70, 71, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 72, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 73, 74, 1, 1, 1, 1, 1, 1, 1, 75, 1, 1, 1, 76, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 77, 1, 1, 1, 1, 1, 1, 1, 1, 78, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 79, 80, 1, 1, 1, 1, 1, 1, 1, 81, 82, 83, 1, 1, 1, 1, 1, 1, 1, 84, 1, 1, 1, 1, 1, 85, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 86, 1, 1, 1, 1, 1, 1, 87, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 84, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_numeric_type_stage_3[] = { 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0, 8, 0, 0, 0, 4, 0, 0, 0, 9, 0, 0, 0, 4, 0, 0, 1, 0, 0, 0, 1, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 13, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 14, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 16, 17, 0, 0, 0, 0, 0, 18, 19, 20, 0, 0, 0, 0, 0, 0, 21, 22, 0, 0, 23, 0, 0, 0, 24, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 27, 28, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 29, 0, 0, 0, 0, 30, 31, 0, 30, 32, 0, 0, 33, 0, 0, 0, 34, 0, 0, 0, 0, 35, 0, 0, 0, 0, 0, 0, 0, 0, 36, 0, 0, 0, 0, 0, 37, 0, 26, 0, 38, 39, 40, 41, 36, 0, 0, 42, 0, 0, 0, 0, 43, 0, 44, 45, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 47, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 50, 0, 0, 0, 51, 52, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 44, 0, 0, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 57, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 58, 59, 60, 0, 0, 0, 56, 0, 3, 0, 0, 0, 0, 0, 61, 0, 62, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 63, 0, 55, 64, 26, 65, 66, 19, 67, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 0, 70, 71, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 73, 74, 0, 75, 0, 76, 77, 0, 0, 0, 0, 78, 79, 19, 0, 0, 80, 81, 82, 0, 0, 83, 0, 0, 73, 73, 0, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 85, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 87, 88, 0, 0, 0, 1, 0, 89, 0, 0, 0, 0, 1, 90, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 91, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92, 19, 19, 19, 93, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 94, 95, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 97, 98, 0, 0, 0, 0, 0, 0, 75, 0, 99, 0, 0, 0, 0, 0, 0, 0, 58, 0, 0, 43, 0, 0, 0, 100, 0, 58, 0, 0, 0, 0, 0, 0, 0, 35, 0, 0, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 103, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 0, 0, 0, 0, }; static RE_UINT8 re_numeric_type_stage_4[] = { 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 3, 4, 1, 2, 0, 0, 5, 1, 0, 0, 5, 1, 6, 7, 5, 1, 8, 0, 5, 1, 9, 0, 5, 1, 0, 10, 5, 1, 11, 0, 1, 12, 13, 0, 0, 14, 15, 16, 0, 17, 18, 0, 1, 2, 19, 7, 0, 0, 1, 20, 1, 2, 1, 2, 0, 0, 21, 22, 23, 22, 0, 0, 0, 0, 19, 19, 19, 19, 19, 19, 24, 7, 0, 0, 23, 25, 26, 27, 19, 23, 25, 13, 0, 28, 29, 30, 0, 0, 31, 32, 23, 33, 34, 0, 0, 0, 0, 35, 36, 0, 0, 0, 37, 7, 0, 9, 0, 0, 38, 0, 19, 7, 0, 0, 0, 19, 37, 19, 0, 0, 37, 19, 35, 0, 0, 0, 39, 0, 0, 0, 0, 40, 0, 0, 0, 35, 0, 0, 41, 42, 0, 0, 0, 43, 44, 0, 0, 0, 0, 36, 18, 0, 0, 36, 0, 18, 0, 0, 0, 0, 18, 0, 43, 0, 0, 0, 45, 0, 0, 0, 0, 46, 0, 0, 47, 43, 0, 0, 48, 0, 0, 0, 0, 0, 0, 39, 0, 0, 42, 42, 0, 0, 0, 40, 0, 0, 0, 17, 0, 49, 18, 0, 0, 0, 0, 45, 0, 43, 0, 0, 0, 0, 40, 0, 0, 0, 45, 0, 0, 45, 39, 0, 42, 0, 0, 0, 45, 43, 0, 0, 0, 0, 0, 18, 17, 19, 0, 0, 0, 0, 11, 0, 0, 39, 39, 18, 0, 0, 50, 0, 36, 19, 19, 19, 19, 19, 13, 0, 19, 19, 19, 18, 0, 51, 0, 0, 37, 19, 19, 13, 13, 0, 0, 0, 42, 40, 0, 0, 0, 0, 52, 0, 0, 0, 0, 19, 0, 0, 0, 37, 36, 19, 0, 0, 0, 0, 0, 53, 0, 0, 17, 13, 0, 0, 0, 54, 19, 19, 8, 19, 55, 0, 0, 0, 0, 0, 0, 56, 0, 0, 0, 57, 0, 53, 0, 0, 0, 37, 0, 0, 0, 0, 0, 8, 23, 25, 19, 10, 0, 0, 58, 59, 60, 1, 0, 0, 0, 0, 5, 1, 37, 19, 16, 0, 0, 0, 1, 61, 1, 12, 9, 0, 19, 10, 0, 0, 0, 0, 1, 62, 7, 0, 0, 0, 19, 19, 7, 0, 0, 5, 1, 1, 1, 1, 1, 1, 23, 63, 0, 0, 40, 0, 0, 0, 39, 43, 0, 43, 0, 40, 0, 35, 0, 0, 0, 42, }; static RE_UINT8 re_numeric_type_stage_5[] = { 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 1, 1, 0, 0, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 0, 0, 0, }; /* Numeric_Type: 2304 bytes. */ RE_UINT32 re_get_numeric_type(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_numeric_type_stage_1[f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_numeric_type_stage_2[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_numeric_type_stage_3[pos + f] << 2; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_numeric_type_stage_4[pos + f] << 3; value = re_numeric_type_stage_5[pos + code]; return value; } /* Numeric_Value. */ static RE_UINT8 re_numeric_value_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 11, 11, 11, 12, 13, 14, 15, 11, 11, 11, 16, 11, 11, 11, 11, 11, 11, 17, 18, 19, 20, 11, 21, 22, 11, 11, 23, 11, 11, 11, 11, 11, 11, 11, 11, 24, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, }; static RE_UINT8 re_numeric_value_stage_2[] = { 0, 1, 1, 1, 1, 1, 2, 3, 1, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 12, 1, 1, 13, 14, 15, 16, 17, 18, 19, 1, 1, 1, 20, 21, 1, 1, 22, 1, 1, 23, 1, 1, 1, 1, 24, 1, 1, 1, 25, 26, 27, 1, 28, 1, 1, 1, 29, 1, 1, 30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 31, 32, 1, 33, 1, 34, 1, 1, 35, 1, 36, 1, 1, 1, 1, 1, 37, 38, 1, 1, 39, 40, 1, 1, 1, 41, 1, 1, 1, 1, 1, 1, 1, 42, 1, 1, 1, 43, 1, 1, 44, 1, 1, 1, 1, 1, 1, 1, 1, 1, 45, 1, 1, 1, 46, 1, 1, 1, 1, 1, 1, 1, 47, 48, 1, 1, 1, 1, 1, 1, 1, 1, 49, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 50, 1, 51, 52, 53, 54, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 55, 1, 1, 1, 1, 1, 15, 1, 56, 57, 58, 59, 1, 1, 1, 60, 61, 62, 63, 64, 1, 65, 1, 66, 67, 54, 1, 68, 1, 69, 70, 71, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 72, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 73, 74, 1, 1, 1, 1, 1, 1, 1, 75, 1, 1, 1, 76, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 77, 1, 1, 1, 1, 1, 1, 1, 1, 78, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 79, 80, 1, 1, 1, 1, 1, 1, 1, 81, 82, 83, 1, 1, 1, 1, 1, 1, 1, 84, 1, 1, 1, 1, 1, 85, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 86, 1, 1, 1, 1, 1, 1, 87, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 88, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_numeric_value_stage_3[] = { 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0, 8, 0, 0, 0, 4, 0, 0, 0, 9, 0, 0, 0, 4, 0, 0, 1, 0, 0, 0, 1, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 13, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 14, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 15, 3, 0, 0, 0, 0, 0, 16, 17, 18, 0, 0, 0, 0, 0, 0, 19, 20, 0, 0, 21, 0, 0, 0, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 0, 28, 29, 0, 28, 30, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 34, 0, 0, 0, 0, 0, 35, 0, 36, 0, 37, 38, 39, 40, 41, 0, 0, 42, 0, 0, 0, 0, 43, 0, 44, 45, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 47, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 50, 0, 0, 0, 51, 52, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 57, 0, 0, 0, 0, 0, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0, 0, 0, 0, 60, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 61, 0, 0, 0, 62, 0, 0, 0, 0, 0, 0, 0, 63, 64, 65, 0, 0, 0, 66, 0, 3, 0, 0, 0, 0, 0, 67, 0, 68, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 69, 0, 70, 71, 72, 73, 74, 75, 76, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78, 0, 79, 80, 0, 0, 0, 81, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 82, 83, 0, 84, 0, 85, 86, 0, 0, 0, 0, 87, 88, 89, 0, 0, 90, 91, 92, 0, 0, 93, 0, 0, 94, 94, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 97, 0, 0, 0, 0, 0, 0, 98, 99, 0, 0, 0, 1, 0, 100, 0, 0, 0, 0, 1, 101, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 3, 0, 0, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 103, 104, 105, 106, 107, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 108, 109, 0, 0, 0, 0, 0, 0, 0, 110, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 111, 112, 0, 0, 0, 0, 0, 0, 113, 0, 114, 0, 0, 0, 0, 0, 0, 0, 115, 0, 0, 116, 0, 0, 0, 117, 0, 118, 0, 0, 0, 0, 0, 0, 0, 119, 0, 0, 120, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 121, 122, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 0, 0, 0, 0, 0, 123, 0, 0, 0, 124, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 125, 0, 0, 0, 0, 0, 0, 0, 0, 126, 0, 0, 0, }; static RE_UINT8 re_numeric_value_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 4, 0, 5, 6, 1, 2, 3, 0, 0, 0, 0, 0, 0, 7, 8, 9, 0, 0, 0, 0, 0, 7, 8, 9, 0, 10, 11, 0, 0, 7, 8, 9, 12, 13, 0, 0, 0, 7, 8, 9, 14, 0, 0, 0, 0, 7, 8, 9, 0, 0, 1, 15, 0, 7, 8, 9, 16, 17, 0, 0, 1, 2, 18, 19, 20, 0, 0, 0, 0, 0, 21, 2, 22, 23, 24, 25, 0, 0, 0, 26, 27, 0, 0, 0, 1, 2, 3, 0, 1, 2, 3, 0, 0, 0, 0, 0, 1, 2, 28, 0, 0, 0, 0, 0, 29, 2, 3, 0, 0, 0, 0, 0, 30, 31, 32, 33, 34, 35, 36, 37, 34, 35, 36, 37, 38, 39, 40, 0, 0, 0, 0, 0, 34, 35, 36, 41, 42, 34, 35, 36, 41, 42, 34, 35, 36, 41, 42, 0, 0, 0, 43, 44, 45, 46, 2, 47, 0, 0, 0, 0, 0, 48, 49, 50, 34, 35, 51, 49, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52, 0, 53, 0, 0, 0, 0, 0, 0, 21, 2, 3, 0, 0, 0, 54, 0, 0, 0, 0, 0, 48, 55, 0, 0, 34, 35, 56, 0, 0, 0, 0, 0, 0, 0, 57, 58, 59, 60, 61, 62, 0, 0, 0, 0, 63, 64, 65, 66, 0, 67, 0, 0, 0, 0, 0, 0, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 0, 0, 0, 0, 0, 0, 0, 0, 70, 0, 0, 0, 0, 71, 72, 73, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 74, 0, 0, 0, 75, 0, 76, 0, 0, 0, 0, 0, 0, 0, 0, 0, 77, 78, 0, 0, 0, 0, 0, 0, 79, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 0, 82, 0, 0, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 0, 0, 0, 0, 84, 85, 0, 0, 0, 0, 86, 87, 0, 88, 0, 0, 0, 0, 89, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 90, 0, 0, 0, 0, 0, 5, 0, 5, 0, 0, 0, 0, 0, 0, 0, 91, 0, 0, 0, 0, 0, 0, 0, 0, 92, 0, 0, 0, 15, 75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 93, 0, 0, 0, 94, 0, 0, 0, 0, 0, 0, 0, 0, 95, 0, 0, 0, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 97, 0, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 0, 0, 0, 0, 0, 0, 0, 99, 68, 0, 0, 0, 0, 0, 0, 0, 75, 0, 0, 0, 100, 0, 0, 0, 0, 0, 0, 0, 0, 101, 0, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 0, 0, 0, 0, 0, 0, 103, 0, 0, 0, 48, 49, 104, 0, 0, 0, 0, 0, 0, 0, 0, 105, 106, 0, 0, 0, 0, 107, 0, 108, 0, 75, 0, 0, 0, 0, 0, 103, 0, 0, 0, 0, 0, 0, 0, 109, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 110, 0, 111, 8, 9, 57, 58, 112, 113, 114, 115, 116, 117, 118, 0, 0, 0, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 122, 131, 132, 0, 0, 0, 133, 0, 0, 0, 0, 0, 21, 2, 22, 23, 24, 134, 135, 0, 136, 0, 0, 0, 0, 0, 0, 0, 137, 0, 138, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 140, 0, 0, 0, 0, 0, 0, 0, 0, 141, 142, 0, 0, 0, 0, 0, 0, 21, 143, 0, 111, 144, 145, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 111, 145, 0, 0, 0, 0, 0, 146, 147, 0, 0, 0, 0, 0, 0, 0, 0, 148, 34, 35, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 34, 163, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 164, 0, 0, 0, 0, 0, 0, 0, 165, 0, 0, 111, 145, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 163, 0, 0, 21, 166, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 167, 168, 34, 35, 149, 150, 169, 152, 170, 171, 0, 0, 0, 0, 48, 49, 50, 172, 173, 174, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 9, 21, 2, 22, 23, 24, 175, 0, 0, 0, 0, 0, 0, 1, 2, 22, 0, 1, 2, 22, 23, 176, 0, 0, 0, 8, 9, 49, 177, 35, 178, 2, 179, 180, 181, 9, 182, 183, 182, 184, 185, 186, 187, 188, 189, 144, 190, 191, 192, 193, 194, 195, 196, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 197, 198, 199, 0, 0, 0, 0, 0, 0, 0, 34, 35, 149, 150, 200, 0, 0, 0, 0, 0, 0, 7, 8, 9, 1, 2, 201, 8, 9, 1, 2, 201, 8, 9, 0, 111, 8, 9, 0, 0, 0, 0, 202, 49, 104, 29, 0, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 0, 0, 203, 0, 0, 0, 0, 0, 0, 98, 0, 0, 0, 0, 0, 0, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 0, 91, 0, 0, 0, 0, 0, 204, 0, 0, 88, 0, 0, 0, 88, 0, 0, 101, 0, 0, 0, 0, 73, 0, 0, 0, 0, 0, 0, 73, 0, 0, 0, 0, 0, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 107, 0, 0, 0, 0, 205, 0, 0, 0, 0, 0, 0, 0, 0, 206, 0, 0, 0, }; static RE_UINT8 re_numeric_value_stage_5[] = { 0, 0, 0, 0, 2, 27, 29, 31, 33, 35, 37, 39, 41, 43, 0, 0, 0, 0, 29, 31, 0, 27, 0, 0, 12, 17, 22, 0, 0, 0, 2, 27, 29, 31, 33, 35, 37, 39, 41, 43, 3, 7, 10, 12, 22, 50, 0, 0, 0, 0, 12, 17, 22, 3, 7, 10, 44, 89, 98, 0, 27, 29, 31, 0, 44, 89, 98, 12, 17, 22, 0, 0, 41, 43, 17, 28, 30, 32, 34, 36, 38, 40, 42, 1, 0, 27, 29, 31, 41, 43, 44, 54, 64, 74, 84, 85, 86, 87, 88, 89, 107, 0, 0, 0, 0, 0, 51, 52, 53, 0, 0, 0, 41, 43, 27, 0, 2, 0, 0, 0, 8, 6, 5, 13, 21, 11, 15, 19, 23, 9, 24, 7, 14, 20, 25, 27, 27, 29, 31, 33, 35, 37, 39, 41, 43, 44, 45, 46, 84, 89, 93, 98, 98, 102, 107, 0, 0, 37, 84, 111, 116, 2, 0, 0, 47, 48, 49, 50, 51, 52, 53, 54, 0, 0, 2, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 27, 29, 31, 41, 43, 44, 2, 0, 0, 27, 29, 31, 33, 35, 37, 39, 41, 43, 44, 43, 44, 27, 29, 0, 17, 0, 0, 0, 0, 0, 2, 44, 54, 64, 0, 31, 33, 0, 0, 43, 44, 0, 0, 44, 54, 64, 74, 84, 85, 86, 87, 0, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 0, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 0, 35, 0, 0, 0, 0, 0, 29, 0, 0, 35, 0, 0, 39, 0, 0, 27, 0, 0, 39, 0, 0, 0, 107, 0, 31, 0, 0, 0, 43, 0, 0, 29, 0, 0, 0, 35, 0, 33, 0, 0, 0, 0, 128, 44, 0, 0, 0, 0, 0, 0, 98, 31, 0, 0, 0, 89, 0, 0, 0, 128, 0, 0, 0, 0, 0, 130, 0, 0, 29, 0, 41, 0, 37, 0, 0, 0, 44, 0, 98, 54, 64, 0, 0, 74, 0, 0, 0, 0, 31, 31, 31, 0, 0, 0, 33, 0, 0, 27, 0, 0, 0, 43, 54, 0, 0, 44, 0, 41, 0, 0, 0, 0, 0, 39, 0, 0, 0, 43, 0, 0, 0, 89, 0, 0, 0, 33, 0, 0, 0, 29, 0, 0, 98, 0, 0, 0, 0, 37, 0, 37, 0, 0, 0, 0, 0, 2, 0, 39, 41, 43, 2, 12, 17, 22, 3, 7, 10, 0, 0, 0, 0, 0, 31, 0, 0, 0, 44, 0, 37, 0, 37, 0, 44, 0, 0, 0, 0, 0, 27, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 12, 17, 27, 35, 84, 93, 102, 111, 35, 44, 84, 89, 93, 98, 102, 35, 44, 84, 89, 93, 98, 107, 111, 44, 27, 27, 27, 29, 29, 29, 29, 35, 44, 44, 44, 44, 44, 64, 84, 84, 84, 84, 89, 91, 93, 93, 93, 93, 84, 17, 17, 21, 22, 0, 0, 0, 0, 0, 2, 12, 90, 91, 92, 93, 94, 95, 96, 97, 27, 35, 44, 84, 0, 88, 0, 0, 0, 0, 97, 0, 0, 27, 29, 44, 54, 89, 0, 0, 27, 29, 31, 44, 54, 89, 98, 107, 33, 35, 44, 54, 29, 31, 33, 33, 35, 44, 54, 89, 0, 0, 27, 44, 54, 89, 29, 31, 26, 17, 0, 0, 43, 44, 54, 64, 74, 84, 85, 86, 0, 0, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 119, 120, 122, 123, 124, 125, 126, 4, 9, 12, 13, 16, 17, 18, 21, 22, 24, 44, 54, 89, 98, 0, 27, 84, 0, 0, 27, 44, 54, 33, 44, 54, 89, 0, 0, 27, 35, 44, 84, 89, 98, 87, 88, 89, 90, 95, 96, 97, 17, 12, 13, 21, 0, 54, 64, 74, 84, 85, 86, 87, 88, 89, 98, 2, 27, 98, 0, 0, 0, 86, 87, 88, 0, 39, 41, 43, 33, 43, 27, 29, 31, 41, 43, 27, 29, 31, 33, 35, 29, 31, 31, 33, 35, 27, 29, 31, 31, 33, 35, 118, 121, 33, 35, 31, 31, 33, 33, 33, 33, 37, 39, 39, 39, 41, 41, 43, 43, 43, 43, 29, 31, 33, 35, 37, 27, 35, 35, 29, 31, 27, 29, 13, 21, 24, 13, 21, 7, 12, 9, 12, 12, 17, 13, 21, 74, 84, 33, 35, 37, 39, 41, 43, 0, 41, 43, 0, 44, 89, 107, 127, 128, 129, 130, 0, 0, 87, 88, 0, 0, 41, 43, 2, 27, 2, 2, 27, 29, 33, 0, 0, 0, 0, 0, 0, 64, 0, 33, 0, 0, 43, 0, 0, 0, }; /* Numeric_Value: 3228 bytes. */ RE_UINT32 re_get_numeric_value(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 12; code = ch ^ (f << 12); pos = (RE_UINT32)re_numeric_value_stage_1[f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_numeric_value_stage_2[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_numeric_value_stage_3[pos + f] << 3; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_numeric_value_stage_4[pos + f] << 2; value = re_numeric_value_stage_5[pos + code]; return value; } /* Bidi_Mirrored. */ static RE_UINT8 re_bidi_mirrored_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_bidi_mirrored_stage_2[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, }; static RE_UINT8 re_bidi_mirrored_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 1, 4, 5, 1, 6, 7, 8, 1, 9, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 12, 1, 1, 1, 1, }; static RE_UINT8 re_bidi_mirrored_stage_4[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 5, 3, 3, 3, 3, 3, 6, 7, 8, 3, 3, 9, 3, 3, 10, 11, 12, 13, 14, 3, 3, 3, 3, 3, 3, 3, 3, 15, 3, 16, 3, 3, 3, 3, 3, 3, 17, 18, 19, 20, 21, 22, 3, 3, 3, 3, 23, 3, 3, 3, 3, 3, 3, 3, 24, 3, 3, 3, 3, 3, 3, 3, 3, 25, 3, 3, 26, 27, 3, 3, 3, 3, 3, 28, 29, 30, 31, 32, }; static RE_UINT8 re_bidi_mirrored_stage_5[] = { 0, 0, 0, 0, 0, 3, 0, 80, 0, 0, 0, 40, 0, 0, 0, 40, 0, 0, 0, 0, 0, 8, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 96, 0, 0, 0, 0, 0, 0, 96, 0, 96, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 30, 63, 98, 188, 87, 248, 15, 250, 255, 31, 60, 128, 245, 207, 255, 255, 255, 159, 7, 1, 204, 255, 255, 193, 0, 62, 195, 255, 255, 63, 255, 255, 0, 15, 0, 0, 3, 6, 0, 0, 0, 0, 0, 0, 0, 255, 63, 0, 121, 59, 120, 112, 252, 255, 0, 0, 248, 255, 255, 249, 255, 255, 0, 1, 63, 194, 55, 31, 58, 3, 240, 51, 0, 252, 255, 223, 83, 122, 48, 112, 0, 0, 128, 1, 48, 188, 25, 254, 255, 255, 255, 255, 207, 191, 255, 255, 255, 255, 127, 80, 124, 112, 136, 47, 60, 54, 0, 48, 255, 3, 0, 0, 0, 255, 243, 15, 0, 0, 0, 0, 0, 0, 0, 126, 48, 0, 0, 0, 0, 3, 0, 80, 0, 0, 0, 40, 0, 0, 0, 168, 13, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, }; /* Bidi_Mirrored: 489 bytes. */ RE_UINT32 re_get_bidi_mirrored(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_bidi_mirrored_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_bidi_mirrored_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_bidi_mirrored_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_bidi_mirrored_stage_4[pos + f] << 6; pos += code; value = (re_bidi_mirrored_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Indic_Positional_Category. */ static RE_UINT8 re_indic_positional_category_stage_1[] = { 0, 1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_indic_positional_category_stage_2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 9, 0, 10, 11, 12, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 15, 16, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 0, 0, 0, 0, 0, 19, 20, 21, 22, 23, 24, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_indic_positional_category_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 3, 4, 5, 0, 6, 0, 0, 7, 8, 9, 5, 0, 10, 0, 0, 7, 11, 0, 0, 12, 10, 0, 0, 7, 13, 0, 5, 0, 6, 0, 0, 14, 15, 16, 5, 0, 17, 0, 0, 18, 19, 9, 0, 0, 20, 0, 0, 21, 22, 23, 5, 0, 6, 0, 0, 14, 24, 25, 5, 0, 6, 0, 0, 18, 26, 9, 5, 0, 27, 0, 0, 0, 28, 29, 0, 27, 0, 0, 0, 30, 31, 0, 0, 0, 0, 0, 0, 32, 33, 0, 0, 0, 0, 34, 0, 35, 0, 0, 0, 36, 37, 38, 39, 40, 41, 0, 0, 0, 0, 0, 42, 43, 0, 44, 45, 46, 47, 48, 0, 0, 0, 0, 0, 0, 0, 49, 0, 49, 0, 50, 0, 50, 0, 0, 0, 51, 52, 53, 0, 0, 0, 0, 54, 55, 0, 0, 0, 0, 0, 0, 0, 56, 57, 0, 0, 0, 0, 58, 0, 0, 0, 59, 60, 61, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 63, 64, 0, 65, 66, 67, 0, 68, 0, 0, 0, 69, 70, 0, 0, 71, 72, 0, 0, 0, 0, 0, 0, 0, 0, 0, 73, 74, 75, 76, 0, 77, 0, 0, 0, 0, 0, 78, 0, 0, 79, 80, 0, 81, 82, 0, 0, 83, 0, 84, 70, 0, 0, 1, 0, 0, 85, 86, 0, 87, 0, 0, 0, 88, 89, 90, 0, 0, 91, 0, 0, 0, 92, 93, 0, 94, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 0, 97, 0, 0, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 0, 0, 100, 101, 0, 0, 0, 67, 0, 0, 102, 0, 0, 0, 0, 103, 0, 104, 105, 0, 0, 0, 106, 67, 0, 0, 107, 108, 0, 0, 0, 0, 0, 109, 110, 0, 0, 0, 0, 0, 0, 0, 0, 0, 111, 112, 0, 6, 0, 0, 18, 113, 9, 114, 115, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 116, 117, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 118, 119, 120, 121, 0, 0, 0, 0, 0, 122, 123, 0, 0, 0, 0, 0, 124, 125, 0, 0, 0, 0, 0, 126, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_indic_positional_category_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 4, 5, 6, 7, 1, 2, 8, 5, 9, 10, 7, 1, 6, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 10, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 4, 5, 6, 3, 11, 12, 13, 14, 0, 0, 0, 0, 15, 0, 0, 0, 0, 10, 2, 0, 0, 0, 0, 0, 0, 5, 3, 0, 10, 16, 10, 17, 0, 1, 0, 18, 0, 0, 0, 0, 0, 5, 6, 7, 10, 19, 15, 5, 0, 0, 0, 0, 0, 0, 0, 3, 20, 5, 6, 3, 11, 21, 13, 22, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 2, 23, 0, 24, 12, 25, 26, 0, 2, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 8, 23, 1, 27, 1, 1, 0, 0, 0, 10, 3, 0, 0, 0, 0, 28, 8, 23, 19, 29, 30, 1, 0, 0, 0, 15, 23, 0, 0, 0, 0, 8, 5, 3, 24, 12, 25, 26, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 15, 8, 1, 3, 3, 4, 31, 32, 33, 20, 8, 1, 1, 6, 3, 0, 0, 34, 34, 35, 10, 1, 1, 1, 16, 20, 8, 1, 1, 6, 10, 3, 0, 34, 34, 36, 0, 1, 1, 1, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 18, 18, 10, 0, 0, 4, 18, 37, 6, 38, 38, 1, 1, 2, 37, 1, 3, 1, 0, 0, 18, 6, 6, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 20, 17, 39, 1, 1, 17, 23, 2, 18, 3, 0, 0, 0, 8, 6, 0, 0, 6, 3, 8, 23, 15, 8, 8, 8, 0, 10, 1, 16, 0, 0, 0, 0, 0, 0, 40, 41, 2, 8, 8, 5, 15, 0, 0, 0, 0, 0, 8, 20, 0, 0, 17, 3, 0, 0, 0, 0, 0, 0, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 1, 17, 6, 42, 43, 24, 25, 2, 20, 1, 1, 1, 1, 10, 0, 0, 0, 0, 10, 0, 1, 40, 44, 45, 2, 8, 0, 0, 8, 40, 8, 8, 5, 17, 0, 0, 8, 8, 46, 34, 8, 35, 8, 8, 23, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 10, 39, 20, 0, 0, 0, 0, 11, 40, 1, 17, 6, 3, 15, 2, 20, 1, 17, 7, 40, 24, 24, 41, 1, 1, 1, 1, 16, 18, 1, 1, 23, 0, 0, 0, 0, 0, 0, 0, 2, 1, 6, 47, 48, 24, 25, 19, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 7, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 23, 0, 0, 0, 0, 0, 0, 15, 6, 17, 9, 1, 23, 6, 0, 0, 0, 0, 2, 1, 8, 20, 20, 1, 8, 0, 0, 0, 0, 0, 0, 0, 0, 8, 4, 49, 8, 7, 1, 1, 1, 24, 17, 0, 0, 0, 0, 1, 16, 50, 6, 6, 1, 6, 6, 2, 51, 51, 51, 52, 0, 18, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 10, 0, 0, 0, 15, 5, 2, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 3, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 6, 0, 0, 0, 0, 18, 6, 17, 6, 7, 0, 10, 8, 1, 6, 24, 2, 8, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 1, 17, 54, 41, 40, 55, 3, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 15, 2, 0, 2, 1, 56, 57, 58, 46, 35, 1, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 7, 9, 0, 0, 15, 0, 0, 0, 0, 0, 0, 15, 20, 8, 40, 23, 5, 0, 59, 6, 10, 52, 0, 0, 6, 7, 0, 0, 0, 0, 17, 3, 0, 0, 20, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 6, 6, 6, 1, 1, 16, 0, 0, 0, 0, 4, 5, 7, 2, 5, 3, 0, 0, 1, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 1, 6, 41, 38, 17, 3, 16, 0, 0, 0, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 0, 15, 9, 6, 6, 6, 1, 19, 23, 0, 0, 0, 0, 10, 3, 0, 0, 0, 0, 0, 0, 0, 8, 5, 1, 30, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 4, 5, 7, 1, 17, 3, 0, 0, 2, 8, 23, 11, 12, 13, 33, 0, 0, 8, 0, 1, 1, 1, 16, 0, 1, 1, 16, 0, 0, 0, 0, 0, 4, 5, 6, 6, 39, 60, 33, 26, 2, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 9, 6, 6, 0, 49, 32, 1, 5, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 8, 5, 6, 6, 7, 2, 20, 5, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 20, 9, 6, 1, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 10, 8, 1, 6, 41, 7, 1, 0, 0, }; static RE_UINT8 re_indic_positional_category_stage_5[] = { 0, 0, 5, 5, 5, 1, 6, 0, 1, 2, 1, 6, 6, 6, 6, 5, 1, 1, 2, 1, 0, 5, 0, 2, 2, 0, 0, 4, 4, 6, 0, 1, 5, 0, 5, 6, 0, 6, 5, 8, 1, 5, 9, 0, 10, 6, 1, 0, 2, 2, 4, 4, 4, 5, 7, 0, 8, 1, 8, 0, 8, 8, 9, 2, 4, 10, 4, 1, 3, 3, 3, 1, 3, 0, 5, 7, 7, 7, 6, 2, 6, 1, 2, 5, 9, 10, 4, 2, 1, 8, 8, 5, 1, 3, 6, 11, 7, 12, 2, 9, 13, 6, 13, 13, 13, 0, 11, 0, 5, 2, 2, 6, 6, 3, 3, 5, 5, 3, 0, 13, 5, 9, }; /* Indic_Positional_Category: 1842 bytes. */ RE_UINT32 re_get_indic_positional_category(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_indic_positional_category_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_indic_positional_category_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_indic_positional_category_stage_3[pos + f] << 3; f = code >> 1; code ^= f << 1; pos = (RE_UINT32)re_indic_positional_category_stage_4[pos + f] << 1; value = re_indic_positional_category_stage_5[pos + code]; return value; } /* Indic_Syllabic_Category. */ static RE_UINT8 re_indic_syllabic_category_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_indic_syllabic_category_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 1, 1, 1, 1, 1, 10, 1, 11, 12, 13, 14, 1, 1, 1, 15, 1, 1, 1, 1, 16, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 17, 18, 19, 20, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 21, 1, 1, 1, 1, 1, 22, 23, 24, 25, 26, 27, 28, 29, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_indic_syllabic_category_stage_3[] = { 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 3, 4, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 12, 20, 21, 15, 16, 22, 23, 24, 25, 26, 27, 28, 16, 29, 30, 0, 12, 31, 14, 15, 16, 29, 32, 33, 12, 34, 35, 36, 37, 38, 39, 40, 25, 0, 41, 42, 16, 43, 44, 45, 12, 0, 46, 42, 16, 47, 44, 48, 12, 49, 46, 42, 8, 50, 51, 52, 12, 53, 54, 55, 8, 56, 57, 58, 25, 59, 60, 8, 61, 62, 63, 2, 0, 0, 64, 65, 66, 67, 68, 69, 0, 0, 0, 0, 70, 71, 72, 8, 73, 74, 75, 76, 77, 78, 79, 0, 0, 0, 8, 8, 80, 81, 82, 83, 84, 85, 86, 87, 0, 0, 0, 0, 0, 0, 88, 89, 90, 89, 90, 91, 88, 92, 8, 8, 93, 94, 95, 96, 2, 0, 97, 61, 98, 99, 25, 8, 100, 101, 8, 8, 102, 103, 104, 2, 0, 0, 8, 105, 8, 8, 106, 107, 108, 109, 2, 2, 0, 0, 0, 0, 0, 0, 110, 90, 8, 111, 112, 2, 0, 0, 113, 8, 114, 115, 8, 8, 116, 117, 8, 8, 118, 119, 120, 0, 0, 0, 0, 0, 0, 0, 0, 121, 122, 123, 124, 125, 0, 0, 0, 0, 0, 126, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 129, 8, 130, 0, 8, 131, 132, 133, 134, 135, 8, 136, 137, 2, 138, 122, 139, 8, 140, 8, 141, 142, 0, 0, 143, 8, 8, 144, 145, 2, 146, 147, 148, 8, 149, 150, 151, 2, 8, 152, 8, 8, 8, 153, 154, 0, 155, 156, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 157, 158, 159, 2, 160, 161, 8, 162, 163, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 164, 90, 8, 165, 166, 167, 168, 169, 170, 8, 8, 171, 0, 0, 0, 0, 172, 8, 173, 174, 0, 175, 8, 176, 177, 178, 8, 179, 180, 2, 181, 182, 183, 184, 185, 186, 0, 0, 0, 0, 187, 188, 189, 190, 8, 191, 192, 2, 193, 15, 16, 29, 32, 40, 194, 195, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 196, 8, 8, 197, 198, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 199, 8, 200, 201, 202, 203, 0, 0, 199, 8, 8, 204, 205, 2, 0, 0, 190, 8, 206, 207, 2, 0, 0, 0, 8, 208, 209, 210, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_indic_syllabic_category_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 4, 0, 0, 0, 5, 0, 0, 0, 0, 6, 0, 0, 7, 8, 8, 8, 8, 9, 10, 10, 10, 10, 10, 10, 10, 10, 11, 12, 13, 13, 13, 14, 15, 16, 10, 10, 17, 18, 2, 2, 19, 8, 10, 10, 20, 21, 8, 22, 22, 9, 10, 10, 10, 10, 23, 10, 24, 25, 26, 12, 13, 27, 27, 28, 0, 29, 0, 30, 26, 0, 0, 0, 20, 21, 31, 32, 23, 33, 26, 34, 35, 29, 27, 36, 0, 0, 37, 24, 0, 18, 2, 2, 38, 39, 0, 0, 20, 21, 8, 40, 40, 9, 10, 10, 23, 37, 26, 12, 13, 41, 41, 36, 0, 0, 42, 0, 13, 27, 27, 36, 0, 43, 0, 30, 42, 0, 0, 0, 44, 21, 31, 19, 45, 46, 33, 23, 47, 48, 49, 25, 10, 10, 26, 43, 35, 43, 50, 36, 0, 29, 0, 0, 7, 21, 8, 45, 45, 9, 10, 10, 10, 10, 26, 51, 13, 50, 50, 36, 0, 52, 49, 0, 20, 21, 8, 45, 10, 37, 26, 12, 0, 52, 0, 53, 54, 0, 0, 0, 10, 10, 49, 51, 13, 50, 50, 55, 0, 29, 0, 32, 0, 0, 56, 57, 58, 21, 8, 8, 8, 31, 25, 10, 30, 10, 10, 42, 10, 49, 59, 29, 13, 60, 13, 13, 43, 0, 0, 0, 37, 10, 10, 10, 10, 10, 10, 49, 13, 13, 61, 0, 13, 41, 62, 63, 33, 64, 24, 42, 0, 10, 37, 10, 37, 65, 25, 33, 13, 13, 41, 66, 13, 67, 62, 68, 2, 2, 3, 10, 2, 2, 2, 2, 2, 69, 70, 0, 10, 10, 37, 10, 10, 10, 10, 48, 16, 13, 13, 71, 72, 73, 74, 75, 76, 76, 77, 76, 76, 76, 76, 76, 76, 76, 76, 78, 0, 79, 0, 0, 80, 8, 81, 13, 13, 82, 83, 84, 2, 2, 3, 85, 86, 17, 87, 88, 89, 90, 91, 92, 93, 94, 10, 10, 95, 96, 62, 97, 2, 2, 98, 99, 100, 10, 10, 23, 11, 101, 0, 0, 100, 10, 10, 10, 11, 0, 0, 0, 102, 0, 0, 0, 103, 8, 8, 8, 8, 43, 13, 13, 13, 71, 104, 105, 106, 0, 0, 107, 108, 10, 10, 10, 13, 13, 109, 0, 110, 111, 112, 0, 113, 114, 114, 115, 116, 117, 0, 0, 10, 10, 10, 0, 13, 13, 13, 13, 118, 111, 119, 0, 10, 120, 13, 0, 10, 10, 10, 80, 100, 121, 111, 122, 123, 13, 13, 13, 13, 91, 124, 125, 126, 127, 8, 8, 10, 128, 13, 13, 13, 129, 10, 0, 130, 8, 131, 10, 132, 13, 133, 134, 2, 2, 135, 136, 10, 137, 13, 13, 138, 0, 0, 0, 10, 139, 13, 118, 111, 140, 0, 0, 2, 2, 3, 37, 141, 142, 142, 142, 143, 0, 0, 0, 144, 145, 143, 0, 0, 0, 0, 146, 147, 4, 0, 0, 0, 148, 0, 0, 5, 148, 0, 0, 0, 0, 0, 4, 40, 149, 150, 10, 120, 13, 0, 0, 10, 10, 10, 151, 152, 153, 154, 10, 155, 0, 0, 0, 156, 8, 8, 8, 131, 10, 10, 10, 10, 157, 13, 13, 13, 158, 0, 0, 142, 142, 142, 142, 2, 2, 159, 10, 151, 114, 160, 119, 10, 120, 13, 161, 162, 0, 0, 0, 163, 8, 9, 100, 164, 13, 13, 165, 158, 0, 0, 0, 10, 166, 10, 10, 2, 2, 159, 49, 8, 131, 10, 10, 10, 10, 93, 13, 167, 168, 0, 0, 111, 111, 111, 169, 37, 0, 170, 92, 13, 13, 13, 96, 171, 0, 0, 0, 131, 10, 120, 13, 0, 172, 0, 0, 10, 10, 10, 86, 173, 10, 174, 111, 175, 13, 35, 176, 93, 52, 0, 71, 10, 37, 37, 10, 10, 0, 177, 178, 2, 2, 0, 0, 179, 180, 8, 8, 10, 10, 13, 13, 13, 181, 0, 0, 182, 183, 183, 183, 183, 184, 2, 2, 0, 0, 0, 185, 186, 8, 8, 9, 13, 13, 187, 0, 186, 100, 10, 10, 10, 120, 13, 13, 188, 189, 2, 2, 114, 190, 10, 10, 164, 0, 0, 0, 186, 8, 8, 8, 9, 10, 10, 10, 120, 13, 13, 13, 191, 0, 192, 67, 193, 2, 2, 2, 2, 194, 0, 0, 8, 8, 10, 10, 30, 10, 10, 10, 10, 10, 10, 13, 13, 195, 0, 0, 8, 49, 23, 30, 10, 10, 10, 30, 10, 10, 48, 0, 8, 8, 131, 10, 10, 10, 10, 150, 13, 13, 196, 0, 7, 21, 8, 22, 17, 197, 142, 145, 142, 145, 0, 0, 21, 8, 8, 100, 13, 13, 13, 198, 199, 107, 0, 0, 8, 8, 8, 131, 10, 10, 10, 120, 13, 99, 13, 200, 201, 0, 0, 0, 0, 0, 8, 99, 13, 13, 13, 202, 67, 0, 0, 0, 10, 10, 150, 203, 13, 204, 0, 0, 10, 10, 26, 205, 13, 13, 206, 0, 2, 2, 2, 0, }; static RE_UINT8 re_indic_syllabic_category_stage_5[] = { 0, 0, 0, 0, 0, 11, 0, 0, 33, 33, 33, 33, 33, 33, 0, 0, 11, 0, 0, 0, 0, 0, 28, 28, 0, 0, 0, 11, 1, 1, 1, 2, 8, 8, 8, 8, 8, 12, 12, 12, 12, 12, 12, 12, 12, 12, 9, 9, 4, 3, 9, 9, 9, 9, 9, 9, 9, 5, 9, 9, 0, 26, 26, 0, 0, 9, 9, 9, 8, 8, 9, 9, 0, 0, 33, 33, 0, 0, 8, 8, 0, 1, 1, 2, 0, 8, 8, 8, 8, 0, 0, 8, 12, 0, 12, 12, 12, 0, 12, 0, 0, 0, 12, 12, 12, 12, 0, 0, 9, 0, 0, 9, 9, 5, 13, 0, 0, 0, 0, 9, 12, 12, 0, 12, 8, 8, 8, 0, 0, 0, 0, 8, 0, 12, 12, 0, 4, 0, 9, 9, 9, 9, 9, 0, 9, 5, 0, 0, 0, 12, 12, 12, 1, 25, 11, 11, 0, 19, 0, 0, 8, 8, 0, 8, 9, 9, 0, 9, 0, 12, 0, 0, 0, 0, 9, 9, 0, 0, 1, 22, 8, 0, 8, 8, 8, 12, 0, 0, 0, 0, 0, 12, 12, 0, 0, 0, 12, 12, 12, 0, 9, 0, 9, 9, 0, 3, 9, 9, 0, 9, 9, 0, 0, 0, 12, 0, 0, 14, 14, 0, 9, 5, 16, 0, 0, 0, 13, 13, 13, 13, 13, 13, 0, 0, 1, 2, 0, 0, 5, 0, 9, 0, 9, 0, 9, 9, 6, 0, 24, 24, 24, 24, 29, 1, 6, 0, 12, 0, 0, 12, 0, 12, 0, 12, 19, 19, 0, 0, 9, 0, 0, 0, 0, 1, 0, 0, 0, 28, 0, 28, 0, 4, 0, 0, 9, 9, 1, 2, 9, 9, 1, 1, 6, 3, 0, 0, 21, 21, 21, 21, 21, 18, 18, 18, 18, 18, 18, 18, 0, 18, 18, 18, 18, 0, 0, 0, 0, 0, 28, 0, 12, 8, 8, 8, 8, 8, 8, 9, 9, 9, 1, 24, 2, 7, 6, 19, 19, 19, 19, 12, 0, 0, 11, 0, 12, 12, 8, 8, 9, 9, 12, 12, 12, 12, 19, 19, 19, 12, 9, 24, 24, 12, 12, 9, 9, 24, 24, 24, 24, 24, 12, 12, 12, 9, 9, 9, 9, 12, 12, 12, 12, 12, 19, 9, 9, 9, 9, 24, 24, 24, 12, 24, 33, 33, 24, 24, 9, 9, 0, 0, 8, 8, 8, 12, 6, 0, 0, 0, 12, 0, 9, 9, 12, 12, 12, 8, 9, 27, 27, 28, 17, 29, 28, 28, 28, 6, 7, 28, 3, 0, 0, 0, 11, 12, 12, 12, 9, 18, 18, 18, 20, 20, 1, 20, 20, 20, 20, 20, 20, 20, 9, 28, 12, 12, 12, 10, 10, 10, 10, 10, 10, 10, 0, 0, 23, 23, 23, 23, 23, 0, 0, 0, 9, 20, 20, 20, 24, 24, 0, 0, 12, 12, 12, 9, 12, 19, 19, 20, 20, 20, 20, 0, 7, 9, 9, 9, 24, 24, 28, 28, 28, 0, 0, 28, 1, 1, 1, 17, 2, 8, 8, 8, 4, 9, 9, 9, 5, 12, 12, 12, 1, 17, 2, 8, 8, 8, 12, 12, 12, 18, 18, 18, 9, 9, 6, 7, 18, 18, 12, 12, 33, 33, 3, 12, 12, 12, 20, 20, 8, 8, 4, 9, 20, 20, 6, 6, 18, 18, 9, 9, 1, 1, 28, 4, 26, 26, 26, 0, 26, 26, 26, 26, 26, 26, 0, 0, 0, 0, 2, 2, 26, 0, 0, 0, 30, 31, 0, 0, 11, 11, 11, 11, 28, 0, 0, 0, 8, 8, 6, 12, 12, 12, 12, 1, 12, 12, 10, 10, 10, 10, 12, 12, 12, 12, 10, 18, 18, 12, 12, 12, 12, 18, 12, 1, 1, 2, 8, 8, 20, 9, 9, 9, 5, 0, 0, 0, 33, 33, 12, 12, 10, 10, 10, 24, 9, 9, 9, 20, 20, 20, 20, 6, 1, 1, 17, 2, 12, 12, 12, 4, 9, 18, 19, 19, 12, 9, 0, 12, 9, 9, 9, 19, 19, 19, 19, 0, 20, 20, 0, 0, 0, 0, 12, 24, 23, 24, 23, 0, 0, 2, 7, 0, 12, 8, 12, 12, 12, 12, 12, 20, 20, 20, 20, 9, 24, 6, 0, 0, 4, 4, 4, 0, 0, 0, 0, 7, 1, 1, 2, 14, 14, 8, 8, 8, 9, 9, 5, 0, 0, 0, 34, 34, 34, 34, 34, 34, 34, 34, 33, 33, 0, 0, 0, 32, 1, 1, 2, 8, 9, 5, 4, 0, 9, 9, 9, 7, 6, 0, 33, 33, 10, 12, 12, 12, 5, 3, 15, 15, 0, 0, 4, 9, 0, 33, 33, 33, 33, 0, 0, 0, 1, 5, 4, 25, 9, 4, 6, 0, 0, 0, 26, 26, 9, 9, 9, 1, 1, 2, 5, 4, 1, 1, 2, 5, 4, 0, 0, 0, 9, 1, 2, 5, 2, 9, 9, 9, 9, 9, 5, 4, 0, 19, 19, 19, 9, 9, 9, 6, }; /* Indic_Syllabic_Category: 2448 bytes. */ RE_UINT32 re_get_indic_syllabic_category(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_indic_syllabic_category_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_indic_syllabic_category_stage_2[pos + f] << 4; f = code >> 4; code ^= f << 4; pos = (RE_UINT32)re_indic_syllabic_category_stage_3[pos + f] << 2; f = code >> 2; code ^= f << 2; pos = (RE_UINT32)re_indic_syllabic_category_stage_4[pos + f] << 2; value = re_indic_syllabic_category_stage_5[pos + code]; return value; } /* Alphanumeric. */ static RE_UINT8 re_alphanumeric_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_alphanumeric_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 27, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_alphanumeric_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 31, 36, 37, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 48, 49, 1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 1, 59, 60, 61, 62, 63, 64, 31, 31, 31, 65, 66, 67, 68, 69, 70, 71, 72, 73, 31, 74, 31, 31, 31, 31, 31, 1, 1, 1, 75, 76, 77, 31, 31, 1, 1, 1, 1, 78, 31, 31, 31, 31, 31, 31, 31, 1, 1, 79, 31, 1, 1, 80, 81, 31, 31, 31, 82, 83, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 84, 31, 31, 31, 31, 31, 31, 31, 85, 86, 87, 88, 89, 31, 31, 31, 31, 31, 90, 31, 31, 91, 31, 31, 31, 31, 31, 31, 1, 1, 1, 1, 1, 1, 92, 1, 1, 1, 1, 1, 1, 1, 1, 93, 94, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 95, 31, 1, 1, 96, 31, 31, 31, 31, 31, }; static RE_UINT8 re_alphanumeric_stage_4[] = { 0, 1, 2, 2, 0, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 0, 0, 8, 9, 10, 11, 5, 12, 5, 5, 5, 5, 13, 5, 5, 5, 5, 14, 15, 16, 17, 18, 19, 20, 21, 5, 22, 23, 5, 5, 24, 25, 26, 5, 27, 5, 5, 28, 5, 29, 30, 31, 32, 0, 0, 33, 0, 34, 5, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 47, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 61, 65, 66, 67, 68, 69, 70, 71, 16, 72, 73, 0, 74, 75, 76, 0, 77, 78, 79, 80, 81, 82, 0, 0, 5, 83, 84, 85, 86, 5, 87, 88, 5, 5, 89, 5, 90, 91, 92, 5, 93, 5, 94, 0, 95, 5, 5, 96, 16, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 97, 2, 5, 5, 98, 99, 100, 100, 101, 5, 102, 103, 78, 1, 5, 5, 104, 5, 105, 5, 106, 107, 108, 109, 110, 5, 111, 112, 0, 113, 5, 107, 114, 112, 115, 0, 0, 5, 116, 117, 0, 5, 118, 5, 119, 5, 106, 120, 121, 0, 0, 0, 122, 5, 5, 5, 5, 5, 5, 0, 123, 96, 5, 124, 121, 5, 125, 126, 127, 0, 0, 0, 128, 129, 0, 0, 0, 130, 131, 132, 5, 133, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 5, 78, 5, 135, 107, 5, 5, 5, 5, 136, 5, 87, 5, 137, 138, 139, 139, 5, 0, 140, 0, 0, 0, 0, 0, 0, 141, 142, 16, 5, 143, 16, 5, 88, 144, 145, 5, 5, 146, 72, 0, 26, 5, 5, 5, 5, 5, 106, 0, 0, 5, 5, 5, 5, 5, 5, 106, 0, 5, 5, 5, 5, 31, 0, 26, 121, 147, 148, 5, 149, 5, 5, 5, 95, 150, 151, 5, 5, 152, 153, 0, 150, 154, 17, 5, 100, 5, 5, 155, 156, 5, 105, 157, 82, 5, 158, 159, 160, 5, 138, 161, 162, 5, 107, 163, 164, 165, 166, 88, 167, 5, 5, 5, 168, 5, 5, 5, 5, 5, 169, 170, 113, 5, 5, 5, 171, 5, 5, 172, 0, 173, 174, 175, 5, 5, 28, 176, 5, 5, 121, 26, 5, 177, 5, 17, 178, 0, 0, 0, 179, 5, 5, 5, 82, 1, 2, 2, 109, 5, 107, 180, 0, 181, 182, 183, 0, 5, 5, 5, 72, 0, 0, 5, 33, 0, 0, 0, 0, 0, 0, 0, 0, 82, 5, 184, 0, 5, 26, 105, 72, 121, 5, 185, 0, 5, 5, 5, 5, 121, 78, 0, 0, 5, 186, 5, 187, 0, 0, 0, 0, 5, 138, 106, 17, 0, 0, 0, 0, 188, 189, 106, 138, 107, 0, 0, 190, 106, 172, 0, 0, 5, 191, 0, 0, 192, 100, 0, 82, 82, 0, 79, 193, 5, 106, 106, 157, 28, 0, 0, 0, 5, 5, 133, 0, 5, 157, 5, 157, 5, 5, 194, 56, 151, 32, 26, 195, 5, 196, 26, 197, 5, 5, 198, 0, 199, 200, 0, 0, 201, 202, 5, 195, 38, 47, 203, 187, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 204, 0, 0, 0, 0, 0, 5, 205, 206, 0, 5, 107, 207, 0, 5, 106, 78, 0, 208, 168, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 209, 0, 0, 0, 0, 0, 0, 5, 32, 5, 5, 5, 5, 172, 0, 0, 0, 5, 5, 5, 146, 5, 5, 5, 5, 5, 5, 187, 0, 0, 0, 0, 0, 5, 146, 0, 0, 0, 0, 0, 0, 5, 5, 210, 0, 0, 0, 0, 0, 5, 32, 107, 78, 0, 0, 26, 211, 5, 138, 155, 212, 95, 0, 0, 0, 5, 5, 213, 107, 176, 0, 0, 0, 214, 0, 0, 0, 0, 0, 0, 0, 5, 5, 5, 215, 216, 0, 0, 0, 5, 5, 217, 5, 218, 219, 220, 5, 221, 222, 223, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 224, 225, 88, 217, 217, 135, 135, 226, 226, 227, 5, 5, 5, 5, 5, 5, 5, 193, 0, 220, 228, 229, 230, 231, 232, 0, 0, 0, 26, 84, 84, 78, 0, 0, 0, 5, 5, 5, 5, 5, 5, 138, 0, 5, 33, 5, 5, 5, 5, 5, 5, 121, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 214, 0, 0, 121, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_alphanumeric_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 32, 0, 0, 0, 0, 0, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 255, 255, 254, 255, 195, 255, 255, 255, 255, 239, 31, 254, 225, 255, 159, 0, 0, 255, 255, 0, 224, 255, 255, 255, 255, 3, 0, 255, 7, 48, 4, 255, 255, 255, 252, 255, 31, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 248, 3, 255, 255, 255, 255, 255, 239, 255, 223, 225, 255, 207, 255, 254, 255, 239, 159, 249, 255, 255, 253, 197, 227, 159, 89, 128, 176, 207, 255, 3, 0, 238, 135, 249, 255, 255, 253, 109, 195, 135, 25, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 227, 191, 27, 1, 0, 207, 255, 0, 2, 238, 159, 249, 255, 159, 25, 192, 176, 207, 255, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 29, 129, 0, 192, 255, 0, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 29, 96, 7, 207, 255, 0, 0, 238, 223, 253, 255, 255, 253, 239, 227, 223, 29, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 93, 128, 128, 207, 255, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 128, 95, 255, 192, 255, 12, 0, 255, 255, 255, 7, 127, 32, 255, 3, 150, 37, 240, 254, 174, 236, 255, 59, 95, 32, 255, 243, 1, 0, 0, 0, 255, 3, 0, 0, 255, 254, 255, 255, 255, 31, 254, 255, 3, 255, 255, 254, 255, 255, 255, 31, 255, 255, 127, 249, 255, 3, 255, 255, 231, 193, 255, 255, 127, 64, 255, 51, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 135, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 15, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 207, 255, 255, 1, 128, 16, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 1, 192, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 3, 255, 255, 255, 15, 254, 255, 31, 0, 128, 0, 0, 0, 255, 255, 239, 255, 239, 15, 255, 3, 255, 243, 255, 255, 191, 255, 3, 0, 255, 227, 255, 255, 255, 255, 255, 63, 0, 222, 111, 0, 128, 255, 31, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 62, 80, 189, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 0, 0, 192, 255, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 255, 255, 127, 0, 127, 127, 127, 127, 0, 128, 0, 0, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 224, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 255, 15, 0, 0, 255, 127, 240, 143, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 15, 0, 255, 3, 0, 0, 252, 40, 255, 255, 7, 0, 255, 255, 247, 255, 0, 128, 255, 3, 223, 255, 255, 127, 255, 63, 255, 3, 255, 255, 127, 196, 5, 0, 0, 56, 255, 255, 60, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 7, 255, 3, 15, 0, 255, 255, 127, 248, 255, 255, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 0, 0, 223, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 15, 255, 62, 0, 255, 0, 255, 255, 15, 0, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 31, 0, 0, 0, 63, 0, 0, 0, 255, 1, 255, 3, 255, 255, 199, 255, 255, 255, 71, 0, 30, 0, 255, 23, 255, 255, 251, 255, 255, 255, 159, 0, 127, 189, 255, 191, 255, 1, 255, 255, 159, 25, 129, 224, 179, 0, 255, 3, 255, 255, 63, 127, 0, 0, 0, 63, 17, 0, 255, 3, 255, 255, 255, 227, 255, 3, 0, 128, 127, 0, 0, 0, 255, 63, 0, 0, 248, 255, 255, 224, 31, 0, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 67, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 207, 255, 255, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* Alphanumeric: 2117 bytes. */ RE_UINT32 re_get_alphanumeric(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_alphanumeric_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_alphanumeric_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_alphanumeric_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_alphanumeric_stage_4[pos + f] << 5; pos += code; value = (re_alphanumeric_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Any. */ RE_UINT32 re_get_any(RE_UINT32 ch) { return 1; } /* Blank. */ static RE_UINT8 re_blank_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_blank_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_blank_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_blank_stage_4[] = { 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 4, 5, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_blank_stage_5[] = { 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 255, 7, 0, 0, 0, 128, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, }; /* Blank: 169 bytes. */ RE_UINT32 re_get_blank(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_blank_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_blank_stage_2[pos + f] << 4; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_blank_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_blank_stage_4[pos + f] << 6; pos += code; value = (re_blank_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Graph. */ static RE_UINT8 re_graph_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 4, 8, 4, 8, }; static RE_UINT8 re_graph_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 7, 7, 7, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 28, 29, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 30, 7, 31, 32, 7, 33, 13, 13, 13, 13, 13, 34, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 35, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 36, }; static RE_UINT8 re_graph_stage_3[] = { 0, 1, 1, 2, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 15, 16, 1, 1, 17, 18, 19, 20, 21, 22, 23, 24, 1, 25, 26, 27, 1, 28, 29, 1, 1, 1, 1, 1, 1, 30, 31, 32, 33, 34, 35, 36, 37, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 48, 48, 48, 48, 48, 48, 48, 48, 1, 1, 49, 50, 1, 51, 52, 53, 54, 55, 56, 57, 58, 59, 1, 60, 61, 62, 63, 64, 65, 48, 66, 48, 67, 68, 69, 70, 71, 72, 73, 74, 75, 48, 76, 48, 48, 48, 48, 48, 1, 1, 1, 77, 78, 79, 48, 48, 1, 1, 1, 1, 80, 48, 48, 48, 48, 48, 48, 48, 1, 1, 81, 48, 1, 1, 82, 83, 48, 48, 48, 84, 85, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 86, 48, 48, 48, 87, 88, 89, 90, 91, 92, 93, 94, 1, 1, 95, 48, 48, 48, 48, 48, 96, 48, 48, 48, 48, 48, 97, 48, 98, 99, 100, 1, 1, 101, 102, 103, 104, 105, 48, 48, 48, 48, 48, 48, 1, 1, 1, 1, 1, 1, 106, 1, 1, 1, 1, 1, 1, 1, 1, 107, 108, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 109, 48, 1, 1, 110, 48, 48, 48, 48, 48, 111, 112, 48, 48, 48, 48, 48, 48, 1, 1, 1, 1, 1, 1, 1, 113, }; static RE_UINT8 re_graph_stage_4[] = { 0, 1, 2, 3, 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 5, 6, 2, 2, 2, 7, 8, 1, 9, 2, 10, 11, 12, 2, 2, 2, 2, 2, 2, 2, 13, 2, 14, 2, 2, 15, 2, 16, 2, 17, 18, 0, 0, 19, 0, 20, 2, 2, 2, 2, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 44, 48, 49, 50, 51, 52, 53, 54, 1, 55, 56, 0, 57, 58, 59, 0, 2, 2, 60, 61, 62, 12, 63, 0, 2, 2, 2, 2, 2, 2, 64, 2, 2, 2, 65, 2, 66, 67, 68, 2, 69, 2, 48, 70, 71, 2, 2, 72, 2, 2, 2, 2, 73, 2, 2, 74, 75, 76, 77, 78, 2, 2, 79, 80, 81, 2, 2, 82, 2, 83, 2, 84, 3, 85, 86, 87, 2, 88, 89, 2, 90, 2, 3, 91, 80, 17, 0, 0, 2, 2, 88, 70, 2, 2, 2, 92, 2, 93, 94, 2, 0, 0, 10, 95, 2, 2, 2, 2, 2, 2, 2, 96, 72, 2, 97, 79, 2, 98, 99, 100, 101, 102, 3, 103, 104, 3, 105, 106, 2, 2, 2, 2, 88, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 16, 2, 107, 108, 2, 2, 2, 2, 2, 2, 2, 2, 109, 110, 111, 112, 113, 2, 114, 3, 2, 2, 2, 2, 115, 2, 64, 2, 116, 76, 117, 117, 2, 2, 2, 118, 0, 119, 2, 2, 77, 2, 2, 2, 2, 2, 2, 84, 120, 1, 2, 1, 2, 8, 2, 2, 2, 121, 122, 2, 2, 114, 16, 2, 123, 3, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 84, 2, 2, 2, 2, 2, 2, 2, 2, 84, 0, 2, 2, 2, 2, 124, 2, 125, 2, 2, 126, 2, 2, 2, 2, 2, 82, 2, 2, 2, 2, 2, 127, 0, 128, 2, 129, 2, 82, 2, 2, 130, 79, 2, 2, 131, 70, 2, 2, 132, 3, 2, 76, 133, 2, 2, 2, 134, 76, 135, 136, 2, 137, 2, 2, 2, 138, 2, 2, 2, 2, 2, 123, 139, 56, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 140, 2, 2, 71, 0, 141, 142, 143, 2, 2, 2, 144, 2, 2, 2, 105, 2, 145, 2, 146, 147, 71, 2, 148, 149, 2, 2, 2, 91, 1, 2, 2, 2, 2, 3, 150, 151, 152, 153, 154, 0, 2, 2, 2, 16, 155, 156, 2, 2, 157, 158, 105, 79, 0, 0, 0, 0, 70, 2, 106, 56, 2, 123, 83, 16, 159, 2, 160, 0, 2, 2, 2, 2, 79, 161, 0, 0, 2, 10, 2, 162, 0, 0, 0, 0, 2, 76, 84, 146, 0, 0, 0, 0, 163, 164, 165, 2, 3, 166, 0, 167, 168, 169, 0, 0, 2, 170, 145, 2, 171, 172, 173, 2, 2, 0, 2, 174, 2, 175, 110, 176, 177, 178, 0, 0, 2, 2, 179, 0, 2, 180, 2, 181, 0, 0, 0, 3, 0, 0, 0, 0, 2, 2, 182, 183, 2, 2, 184, 185, 2, 98, 123, 76, 2, 2, 140, 186, 187, 79, 0, 0, 188, 189, 2, 190, 21, 30, 191, 192, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 193, 0, 0, 0, 0, 0, 2, 110, 79, 0, 2, 2, 194, 0, 2, 82, 161, 0, 111, 88, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 195, 0, 0, 0, 0, 0, 0, 2, 74, 2, 2, 2, 2, 71, 0, 0, 0, 2, 2, 2, 196, 2, 2, 2, 2, 2, 2, 197, 0, 0, 0, 0, 0, 2, 198, 0, 0, 0, 0, 0, 0, 2, 2, 107, 0, 0, 0, 0, 0, 2, 74, 3, 199, 0, 0, 105, 200, 2, 2, 201, 202, 203, 0, 0, 0, 2, 2, 204, 3, 205, 0, 0, 0, 206, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 207, 208, 197, 0, 0, 2, 2, 2, 2, 2, 2, 2, 84, 2, 209, 2, 2, 2, 2, 2, 179, 2, 2, 210, 0, 0, 0, 0, 0, 2, 2, 76, 15, 0, 0, 0, 0, 2, 2, 98, 2, 12, 211, 212, 2, 213, 214, 215, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 216, 2, 2, 2, 2, 2, 2, 2, 2, 217, 2, 2, 2, 2, 2, 218, 219, 0, 0, 2, 2, 2, 2, 2, 2, 220, 0, 212, 221, 222, 223, 224, 225, 0, 226, 2, 88, 2, 2, 77, 227, 228, 84, 124, 114, 2, 88, 16, 0, 0, 229, 230, 16, 231, 0, 0, 0, 0, 0, 2, 2, 2, 119, 2, 212, 2, 2, 2, 2, 2, 2, 2, 2, 106, 232, 2, 2, 2, 77, 2, 2, 19, 0, 88, 2, 193, 2, 10, 233, 0, 0, 234, 0, 0, 0, 235, 0, 158, 0, 2, 2, 2, 2, 2, 2, 76, 0, 2, 19, 2, 2, 2, 2, 2, 2, 79, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 206, 0, 0, 79, 0, 0, 0, 0, 0, 0, 0, 236, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 203, 2, 2, 2, 2, 2, 2, 2, 79, }; static RE_UINT8 re_graph_stage_5[] = { 0, 0, 0, 0, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 127, 255, 255, 255, 252, 240, 215, 255, 255, 251, 255, 255, 255, 255, 255, 254, 255, 255, 255, 127, 254, 255, 230, 254, 255, 255, 0, 255, 255, 255, 7, 31, 0, 255, 255, 255, 223, 255, 191, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 255, 7, 255, 63, 255, 127, 255, 255, 255, 79, 255, 255, 31, 0, 248, 255, 255, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 255, 15, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 3, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 255, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 255, 7, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 255, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 63, 254, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 28, 0, 255, 255, 255, 135, 255, 255, 255, 15, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 255, 254, 255, 255, 255, 31, 254, 255, 255, 255, 255, 254, 255, 223, 255, 7, 191, 32, 255, 255, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 31, 255, 255, 255, 3, 255, 255, 63, 63, 254, 255, 255, 31, 255, 255, 255, 1, 255, 223, 31, 0, 255, 255, 127, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 255, 63, 255, 3, 255, 3, 255, 127, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 15, 255, 15, 241, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 199, 255, 255, 255, 207, 255, 255, 255, 159, 255, 255, 15, 240, 255, 255, 255, 248, 255, 227, 255, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 255, 223, 255, 207, 239, 255, 255, 220, 127, 0, 248, 255, 255, 255, 124, 255, 255, 223, 255, 243, 255, 255, 127, 255, 31, 0, 0, 255, 255, 255, 255, 1, 0, 127, 0, 0, 0, 255, 7, 0, 0, 255, 255, 207, 255, 255, 255, 63, 255, 255, 255, 255, 227, 255, 253, 3, 0, 0, 240, 0, 0, 255, 127, 255, 255, 255, 255, 15, 254, 255, 128, 1, 128, 127, 127, 127, 127, 7, 0, 0, 0, 255, 255, 255, 251, 0, 0, 255, 15, 224, 255, 255, 255, 255, 63, 254, 255, 15, 0, 255, 255, 255, 31, 255, 255, 127, 0, 255, 255, 255, 15, 0, 0, 255, 63, 255, 0, 0, 0, 128, 255, 255, 15, 255, 3, 31, 192, 255, 3, 255, 255, 15, 128, 255, 191, 255, 195, 255, 63, 255, 243, 7, 0, 0, 248, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 63, 255, 3, 127, 248, 255, 255, 255, 63, 255, 255, 127, 0, 248, 224, 255, 255, 127, 95, 219, 255, 255, 255, 3, 0, 248, 255, 255, 255, 252, 255, 255, 0, 0, 0, 0, 0, 255, 63, 255, 255, 247, 255, 127, 15, 223, 255, 252, 252, 252, 28, 127, 127, 0, 62, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 135, 255, 255, 255, 255, 255, 143, 255, 255, 31, 255, 15, 1, 0, 0, 0, 255, 255, 255, 191, 15, 255, 63, 0, 255, 3, 0, 0, 15, 128, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 191, 255, 128, 255, 0, 0, 255, 255, 55, 248, 255, 255, 255, 143, 255, 255, 255, 131, 255, 255, 255, 240, 111, 240, 239, 254, 255, 255, 15, 135, 255, 0, 255, 1, 127, 248, 127, 0, 255, 255, 63, 254, 255, 255, 7, 255, 255, 255, 3, 30, 0, 254, 0, 0, 255, 1, 0, 0, 255, 255, 7, 0, 255, 255, 7, 252, 255, 63, 252, 255, 255, 255, 0, 128, 3, 0, 255, 255, 255, 1, 255, 3, 254, 255, 31, 0, 255, 255, 251, 255, 127, 189, 255, 191, 255, 3, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 255, 0, 255, 3, 31, 0, 255, 3, 255, 255, 7, 128, 255, 127, 31, 0, 15, 0, 0, 0, 255, 127, 0, 0, 255, 195, 0, 0, 255, 63, 63, 0, 63, 0, 255, 251, 251, 255, 255, 224, 255, 255, 0, 0, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 243, 127, 254, 255, 255, 63, 0, 0, 0, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 255, 207, 255, 255, 255, 15, 0, 248, 254, 255, 0, 0, 159, 255, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 0, 0, 3, 0, 255, 127, 254, 255, 254, 255, 254, 255, 192, 255, 255, 255, 7, 0, 255, 255, 255, 1, 3, 0, 255, 31, 15, 0, 255, 63, 0, 0, 0, 0, 255, 1, 31, 0, 0, 0, 2, 0, 0, 0, }; /* Graph: 2334 bytes. */ RE_UINT32 re_get_graph(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_graph_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_graph_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_graph_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_graph_stage_4[pos + f] << 5; pos += code; value = (re_graph_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Print. */ static RE_UINT8 re_print_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 4, 8, 4, 8, }; static RE_UINT8 re_print_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 7, 7, 7, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 28, 29, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 30, 7, 31, 32, 7, 33, 13, 13, 13, 13, 13, 34, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 35, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 36, }; static RE_UINT8 re_print_stage_3[] = { 0, 1, 1, 2, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 15, 16, 1, 1, 17, 18, 19, 20, 21, 22, 23, 24, 1, 25, 26, 27, 1, 28, 29, 1, 1, 1, 1, 1, 1, 30, 31, 32, 33, 34, 35, 36, 37, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 48, 48, 48, 48, 48, 48, 48, 48, 1, 1, 49, 50, 1, 51, 52, 53, 54, 55, 56, 57, 58, 59, 1, 60, 61, 62, 63, 64, 65, 48, 66, 48, 67, 68, 69, 70, 71, 72, 73, 74, 75, 48, 76, 48, 48, 48, 48, 48, 1, 1, 1, 77, 78, 79, 48, 48, 1, 1, 1, 1, 80, 48, 48, 48, 48, 48, 48, 48, 1, 1, 81, 48, 1, 1, 82, 83, 48, 48, 48, 84, 85, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 86, 48, 48, 48, 87, 88, 89, 90, 91, 92, 93, 94, 1, 1, 95, 48, 48, 48, 48, 48, 96, 48, 48, 48, 48, 48, 97, 48, 98, 99, 100, 1, 1, 101, 102, 103, 104, 105, 48, 48, 48, 48, 48, 48, 1, 1, 1, 1, 1, 1, 106, 1, 1, 1, 1, 1, 1, 1, 1, 107, 108, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 109, 48, 1, 1, 110, 48, 48, 48, 48, 48, 111, 112, 48, 48, 48, 48, 48, 48, 1, 1, 1, 1, 1, 1, 1, 113, }; static RE_UINT8 re_print_stage_4[] = { 0, 1, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 4, 5, 1, 1, 1, 6, 7, 8, 9, 1, 10, 11, 12, 1, 1, 1, 1, 1, 1, 1, 13, 1, 14, 1, 1, 15, 1, 16, 1, 17, 18, 0, 0, 19, 0, 20, 1, 1, 1, 1, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 44, 48, 49, 50, 51, 52, 53, 54, 8, 55, 56, 0, 57, 58, 59, 0, 1, 1, 60, 61, 62, 12, 63, 0, 1, 1, 1, 1, 1, 1, 64, 1, 1, 1, 65, 1, 66, 67, 68, 1, 69, 1, 48, 70, 71, 1, 1, 72, 1, 1, 1, 1, 70, 1, 1, 73, 74, 75, 76, 77, 1, 1, 78, 79, 80, 1, 1, 81, 1, 82, 1, 83, 2, 84, 85, 86, 1, 87, 88, 1, 89, 1, 2, 90, 79, 17, 0, 0, 1, 1, 87, 70, 1, 1, 1, 91, 1, 92, 93, 1, 0, 0, 10, 94, 1, 1, 1, 1, 1, 1, 1, 95, 72, 1, 96, 78, 1, 97, 98, 99, 1, 100, 1, 101, 102, 2, 103, 104, 1, 1, 1, 1, 87, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 16, 1, 105, 106, 1, 1, 1, 1, 1, 1, 1, 1, 107, 108, 109, 110, 111, 1, 112, 2, 1, 1, 1, 1, 113, 1, 64, 1, 114, 75, 115, 115, 1, 1, 1, 116, 0, 117, 1, 1, 76, 1, 1, 1, 1, 1, 1, 83, 118, 1, 1, 8, 1, 7, 1, 1, 1, 119, 120, 1, 1, 112, 16, 1, 121, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 83, 1, 1, 1, 1, 1, 1, 1, 1, 83, 0, 1, 1, 1, 1, 122, 1, 123, 1, 1, 124, 1, 1, 1, 1, 1, 81, 1, 1, 1, 1, 1, 125, 0, 126, 1, 127, 1, 81, 1, 1, 128, 78, 1, 1, 129, 70, 1, 1, 130, 2, 1, 75, 131, 1, 1, 1, 132, 75, 133, 134, 1, 135, 1, 1, 1, 136, 1, 1, 1, 1, 1, 121, 137, 56, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 138, 1, 1, 71, 0, 139, 140, 141, 1, 1, 1, 142, 1, 1, 1, 103, 1, 143, 1, 144, 145, 71, 1, 146, 147, 1, 1, 1, 90, 8, 1, 1, 1, 1, 2, 148, 149, 150, 151, 152, 0, 1, 1, 1, 16, 153, 154, 1, 1, 155, 156, 103, 78, 0, 0, 0, 0, 70, 1, 104, 56, 1, 121, 82, 16, 157, 1, 158, 0, 1, 1, 1, 1, 78, 159, 0, 0, 1, 10, 1, 160, 0, 0, 0, 0, 1, 75, 83, 144, 0, 0, 0, 0, 161, 162, 163, 1, 2, 164, 0, 165, 166, 167, 0, 0, 1, 168, 143, 1, 169, 170, 171, 1, 1, 0, 1, 172, 1, 173, 108, 174, 175, 176, 0, 0, 1, 1, 177, 0, 1, 178, 1, 179, 0, 0, 0, 2, 0, 0, 0, 0, 1, 1, 180, 181, 1, 1, 182, 183, 1, 97, 121, 75, 1, 1, 138, 184, 185, 78, 0, 0, 186, 187, 1, 188, 21, 30, 189, 190, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 191, 0, 0, 0, 0, 0, 1, 108, 78, 0, 1, 1, 192, 0, 1, 81, 159, 0, 109, 87, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 193, 0, 0, 0, 0, 0, 0, 1, 73, 1, 1, 1, 1, 71, 0, 0, 0, 1, 1, 1, 194, 1, 1, 1, 1, 1, 1, 195, 0, 0, 0, 0, 0, 1, 196, 0, 0, 0, 0, 0, 0, 1, 1, 105, 0, 0, 0, 0, 0, 1, 73, 2, 197, 0, 0, 103, 198, 1, 1, 199, 200, 201, 0, 0, 0, 1, 1, 202, 2, 203, 0, 0, 0, 204, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 205, 206, 195, 0, 0, 1, 1, 1, 1, 1, 1, 1, 83, 1, 207, 1, 1, 1, 1, 1, 177, 1, 1, 208, 0, 0, 0, 0, 0, 1, 1, 75, 15, 0, 0, 0, 0, 1, 1, 97, 1, 12, 209, 210, 1, 211, 212, 213, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 214, 1, 1, 1, 1, 1, 1, 1, 1, 215, 1, 1, 1, 1, 1, 216, 217, 0, 0, 1, 1, 1, 1, 1, 1, 218, 0, 210, 219, 220, 221, 222, 223, 0, 224, 1, 87, 1, 1, 76, 225, 226, 83, 122, 112, 1, 87, 16, 0, 0, 227, 228, 16, 229, 0, 0, 0, 0, 0, 1, 1, 1, 117, 1, 210, 1, 1, 1, 1, 1, 1, 1, 1, 104, 230, 1, 1, 1, 76, 1, 1, 19, 0, 87, 1, 191, 1, 10, 231, 0, 0, 232, 0, 0, 0, 233, 0, 156, 0, 1, 1, 1, 1, 1, 1, 75, 0, 1, 19, 1, 1, 1, 1, 1, 1, 78, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 204, 0, 0, 78, 0, 0, 0, 0, 0, 0, 0, 234, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 201, 1, 1, 1, 1, 1, 1, 1, 78, }; static RE_UINT8 re_print_stage_5[] = { 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 127, 255, 255, 255, 252, 240, 215, 255, 255, 251, 255, 255, 255, 255, 255, 254, 255, 255, 255, 127, 254, 254, 255, 255, 255, 255, 230, 254, 255, 255, 0, 255, 255, 255, 7, 31, 0, 255, 255, 255, 223, 255, 191, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 255, 7, 255, 63, 255, 127, 255, 255, 255, 79, 255, 255, 31, 0, 248, 255, 255, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 255, 15, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 3, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 255, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 255, 7, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 255, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 63, 254, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 28, 0, 255, 255, 255, 135, 255, 255, 255, 15, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 255, 254, 255, 255, 255, 31, 254, 255, 255, 255, 255, 254, 255, 223, 255, 7, 191, 32, 255, 255, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 31, 255, 255, 255, 3, 255, 255, 63, 63, 255, 255, 255, 1, 255, 223, 31, 0, 255, 255, 127, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 255, 63, 255, 3, 255, 3, 255, 127, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 15, 255, 15, 241, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 199, 255, 255, 255, 207, 255, 255, 255, 159, 255, 255, 15, 240, 255, 255, 255, 248, 255, 227, 255, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 255, 223, 255, 207, 239, 255, 255, 220, 127, 255, 252, 255, 255, 223, 255, 243, 255, 255, 127, 255, 31, 0, 0, 255, 255, 255, 255, 1, 0, 127, 0, 0, 0, 255, 7, 0, 0, 255, 255, 207, 255, 255, 255, 63, 255, 255, 255, 255, 227, 255, 253, 3, 0, 0, 240, 0, 0, 255, 127, 255, 255, 255, 255, 15, 254, 255, 128, 1, 128, 127, 127, 127, 127, 7, 0, 0, 0, 255, 255, 255, 251, 0, 0, 255, 15, 224, 255, 255, 255, 255, 63, 254, 255, 15, 0, 255, 255, 255, 31, 255, 255, 127, 0, 255, 255, 255, 15, 0, 0, 255, 63, 255, 0, 0, 0, 128, 255, 255, 15, 255, 3, 31, 192, 255, 3, 255, 255, 15, 128, 255, 191, 255, 195, 255, 63, 255, 243, 7, 0, 0, 248, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 63, 255, 3, 127, 248, 255, 255, 255, 63, 255, 255, 127, 0, 248, 224, 255, 255, 127, 95, 219, 255, 255, 255, 3, 0, 248, 255, 255, 255, 252, 255, 255, 0, 0, 0, 0, 0, 255, 63, 255, 255, 247, 255, 127, 15, 223, 255, 252, 252, 252, 28, 127, 127, 0, 62, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 135, 255, 255, 255, 255, 255, 143, 255, 255, 31, 255, 15, 1, 0, 0, 0, 255, 255, 255, 191, 15, 255, 63, 0, 255, 3, 0, 0, 15, 128, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 191, 255, 128, 255, 0, 0, 255, 255, 55, 248, 255, 255, 255, 143, 255, 255, 255, 131, 255, 255, 255, 240, 111, 240, 239, 254, 255, 255, 15, 135, 255, 0, 255, 1, 127, 248, 127, 0, 255, 255, 63, 254, 255, 255, 7, 255, 255, 255, 3, 30, 0, 254, 0, 0, 255, 1, 0, 0, 255, 255, 7, 0, 255, 255, 7, 252, 255, 63, 252, 255, 255, 255, 0, 128, 3, 0, 255, 255, 255, 1, 255, 3, 254, 255, 31, 0, 255, 255, 251, 255, 127, 189, 255, 191, 255, 3, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 255, 0, 255, 3, 31, 0, 255, 3, 255, 255, 7, 128, 255, 127, 31, 0, 15, 0, 0, 0, 255, 127, 0, 0, 255, 195, 0, 0, 255, 63, 63, 0, 63, 0, 255, 251, 251, 255, 255, 224, 255, 255, 0, 0, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 243, 127, 254, 255, 255, 63, 0, 0, 0, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 255, 207, 255, 255, 255, 15, 0, 248, 254, 255, 0, 0, 159, 255, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 0, 0, 3, 0, 255, 127, 254, 255, 254, 255, 254, 255, 192, 255, 255, 255, 7, 0, 255, 255, 255, 1, 3, 0, 255, 31, 15, 0, 255, 63, 0, 0, 0, 0, 255, 1, 31, 0, 0, 0, 2, 0, 0, 0, }; /* Print: 2326 bytes. */ RE_UINT32 re_get_print(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_print_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_print_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_print_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_print_stage_4[pos + f] << 5; pos += code; value = (re_print_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Word. */ static RE_UINT8 re_word_stage_1[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, }; static RE_UINT8 re_word_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 26, 13, 27, 28, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 29, 7, 30, 31, 7, 32, 13, 13, 13, 13, 13, 33, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 34, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_word_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 31, 36, 37, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 48, 49, 1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 1, 59, 60, 61, 62, 63, 64, 31, 31, 31, 65, 66, 67, 68, 69, 70, 71, 72, 73, 31, 74, 31, 31, 31, 31, 31, 1, 1, 1, 75, 76, 77, 31, 31, 1, 1, 1, 1, 78, 31, 31, 31, 31, 31, 31, 31, 1, 1, 79, 31, 1, 1, 80, 81, 31, 31, 31, 82, 83, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 84, 31, 31, 31, 31, 85, 86, 31, 87, 88, 89, 90, 31, 31, 91, 31, 31, 31, 31, 31, 92, 31, 31, 31, 31, 31, 93, 31, 31, 94, 31, 31, 31, 31, 31, 31, 1, 1, 1, 1, 1, 1, 95, 1, 1, 1, 1, 1, 1, 1, 1, 96, 97, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 98, 31, 1, 1, 99, 31, 31, 31, 31, 31, 31, 100, 31, 31, 31, 31, 31, 31, }; static RE_UINT8 re_word_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 8, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 6, 6, 11, 6, 6, 6, 6, 13, 14, 15, 16, 17, 18, 19, 20, 6, 6, 21, 6, 6, 22, 23, 24, 6, 25, 6, 6, 26, 6, 27, 6, 28, 29, 0, 0, 30, 0, 31, 6, 6, 6, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 42, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 56, 60, 61, 62, 63, 64, 65, 66, 15, 67, 68, 0, 69, 70, 71, 0, 72, 73, 74, 75, 76, 77, 78, 0, 6, 6, 79, 6, 80, 6, 81, 82, 6, 6, 83, 6, 84, 85, 86, 6, 87, 6, 60, 0, 88, 6, 6, 89, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 90, 3, 6, 6, 91, 92, 30, 93, 94, 6, 6, 95, 96, 97, 6, 6, 98, 6, 99, 6, 100, 101, 102, 103, 104, 6, 105, 106, 0, 29, 6, 101, 107, 106, 108, 0, 0, 6, 6, 109, 110, 6, 6, 6, 93, 6, 98, 111, 80, 0, 0, 112, 113, 6, 6, 6, 6, 6, 6, 6, 114, 89, 6, 115, 80, 6, 116, 117, 118, 119, 120, 121, 122, 123, 0, 24, 124, 125, 126, 127, 6, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 129, 6, 96, 6, 130, 101, 6, 6, 6, 6, 131, 6, 81, 6, 132, 133, 134, 134, 6, 0, 135, 0, 0, 0, 0, 0, 0, 136, 137, 15, 6, 138, 15, 6, 82, 139, 140, 6, 6, 141, 67, 0, 24, 6, 6, 6, 6, 6, 100, 0, 0, 6, 6, 6, 6, 6, 6, 100, 0, 6, 6, 6, 6, 142, 0, 24, 80, 143, 144, 6, 145, 6, 6, 6, 26, 146, 147, 6, 6, 148, 149, 0, 146, 6, 150, 6, 93, 6, 6, 151, 152, 6, 153, 93, 77, 6, 6, 154, 101, 6, 133, 155, 156, 6, 6, 157, 158, 159, 160, 82, 161, 6, 6, 6, 162, 6, 6, 6, 6, 6, 163, 164, 29, 6, 6, 6, 153, 6, 6, 165, 0, 166, 167, 168, 6, 6, 26, 169, 6, 6, 80, 24, 6, 170, 6, 150, 171, 88, 172, 173, 174, 6, 6, 6, 77, 1, 2, 3, 103, 6, 101, 175, 0, 176, 177, 178, 0, 6, 6, 6, 67, 0, 0, 6, 30, 0, 0, 0, 179, 0, 0, 0, 0, 77, 6, 124, 180, 6, 24, 99, 67, 80, 6, 181, 0, 6, 6, 6, 6, 80, 96, 0, 0, 6, 182, 6, 183, 0, 0, 0, 0, 6, 133, 100, 150, 0, 0, 0, 0, 184, 185, 100, 133, 101, 0, 0, 186, 100, 165, 0, 0, 6, 187, 0, 0, 188, 189, 0, 77, 77, 0, 74, 190, 6, 100, 100, 191, 26, 0, 0, 0, 6, 6, 128, 0, 6, 191, 6, 191, 6, 6, 190, 192, 6, 67, 24, 193, 6, 194, 24, 195, 6, 6, 196, 0, 197, 98, 0, 0, 198, 199, 6, 200, 33, 42, 201, 202, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 203, 0, 0, 0, 0, 0, 6, 204, 205, 0, 6, 6, 206, 0, 6, 98, 96, 0, 207, 109, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 208, 0, 0, 0, 0, 0, 0, 6, 209, 6, 6, 6, 6, 165, 0, 0, 0, 6, 6, 6, 141, 6, 6, 6, 6, 6, 6, 183, 0, 0, 0, 0, 0, 6, 141, 0, 0, 0, 0, 0, 0, 6, 6, 190, 0, 0, 0, 0, 0, 6, 209, 101, 96, 0, 0, 24, 104, 6, 133, 210, 211, 88, 0, 0, 0, 6, 6, 212, 101, 213, 0, 0, 0, 214, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 215, 216, 0, 0, 0, 0, 0, 0, 217, 218, 219, 0, 0, 0, 0, 220, 0, 0, 0, 0, 0, 6, 6, 194, 6, 221, 222, 223, 6, 224, 225, 226, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 227, 228, 82, 194, 194, 130, 130, 229, 229, 230, 6, 6, 231, 6, 232, 233, 234, 0, 0, 6, 6, 6, 6, 6, 6, 235, 0, 223, 236, 237, 238, 239, 240, 0, 0, 0, 24, 79, 79, 96, 0, 0, 0, 6, 6, 6, 6, 6, 6, 133, 0, 6, 30, 6, 6, 6, 6, 6, 6, 80, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 214, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 88, }; static RE_UINT8 re_word_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 135, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 255, 255, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 254, 255, 255, 255, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 195, 255, 255, 255, 255, 239, 159, 255, 253, 255, 159, 0, 0, 255, 255, 255, 231, 255, 255, 255, 255, 3, 0, 255, 255, 63, 4, 255, 63, 0, 0, 255, 255, 255, 15, 255, 255, 31, 0, 248, 255, 255, 255, 207, 255, 254, 255, 239, 159, 249, 255, 255, 253, 197, 243, 159, 121, 128, 176, 207, 255, 3, 0, 238, 135, 249, 255, 255, 253, 109, 211, 135, 57, 2, 94, 192, 255, 63, 0, 238, 191, 251, 255, 255, 253, 237, 243, 191, 59, 1, 0, 207, 255, 0, 2, 238, 159, 249, 255, 159, 57, 192, 176, 207, 255, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 61, 129, 0, 192, 255, 0, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 61, 96, 7, 207, 255, 0, 0, 238, 223, 253, 255, 255, 253, 239, 243, 223, 61, 96, 64, 207, 255, 6, 0, 255, 255, 255, 231, 223, 125, 128, 128, 207, 255, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 132, 95, 255, 192, 255, 12, 0, 255, 255, 255, 7, 255, 127, 255, 3, 150, 37, 240, 254, 174, 236, 255, 59, 95, 63, 255, 243, 1, 0, 0, 3, 255, 3, 160, 194, 255, 254, 255, 255, 255, 31, 254, 255, 223, 255, 255, 254, 255, 255, 255, 31, 64, 0, 0, 0, 255, 3, 255, 255, 255, 255, 255, 63, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 31, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 143, 48, 255, 3, 0, 0, 0, 56, 255, 3, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 15, 192, 255, 255, 255, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 255, 3, 255, 255, 255, 159, 128, 0, 255, 127, 255, 15, 255, 3, 0, 248, 15, 0, 255, 227, 255, 255, 0, 0, 247, 255, 255, 255, 127, 3, 255, 255, 63, 240, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 48, 0, 0, 0, 0, 0, 128, 1, 0, 16, 0, 0, 0, 2, 128, 0, 0, 255, 31, 255, 255, 1, 0, 132, 252, 47, 62, 80, 189, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 0, 0, 192, 255, 255, 127, 255, 255, 31, 248, 15, 0, 255, 128, 0, 128, 255, 255, 127, 0, 127, 127, 127, 127, 0, 128, 0, 0, 224, 0, 0, 0, 254, 255, 62, 31, 255, 255, 127, 230, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 0, 0, 255, 31, 255, 255, 255, 15, 0, 0, 255, 255, 247, 191, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 255, 0, 0, 0, 31, 0, 255, 3, 255, 255, 255, 40, 255, 63, 255, 255, 1, 128, 255, 3, 255, 63, 255, 3, 255, 255, 127, 252, 7, 0, 0, 56, 255, 255, 124, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 55, 255, 3, 15, 0, 255, 255, 127, 248, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 255, 255, 24, 0, 0, 224, 0, 0, 0, 0, 223, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 0, 0, 0, 32, 1, 0, 0, 0, 15, 255, 62, 0, 255, 0, 255, 255, 15, 0, 0, 0, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 255, 255, 15, 135, 127, 0, 0, 0, 255, 255, 7, 0, 192, 255, 0, 128, 255, 1, 255, 3, 255, 255, 223, 255, 255, 255, 79, 0, 31, 28, 255, 23, 255, 255, 251, 255, 127, 189, 255, 191, 255, 1, 255, 255, 255, 7, 255, 3, 159, 57, 129, 224, 207, 31, 31, 0, 191, 0, 255, 3, 255, 255, 63, 255, 1, 0, 0, 63, 17, 0, 255, 3, 255, 255, 255, 227, 255, 3, 0, 128, 255, 255, 255, 1, 15, 0, 255, 3, 248, 255, 255, 224, 31, 0, 255, 255, 0, 128, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 99, 224, 227, 7, 248, 231, 15, 0, 0, 0, 60, 0, 0, 28, 0, 0, 0, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 207, 255, 255, 255, 255, 127, 248, 255, 31, 32, 0, 16, 0, 0, 248, 254, 255, 0, 0, 31, 0, 127, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, }; /* Word: 2214 bytes. */ RE_UINT32 re_get_word(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 15; code = ch ^ (f << 15); pos = (RE_UINT32)re_word_stage_1[f] << 4; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_word_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_word_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_word_stage_4[pos + f] << 5; pos += code; value = (re_word_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* XDigit. */ static RE_UINT8 re_xdigit_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_xdigit_stage_2[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 4, 5, 6, 2, 2, 2, 2, 7, 2, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_xdigit_stage_3[] = { 0, 1, 1, 1, 1, 1, 2, 3, 1, 4, 4, 4, 4, 4, 5, 6, 7, 1, 1, 1, 1, 1, 1, 8, 9, 10, 11, 12, 13, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 14, 15, 16, 17, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 18, 1, 1, 1, 1, 19, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 20, 21, 17, 1, 14, 1, 22, 23, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 24, 16, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 25, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_xdigit_stage_4[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 0, 2, 2, 2, 4, 2, 5, 2, 5, 2, 6, 2, 6, 3, 2, 2, 2, 2, 4, 6, 2, 2, 2, 2, 3, 6, 2, 2, 2, 2, 7, 2, 6, 2, 2, 8, 2, 2, 6, 0, 2, 2, 8, 2, 2, 2, 2, 2, 6, 4, 2, 2, 9, 2, 6, 2, 2, 2, 2, 2, 0, 10, 11, 2, 2, 2, 2, 3, 2, 2, 5, 2, 0, 12, 2, 2, 6, 2, 6, 2, 4, 0, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 13, }; static RE_UINT8 re_xdigit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 3, 0, 0, 255, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 255, 0, 0, 0, 0, 255, 3, 0, 0, 0, 0, 192, 255, 0, 0, 0, 0, 0, 0, 255, 3, 255, 3, 0, 0, 0, 0, 0, 0, 255, 3, 0, 0, 255, 3, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 255, 0, 192, 255, 255, 255, 255, 255, 255, }; /* XDigit: 425 bytes. */ RE_UINT32 re_get_xdigit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_xdigit_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_xdigit_stage_2[pos + f] << 4; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_xdigit_stage_3[pos + f] << 2; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_xdigit_stage_4[pos + f] << 6; pos += code; value = (re_xdigit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Posix_Digit. */ static RE_UINT8 re_posix_digit_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_digit_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_digit_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_digit_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_digit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 0, 0, 0, 0, 0, 0, 0, 0, }; /* Posix_Digit: 97 bytes. */ RE_UINT32 re_get_posix_digit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_posix_digit_stage_1[f] << 4; f = code >> 12; code ^= f << 12; pos = (RE_UINT32)re_posix_digit_stage_2[pos + f] << 3; f = code >> 9; code ^= f << 9; pos = (RE_UINT32)re_posix_digit_stage_3[pos + f] << 3; f = code >> 6; code ^= f << 6; pos = (RE_UINT32)re_posix_digit_stage_4[pos + f] << 6; pos += code; value = (re_posix_digit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Posix_AlNum. */ static RE_UINT8 re_posix_alnum_stage_1[] = { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, }; static RE_UINT8 re_posix_alnum_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 11, 7, 7, 7, 7, 12, 13, 13, 13, 13, 14, 15, 16, 17, 18, 19, 13, 20, 13, 21, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13, 13, 23, 24, 13, 13, 25, 13, 13, 26, 27, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 28, 7, 29, 30, 7, 31, 13, 13, 13, 13, 13, 32, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, }; static RE_UINT8 re_posix_alnum_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 1, 28, 29, 30, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 31, 36, 37, 31, 31, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 38, 1, 1, 1, 1, 1, 1, 1, 1, 1, 39, 1, 1, 1, 1, 40, 1, 41, 42, 43, 44, 45, 46, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 1, 48, 49, 1, 50, 51, 52, 53, 54, 55, 56, 57, 58, 1, 59, 60, 61, 62, 63, 64, 31, 31, 31, 65, 66, 67, 68, 69, 70, 71, 72, 73, 31, 74, 31, 31, 31, 31, 31, 1, 1, 1, 75, 76, 77, 31, 31, 1, 1, 1, 1, 78, 31, 31, 31, 31, 31, 31, 31, 1, 1, 79, 31, 1, 1, 80, 81, 31, 31, 31, 82, 83, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 84, 31, 31, 31, 31, 31, 31, 31, 85, 86, 87, 88, 89, 31, 31, 31, 31, 31, 90, 31, 31, 91, 31, 31, 31, 31, 31, 31, 1, 1, 1, 1, 1, 1, 92, 1, 1, 1, 1, 1, 1, 1, 1, 93, 94, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 95, 31, 1, 1, 96, 31, 31, 31, 31, 31, }; static RE_UINT8 re_posix_alnum_stage_4[] = { 0, 1, 2, 2, 0, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 0, 0, 8, 9, 10, 11, 5, 12, 5, 5, 5, 5, 13, 5, 5, 5, 5, 14, 15, 16, 17, 18, 19, 20, 21, 5, 22, 23, 5, 5, 24, 25, 26, 5, 27, 5, 5, 28, 29, 30, 31, 32, 33, 0, 0, 34, 0, 35, 5, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 48, 52, 53, 54, 55, 56, 0, 57, 58, 59, 60, 61, 62, 63, 64, 61, 65, 66, 67, 68, 69, 70, 71, 16, 72, 73, 0, 74, 75, 76, 0, 77, 0, 78, 79, 80, 81, 0, 0, 5, 82, 26, 83, 84, 5, 85, 86, 5, 5, 87, 5, 88, 89, 90, 5, 91, 5, 92, 0, 93, 5, 5, 94, 16, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 95, 2, 5, 5, 96, 97, 98, 98, 99, 5, 100, 101, 0, 0, 5, 5, 102, 5, 103, 5, 104, 105, 106, 26, 107, 5, 108, 109, 0, 110, 5, 105, 111, 0, 112, 0, 0, 5, 113, 114, 0, 5, 115, 5, 116, 5, 104, 117, 118, 0, 0, 0, 119, 5, 5, 5, 5, 5, 5, 0, 120, 94, 5, 121, 118, 5, 122, 123, 124, 0, 0, 0, 125, 126, 0, 0, 0, 127, 128, 129, 5, 130, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 131, 5, 109, 5, 132, 105, 5, 5, 5, 5, 133, 5, 85, 5, 134, 135, 136, 136, 5, 0, 137, 0, 0, 0, 0, 0, 0, 138, 139, 16, 5, 140, 16, 5, 86, 141, 142, 5, 5, 143, 72, 0, 26, 5, 5, 5, 5, 5, 104, 0, 0, 5, 5, 5, 5, 5, 5, 104, 0, 5, 5, 5, 5, 32, 0, 26, 118, 144, 145, 5, 146, 5, 5, 5, 93, 147, 148, 5, 5, 149, 150, 0, 147, 151, 17, 5, 98, 5, 5, 60, 152, 29, 103, 153, 81, 5, 154, 137, 155, 5, 135, 156, 157, 5, 105, 158, 159, 160, 161, 86, 162, 5, 5, 5, 163, 5, 5, 5, 5, 5, 164, 165, 110, 5, 5, 5, 166, 5, 5, 167, 0, 168, 169, 170, 5, 5, 28, 171, 5, 5, 118, 26, 5, 172, 5, 17, 173, 0, 0, 0, 174, 5, 5, 5, 81, 0, 2, 2, 175, 5, 105, 176, 0, 177, 178, 179, 0, 5, 5, 5, 72, 0, 0, 5, 34, 0, 0, 0, 0, 0, 0, 0, 0, 81, 5, 180, 0, 5, 26, 103, 72, 118, 5, 181, 0, 5, 5, 5, 5, 118, 0, 0, 0, 5, 182, 5, 60, 0, 0, 0, 0, 5, 135, 104, 17, 0, 0, 0, 0, 183, 184, 104, 135, 105, 0, 0, 185, 104, 167, 0, 0, 5, 186, 0, 0, 187, 98, 0, 81, 81, 0, 78, 188, 5, 104, 104, 153, 28, 0, 0, 0, 5, 5, 130, 0, 5, 153, 5, 153, 5, 5, 189, 0, 148, 33, 26, 130, 5, 153, 26, 190, 5, 5, 191, 0, 192, 193, 0, 0, 194, 195, 5, 130, 39, 48, 196, 60, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 197, 0, 0, 0, 0, 0, 5, 198, 199, 0, 5, 105, 200, 0, 5, 104, 0, 0, 201, 163, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 5, 202, 0, 0, 0, 0, 0, 0, 5, 33, 5, 5, 5, 5, 167, 0, 0, 0, 5, 5, 5, 143, 5, 5, 5, 5, 5, 5, 60, 0, 0, 0, 0, 0, 5, 143, 0, 0, 0, 0, 0, 0, 5, 5, 203, 0, 0, 0, 0, 0, 5, 33, 105, 0, 0, 0, 26, 156, 5, 135, 60, 204, 93, 0, 0, 0, 5, 5, 205, 105, 171, 0, 0, 0, 206, 0, 0, 0, 0, 0, 0, 0, 5, 5, 5, 207, 208, 0, 0, 0, 5, 5, 209, 5, 210, 211, 212, 5, 213, 214, 215, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 216, 217, 86, 209, 209, 132, 132, 218, 218, 219, 0, 5, 5, 5, 5, 5, 5, 188, 0, 212, 220, 221, 222, 223, 224, 0, 0, 0, 26, 225, 225, 109, 0, 0, 0, 5, 5, 5, 5, 5, 5, 135, 0, 5, 34, 5, 5, 5, 5, 5, 5, 118, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 206, 0, 0, 118, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_posix_alnum_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 254, 255, 255, 7, 0, 4, 32, 4, 255, 255, 127, 255, 255, 255, 255, 255, 195, 255, 3, 0, 31, 80, 0, 0, 32, 0, 0, 0, 0, 0, 223, 188, 64, 215, 255, 255, 251, 255, 255, 255, 255, 255, 191, 255, 3, 252, 255, 255, 255, 255, 254, 255, 255, 255, 127, 2, 254, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 191, 182, 0, 255, 255, 255, 7, 7, 0, 0, 0, 255, 7, 255, 255, 255, 254, 0, 192, 255, 255, 255, 255, 239, 31, 254, 225, 0, 156, 0, 0, 255, 255, 0, 224, 255, 255, 255, 255, 3, 0, 0, 252, 255, 255, 255, 7, 48, 4, 255, 255, 255, 252, 255, 31, 0, 0, 255, 255, 255, 1, 255, 255, 31, 0, 248, 3, 255, 255, 255, 255, 255, 239, 255, 223, 225, 255, 15, 0, 254, 255, 239, 159, 249, 255, 255, 253, 197, 227, 159, 89, 128, 176, 15, 0, 3, 0, 238, 135, 249, 255, 255, 253, 109, 195, 135, 25, 2, 94, 0, 0, 63, 0, 238, 191, 251, 255, 255, 253, 237, 227, 191, 27, 1, 0, 15, 0, 0, 2, 238, 159, 249, 255, 159, 25, 192, 176, 15, 0, 2, 0, 236, 199, 61, 214, 24, 199, 255, 195, 199, 29, 129, 0, 239, 223, 253, 255, 255, 253, 255, 227, 223, 29, 96, 7, 15, 0, 0, 0, 238, 223, 253, 255, 255, 253, 239, 227, 223, 29, 96, 64, 15, 0, 6, 0, 255, 255, 255, 231, 223, 93, 128, 128, 15, 0, 0, 252, 236, 255, 127, 252, 255, 255, 251, 47, 127, 128, 95, 255, 0, 0, 12, 0, 255, 255, 255, 7, 127, 32, 0, 0, 150, 37, 240, 254, 174, 236, 255, 59, 95, 32, 0, 240, 1, 0, 0, 0, 255, 254, 255, 255, 255, 31, 254, 255, 3, 255, 255, 254, 255, 255, 255, 31, 255, 255, 127, 249, 231, 193, 255, 255, 127, 64, 0, 48, 191, 32, 255, 255, 255, 255, 255, 247, 255, 61, 127, 61, 255, 61, 255, 255, 255, 255, 61, 127, 61, 255, 127, 255, 255, 255, 61, 255, 255, 255, 255, 135, 255, 255, 0, 0, 255, 255, 63, 63, 255, 159, 255, 255, 255, 199, 255, 1, 255, 223, 15, 0, 255, 255, 15, 0, 255, 223, 13, 0, 255, 255, 207, 255, 255, 1, 128, 16, 255, 255, 255, 0, 255, 7, 255, 255, 255, 255, 63, 0, 255, 255, 255, 127, 255, 15, 255, 1, 255, 63, 31, 0, 255, 15, 255, 255, 255, 3, 0, 0, 255, 255, 255, 15, 254, 255, 31, 0, 128, 0, 0, 0, 255, 255, 239, 255, 239, 15, 0, 0, 255, 243, 0, 252, 191, 255, 3, 0, 0, 224, 0, 252, 255, 255, 255, 63, 0, 222, 111, 0, 128, 255, 31, 0, 63, 63, 255, 170, 255, 255, 223, 95, 220, 31, 207, 15, 255, 31, 220, 31, 0, 0, 2, 128, 0, 0, 255, 31, 132, 252, 47, 62, 80, 189, 255, 243, 224, 67, 0, 0, 255, 1, 0, 0, 0, 0, 192, 255, 255, 127, 255, 255, 31, 120, 12, 0, 255, 128, 0, 0, 255, 255, 127, 0, 127, 127, 127, 127, 0, 128, 0, 0, 224, 0, 0, 0, 254, 3, 62, 31, 255, 255, 127, 224, 224, 255, 255, 255, 255, 63, 254, 255, 255, 127, 0, 0, 255, 31, 255, 255, 0, 12, 0, 0, 255, 127, 240, 143, 0, 0, 128, 255, 252, 255, 255, 255, 255, 249, 255, 255, 255, 63, 255, 0, 187, 247, 255, 255, 0, 0, 252, 40, 255, 255, 7, 0, 255, 255, 247, 255, 223, 255, 0, 124, 255, 63, 0, 0, 255, 255, 127, 196, 5, 0, 0, 56, 255, 255, 60, 0, 126, 126, 126, 0, 127, 127, 255, 255, 63, 0, 255, 255, 255, 7, 0, 0, 15, 0, 255, 255, 127, 248, 255, 255, 255, 63, 255, 255, 255, 255, 255, 3, 127, 0, 248, 224, 255, 253, 127, 95, 219, 255, 255, 255, 0, 0, 248, 255, 255, 255, 252, 255, 0, 0, 255, 15, 0, 0, 223, 255, 192, 255, 255, 255, 252, 252, 252, 28, 255, 239, 255, 255, 127, 255, 255, 183, 255, 63, 255, 63, 255, 255, 1, 0, 15, 255, 62, 0, 255, 0, 255, 255, 63, 253, 255, 255, 255, 255, 191, 145, 255, 255, 55, 0, 255, 255, 255, 192, 111, 240, 239, 254, 31, 0, 0, 0, 63, 0, 0, 0, 255, 255, 71, 0, 30, 0, 0, 20, 255, 255, 251, 255, 255, 255, 159, 0, 127, 189, 255, 191, 255, 1, 255, 255, 159, 25, 129, 224, 179, 0, 0, 0, 255, 255, 63, 127, 0, 0, 0, 63, 17, 0, 0, 0, 255, 255, 255, 227, 0, 0, 0, 128, 127, 0, 0, 0, 248, 255, 255, 224, 31, 0, 255, 255, 3, 0, 0, 0, 255, 7, 255, 31, 255, 1, 255, 67, 255, 255, 223, 255, 255, 255, 255, 223, 100, 222, 255, 235, 239, 255, 255, 255, 191, 231, 223, 223, 255, 255, 255, 123, 95, 252, 253, 255, 63, 255, 255, 255, 253, 255, 255, 247, 255, 253, 255, 255, 247, 15, 0, 0, 150, 254, 247, 10, 132, 234, 150, 170, 150, 247, 247, 94, 255, 251, 255, 15, 238, 251, 255, 15, 255, 3, 255, 255, }; /* Posix_AlNum: 2089 bytes. */ RE_UINT32 re_get_posix_alnum(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_posix_alnum_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_posix_alnum_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_posix_alnum_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_posix_alnum_stage_4[pos + f] << 5; pos += code; value = (re_posix_alnum_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Posix_Punct. */ static RE_UINT8 re_posix_punct_stage_1[] = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_posix_punct_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 7, 7, 7, 7, 7, 7, 7, 7, 7, 11, 12, 13, 14, 7, 15, 7, 7, 7, 7, 7, 7, 7, 7, 16, 7, 7, 7, 7, 7, 7, 7, 7, 7, 17, 7, 7, 18, 19, 7, 20, 21, 22, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, }; static RE_UINT8 re_posix_punct_stage_3[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 1, 17, 18, 1, 19, 20, 21, 22, 23, 24, 25, 1, 1, 26, 27, 28, 29, 30, 31, 29, 29, 32, 29, 29, 29, 33, 34, 35, 36, 37, 38, 39, 40, 29, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 41, 1, 1, 1, 1, 1, 1, 42, 1, 43, 44, 45, 46, 47, 48, 1, 1, 1, 1, 1, 1, 1, 49, 1, 50, 51, 52, 1, 53, 1, 54, 1, 55, 1, 1, 56, 57, 58, 59, 1, 1, 1, 1, 60, 61, 62, 1, 63, 64, 65, 66, 1, 1, 1, 1, 67, 1, 1, 1, 1, 1, 68, 69, 1, 1, 1, 1, 1, 1, 1, 1, 70, 1, 1, 1, 71, 72, 73, 74, 1, 1, 75, 76, 29, 29, 77, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 10, 1, 78, 79, 80, 29, 29, 81, 82, 83, 84, 85, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_punct_stage_4[] = { 0, 1, 2, 3, 0, 4, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 0, 0, 0, 8, 9, 0, 0, 10, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 12, 0, 13, 14, 15, 16, 17, 0, 0, 18, 0, 0, 19, 20, 21, 0, 0, 0, 0, 0, 0, 22, 0, 23, 14, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 25, 0, 0, 0, 0, 0, 0, 0, 26, 0, 0, 0, 27, 0, 0, 0, 28, 0, 0, 0, 29, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 31, 0, 29, 32, 0, 0, 0, 0, 0, 33, 34, 0, 0, 35, 36, 37, 0, 0, 0, 38, 0, 36, 0, 0, 39, 0, 0, 0, 40, 41, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 43, 44, 0, 0, 45, 0, 46, 0, 0, 0, 0, 47, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0, 0, 0, 36, 50, 36, 0, 0, 0, 0, 51, 0, 0, 0, 0, 12, 52, 0, 0, 0, 53, 0, 54, 0, 36, 0, 0, 55, 0, 0, 0, 0, 0, 0, 56, 57, 58, 59, 60, 61, 62, 63, 61, 0, 0, 64, 65, 66, 0, 67, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 68, 50, 69, 48, 0, 53, 70, 0, 0, 50, 50, 50, 70, 71, 50, 50, 50, 50, 50, 50, 72, 73, 74, 75, 76, 0, 0, 0, 0, 0, 0, 0, 77, 0, 0, 0, 27, 0, 0, 0, 0, 50, 78, 79, 0, 80, 50, 50, 81, 50, 50, 50, 50, 50, 50, 70, 82, 83, 84, 0, 0, 44, 42, 0, 39, 0, 0, 0, 0, 85, 0, 50, 86, 61, 87, 88, 50, 87, 89, 50, 61, 0, 0, 0, 0, 0, 0, 50, 50, 0, 0, 0, 0, 59, 50, 69, 36, 90, 0, 0, 91, 0, 0, 0, 92, 93, 94, 0, 0, 95, 0, 0, 0, 0, 96, 0, 97, 0, 0, 98, 99, 0, 98, 29, 0, 0, 0, 100, 0, 0, 0, 53, 101, 0, 0, 36, 26, 0, 0, 39, 0, 0, 0, 0, 102, 0, 103, 0, 0, 0, 104, 94, 0, 0, 36, 0, 0, 0, 0, 0, 105, 41, 59, 106, 107, 0, 0, 0, 0, 1, 2, 2, 108, 0, 0, 0, 109, 79, 110, 0, 111, 112, 42, 59, 113, 0, 0, 0, 0, 29, 0, 27, 0, 0, 0, 0, 114, 0, 0, 0, 0, 0, 0, 5, 115, 0, 0, 0, 0, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 116, 29, 0, 0, 117, 118, 0, 111, 0, 0, 119, 0, 0, 0, 0, 0, 120, 0, 0, 121, 94, 0, 0, 0, 86, 122, 0, 0, 123, 0, 0, 124, 0, 0, 0, 103, 0, 0, 0, 0, 0, 0, 0, 0, 125, 0, 0, 0, 0, 0, 0, 0, 126, 0, 0, 0, 127, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0, 0, 98, 0, 0, 0, 129, 0, 110, 130, 0, 0, 0, 0, 0, 0, 0, 0, 0, 131, 0, 0, 0, 50, 50, 50, 50, 50, 50, 50, 70, 50, 132, 50, 133, 134, 135, 50, 40, 50, 50, 136, 0, 0, 0, 0, 0, 50, 50, 93, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 137, 39, 129, 129, 114, 114, 103, 103, 138, 0, 0, 139, 0, 140, 141, 0, 0, 0, 50, 142, 50, 50, 81, 143, 144, 70, 59, 145, 38, 146, 147, 0, 0, 148, 149, 68, 150, 0, 0, 0, 0, 0, 50, 50, 50, 80, 50, 151, 50, 50, 50, 50, 50, 50, 50, 50, 89, 152, 50, 50, 50, 81, 50, 50, 153, 0, 142, 50, 154, 50, 60, 21, 0, 0, 116, 0, 0, 0, 155, 0, 42, 0, }; static RE_UINT8 re_posix_punct_stage_5[] = { 0, 0, 0, 0, 254, 255, 0, 252, 1, 0, 0, 248, 1, 0, 0, 120, 254, 219, 211, 137, 0, 0, 128, 0, 60, 0, 252, 255, 224, 175, 255, 255, 0, 0, 32, 64, 176, 0, 0, 0, 0, 0, 64, 0, 4, 0, 0, 0, 0, 0, 0, 252, 0, 230, 0, 0, 0, 0, 0, 64, 73, 0, 0, 0, 0, 0, 24, 0, 192, 255, 0, 200, 0, 60, 0, 0, 0, 0, 16, 64, 0, 2, 0, 96, 255, 63, 0, 0, 0, 0, 192, 3, 0, 0, 255, 127, 48, 0, 1, 0, 0, 0, 12, 12, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 248, 7, 0, 0, 0, 128, 0, 0, 0, 2, 0, 0, 16, 0, 0, 128, 0, 12, 254, 255, 255, 252, 0, 0, 80, 61, 32, 0, 0, 0, 0, 0, 0, 192, 191, 223, 255, 7, 0, 252, 0, 0, 0, 0, 0, 8, 255, 1, 0, 0, 0, 0, 255, 3, 1, 0, 0, 0, 0, 96, 0, 0, 0, 0, 0, 24, 0, 56, 0, 0, 0, 0, 96, 0, 0, 0, 112, 15, 255, 7, 0, 0, 49, 0, 0, 0, 255, 255, 255, 255, 127, 63, 0, 0, 255, 7, 240, 31, 0, 0, 0, 240, 0, 0, 0, 248, 255, 0, 8, 0, 0, 0, 0, 160, 3, 224, 0, 224, 0, 224, 0, 96, 0, 0, 255, 255, 255, 0, 255, 255, 255, 255, 255, 127, 0, 0, 0, 124, 0, 124, 0, 0, 123, 3, 208, 193, 175, 66, 0, 12, 31, 188, 0, 0, 0, 12, 255, 255, 255, 255, 255, 7, 127, 0, 0, 0, 255, 255, 63, 0, 0, 0, 240, 255, 255, 255, 207, 255, 255, 255, 63, 255, 255, 255, 255, 227, 255, 253, 3, 0, 0, 240, 0, 0, 224, 7, 0, 222, 255, 127, 255, 255, 7, 0, 0, 0, 255, 255, 255, 251, 255, 255, 15, 0, 0, 0, 255, 15, 30, 255, 255, 255, 1, 0, 193, 224, 0, 0, 195, 255, 15, 0, 0, 0, 0, 252, 255, 255, 255, 0, 1, 0, 255, 255, 1, 0, 0, 224, 0, 0, 0, 0, 8, 64, 0, 0, 252, 0, 255, 255, 127, 0, 3, 0, 0, 0, 0, 6, 0, 0, 0, 15, 192, 3, 0, 0, 240, 0, 0, 192, 0, 0, 0, 0, 0, 23, 254, 63, 0, 192, 0, 0, 128, 3, 0, 8, 0, 0, 0, 2, 0, 0, 0, 0, 252, 255, 0, 0, 0, 48, 255, 255, 247, 255, 127, 15, 0, 0, 63, 0, 0, 0, 127, 127, 0, 48, 0, 0, 128, 255, 0, 0, 0, 254, 255, 19, 255, 15, 255, 255, 255, 31, 0, 128, 0, 0, 0, 0, 128, 1, 0, 0, 255, 1, 0, 1, 0, 0, 0, 0, 127, 0, 0, 0, 0, 30, 128, 63, 0, 0, 0, 0, 0, 216, 0, 0, 48, 0, 224, 35, 0, 232, 0, 0, 0, 63, 64, 0, 0, 0, 254, 255, 255, 0, 14, 0, 0, 0, 0, 0, 31, 0, 0, 0, 32, 0, 48, 0, 0, 0, 0, 0, 0, 144, 127, 254, 255, 255, 31, 28, 0, 0, 24, 240, 255, 255, 255, 195, 255, 255, 35, 0, 0, 0, 2, 0, 0, 8, 8, 0, 0, 0, 0, 0, 128, 7, 0, 224, 223, 255, 239, 15, 0, 0, 255, 15, 255, 255, 255, 127, 254, 255, 254, 255, 254, 255, 255, 127, 0, 0, 0, 12, 0, 0, 0, 252, 255, 7, 192, 255, 255, 255, 7, 0, 255, 255, 255, 1, 3, 0, 239, 255, 255, 255, 255, 31, 15, 0, 255, 255, 31, 0, 255, 0, 255, 3, 31, 0, 0, 0, }; /* Posix_Punct: 1609 bytes. */ RE_UINT32 re_get_posix_punct(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_posix_punct_stage_1[f] << 5; f = code >> 11; code ^= f << 11; pos = (RE_UINT32)re_posix_punct_stage_2[pos + f] << 3; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_posix_punct_stage_3[pos + f] << 3; f = code >> 5; code ^= f << 5; pos = (RE_UINT32)re_posix_punct_stage_4[pos + f] << 5; pos += code; value = (re_posix_punct_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* Posix_XDigit. */ static RE_UINT8 re_posix_xdigit_stage_1[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_xdigit_stage_2[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_xdigit_stage_3[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_xdigit_stage_4[] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, }; static RE_UINT8 re_posix_xdigit_stage_5[] = { 0, 0, 0, 0, 0, 0, 255, 3, 126, 0, 0, 0, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; /* Posix_XDigit: 97 bytes. */ RE_UINT32 re_get_posix_xdigit(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; f = ch >> 16; code = ch ^ (f << 16); pos = (RE_UINT32)re_posix_xdigit_stage_1[f] << 3; f = code >> 13; code ^= f << 13; pos = (RE_UINT32)re_posix_xdigit_stage_2[pos + f] << 3; f = code >> 10; code ^= f << 10; pos = (RE_UINT32)re_posix_xdigit_stage_3[pos + f] << 3; f = code >> 7; code ^= f << 7; pos = (RE_UINT32)re_posix_xdigit_stage_4[pos + f] << 7; pos += code; value = (re_posix_xdigit_stage_5[pos >> 3] >> (pos & 0x7)) & 0x1; return value; } /* All_Cases. */ static RE_UINT8 re_all_cases_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 4, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_all_cases_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 11, 6, 12, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 16, 17, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 19, 6, 6, 6, 20, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, 22, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 23, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_all_cases_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 9, 0, 10, 11, 12, 13, 14, 15, 16, 17, 18, 18, 18, 18, 18, 18, 19, 20, 21, 22, 18, 18, 18, 18, 18, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 21, 34, 18, 18, 35, 18, 18, 18, 18, 18, 36, 18, 37, 38, 39, 18, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 0, 0, 0, 0, 0, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 18, 18, 18, 64, 65, 66, 66, 11, 11, 11, 11, 15, 15, 15, 15, 67, 67, 18, 18, 18, 18, 68, 69, 18, 18, 18, 18, 18, 18, 70, 71, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 72, 73, 73, 73, 74, 0, 75, 76, 76, 76, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78, 78, 78, 78, 79, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 82, 83, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 85, 18, 18, 18, 18, 18, 86, 87, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 88, 89, 82, 83, 88, 89, 88, 89, 82, 83, 90, 91, 88, 89, 92, 93, 88, 89, 88, 89, 88, 89, 94, 95, 96, 97, 98, 99, 100, 101, 96, 102, 0, 0, 0, 0, 103, 104, 105, 0, 0, 106, 0, 0, 107, 107, 108, 108, 109, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 110, 111, 111, 111, 112, 112, 112, 113, 0, 0, 73, 73, 73, 73, 73, 74, 76, 76, 76, 76, 76, 77, 114, 115, 116, 117, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 37, 118, 119, 0, 120, 120, 120, 120, 121, 122, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 18, 18, 18, 18, 86, 0, 0, 18, 18, 18, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 18, 69, 18, 18, 18, 18, 18, 18, 18, 0, 123, 18, 124, 51, 18, 18, 125, 126, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 0, 0, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 0, 0, 0, 0, 0, 0, 0, 0, 129, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 11, 11, 4, 5, 15, 15, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 130, 130, 130, 130, 130, 131, 131, 131, 131, 131, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 132, 132, 132, 132, 132, 132, 133, 0, 134, 134, 134, 134, 134, 134, 135, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 15, 15, 15, 15, 0, 0, 0, 0, }; static RE_UINT8 re_all_cases_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 7, 5, 5, 5, 5, 5, 5, 5, 8, 5, 5, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 1, 1, 1, 1, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 11, 5, 5, 5, 5, 5, 12, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 0, 5, 5, 5, 5, 5, 5, 5, 13, 14, 15, 14, 15, 14, 15, 14, 15, 16, 17, 14, 15, 14, 15, 14, 15, 0, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 0, 14, 15, 14, 15, 14, 15, 18, 14, 15, 14, 15, 14, 15, 19, 20, 21, 14, 15, 14, 15, 22, 14, 15, 23, 23, 14, 15, 0, 24, 25, 26, 14, 15, 23, 27, 28, 29, 30, 14, 15, 31, 0, 29, 32, 33, 34, 14, 15, 14, 15, 14, 15, 35, 14, 15, 35, 0, 0, 14, 15, 35, 14, 15, 36, 36, 14, 15, 14, 15, 37, 14, 15, 0, 0, 14, 15, 0, 38, 0, 0, 0, 0, 39, 40, 41, 39, 40, 41, 39, 40, 41, 14, 15, 14, 15, 14, 15, 14, 15, 42, 14, 15, 0, 39, 40, 41, 14, 15, 43, 44, 45, 0, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 0, 0, 0, 0, 0, 0, 46, 14, 15, 47, 48, 49, 49, 14, 15, 50, 51, 52, 14, 15, 53, 54, 55, 56, 57, 0, 58, 58, 0, 59, 0, 60, 61, 0, 0, 0, 58, 62, 0, 63, 0, 64, 65, 0, 66, 67, 0, 68, 69, 0, 0, 67, 0, 70, 71, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 73, 0, 0, 74, 0, 0, 74, 0, 0, 0, 75, 74, 76, 77, 77, 78, 0, 0, 0, 0, 0, 79, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 81, 0, 0, 0, 0, 0, 0, 82, 0, 0, 14, 15, 14, 15, 0, 0, 14, 15, 0, 0, 0, 33, 33, 33, 0, 83, 0, 0, 0, 0, 0, 0, 84, 0, 85, 85, 85, 0, 86, 0, 87, 87, 88, 1, 89, 1, 1, 90, 1, 1, 91, 92, 93, 1, 94, 1, 1, 1, 95, 96, 0, 97, 1, 1, 98, 1, 1, 99, 1, 1, 100, 101, 101, 101, 102, 5, 103, 5, 5, 104, 5, 5, 105, 106, 107, 5, 108, 5, 5, 5, 109, 110, 111, 112, 5, 5, 113, 5, 5, 114, 5, 5, 115, 116, 116, 117, 118, 119, 0, 0, 0, 120, 121, 122, 123, 124, 125, 126, 127, 128, 0, 14, 15, 129, 14, 15, 0, 45, 45, 45, 130, 130, 130, 130, 130, 130, 130, 130, 131, 131, 131, 131, 131, 131, 131, 131, 14, 15, 0, 0, 0, 0, 0, 0, 0, 0, 14, 15, 14, 15, 14, 15, 132, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 133, 0, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 134, 0, 0, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 135, 0, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 0, 136, 0, 0, 0, 0, 0, 136, 0, 0, 137, 137, 137, 137, 137, 137, 137, 137, 117, 117, 117, 117, 117, 117, 0, 0, 122, 122, 122, 122, 122, 122, 0, 0, 0, 138, 0, 0, 0, 139, 0, 0, 140, 141, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 14, 15, 0, 0, 0, 0, 0, 142, 0, 0, 143, 0, 117, 117, 117, 117, 117, 117, 117, 117, 122, 122, 122, 122, 122, 122, 122, 122, 0, 117, 0, 117, 0, 117, 0, 117, 0, 122, 0, 122, 0, 122, 0, 122, 144, 144, 145, 145, 145, 145, 146, 146, 147, 147, 148, 148, 149, 149, 0, 0, 117, 117, 0, 150, 0, 0, 0, 0, 122, 122, 151, 151, 152, 0, 153, 0, 0, 0, 0, 150, 0, 0, 0, 0, 154, 154, 154, 154, 152, 0, 0, 0, 117, 117, 0, 155, 0, 0, 0, 0, 122, 122, 156, 156, 0, 0, 0, 0, 117, 117, 0, 157, 0, 125, 0, 0, 122, 122, 158, 158, 129, 0, 0, 0, 159, 159, 160, 160, 152, 0, 0, 0, 0, 0, 0, 0, 0, 0, 161, 0, 0, 0, 162, 163, 0, 0, 0, 0, 0, 0, 164, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 165, 0, 166, 166, 166, 166, 166, 166, 166, 166, 167, 167, 167, 167, 167, 167, 167, 167, 0, 0, 0, 14, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 169, 169, 169, 169, 169, 169, 169, 169, 169, 169, 0, 0, 0, 0, 0, 0, 14, 15, 170, 171, 172, 173, 174, 14, 15, 14, 15, 14, 15, 175, 176, 177, 178, 0, 14, 15, 0, 14, 15, 0, 0, 0, 0, 0, 0, 0, 179, 179, 0, 0, 0, 14, 15, 14, 15, 0, 0, 0, 14, 15, 0, 0, 0, 0, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 0, 180, 0, 0, 0, 0, 0, 180, 0, 0, 0, 14, 15, 14, 15, 181, 14, 15, 0, 0, 0, 14, 15, 182, 0, 0, 14, 15, 183, 184, 185, 186, 0, 0, 187, 188, 189, 190, 14, 15, 14, 15, 0, 0, 0, 191, 0, 0, 0, 0, 192, 192, 192, 192, 192, 192, 192, 192, 0, 0, 0, 0, 0, 14, 15, 0, 193, 193, 193, 193, 193, 193, 193, 193, 194, 194, 194, 194, 194, 194, 194, 194, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 0, 0, 0, 0, 0, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 0, 0, 0, 0, 0, }; /* All_Cases: 2184 bytes. */ static RE_AllCases re_all_cases_table[] = { {{ 0, 0, 0}}, {{ 32, 0, 0}}, {{ 32, 232, 0}}, {{ 32, 8415, 0}}, {{ 32, 300, 0}}, {{ -32, 0, 0}}, {{ -32, 199, 0}}, {{ -32, 8383, 0}}, {{ -32, 268, 0}}, {{ 743, 775, 0}}, {{ 32, 8294, 0}}, {{ 7615, 0, 0}}, {{ -32, 8262, 0}}, {{ 121, 0, 0}}, {{ 1, 0, 0}}, {{ -1, 0, 0}}, {{ -199, 0, 0}}, {{ -232, 0, 0}}, {{ -121, 0, 0}}, {{ -300, -268, 0}}, {{ 195, 0, 0}}, {{ 210, 0, 0}}, {{ 206, 0, 0}}, {{ 205, 0, 0}}, {{ 79, 0, 0}}, {{ 202, 0, 0}}, {{ 203, 0, 0}}, {{ 207, 0, 0}}, {{ 97, 0, 0}}, {{ 211, 0, 0}}, {{ 209, 0, 0}}, {{ 163, 0, 0}}, {{ 213, 0, 0}}, {{ 130, 0, 0}}, {{ 214, 0, 0}}, {{ 218, 0, 0}}, {{ 217, 0, 0}}, {{ 219, 0, 0}}, {{ 56, 0, 0}}, {{ 1, 2, 0}}, {{ -1, 1, 0}}, {{ -2, -1, 0}}, {{ -79, 0, 0}}, {{ -97, 0, 0}}, {{ -56, 0, 0}}, {{ -130, 0, 0}}, {{ 10795, 0, 0}}, {{ -163, 0, 0}}, {{ 10792, 0, 0}}, {{ 10815, 0, 0}}, {{ -195, 0, 0}}, {{ 69, 0, 0}}, {{ 71, 0, 0}}, {{ 10783, 0, 0}}, {{ 10780, 0, 0}}, {{ 10782, 0, 0}}, {{ -210, 0, 0}}, {{ -206, 0, 0}}, {{ -205, 0, 0}}, {{ -202, 0, 0}}, {{ -203, 0, 0}}, {{ 42319, 0, 0}}, {{ 42315, 0, 0}}, {{ -207, 0, 0}}, {{ 42280, 0, 0}}, {{ 42308, 0, 0}}, {{ -209, 0, 0}}, {{ -211, 0, 0}}, {{ 10743, 0, 0}}, {{ 42305, 0, 0}}, {{ 10749, 0, 0}}, {{ -213, 0, 0}}, {{ -214, 0, 0}}, {{ 10727, 0, 0}}, {{ -218, 0, 0}}, {{ 42282, 0, 0}}, {{ -69, 0, 0}}, {{ -217, 0, 0}}, {{ -71, 0, 0}}, {{ -219, 0, 0}}, {{ 42261, 0, 0}}, {{ 42258, 0, 0}}, {{ 84, 116, 7289}}, {{ 116, 0, 0}}, {{ 38, 0, 0}}, {{ 37, 0, 0}}, {{ 64, 0, 0}}, {{ 63, 0, 0}}, {{ 7235, 0, 0}}, {{ 32, 62, 0}}, {{ 32, 96, 0}}, {{ 32, 57, 92}}, {{ -84, 32, 7205}}, {{ 32, 86, 0}}, {{ -743, 32, 0}}, {{ 32, 54, 0}}, {{ 32, 80, 0}}, {{ 31, 32, 0}}, {{ 32, 47, 0}}, {{ 32, 7549, 0}}, {{ -38, 0, 0}}, {{ -37, 0, 0}}, {{ 7219, 0, 0}}, {{ -32, 30, 0}}, {{ -32, 64, 0}}, {{ -32, 25, 60}}, {{ -116, -32, 7173}}, {{ -32, 54, 0}}, {{ -775, -32, 0}}, {{ -32, 22, 0}}, {{ -32, 48, 0}}, {{ -31, 1, 0}}, {{ -32, -1, 0}}, {{ -32, 15, 0}}, {{ -32, 7517, 0}}, {{ -64, 0, 0}}, {{ -63, 0, 0}}, {{ 8, 0, 0}}, {{ -62, -30, 0}}, {{ -57, -25, 35}}, {{ -47, -15, 0}}, {{ -54, -22, 0}}, {{ -8, 0, 0}}, {{ -86, -54, 0}}, {{ -80, -48, 0}}, {{ 7, 0, 0}}, {{ -116, 0, 0}}, {{ -92, -60, -35}}, {{ -96, -64, 0}}, {{ -7, 0, 0}}, {{ 80, 0, 0}}, {{ -80, 0, 0}}, {{ 15, 0, 0}}, {{ -15, 0, 0}}, {{ 48, 0, 0}}, {{ -48, 0, 0}}, {{ 7264, 0, 0}}, {{ 38864, 0, 0}}, {{ 35332, 0, 0}}, {{ 3814, 0, 0}}, {{ 1, 59, 0}}, {{ -1, 58, 0}}, {{ -59, -58, 0}}, {{ -7615, 0, 0}}, {{ 74, 0, 0}}, {{ 86, 0, 0}}, {{ 100, 0, 0}}, {{ 128, 0, 0}}, {{ 112, 0, 0}}, {{ 126, 0, 0}}, {{ 9, 0, 0}}, {{ -74, 0, 0}}, {{ -9, 0, 0}}, {{ -7289, -7205, -7173}}, {{ -86, 0, 0}}, {{ -7235, 0, 0}}, {{ -100, 0, 0}}, {{ -7219, 0, 0}}, {{ -112, 0, 0}}, {{ -128, 0, 0}}, {{ -126, 0, 0}}, {{ -7549, -7517, 0}}, {{ -8415, -8383, 0}}, {{ -8294, -8262, 0}}, {{ 28, 0, 0}}, {{ -28, 0, 0}}, {{ 16, 0, 0}}, {{ -16, 0, 0}}, {{ 26, 0, 0}}, {{ -26, 0, 0}}, {{-10743, 0, 0}}, {{ -3814, 0, 0}}, {{-10727, 0, 0}}, {{-10795, 0, 0}}, {{-10792, 0, 0}}, {{-10780, 0, 0}}, {{-10749, 0, 0}}, {{-10783, 0, 0}}, {{-10782, 0, 0}}, {{-10815, 0, 0}}, {{ -7264, 0, 0}}, {{-35332, 0, 0}}, {{-42280, 0, 0}}, {{-42308, 0, 0}}, {{-42319, 0, 0}}, {{-42315, 0, 0}}, {{-42305, 0, 0}}, {{-42258, 0, 0}}, {{-42282, 0, 0}}, {{-42261, 0, 0}}, {{ 928, 0, 0}}, {{ -928, 0, 0}}, {{-38864, 0, 0}}, {{ 40, 0, 0}}, {{ -40, 0, 0}}, }; /* All_Cases: 2340 bytes. */ int re_get_all_cases(RE_UINT32 ch, RE_UINT32* codepoints) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; RE_AllCases* all_cases; int count; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_all_cases_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_all_cases_stage_2[pos + f] << 5; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_all_cases_stage_3[pos + f] << 3; value = re_all_cases_stage_4[pos + code]; all_cases = &re_all_cases_table[value]; codepoints[0] = ch; count = 1; while (count < RE_MAX_CASES && all_cases->diffs[count - 1] != 0) { codepoints[count] = (RE_UINT32)((RE_INT32)ch + all_cases->diffs[count - 1]); ++count; } return count; } /* Simple_Case_Folding. */ static RE_UINT8 re_simple_case_folding_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 4, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_simple_case_folding_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 17, 6, 6, 6, 6, 18, 6, 6, 6, 6, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_simple_case_folding_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 2, 2, 5, 5, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 7, 8, 8, 7, 6, 6, 6, 6, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 8, 20, 6, 6, 21, 6, 6, 6, 6, 6, 22, 6, 23, 24, 25, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 0, 0, 0, 0, 27, 28, 29, 30, 1, 2, 31, 32, 0, 0, 33, 34, 35, 6, 6, 6, 36, 37, 38, 38, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 39, 7, 6, 6, 6, 6, 6, 6, 40, 41, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 42, 43, 43, 43, 44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 46, 47, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 49, 50, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 51, 0, 48, 0, 51, 0, 51, 0, 48, 0, 52, 0, 51, 0, 0, 0, 51, 0, 51, 0, 51, 0, 53, 0, 54, 0, 55, 0, 56, 0, 57, 0, 0, 0, 0, 58, 59, 60, 0, 0, 0, 0, 0, 61, 61, 0, 0, 62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 64, 64, 64, 0, 0, 0, 0, 0, 0, 43, 43, 43, 43, 43, 44, 0, 0, 0, 0, 0, 0, 65, 66, 67, 68, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 23, 69, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 49, 0, 0, 6, 6, 6, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 6, 7, 6, 6, 6, 6, 6, 6, 6, 0, 70, 6, 71, 27, 6, 6, 72, 73, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 75, 75, 75, 75, 75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 76, 76, 76, 76, 76, 76, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_simple_case_folding_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 3, 0, 3, 0, 3, 0, 3, 0, 0, 0, 3, 0, 3, 0, 3, 0, 0, 3, 0, 3, 0, 3, 0, 3, 4, 3, 0, 3, 0, 3, 0, 5, 0, 6, 3, 0, 3, 0, 7, 3, 0, 8, 8, 3, 0, 0, 9, 10, 11, 3, 0, 8, 12, 0, 13, 14, 3, 0, 0, 0, 13, 15, 0, 16, 3, 0, 3, 0, 3, 0, 17, 3, 0, 17, 0, 0, 3, 0, 17, 3, 0, 18, 18, 3, 0, 3, 0, 19, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 20, 3, 0, 20, 3, 0, 20, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 0, 3, 0, 0, 20, 3, 0, 3, 0, 21, 22, 23, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0, 24, 3, 0, 25, 26, 0, 0, 3, 0, 27, 28, 29, 3, 0, 0, 0, 0, 0, 0, 30, 0, 0, 3, 0, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 31, 0, 32, 32, 32, 0, 33, 0, 34, 34, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 36, 37, 0, 0, 0, 38, 39, 0, 40, 41, 0, 0, 42, 43, 0, 3, 0, 44, 3, 0, 0, 23, 23, 23, 45, 45, 45, 45, 45, 45, 45, 45, 3, 0, 0, 0, 0, 0, 0, 0, 46, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 0, 0, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 0, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 0, 48, 0, 0, 0, 0, 0, 48, 0, 0, 49, 49, 49, 49, 49, 49, 0, 0, 3, 0, 3, 0, 3, 0, 0, 0, 0, 0, 0, 50, 0, 0, 51, 0, 49, 49, 49, 49, 49, 49, 49, 49, 0, 49, 0, 49, 0, 49, 0, 49, 49, 49, 52, 52, 53, 0, 54, 0, 55, 55, 55, 55, 53, 0, 0, 0, 49, 49, 56, 56, 0, 0, 0, 0, 49, 49, 57, 57, 44, 0, 0, 0, 58, 58, 59, 59, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 61, 62, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 0, 0, 64, 64, 64, 64, 64, 64, 64, 64, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 3, 0, 66, 67, 68, 0, 0, 3, 0, 3, 0, 3, 0, 69, 70, 71, 72, 0, 3, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 73, 73, 0, 0, 0, 3, 0, 3, 0, 0, 0, 3, 0, 3, 0, 74, 3, 0, 0, 0, 0, 3, 0, 75, 0, 0, 3, 0, 76, 77, 78, 79, 0, 0, 80, 81, 82, 83, 3, 0, 3, 0, 84, 84, 84, 84, 84, 84, 84, 84, 85, 85, 85, 85, 85, 85, 85, 85, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 0, 0, 0, 0, 0, }; /* Simple_Case_Folding: 1624 bytes. */ static RE_INT32 re_simple_case_folding_table[] = { 0, 32, 775, 1, -121, -268, 210, 206, 205, 79, 202, 203, 207, 211, 209, 213, 214, 218, 217, 219, 2, -97, -56, -130, 10795, -163, 10792, -195, 69, 71, 116, 38, 37, 64, 63, 8, -30, -25, -15, -22, -54, -48, -60, -64, -7, 80, 15, 48, 7264, -8, -58, -7615, -74, -9, -7173, -86, -100, -112, -128, -126, -7517, -8383, -8262, 28, 16, 26, -10743, -3814, -10727, -10780, -10749, -10783, -10782, -10815, -35332, -42280, -42308, -42319, -42315, -42305, -42258, -42282, -42261, 928, -38864, 40, }; /* Simple_Case_Folding: 344 bytes. */ RE_UINT32 re_get_simple_case_folding(RE_UINT32 ch) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; RE_INT32 diff; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_simple_case_folding_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_simple_case_folding_stage_2[pos + f] << 5; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_simple_case_folding_stage_3[pos + f] << 3; value = re_simple_case_folding_stage_4[pos + code]; diff = re_simple_case_folding_table[value]; return (RE_UINT32)((RE_INT32)ch + diff); } /* Full_Case_Folding. */ static RE_UINT8 re_full_case_folding_stage_1[] = { 0, 1, 2, 2, 2, 3, 2, 4, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, }; static RE_UINT8 re_full_case_folding_stage_2[] = { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 6, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 9, 10, 6, 11, 6, 6, 12, 6, 6, 6, 6, 6, 6, 6, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 14, 15, 6, 6, 6, 16, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 17, 6, 6, 6, 18, 6, 6, 6, 6, 19, 6, 6, 6, 6, 6, 6, 6, 20, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 21, 6, 6, 6, 6, 6, 6, 6, }; static RE_UINT8 re_full_case_folding_stage_3[] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 2, 2, 5, 6, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 8, 9, 9, 10, 7, 7, 7, 7, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 9, 22, 7, 7, 23, 7, 7, 7, 7, 7, 24, 7, 25, 26, 27, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28, 0, 0, 0, 0, 0, 29, 30, 31, 32, 33, 2, 34, 35, 36, 0, 37, 38, 39, 7, 7, 7, 40, 41, 42, 42, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 43, 44, 7, 7, 7, 7, 7, 7, 45, 46, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 47, 48, 48, 48, 49, 0, 0, 0, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 51, 51, 51, 51, 52, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 54, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 55, 56, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 0, 57, 0, 54, 0, 57, 0, 57, 0, 54, 58, 59, 0, 57, 0, 0, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 0, 0, 0, 0, 76, 77, 78, 0, 0, 0, 0, 0, 79, 79, 0, 0, 80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 82, 82, 82, 0, 0, 0, 0, 0, 0, 48, 48, 48, 48, 48, 49, 0, 0, 0, 0, 0, 0, 83, 84, 85, 86, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 25, 87, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 88, 0, 0, 7, 7, 7, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 44, 7, 44, 7, 7, 7, 7, 7, 7, 7, 0, 89, 7, 90, 29, 7, 7, 91, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 0, 0, 0, 0, 0, 0, 0, 0, 94, 0, 95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96, 96, 96, 96, 96, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 97, 97, 97, 97, 97, 97, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, }; static RE_UINT8 re_full_case_folding_stage_4[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 3, 4, 0, 4, 0, 4, 0, 4, 0, 5, 0, 4, 0, 4, 0, 4, 0, 0, 4, 0, 4, 0, 4, 0, 4, 0, 6, 4, 0, 4, 0, 4, 0, 7, 4, 0, 4, 0, 4, 0, 8, 0, 9, 4, 0, 4, 0, 10, 4, 0, 11, 11, 4, 0, 0, 12, 13, 14, 4, 0, 11, 15, 0, 16, 17, 4, 0, 0, 0, 16, 18, 0, 19, 4, 0, 4, 0, 4, 0, 20, 4, 0, 20, 0, 0, 4, 0, 20, 4, 0, 21, 21, 4, 0, 4, 0, 22, 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 23, 4, 0, 23, 4, 0, 23, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 0, 4, 0, 24, 23, 4, 0, 4, 0, 25, 26, 27, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 28, 4, 0, 29, 30, 0, 0, 4, 0, 31, 32, 33, 4, 0, 0, 0, 0, 0, 0, 34, 0, 0, 4, 0, 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 34, 0, 0, 0, 0, 0, 0, 35, 0, 36, 36, 36, 0, 37, 0, 38, 38, 39, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 42, 43, 0, 0, 0, 44, 45, 0, 46, 47, 0, 0, 48, 49, 0, 4, 0, 50, 4, 0, 0, 27, 27, 27, 51, 51, 51, 51, 51, 51, 51, 51, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 4, 0, 4, 0, 52, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 0, 0, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 0, 0, 0, 0, 0, 0, 0, 0, 54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 0, 55, 0, 0, 0, 0, 0, 55, 0, 0, 56, 56, 56, 56, 56, 56, 0, 0, 4, 0, 4, 0, 4, 0, 57, 58, 59, 60, 61, 62, 0, 0, 63, 0, 56, 56, 56, 56, 56, 56, 56, 56, 64, 0, 65, 0, 66, 0, 67, 0, 0, 56, 0, 56, 0, 56, 0, 56, 68, 68, 68, 68, 68, 68, 68, 68, 69, 69, 69, 69, 69, 69, 69, 69, 70, 70, 70, 70, 70, 70, 70, 70, 71, 71, 71, 71, 71, 71, 71, 71, 72, 72, 72, 72, 72, 72, 72, 72, 73, 73, 73, 73, 73, 73, 73, 73, 0, 0, 74, 75, 76, 0, 77, 78, 56, 56, 79, 79, 80, 0, 81, 0, 0, 0, 82, 83, 84, 0, 85, 86, 87, 87, 87, 87, 88, 0, 0, 0, 0, 0, 89, 90, 0, 0, 91, 92, 56, 56, 93, 93, 0, 0, 0, 0, 0, 0, 94, 95, 96, 0, 97, 98, 56, 56, 99, 99, 50, 0, 0, 0, 0, 0, 100, 101, 102, 0, 103, 104, 105, 105, 106, 106, 107, 0, 0, 0, 0, 0, 0, 0, 0, 0, 108, 0, 0, 0, 109, 110, 0, 0, 0, 0, 0, 0, 111, 0, 0, 0, 0, 0, 112, 112, 112, 112, 112, 112, 112, 112, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 4, 0, 114, 115, 116, 0, 0, 4, 0, 4, 0, 4, 0, 117, 118, 119, 120, 0, 4, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 121, 121, 0, 0, 0, 4, 0, 4, 0, 0, 4, 0, 4, 0, 4, 0, 0, 0, 0, 4, 0, 4, 0, 122, 4, 0, 0, 0, 0, 4, 0, 123, 0, 0, 4, 0, 124, 125, 126, 127, 0, 0, 128, 129, 130, 131, 4, 0, 4, 0, 132, 132, 132, 132, 132, 132, 132, 132, 133, 134, 135, 136, 137, 138, 139, 0, 0, 0, 0, 140, 141, 142, 143, 144, 145, 145, 145, 145, 145, 145, 145, 145, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 0, 0, 0, 0, 0, }; /* Full_Case_Folding: 1824 bytes. */ static RE_FullCaseFolding re_full_case_folding_table[] = { { 0, { 0, 0}}, { 32, { 0, 0}}, { 775, { 0, 0}}, { -108, { 115, 0}}, { 1, { 0, 0}}, { -199, { 775, 0}}, { 371, { 110, 0}}, { -121, { 0, 0}}, { -268, { 0, 0}}, { 210, { 0, 0}}, { 206, { 0, 0}}, { 205, { 0, 0}}, { 79, { 0, 0}}, { 202, { 0, 0}}, { 203, { 0, 0}}, { 207, { 0, 0}}, { 211, { 0, 0}}, { 209, { 0, 0}}, { 213, { 0, 0}}, { 214, { 0, 0}}, { 218, { 0, 0}}, { 217, { 0, 0}}, { 219, { 0, 0}}, { 2, { 0, 0}}, { -390, { 780, 0}}, { -97, { 0, 0}}, { -56, { 0, 0}}, { -130, { 0, 0}}, { 10795, { 0, 0}}, { -163, { 0, 0}}, { 10792, { 0, 0}}, { -195, { 0, 0}}, { 69, { 0, 0}}, { 71, { 0, 0}}, { 116, { 0, 0}}, { 38, { 0, 0}}, { 37, { 0, 0}}, { 64, { 0, 0}}, { 63, { 0, 0}}, { 41, { 776, 769}}, { 21, { 776, 769}}, { 8, { 0, 0}}, { -30, { 0, 0}}, { -25, { 0, 0}}, { -15, { 0, 0}}, { -22, { 0, 0}}, { -54, { 0, 0}}, { -48, { 0, 0}}, { -60, { 0, 0}}, { -64, { 0, 0}}, { -7, { 0, 0}}, { 80, { 0, 0}}, { 15, { 0, 0}}, { 48, { 0, 0}}, { -34, {1410, 0}}, { 7264, { 0, 0}}, { -8, { 0, 0}}, { -7726, { 817, 0}}, { -7715, { 776, 0}}, { -7713, { 778, 0}}, { -7712, { 778, 0}}, { -7737, { 702, 0}}, { -58, { 0, 0}}, { -7723, { 115, 0}}, { -7051, { 787, 0}}, { -7053, { 787, 768}}, { -7055, { 787, 769}}, { -7057, { 787, 834}}, { -128, { 953, 0}}, { -136, { 953, 0}}, { -112, { 953, 0}}, { -120, { 953, 0}}, { -64, { 953, 0}}, { -72, { 953, 0}}, { -66, { 953, 0}}, { -7170, { 953, 0}}, { -7176, { 953, 0}}, { -7173, { 834, 0}}, { -7174, { 834, 953}}, { -74, { 0, 0}}, { -7179, { 953, 0}}, { -7173, { 0, 0}}, { -78, { 953, 0}}, { -7180, { 953, 0}}, { -7190, { 953, 0}}, { -7183, { 834, 0}}, { -7184, { 834, 953}}, { -86, { 0, 0}}, { -7189, { 953, 0}}, { -7193, { 776, 768}}, { -7194, { 776, 769}}, { -7197, { 834, 0}}, { -7198, { 776, 834}}, { -100, { 0, 0}}, { -7197, { 776, 768}}, { -7198, { 776, 769}}, { -7203, { 787, 0}}, { -7201, { 834, 0}}, { -7202, { 776, 834}}, { -112, { 0, 0}}, { -118, { 953, 0}}, { -7210, { 953, 0}}, { -7206, { 953, 0}}, { -7213, { 834, 0}}, { -7214, { 834, 953}}, { -128, { 0, 0}}, { -126, { 0, 0}}, { -7219, { 953, 0}}, { -7517, { 0, 0}}, { -8383, { 0, 0}}, { -8262, { 0, 0}}, { 28, { 0, 0}}, { 16, { 0, 0}}, { 26, { 0, 0}}, {-10743, { 0, 0}}, { -3814, { 0, 0}}, {-10727, { 0, 0}}, {-10780, { 0, 0}}, {-10749, { 0, 0}}, {-10783, { 0, 0}}, {-10782, { 0, 0}}, {-10815, { 0, 0}}, {-35332, { 0, 0}}, {-42280, { 0, 0}}, {-42308, { 0, 0}}, {-42319, { 0, 0}}, {-42315, { 0, 0}}, {-42305, { 0, 0}}, {-42258, { 0, 0}}, {-42282, { 0, 0}}, {-42261, { 0, 0}}, { 928, { 0, 0}}, {-38864, { 0, 0}}, {-64154, { 102, 0}}, {-64155, { 105, 0}}, {-64156, { 108, 0}}, {-64157, { 102, 105}}, {-64158, { 102, 108}}, {-64146, { 116, 0}}, {-64147, { 116, 0}}, {-62879, {1398, 0}}, {-62880, {1381, 0}}, {-62881, {1387, 0}}, {-62872, {1398, 0}}, {-62883, {1389, 0}}, { 40, { 0, 0}}, }; /* Full_Case_Folding: 1168 bytes. */ int re_get_full_case_folding(RE_UINT32 ch, RE_UINT32* codepoints) { RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; RE_FullCaseFolding* case_folding; int count; f = ch >> 13; code = ch ^ (f << 13); pos = (RE_UINT32)re_full_case_folding_stage_1[f] << 5; f = code >> 8; code ^= f << 8; pos = (RE_UINT32)re_full_case_folding_stage_2[pos + f] << 5; f = code >> 3; code ^= f << 3; pos = (RE_UINT32)re_full_case_folding_stage_3[pos + f] << 3; value = re_full_case_folding_stage_4[pos + code]; case_folding = &re_full_case_folding_table[value]; codepoints[0] = (RE_UINT32)((RE_INT32)ch + case_folding->diff); count = 1; while (count < RE_MAX_FOLDED && case_folding->codepoints[count - 1] != 0) { codepoints[count] = case_folding->codepoints[count - 1]; ++count; } return count; } /* Property function table. */ RE_GetPropertyFunc re_get_property[] = { re_get_general_category, re_get_block, re_get_script, re_get_word_break, re_get_grapheme_cluster_break, re_get_sentence_break, re_get_math, re_get_alphabetic, re_get_lowercase, re_get_uppercase, re_get_cased, re_get_case_ignorable, re_get_changes_when_lowercased, re_get_changes_when_uppercased, re_get_changes_when_titlecased, re_get_changes_when_casefolded, re_get_changes_when_casemapped, re_get_id_start, re_get_id_continue, re_get_xid_start, re_get_xid_continue, re_get_default_ignorable_code_point, re_get_grapheme_extend, re_get_grapheme_base, re_get_grapheme_link, re_get_white_space, re_get_bidi_control, re_get_join_control, re_get_dash, re_get_hyphen, re_get_quotation_mark, re_get_terminal_punctuation, re_get_other_math, re_get_hex_digit, re_get_ascii_hex_digit, re_get_other_alphabetic, re_get_ideographic, re_get_diacritic, re_get_extender, re_get_other_lowercase, re_get_other_uppercase, re_get_noncharacter_code_point, re_get_other_grapheme_extend, re_get_ids_binary_operator, re_get_ids_trinary_operator, re_get_radical, re_get_unified_ideograph, re_get_other_default_ignorable_code_point, re_get_deprecated, re_get_soft_dotted, re_get_logical_order_exception, re_get_other_id_start, re_get_other_id_continue, re_get_sterm, re_get_variation_selector, re_get_pattern_white_space, re_get_pattern_syntax, re_get_hangul_syllable_type, re_get_bidi_class, re_get_canonical_combining_class, re_get_decomposition_type, re_get_east_asian_width, re_get_joining_group, re_get_joining_type, re_get_line_break, re_get_numeric_type, re_get_numeric_value, re_get_bidi_mirrored, re_get_indic_positional_category, re_get_indic_syllabic_category, re_get_alphanumeric, re_get_any, re_get_blank, re_get_graph, re_get_print, re_get_word, re_get_xdigit, re_get_posix_digit, re_get_posix_alnum, re_get_posix_punct, re_get_posix_xdigit, }; regex-2016.01.10/Python3/_regex_unicode.h0000666000000000000000000001640712540663552016062 0ustar 00000000000000typedef unsigned char RE_UINT8; typedef signed char RE_INT8; typedef unsigned short RE_UINT16; typedef signed short RE_INT16; typedef unsigned int RE_UINT32; typedef signed int RE_INT32; typedef unsigned char BOOL; enum {FALSE, TRUE}; #define RE_ASCII_MAX 0x7F #define RE_LOCALE_MAX 0xFF #define RE_UNICODE_MAX 0x10FFFF #define RE_MAX_CASES 4 #define RE_MAX_FOLDED 3 typedef struct RE_Property { RE_UINT16 name; RE_UINT8 id; RE_UINT8 value_set; } RE_Property; typedef struct RE_PropertyValue { RE_UINT16 name; RE_UINT8 value_set; RE_UINT16 id; } RE_PropertyValue; typedef RE_UINT32 (*RE_GetPropertyFunc)(RE_UINT32 ch); #define RE_PROP_GC 0x0 #define RE_PROP_CASED 0xA #define RE_PROP_UPPERCASE 0x9 #define RE_PROP_LOWERCASE 0x8 #define RE_PROP_C 30 #define RE_PROP_L 31 #define RE_PROP_M 32 #define RE_PROP_N 33 #define RE_PROP_P 34 #define RE_PROP_S 35 #define RE_PROP_Z 36 #define RE_PROP_ASSIGNED 38 #define RE_PROP_CASEDLETTER 37 #define RE_PROP_CN 0 #define RE_PROP_LU 1 #define RE_PROP_LL 2 #define RE_PROP_LT 3 #define RE_PROP_LM 4 #define RE_PROP_LO 5 #define RE_PROP_MN 6 #define RE_PROP_ME 7 #define RE_PROP_MC 8 #define RE_PROP_ND 9 #define RE_PROP_NL 10 #define RE_PROP_NO 11 #define RE_PROP_ZS 12 #define RE_PROP_ZL 13 #define RE_PROP_ZP 14 #define RE_PROP_CC 15 #define RE_PROP_CF 16 #define RE_PROP_CO 17 #define RE_PROP_CS 18 #define RE_PROP_PD 19 #define RE_PROP_PS 20 #define RE_PROP_PE 21 #define RE_PROP_PC 22 #define RE_PROP_PO 23 #define RE_PROP_SM 24 #define RE_PROP_SC 25 #define RE_PROP_SK 26 #define RE_PROP_SO 27 #define RE_PROP_PI 28 #define RE_PROP_PF 29 #define RE_PROP_C_MASK 0x00078001 #define RE_PROP_L_MASK 0x0000003E #define RE_PROP_M_MASK 0x000001C0 #define RE_PROP_N_MASK 0x00000E00 #define RE_PROP_P_MASK 0x30F80000 #define RE_PROP_S_MASK 0x0F000000 #define RE_PROP_Z_MASK 0x00007000 #define RE_PROP_ALNUM 0x460001 #define RE_PROP_ALPHA 0x070001 #define RE_PROP_ANY 0x470001 #define RE_PROP_ASCII 0x010001 #define RE_PROP_BLANK 0x480001 #define RE_PROP_CNTRL 0x00000F #define RE_PROP_DIGIT 0x000009 #define RE_PROP_GRAPH 0x490001 #define RE_PROP_LOWER 0x080001 #define RE_PROP_PRINT 0x4A0001 #define RE_PROP_SPACE 0x190001 #define RE_PROP_UPPER 0x090001 #define RE_PROP_WORD 0x4B0001 #define RE_PROP_XDIGIT 0x4C0001 #define RE_PROP_POSIX_ALNUM 0x4E0001 #define RE_PROP_POSIX_DIGIT 0x4D0001 #define RE_PROP_POSIX_PUNCT 0x4F0001 #define RE_PROP_POSIX_XDIGIT 0x500001 #define RE_BREAK_OTHER 0 #define RE_BREAK_DOUBLEQUOTE 1 #define RE_BREAK_SINGLEQUOTE 2 #define RE_BREAK_HEBREWLETTER 3 #define RE_BREAK_CR 4 #define RE_BREAK_LF 5 #define RE_BREAK_NEWLINE 6 #define RE_BREAK_EXTEND 7 #define RE_BREAK_REGIONALINDICATOR 8 #define RE_BREAK_FORMAT 9 #define RE_BREAK_KATAKANA 10 #define RE_BREAK_ALETTER 11 #define RE_BREAK_MIDLETTER 12 #define RE_BREAK_MIDNUM 13 #define RE_BREAK_MIDNUMLET 14 #define RE_BREAK_NUMERIC 15 #define RE_BREAK_EXTENDNUMLET 16 #define RE_GBREAK_OTHER 0 #define RE_GBREAK_CR 1 #define RE_GBREAK_LF 2 #define RE_GBREAK_CONTROL 3 #define RE_GBREAK_EXTEND 4 #define RE_GBREAK_REGIONALINDICATOR 5 #define RE_GBREAK_SPACINGMARK 6 #define RE_GBREAK_L 7 #define RE_GBREAK_V 8 #define RE_GBREAK_T 9 #define RE_GBREAK_LV 10 #define RE_GBREAK_LVT 11 #define RE_GBREAK_PREPEND 12 extern char* re_strings[1296]; extern RE_Property re_properties[147]; extern RE_PropertyValue re_property_values[1412]; extern RE_UINT16 re_expand_on_folding[104]; extern RE_GetPropertyFunc re_get_property[81]; RE_UINT32 re_get_general_category(RE_UINT32 ch); RE_UINT32 re_get_block(RE_UINT32 ch); RE_UINT32 re_get_script(RE_UINT32 ch); RE_UINT32 re_get_word_break(RE_UINT32 ch); RE_UINT32 re_get_grapheme_cluster_break(RE_UINT32 ch); RE_UINT32 re_get_sentence_break(RE_UINT32 ch); RE_UINT32 re_get_math(RE_UINT32 ch); RE_UINT32 re_get_alphabetic(RE_UINT32 ch); RE_UINT32 re_get_lowercase(RE_UINT32 ch); RE_UINT32 re_get_uppercase(RE_UINT32 ch); RE_UINT32 re_get_cased(RE_UINT32 ch); RE_UINT32 re_get_case_ignorable(RE_UINT32 ch); RE_UINT32 re_get_changes_when_lowercased(RE_UINT32 ch); RE_UINT32 re_get_changes_when_uppercased(RE_UINT32 ch); RE_UINT32 re_get_changes_when_titlecased(RE_UINT32 ch); RE_UINT32 re_get_changes_when_casefolded(RE_UINT32 ch); RE_UINT32 re_get_changes_when_casemapped(RE_UINT32 ch); RE_UINT32 re_get_id_start(RE_UINT32 ch); RE_UINT32 re_get_id_continue(RE_UINT32 ch); RE_UINT32 re_get_xid_start(RE_UINT32 ch); RE_UINT32 re_get_xid_continue(RE_UINT32 ch); RE_UINT32 re_get_default_ignorable_code_point(RE_UINT32 ch); RE_UINT32 re_get_grapheme_extend(RE_UINT32 ch); RE_UINT32 re_get_grapheme_base(RE_UINT32 ch); RE_UINT32 re_get_grapheme_link(RE_UINT32 ch); RE_UINT32 re_get_white_space(RE_UINT32 ch); RE_UINT32 re_get_bidi_control(RE_UINT32 ch); RE_UINT32 re_get_join_control(RE_UINT32 ch); RE_UINT32 re_get_dash(RE_UINT32 ch); RE_UINT32 re_get_hyphen(RE_UINT32 ch); RE_UINT32 re_get_quotation_mark(RE_UINT32 ch); RE_UINT32 re_get_terminal_punctuation(RE_UINT32 ch); RE_UINT32 re_get_other_math(RE_UINT32 ch); RE_UINT32 re_get_hex_digit(RE_UINT32 ch); RE_UINT32 re_get_ascii_hex_digit(RE_UINT32 ch); RE_UINT32 re_get_other_alphabetic(RE_UINT32 ch); RE_UINT32 re_get_ideographic(RE_UINT32 ch); RE_UINT32 re_get_diacritic(RE_UINT32 ch); RE_UINT32 re_get_extender(RE_UINT32 ch); RE_UINT32 re_get_other_lowercase(RE_UINT32 ch); RE_UINT32 re_get_other_uppercase(RE_UINT32 ch); RE_UINT32 re_get_noncharacter_code_point(RE_UINT32 ch); RE_UINT32 re_get_other_grapheme_extend(RE_UINT32 ch); RE_UINT32 re_get_ids_binary_operator(RE_UINT32 ch); RE_UINT32 re_get_ids_trinary_operator(RE_UINT32 ch); RE_UINT32 re_get_radical(RE_UINT32 ch); RE_UINT32 re_get_unified_ideograph(RE_UINT32 ch); RE_UINT32 re_get_other_default_ignorable_code_point(RE_UINT32 ch); RE_UINT32 re_get_deprecated(RE_UINT32 ch); RE_UINT32 re_get_soft_dotted(RE_UINT32 ch); RE_UINT32 re_get_logical_order_exception(RE_UINT32 ch); RE_UINT32 re_get_other_id_start(RE_UINT32 ch); RE_UINT32 re_get_other_id_continue(RE_UINT32 ch); RE_UINT32 re_get_sterm(RE_UINT32 ch); RE_UINT32 re_get_variation_selector(RE_UINT32 ch); RE_UINT32 re_get_pattern_white_space(RE_UINT32 ch); RE_UINT32 re_get_pattern_syntax(RE_UINT32 ch); RE_UINT32 re_get_hangul_syllable_type(RE_UINT32 ch); RE_UINT32 re_get_bidi_class(RE_UINT32 ch); RE_UINT32 re_get_canonical_combining_class(RE_UINT32 ch); RE_UINT32 re_get_decomposition_type(RE_UINT32 ch); RE_UINT32 re_get_east_asian_width(RE_UINT32 ch); RE_UINT32 re_get_joining_group(RE_UINT32 ch); RE_UINT32 re_get_joining_type(RE_UINT32 ch); RE_UINT32 re_get_line_break(RE_UINT32 ch); RE_UINT32 re_get_numeric_type(RE_UINT32 ch); RE_UINT32 re_get_numeric_value(RE_UINT32 ch); RE_UINT32 re_get_bidi_mirrored(RE_UINT32 ch); RE_UINT32 re_get_indic_positional_category(RE_UINT32 ch); RE_UINT32 re_get_indic_syllabic_category(RE_UINT32 ch); RE_UINT32 re_get_alphanumeric(RE_UINT32 ch); RE_UINT32 re_get_any(RE_UINT32 ch); RE_UINT32 re_get_blank(RE_UINT32 ch); RE_UINT32 re_get_graph(RE_UINT32 ch); RE_UINT32 re_get_print(RE_UINT32 ch); RE_UINT32 re_get_word(RE_UINT32 ch); RE_UINT32 re_get_xdigit(RE_UINT32 ch); RE_UINT32 re_get_posix_digit(RE_UINT32 ch); RE_UINT32 re_get_posix_alnum(RE_UINT32 ch); RE_UINT32 re_get_posix_punct(RE_UINT32 ch); RE_UINT32 re_get_posix_xdigit(RE_UINT32 ch); int re_get_all_cases(RE_UINT32 ch, RE_UINT32* codepoints); RE_UINT32 re_get_simple_case_folding(RE_UINT32 ch); int re_get_full_case_folding(RE_UINT32 ch, RE_UINT32* codepoints); regex-2016.01.10/tools/0000777000000000000000000000000012644552200012506 5ustar 00000000000000regex-2016.01.10/tools/build_regex_unicode.py0000666000000000000000000020061311721570164017065 0ustar 00000000000000# -*- coding: utf-8 -*- # This script builds the Unicode tables used by the regex module. # # It downloads the data from the Unicode website, saving it locally, and then # calculates the minimum size of the tables. # # Finally, it creates 2 code files, namely "_regex_unicode.h" and # "_regex_unicode.c". # # Various parameters are stored in a local "shelve" file in order to reduce # the amount of recalculation. # # This script is written in Python 3. import os import shelve import sys import shutil from collections import defaultdict from contextlib import closing from urllib.parse import urljoin, urlparse from urllib.request import urlretrieve this_folder = os.path.dirname(__file__) # The location of the Unicode data folder. unicode_folder = os.path.join(this_folder, "Unicode") # The location of the C sources for the regex engine. c_folder = os.path.join(this_folder, "regex") # The paths of the source files to be generated. h_path = os.path.join(c_folder, "_regex_unicode.h") c_path = os.path.join(c_folder, "_regex_unicode.c") properties_path = os.path.join(this_folder, "UnicodeProperties.txt") # The paths of the C source files. c_header_path = os.path.join(c_folder, "_regex_unicode.h") c_source_path = os.path.join(c_folder, "_regex_unicode.c") # The path of the shelve file. shelf_path = os.path.splitext(__file__)[0] + ".shf" # The number of columns in each table. COLUMNS = 16 # The maximum number of codepoints. NUM_CODEPOINTS = 0x110000 # The maximum depth of the multi-stage tables. MAX_STAGES = 5 # Whether to force an update of the Unicode data. # # Data is downloaded if needed, but if the Unicode data has been updated on # the website then you need to force an update. FORCE_UPDATE = False # Whether to force recalculation of the smallest table size. FORCE_RECALC = False # Whether to count the number of codepoints as a check. COUNT_CODEPOINTS = False # If we update then we must recalculate. if FORCE_UPDATE: FORCE_RECALC = True # Ensure that the Unicode data folder exists. try: os.mkdir(unicode_folder) except OSError: pass # If the maximum number of stages has changed, then force recalculation. with closing(shelve.open(shelf_path, writeback=True)) as shelf: if shelf.get("MAXSTAGES") != MAX_STAGES: shelf["MAXSTAGES"] = MAX_STAGES FORCE_RECALC = True if FORCE_RECALC: try: del shelf["CASEFOLDING"] except KeyError: pass # Redefine "print" so that it flushes. real_print = print def print(*args, **kwargs): real_print(*args, **kwargs) sys.stdout.flush() class UnicodeDataError(Exception): pass def determine_data_type(min_value, max_value): "Determines the smallest C data type which can store values in a range." # 1 byte, unsigned and signed. if 0 <= min_value <= max_value <= 0xFF: return "RE_UINT8", 1 if -0x80 <= min_value <= max_value <= 0x7F: return "RE_INT8", 1 # 2 bytes, unsigned and signed. if 0 <= min_value <= max_value <= 0xFFFF: return "RE_UINT16", 2 if -0x8000 <= min_value <= max_value <= 0x7FFF: return "RE_INT16", 2 # 4 bytes, unsigned and signed. if 0 <= min_value <= max_value <= 0xFFFFFFFF: return "RE_UINT32", 4 if -0x80000000 <= min_value <= max_value <= 0x7FFFFFFF: return "RE_INT32", 4 raise ValueError("value range too big for 32 bits") def smallest_data_type(min_value, max_value): """Determines the smallest integer data type required to store all of the values in a range. """ return determine_data_type(min_value, max_value)[0] def smallest_bytesize(min_value, max_value): """Determines the minimum number of bytes required to store all of the values in a range. """ return determine_data_type(min_value, max_value)[1] def product(numbers): """Calculates the product of a series of numbers.""" if not product: raise ValueError("product of empty sequence") result = 1 for n in numbers: result *= n return result def mul_to_shift(number): "Converts a multiplier into a shift." shift = number.bit_length() - 1 if shift < 0 or (1 << shift) != number: raise ValueError("can't convert multiplier into shift") return shift class MultistageTable: "A multi-stage table." def __init__(self, block_sizes, stages, binary): self.block_sizes = block_sizes self.stages = stages self.binary = binary self.num_stages = len(self.block_sizes) + 1 # How many bytes of storage are needed for this table? self.bytesize = 0 for stage in self.stages[ : -1]: self.bytesize += (smallest_bytesize(min(stage), max(stage)) * len(stage)) if binary: self.bytesize += len(self.stages[-1]) // 8 else: self.bytesize += smallest_bytesize(min(self.stages[-1]), max(self.stages[-1])) * len(self.stages[-1]) # Calculate the block-size products for lookup. self._size_products = [] for stage in range(self.num_stages - 1): self._size_products.append(product(self.block_sizes[stage : ])) class PropertyValue: "A property value." def __init__(self, name, id): self.name = name self.id = id self.aliases = set() def use_pref_name(self): """Uses better names for the properties and values if the current one is poor. """ self.name, self.aliases = pick_pref_name(self.name, self.aliases) class Property: "A Unicode property." def __init__(self, name, entries, value_dict): self.name = name self.entries = entries self._value_list = [] self._value_dict = {} for name, value in sorted(value_dict.items(), key=lambda pair: pair[1]): val = PropertyValue(name, value) self._value_list.append(val) self._value_dict[name.upper()] = val self.binary = len(self._value_dict.values()) == 2 self.aliases = set() def add(self, val): "Adds a value." # Make it case-insensitive. upper_name = val.name.upper() if upper_name in self._value_dict: raise KeyError("duplicate value name: {}".format(val.name)) self._value_list.append(val) self._value_dict[upper_name] = val def use_pref_name(self): """Use a better name for a property or value if the current one is poor. """ self.name, self.aliases = pick_pref_name(self.name, self.aliases) def make_binary_property(self): "Makes this property a binary property." if self._value_list: raise UnicodeDataError("property '{}' already has values".format(self.name)) binary_values = [ ("No", 0, {"N", "False", "F"}), ("Yes", 1, {"Y", "True", "T"}) ] for name, v, aliases in binary_values: val = PropertyValue(name, v) val.aliases |= aliases self._value_list.append(val) self._value_dict[name.upper()] = val self.binary = True def generate_code(self, h_file, c_file, info): "Generates the code for a property." # Build the tables. self._build_tables() print("Generating code for {}".format(self.name)) table = self.table # Write the property tables. c_file.write(""" /* {name}. */ """.format(name=self.name)) self.generate_tables(c_file) # Write the lookup function. prototype = "RE_UINT32 re_get_{name}(RE_UINT32 ch)".format(name=self.name.lower()) h_file.write("{prototype};\n".format(prototype=prototype)) c_file.write(""" {prototype} {{ """.format(prototype=prototype)) self._generate_locals(c_file) c_file.write("\n") self._generate_lookup(c_file) c_file.write(""" return value; } """) def generate_tables(self, c_file): table = self.table for stage in range(table.num_stages): # The contents of this table. entries = table.stages[stage] # What data type should we use for the entries? if self.binary and stage == table.num_stages - 1: data_type = "RE_UINT8" entries = self._pack_to_bitflags(entries) else: data_type = smallest_data_type(min(entries), max(entries)) # The entries will be stored in an array. c_file.write(""" static {data_type} re_{name}_stage_{stage}[] = {{ """.format(data_type=data_type, name=self.name.lower(), stage=stage + 1)) # Write the entries, nicely aligned in columns. entries = ["{},".format(e) for e in entries] entry_width = max(len(e) for e in entries) entries = [e.rjust(entry_width) for e in entries] for start in range(0, len(entries), COLUMNS): c_file.write(" {}\n".format(" ".join(entries[start : start + COLUMNS]))) c_file.write("};\n") # Write how much storage will be used by all of the tables. c_file.write(""" /* {name}: {bytesize} bytes. */ """.format(name=self.name, bytesize=table.bytesize)) def _pack_to_bitflags(self, entries): entries = tuple(entries) new_entries = [] for start in range(0, len(entries), 8): new_entries.append(bitflag_dict[entries[start : start + 8]]) return new_entries def _generate_locals(self, c_file): c_file.write("""\ RE_UINT32 code; RE_UINT32 f; RE_UINT32 pos; RE_UINT32 value; """) def _generate_lookup(self, c_file): table = self.table name = self.name.lower() # Convert the block sizes into shift values. shifts = [mul_to_shift(size) for size in table.block_sizes] c_file.write("""\ f = ch >> {field_shift}; code = ch ^ (f << {field_shift}); pos = (RE_UINT32)re_{name}_stage_1[f] << {block_shift}; """.format(field_shift=sum(shifts), name=name, block_shift=shifts[0])) for stage in range(1, table.num_stages - 1): c_file.write("""\ f = code >> {field_shift}; code ^= f << {field_shift}; pos = (RE_UINT32)re_{name}_stage_{stage}[pos + f] << {block_shift}; """.format(field_shift=sum(shifts[stage : ]), name=name, stage=stage + 1, block_shift=shifts[stage])) # If it's a binary property, we're using bitflags. if self.binary: c_file.write("""\ pos += code; value = (re_{name}_stage_{stage}[pos >> 3] >> (pos & 0x7)) & 0x1; """.format(name=self.name.lower(), stage=table.num_stages)) else: c_file.write("""\ value = re_{name}_stage_{stage}[pos + code]; """.format(name=self.name.lower(), stage=table.num_stages)) def get(self, name, default=None): try: return self.__getitem__(name) except KeyError: return default def __len__(self): return len(self._value_list) def __getitem__(self, name): # Make it case-insensitive. upper_name = name.upper() val = self._value_dict.get(upper_name) if not val: # Can't find a value with that name, so collect the aliases and try # again. for val in self._value_list: for alias in {val.name} | val.aliases: self._value_dict[alias.upper()] = val val = self._value_dict.get(upper_name) if not val: raise KeyError(name) return val def __iter__(self): for val in self._value_list: yield val def _build_tables(self): "Builds the multi-stage tables." stored_name = reduce_name(self.name) # Do we already know the best block sizes? shelf = shelve.open(shelf_path, writeback=True) if FORCE_RECALC: # Force calculation of the block sizes and build the tables. table = self._build_smallest_table() else: try: # What are the best block sizes? block_sizes = shelf[stored_name]["block_sizes"] # Build the tables. table = self._build_multistage_table(block_sizes) except KeyError: # Something isn't known, so calculate the best block sizes and # build the tables. table = self._build_smallest_table() # Save the info. shelf[stored_name] = {} shelf[stored_name]["block_sizes"] = table.block_sizes shelf.close() self.table = table def _build_smallest_table(self): """Calculates the block sizes to give the smallest storage requirement and builds the multi-stage table. """ print("Determining smallest storage for {}".format(self.name)) # Initialise with a large value. best_block_sizes, smallest_bytesize = None, len(self.entries) * 4 # Try different numbers and sizes of blocks. for block_sizes, bytesize in self._table_sizes(self.entries, 1, self.binary): print("Block sizes are {}, bytesize is {}".format(block_sizes, bytesize)) if bytesize < smallest_bytesize: best_block_sizes, smallest_bytesize = block_sizes, bytesize print("Smallest for {} has block sizes {} and bytesize {}".format(self.name, best_block_sizes, smallest_bytesize)) return self._build_multistage_table(best_block_sizes) def _table_sizes(self, entries, num_stages, binary): """Yields different numbers and sizes of blocks, up to MAX_STAGES. All the sizes are powers of 2 and for a binary property the final block size is at least 8 because the final stage of the table will be using bitflags. """ # What if this is the top stage? if binary: bytesize = len(entries) // 8 else: bytesize = (smallest_bytesize(min(entries), max(entries)) * len(entries)) yield [], bytesize if num_stages >= MAX_STAGES: return entries = tuple(entries) # Initialise the block size and double it on each iteration. Usually an # index entry is 1 byte, so a data block should be at least 2 bytes. size = 16 if binary else 2 # There should be at least 2 blocks. while size * 2 <= len(entries) and len(entries) % size == 0: # Group the entries into blocks. indexes = [] block_dict = {} for start in range(0, len(entries), size): block = entries[start : start + size] indexes.append(block_dict.setdefault(block, len(block_dict))) # Collect all the blocks. blocks = [] for block in sorted(block_dict, key=lambda block: block_dict[block]): blocks.extend(block) # How much storage will the blocks stage need? if binary: block_bytesize = len(blocks) // 8 else: block_bytesize = (smallest_bytesize(min(blocks), max(blocks)) * len(blocks)) # Yield the higher stages for the indexes. for block_sizes, total_bytesize in self._table_sizes(indexes, num_stages + 1, False): yield block_sizes + [size], total_bytesize + block_bytesize # Next size up. size *= 2 def _build_multistage_table(self, block_sizes): "Builds a multi-stage table." if product(block_sizes) > len(self.entries): raise UnicodeDataError("product of block sizes greater than number of entries") # Build the stages from the bottom one up. entries = self.entries stages = [] for block_size in reversed(block_sizes): entries = tuple(entries) # Group the entries into blocks. block_dict = {} indexes = [] for start in range(0, len(entries), block_size): block = entries[start : start + block_size] indexes.append(block_dict.setdefault(block, len(block_dict))) # Collect all the blocks. blocks = [] for block in sorted(block_dict, key=lambda block: block_dict[block]): blocks.extend(block) # We have a new stage. stages.append(blocks) # Prepare for the next higher stage. entries = indexes # We have the top stage. stages.append(entries) # Put the stages into the correct order (top-down). stages.reverse() return MultistageTable(block_sizes, stages, self.binary) class AllCasesProperty(Property): "All Unicode cases." def __init__(self, name, entries, value_dict): self.name = name self.entries = entries self._value_list = [] self._value_dict = {} for name, value in sorted(value_dict.items(), key=lambda pair: pair[1]): val = PropertyValue(name, value) self._value_list.append(val) self._value_dict[name] = val self.binary = False self.aliases = set() # What data type should we use for the cases entries? rows = [list(val.name) for val in self._value_list] data = [e for r in rows for e in r] self.case_data_type = smallest_data_type(min(data), max(data)) def generate_code(self, h_file, c_file, info): "Generates the code for a property." print("Generating code for {}".format(self.name)) # Build the tables. self._build_tables() # Write the all-cases tables. c_file.write(""" /* {name}. */ """.format(name=self.name)) self.generate_tables(c_file) # What data type should we use for the cases entries? rows = [list(val.name) for val in self._value_list] data = [e for r in rows for e in r[1 : ]] data_type, data_size = determine_data_type(min(data), max(data)) self.case_data_type = data_type # Calculate the size of the struct. entry_size = data_size * (info["max_cases"] - 1) # Pad the cases entries to the same length. max_len = max(len(r) for r in rows) padding = [0] * (max_len - 1) rows = [(r + padding)[ : max_len] for r in rows] # Write the entries, nicely aligned in columns. rows = [[str(e) for e in r] for r in rows] entry_widths = [max(len(e) for e in c) for c in zip(*rows)] rows = [[e.rjust(w) for e, w in zip(r, entry_widths)] for r in rows] c_file.write(""" static RE_AllCases re_all_cases_table[] = { """) for r in rows: c_file.write(" {{{}}},\n".format(", ".join(r))) c_file.write("};\n") # Write how much storage will be used by the table. c_file.write(""" /* {name}: {bytesize} bytes. */ """.format(name=self.name, bytesize=entry_size * len(rows))) # Write the lookup function. prototype = "int re_get_{name}(RE_UINT32 ch, RE_UINT32* codepoints)".format(name=self.name.lower()) h_file.write("{prototype};\n".format(prototype=prototype)) c_file.write(""" {prototype} {{ """.format(name=self.name, prototype=prototype)) self._generate_locals(c_file) c_file.write("""\ RE_AllCases* all_cases; int count; """) self._generate_lookup(c_file) c_file.write(""" all_cases = &re_all_cases_table[value]; codepoints[0] = ch; count = 1; while (count < RE_MAX_CASES && all_cases->diffs[count - 1] != 0) { codepoints[count] = ch + all_cases->diffs[count - 1]; ++count; } return count; } """) class SimpleCaseFoldingProperty(Property): "Unicode simple case-folding." def __init__(self, name, entries, value_dict): self.name = name self.entries = entries self._value_list = [] self._value_dict = {} for name, value in sorted(value_dict.items(), key=lambda pair: pair[1]): val = PropertyValue(name, value) self._value_list.append(val) self._value_dict[name] = val self.binary = False self.aliases = set() def generate_code(self, h_file, c_file, info): "Generates the code for a property." print("Generating code for {}".format(self.name)) # Build the tables. self._build_tables() # Write the case-folding tables. c_file.write(""" /* {name}. */ """.format(name=self.name)) self.generate_tables(c_file) # What data type should we use for the case-folding entries? rows = [val.name for val in self._value_list] # Calculate the size of an entry, including alignment. entry_size = 4 # Write the entries, nicely aligned in columns. rows = [str(r) for r in rows] entry_width = max(len(r) for r in rows) rows = [r.rjust(entry_width) for r in rows] c_file.write(""" static RE_INT32 re_simple_case_folding_table[] = { """) for r in rows: c_file.write(" {},\n".format(r)) c_file.write("};\n") # Write how much storage will be used by the table. c_file.write(""" /* {name}: {bytesize} bytes. */ """.format(name=self.name, bytesize=entry_size * len(rows))) # Write the lookup function. prototype = "RE_UINT32 re_get_{name}(RE_UINT32 ch)".format(name=self.name.lower()) h_file.write("{prototype};\n".format(prototype=prototype)) c_file.write(""" {prototype} {{ """.format(name=self.name, prototype=prototype)) self._generate_locals(c_file) c_file.write("""\ RE_INT32 diff; """) self._generate_lookup(c_file) c_file.write(""" diff = re_simple_case_folding_table[value]; return ch + diff; } """) class FullCaseFoldingProperty(Property): "Unicode full case-folding." def __init__(self, name, entries, value_dict): self.name = name self.entries = entries self._value_list = [] self._value_dict = {} for name, value in sorted(value_dict.items(), key=lambda pair: pair[1]): val = PropertyValue(name, value) self._value_list.append(val) self._value_dict[name] = val self.binary = False self.aliases = set() def generate_code(self, h_file, c_file, info): "Generates the code for a property." print("Generating code for {}".format(self.name)) # Build the tables. self._build_tables() # Write the case-folding tables. c_file.write(""" /* {name}. */ """.format(name=self.name)) self.generate_tables(c_file) # What data type should we use for the case-folding entries? rows = [list(val.name) for val in self._value_list] # The diff entry needs to be signed 32-bit, the others should be OK # with unsigned 16-bit. data = [e for r in rows for e in r[1 : ]] # Verify that unsigned 16-bit is OK. data_type = smallest_data_type(min(data), max(data)) if data_type != "RE_UINT16": raise UnicodeDataError("full case-folding table entry too big") # Calculate the size of an entry, including alignment. entry_size = 4 + 2 * (info["max_folded"] - 1) excess = entry_size % 4 if excess > 0: entry_size += 4 - excess # Pad the case-folding entries to the same length and append the count. max_len = max(len(r) for r in rows) padding = [0] * (max_len - 1) rows = [(r + padding)[ : max_len] for r in rows] # Write the entries, nicely aligned in columns. rows = [[str(e) for e in r] for r in rows] entry_widths = [max(len(e) for e in c) for c in zip(*rows)] rows = [[e.rjust(w) for e, w in zip(r, entry_widths)] for r in rows] c_file.write(""" static RE_FullCaseFolding re_full_case_folding_table[] = { """) for r in rows: c_file.write(" {{{}}},\n".format(", ".join(r))) c_file.write("};\n") # Write how much storage will be used by the table. c_file.write(""" /* {name}: {bytesize} bytes. */ """.format(name=self.name, bytesize=entry_size * len(rows))) # Write the lookup function. prototype = "int re_get_{name}(RE_UINT32 ch, RE_UINT32* codepoints)".format(name=self.name.lower()) h_file.write("{prototype};\n".format(prototype=prototype)) c_file.write(""" {prototype} {{ """.format(name=self.name, prototype=prototype)) self._generate_locals(c_file) c_file.write("""\ RE_FullCaseFolding* case_folding; int count; """) self._generate_lookup(c_file) c_file.write(""" case_folding = &re_full_case_folding_table[value]; codepoints[0] = ch + case_folding->diff; count = 1; while (count < RE_MAX_FOLDED && case_folding->codepoints[count - 1] != 0) { codepoints[count] = case_folding->codepoints[count - 1]; ++count; } return count; } """) class CompoundProperty(Property): "A compound Unicode property." def __init__(self, name, function): Property.__init__(self, name, [], {}) self.function = function def generate_code(self, h_file, c_file, info): "Generates the code for a property." print("Generating code for {}".format(self.name)) # Write the lookup function. prototype = "RE_UINT32 re_get_{name}(RE_UINT32 ch)".format(name=self.name.lower()) h_file.write("{prototype};\n".format(prototype=prototype)) c_file.write(""" /* {name}. */ {prototype} {{ {function}}} """.format(name=self.name, prototype=prototype, function=self.function)) class PropertySet: "An ordered set of Unicode properties." def __init__(self): self._property_list = [] self._property_dict = {} def add(self, prop): "Adds a property." # Make it case-insensitive. upper_name = prop.name.upper() if upper_name in self._property_dict: raise KeyError("duplicate property name: {}".format(prop.name)) prop.id = len(self._property_list) self._property_list.append(prop) self._property_dict[upper_name] = prop def use_pref_name(self): """Use a better name for a property or value if the current one is poor. """ for prop in self._property_list: prop.use_pref_name() def get(self, name, default=None): try: return self.__getitem__(name) except KeyError: return default def __len__(self): return len(self._property_list) def __getitem__(self, name): # Make it case-insensitive. upper_name = name.upper() prop = self._property_dict.get(upper_name) if not prop: # Can't find a property with that name, so collect the aliases and # try again. for prop in self._property_list: for alias in {prop.name} | prop.aliases: self._property_dict[alias.upper()] = prop prop = self._property_dict.get(upper_name) if not prop: raise KeyError(name) return prop def __iter__(self): for prop in self._property_list: yield prop def download_unicode_file(url, unicode_folder): "Downloads a Unicode file." name = urlparse(url).path.rsplit("/")[-1] path = os.path.join(unicode_folder, name) # Do we need to download it? if os.path.isfile(path) and not FORCE_UPDATE: return print("Downloading {} to {}".format(url, path)) new_path = os.path.splitext(path)[0] + ".new" try: urlretrieve(url, new_path) except ValueError: # Failed to download, so clean up and report it. try: os.remove(new_path) except OSError: pass raise os.remove(path) os.rename(new_path, path) # Is this a new version of the file? with open(path, encoding="utf-8") as file: # Normally the first line of the file contains its versioned name. line = file.readline() if line.startswith("#") and line.endswith(".txt\n"): versioned_name = line.strip("# \n") versioned_path = os.path.join(unicode_folder, versioned_name) if not os.path.isfile(versioned_path): # We don't have this version, so copy it. shutil.copy2(path, versioned_path) print("Updated to {}".format(versioned_name)) def reduce_name(name): "Reduces a name to uppercase without punctuation, unless it's numeric." r = reduced_names.get(name) if r is None: if all(part.isdigit() for part in name.lstrip("-").split("/", 1)): r = name else: r = name.translate(reduce_trans).upper() reduced_names[name] = r return r def std_name(name): "Standardises the form of a name to its first occurrence" r = reduce_name(name) s = standardised_names.get(r) if s is None: s = name.replace(" ", "_") standardised_names[r] = s return s def parse_property_aliases(unicode_folder, filename): "Parses the PropertyAliases data." print("Parsing '{}'".format(filename)) path = os.path.join(unicode_folder, filename) property_aliases = {} for line in open(path): line = line.partition("#")[0].strip() if line: # Format is: abbrev., pref., other... fields = [std_name(f.strip()) for f in line.split(";")] pref_name = fields.pop(1) aliases = set(fields) for name in {pref_name} | aliases: property_aliases[name] = (pref_name, aliases) return property_aliases def parse_value_aliases(unicode_folder, filename): "Parses the PropertyValueAliases data." print("Parsing '{}'".format(filename)) path = os.path.join(unicode_folder, filename) value_aliases = defaultdict(dict) for line in open(path): line = line.partition("#")[0].strip() if line: # Format is: property, abbrev., pref., other... # except for "ccc": property, numeric, abbrev., pref., other... fields = [std_name(f.strip()) for f in line.split(";")] prop_name = fields.pop(0) if prop_name == "ccc": pref_name = fields.pop(2) else: pref_name = fields.pop(1) aliases = set(fields) # Sometimes there's no abbreviated name, which is indicated by # "n/a". aliases.discard("n/a") prop = value_aliases[prop_name] for name in {pref_name} | aliases: prop[name] = (pref_name, aliases) return value_aliases def check_codepoint_count(entries, codepoint_counts): "Checks that the number of codepoints is correct." counts = defaultdict(int) for e in entries: counts[e] += 1 for name, value, expected in codepoint_counts: if counts[value] != expected: raise UnicodeDataError("codepoint count mismatch: expected {} with '{}' but saw {} [value is {}]".format(expected, name, counts[value], value)) def parse_data_file(filename, properties, numeric_values=False): "Parses a multi-value file." print("Parsing '{}'".format(filename)) path = os.path.join(unicode_folder, filename) # Initialise with the default value. entries = [0] * NUM_CODEPOINTS value_dict = {} aliases = {} prop_name = prop_alias = None default = default_alias = None val_alias = None listed_values = False value_field = 1 codepoint_counts = [] if numeric_values: prop_name = std_name("Numeric_Value") value_field = 3 # Parse the data file. # # There is a certain amount of variation in the file format, which is why # it takes so many lines of code to parse it. for line in open(path): if line.startswith("#"): if line.startswith("# Property:"): # The name of a property. prop_name = std_name(line.rsplit(None, 1)[-1]) prop_alias = None print(" Property '{}'".format(prop_name)) listed_values = True elif line.startswith("# Derived Property:"): # It's a new property. if prop_name: # Should we check the number of codepoints? if COUNT_CODEPOINTS: check_codepoint_count(entries, codepoint_counts) codepoint_counts = [] # Save the current property. if any(entries): prop = Property(prop_name, entries, value_dict) if prop_alias: prop.aliases.add(prop_alias) properties.add(prop) # Reset for the new property. entries = [0] * NUM_CODEPOINTS words = line.split() if words[-1].endswith(")"): # It ends with something in parentheses, possibly more # than one word. while not words[-1].startswith("("): words.pop() prop_name, prop_alias = words[-2], words[-1].strip("()") if prop_alias.lower() in {prop_name.lower(), "deprecated"}: prop_alias = None else: prop_name, prop_alias = words[-1], None prop_name = std_name(prop_name) if prop_alias: prop_alias = std_name(prop_alias) if prop_alias: print(" Property '{}' alias '{}'".format(prop_name, prop_alias)) else: print(" Property '{}'".format(prop_name)) elif line.startswith("# All code points not explicitly listed for "): # The name of a property. new = std_name(line.rsplit(None, 1)[1]) if prop_name: if new != prop_name: raise UnicodeDataError("property mismatch: saw '{}' and then '{}'".format(prop_name, new)) else: prop_name = new prop_alias = None print(" Property '{}'".format(prop_name)) listed_values = True elif line.startswith("# have the value "): # The name of the default value. words = line.rsplit(None, 2) default, default_alias = words[-1].rstrip("."), None if default[ : 1] + default[-1 : ] == "()": # The last word looks line an alias in parentheses. default, default_alias = words[-2], default[1 : -1] if default_alias in {default, "deprecated"}: default_alias = None default = std_name(default) if default_alias: default_alias = std_name(default_alias) value_dict.setdefault(default, 0) if default_alias: print(" Default '{}' alias '{}'".format(default, default_alias)) else: print(" Default '{}'".format(default)) listed_values = True elif line.startswith("# @missing:"): # The name of the default value. new = std_name(line.rsplit(None, 1)[-1]) if default: if new != default: raise UnicodeDataError("default mismatch: saw '{}' and then '{}'".format(default, new)) else: default = new value_dict.setdefault(default, 0) print(" Default '{}' => 0".format(default)) listed_values = True elif line.startswith("# Total code points:"): # The number of codepoints with this value or property. expected = int(line.rsplit(None, 1)[1]) if not listed_values: value = 1 codepoint_counts.append((v, value, expected)) elif prop_name and line.startswith("# {}=".format(prop_name)): # The alias of the value. val_alias = std_name(line.rsplit("=")[-1].strip()) print(" Value '{}'".format(val_alias)) elif ";" in line: # Discard any comment and then split into fields. fields = line.split("#", 1)[0].split(";") code_range = [int(f, 16) for f in fields[0].split("..")] v = std_name(fields[value_field].strip()) if listed_values: # The values of a property. if v in {default, default_alias}: value = 0 else: if not default: if val_alias: default = val_alias print(" Default '{}'".format(default)) else: raise UnicodeDataError("unknown default") value = value_dict.get(v) if value is None: value = value_dict.setdefault(v, len(value_dict)) if val_alias and val_alias != v: aliases[val_alias] = v print(" Value '{}' alias '{}' => {}".format(val_alias, v, value)) val_alias = None else: print(" Value '{}' => {}".format(v, value)) else: # It's a binary property. if v != prop_name: if prop_name: # Should we check the number of codepoints? if COUNT_CODEPOINTS: check_codepoint_count(entries, codepoint_counts) codepoint_counts = [] # Save the current property. prop = Property(prop_name, entries, value_dict) if prop_alias: prop.aliases.add(prop_alias) properties.add(prop) # Reset for the new property. entries = [0] * NUM_CODEPOINTS prop_name = v print(" Property '{}'".format(prop_name)) value = 1 # Store the entries in the range. for code in range(code_range[0], code_range[-1] + 1): entries[code] = value if not prop_name: raise UnicodeDataError("unknown property name") # Should we check the number of codepoints? if COUNT_CODEPOINTS: check_codepoint_count(entries, codepoint_counts) codepoint_counts = [] if "Grapheme" in filename: # In Unicode 6.1, there are no entries in the # "GraphemeBreakProperty.txt" file with the value "Prepend", so we need # to add it here in order not to break the code. value_dict.setdefault(std_name("Prepend"), len(value_dict)) # Save the property. prop = Property(prop_name, entries, value_dict) if prop_alias: prop.aliases.add(prop_alias) if listed_values and default_alias: if default_alias in value_dict: default, default_alias = default_alias, default prop[default].aliases.add(default_alias) for name, alias in aliases.items(): prop[alias].aliases.add(name) properties.add(prop) def parse_NumericValues_file(filename, properties): "Parses the 'NumericValues' file." parse_data_file(filename, properties, numeric_values=True) def parse_CaseFolding(file_name): "Parses the Unicode CaseFolding file." path = os.path.join(unicode_folder, file_name) print("Parsing '{}'".format(file_name)) # Initialise with the default value. simple_folding_entries = [0] * NUM_CODEPOINTS simple_folding_value_dict = {0: 0} full_folding_entries = [0] * NUM_CODEPOINTS full_folding_value_dict = {(0, ): 0} equivalent_dict = defaultdict(set) expand_set = set() turkic_set = set() for line in open(path): if not line.startswith("#") and ";" in line: fields = line.split(";") code = int(fields[0], 16) fold_type = fields[1].strip() folded = [int(f, 16) for f in fields[2].split()] if fold_type in "CFS": # Determine the equivalences. equiv_set = set() for c in [(code, ), tuple(folded)]: equiv_set |= equivalent_dict.get(c, {c}) for c in equiv_set: equivalent_dict[c] = equiv_set entry = [folded[0] - code] + folded[1 : ] if fold_type in "CS": value = simple_folding_value_dict.setdefault(entry[0], len(simple_folding_value_dict)) simple_folding_entries[code] = value if fold_type in "CF": value = full_folding_value_dict.setdefault(tuple(entry), len(full_folding_value_dict)) full_folding_entries[code] = value if len(entry) > 1: expand_set.add(code) if fold_type == "T": # Turkic folded cases. turkic_set.add((code, tuple(folded))) # Is the Turkic set what we expected? if turkic_set != {(0x49, (0x131, )), (0x130, (0x69, ))}: raise UnicodeDataError("Turkic set has changed") # Add the Turkic set to the equivalences. Note that: # # dotted_capital == dotted_small # # and: # # dotted_small == dotless_capital # # but: # # dotted_capital != dotless_capital # for code, folded in turkic_set: char1, char2 = (code, ), folded equivalent_dict[char1] = equivalent_dict[char1] | {char2} equivalent_dict[char2] = equivalent_dict[char2] | {char1} # Sort the equivalent cases. other_cases = [] for code, equiv_set in equivalent_dict.items(): if len(code) == 1: diff_list = [] for e in equiv_set - {code}: if len(e) == 1: diff_list.append(e[0] - code[0]) other_cases.append((code[0], sorted(diff_list))) other_cases.sort() # How many other cases can there be? max_other_cases = max(len(diff_list) for code, diff_list in other_cases) # Initialise with the default value. default_value = [0] * max_other_cases others_entries = [0] * NUM_CODEPOINTS others_value_dict = {tuple(default_value): 0} for code, diff_list in other_cases: entry = tuple(diff_list + default_value)[ : max_other_cases] value = others_value_dict.setdefault(entry, len(others_value_dict)) others_entries[code] = value # Save the all-cases property. all_prop = AllCasesProperty(std_name("All_Cases"), others_entries, others_value_dict) # Save the simple case-folding property. simple_folding_prop = SimpleCaseFoldingProperty(std_name("Simple_Case_Folding"), simple_folding_entries, simple_folding_value_dict) # Save the full case-folding property. full_folding_prop = FullCaseFoldingProperty(std_name("Full_Case_Folding"), full_folding_entries, full_folding_value_dict) info = dict(all_cases=all_prop, simple_case_folding=simple_folding_prop, full_case_folding=full_folding_prop, expand_set=expand_set) return info def define_Alphanumeric_property(properties): "Defines the Alphanumeric property." prop_name = std_name("Alphanumeric") print("Defining '{}'".format(prop_name)) function = """\ RE_UINT32 v; v = re_get_alphabetic(ch); if (v == 1) return 1; v = re_get_general_category(ch); if (v == RE_PROP_ND) return 1; return 0; """ properties.add(CompoundProperty(prop_name, function)) def define_Any_property(properties): "Defines the Any property." prop_name = std_name("Any") print("Defining '{}'".format(prop_name)) function = """\ return 1; """ properties.add(CompoundProperty(prop_name, function)) def define_Assigned_property(properties): "Defines the Assigned property." prop_name = std_name("Assigned") print("Defining '{}'".format(prop_name)) function = """\ if (re_get_general_category(ch) != RE_PROP_CN) return 1; return 0; """ properties.add(CompoundProperty(prop_name, function)) def define_Blank_property(properties): "Defines the Blank property." prop_name = std_name("Blank") print("Defining '{}'".format(prop_name)) function = """\ RE_UINT32 v; if (0x0A <= ch && ch <= 0x0D || ch == 0x85) return 0; v = re_get_white_space(ch); if (v == 0) return 0; v = re_get_general_category(ch); if ((RE_BLANK_MASK & (1 << v)) != 0) return 0; return 1; """ properties.add(CompoundProperty(prop_name, function)) def define_Graph_property(properties): "Defines the Graph property." prop_name = std_name("Graph") print("Defining '{}'".format(prop_name)) function = """\ RE_UINT32 v; v = re_get_white_space(ch); if (v == 1) return 0; v = re_get_general_category(ch); if ((RE_GRAPH_MASK & (1 << v)) != 0) return 0; return 1; """ properties.add(CompoundProperty(prop_name, function)) def define_Print_property(properties): "Defines the Print property." prop_name = std_name("Print") print("Defining '{}'".format(prop_name)) function = """\ RE_UINT32 v; v = re_get_general_category(ch); if (v == RE_PROP_CC) return 0; v = re_get_graph(ch); if (v == 1) return 1; v = re_get_blank(ch); if (v == 1) return 1; return 0; """ properties.add(CompoundProperty(prop_name, function)) def define_Word_property(properties): "Defines the Word property." prop_name = std_name("Word") print("Defining '{}'".format(prop_name)) function = """\ RE_UINT32 v; v = re_get_alphabetic(ch); if (v == 1) return 1; v = re_get_general_category(ch); if ((RE_WORD_MASK & (1 << v)) != 0) return 1; return 0; """ properties.add(CompoundProperty(prop_name, function)) def first_true(iterable): "Returns the first item which is true." for i in iterable: if i: return i return None def pick_pref_name(name, aliases): "Picks a better name if the current one is poor." if name.isupper() or name.isdigit(): aliases = aliases | {name} better_name = max(aliases, key=lambda name: len(name)) name = better_name aliases.remove(better_name) return name, aliases def write_properties_description(properties, properties_path): "Writes a list of the properties which are supported by this module." with open(properties_path, "w", encoding="utf-8", newline="\n") as p_file: p_file.write("The following is a list of the {} properties which are supported by this module:\n".format(len(properties))) sorted_properties = sorted(properties, key=lambda prop: prop.name) for prop in sorted_properties: p_file.write("\n") name = prop.name aliases = sorted(prop.aliases) if aliases: p_file.write("{} [{}]\n".format(name, ", ".join(aliases))) else: p_file.write("{}\n".format(name)) sorted_values = sorted(prop, key=lambda val: val.name) for val in sorted_values: name = val.name aliases = sorted(val.aliases) if aliases: p_file.write(" {} [{}]\n".format(name, ", ".join(aliases))) else: p_file.write(" {}\n".format(name)) def tabulate(rows): "Creates a table with right-justified columns." # Convert all the entries to strings. rows = [[str(e) for e in row] for row in rows] # Determine the widths of the columns. widths = [max(len(e) for e in column) for column in zip(*rows)] # Pad all the entries. rows = [[e.rjust(w) for e, w in zip(row, widths)] for row in rows] return rows def parse_unicode_data(): "Parses the Unicode data." # Parse the aliases. property_aliases = parse_property_aliases(unicode_folder, "PropertyAliases.txt") value_aliases = parse_value_aliases(unicode_folder, "PropertyValueAliases.txt") # The set of properties. properties = PropertySet() # The parsers for the various file formats. parsers = {"": parse_data_file, "NumericValues": parse_NumericValues_file} # Parse the property data files. for line in unicode_info.splitlines(): if line and line[0] != "#": url, sep, file_format = line.partition(":") if file_format != "~": filename = url.rpartition("/")[-1] parsers[file_format](filename, properties) # Parse the case-folding data specially. info = parse_CaseFolding("CaseFolding.txt") max_cases = max(len(val.name) for val in info["all_cases"]) + 1 max_folded = max(len(val.name) for val in info["full_case_folding"]) # Define some additional properties. define_Alphanumeric_property(properties) define_Any_property(properties) define_Assigned_property(properties) define_Blank_property(properties) define_Graph_property(properties) define_Print_property(properties) define_Word_property(properties) # The additional General_Category properties. gc_prop = properties["General_Category"] gc_short = {} gc_masks = defaultdict(int) for val in gc_prop: short_name = [a.upper() for a in {val.name} | val.aliases if len(a) == 2][0] gc_short[short_name] = val.id gc_masks[short_name[0]] |= 1 << val.id last_id = max(val.id for val in gc_prop) for name in sorted(gc_masks): last_id += 1 val = PropertyValue(name, last_id) val.aliases.add(name + "&") gc_prop.add(val) # Add the value aliases for the binary properties. print("Checking binary properties") for prop in properties: if len(prop) == 0: prop.make_binary_property() # Add the property and value aliases. print("Adding aliases") for prop in properties: try: pref_name, aliases = property_aliases[prop.name] prop_aliases = {prop.name, pref_name} | aliases prop.aliases |= prop_aliases - {prop.name} val_aliases = first_true(value_aliases.get(a) for a in prop_aliases) if val_aliases: for i, val in enumerate(prop): try: pref_name, aliases = val_aliases[val.name] aliases = {val.name, pref_name} | aliases val.aliases |= aliases - {val.name} except KeyError: pass except KeyError: pass # Additional aliases. prop = properties["Alphanumeric"] prop.aliases.add(std_name("AlNum")) prop = properties["Hex_Digit"] prop.aliases.add(std_name("XDigit")) # Ensure that all the properties and values use the preferred name. properties.use_pref_name() info.update(dict(properties=properties, max_cases=max_cases, max_folded=max_folded, gc_short=gc_short, gc_masks=gc_masks)) return info def generate_code(strings): "Generates the C files." h_file = open(h_path, "w", encoding="utf-8", newline="\n") c_file = open(c_path, "w", encoding="utf-8", newline="\n") # Useful definitions. h_file.write("""\ typedef unsigned char RE_UINT8; typedef signed char RE_INT8; typedef unsigned short RE_UINT16; typedef signed short RE_INT16; typedef unsigned int RE_UINT32; typedef signed int RE_INT32; typedef unsigned char BOOL; enum {{FALSE, TRUE}}; #define RE_ASCII_MAX 0x7F #define RE_LOCALE_MAX 0xFF #define RE_UNICODE_MAX 0x10FFFF #define RE_MAX_CASES {max_cases} #define RE_MAX_FOLDED {max_folded} typedef struct RE_Property {{ RE_UINT16 name; RE_UINT8 id; RE_UINT8 value_set; }} RE_Property; typedef struct RE_PropertyValue {{ RE_UINT16 name; RE_UINT8 value_set; RE_UINT8 id; }} RE_PropertyValue; typedef RE_UINT32 (*RE_GetPropertyFunc)(RE_UINT32 ch); """.format(max_cases=info["max_cases"], max_folded=info["max_folded"])) for prop in ("GC", "Cased", "Uppercase", "Lowercase"): h_file.write("#define RE_PROP_{} 0x{:X}\n".format(prop.upper(), properties[prop].id)) h_file.write("\n") RE_Property_size = 4 RE_PropertyValue_size = 4 # Define the property types. last_val_id = max(info["gc_short"].values()) for val_id, name in enumerate(sorted(info["gc_masks"]), start=last_val_id + 1): h_file.write("#define RE_PROP_{} {}\n".format(name, val_id)) h_file.write("\n") # Write the General_Category properties. for name, val_id in sorted(info["gc_short"].items(), key=lambda pair: pair[1]): h_file.write("#define RE_PROP_{} {}\n".format(name, val_id)) h_file.write("\n") # Define a property masks. for name, mask in sorted(info["gc_masks"].items()): h_file.write("#define RE_PROP_{}_MASK 0x{:08X}\n".format(name, mask)) h_file.write("\n") # The common abbreviated properties. common_props = """ AlNum Alpha Any Assigned Blank Cntrl Digit Graph Lower Print Punct Space Upper Word XDigit """.split() for name in common_props: prop = properties.get(name) if prop is not None: h_file.write("#define RE_PROP_{} 0x{:06X}\n".format(name.upper(), prop.id << 16 | 1)) else: prop = properties["GC"] val = prop.get(name) if val is not None: h_file.write("#define RE_PROP_{} 0x{:06X}\n".format(name.upper(), prop.id << 16 | val.id)) else: raise UnicodeDataError("unknown abbreviated property: '{}'".format(name)) prop = properties["Block"] h_file.write("#define RE_PROP_ASCII 0x{:06X}\n".format((prop.id << 16) | prop["ASCII"].id)) h_file.write("\n") # Define the word-break values. for val in properties["Word_Break"]: name = reduce_name(val.name) h_file.write("#define RE_BREAK_{} {}\n".format(name, val.id)) h_file.write("\n") # Define the grapheme cluster-break values. for val in properties["Grapheme_Cluster_Break"]: name = reduce_name(val.name) h_file.write("#define RE_GBREAK_{} {}\n".format(name, val.id)) c_file.write('#include "_regex_unicode.h"\n') # Write the standardised strings. c_file.write(""" #define RE_BLANK_MASK ((1 << RE_PROP_ZL) | (1 << RE_PROP_ZP)) #define RE_GRAPH_MASK ((1 << RE_PROP_CC) | (1 << RE_PROP_CS) | (1 << RE_PROP_CN)) #define RE_WORD_MASK (RE_PROP_M_MASK | (1 << RE_PROP_ND) | (1 << RE_PROP_PC)) typedef struct RE_AllCases {{ {data_type} diffs[RE_MAX_CASES - 1]; }} RE_AllCases; typedef struct RE_FullCaseFolding {{ RE_INT32 diff; RE_UINT16 codepoints[RE_MAX_FOLDED - 1]; }} RE_FullCaseFolding; /* strings. */ char* re_strings[] = {{ """.format(data_type=info["all_cases"].case_data_type)) # Calculate the number and size of the string constants. bytesize = 0 for s in strings: s = reduce_name(s) c_file.write(" \"{}\",\n".format(s)) bytesize += len(s) + 1 h_file.write("\nextern char* re_strings[{}];\n".format(len(strings))) c_file.write("""}}; /* strings: {bytesize} bytes. */ """.format(bytesize=bytesize)) # Write the property name tables. # # Properties which are aliases have the same property id, and properties, # such as binary properties, which have the same set of values have the # same value set id. # The rows of the property and value tables. property_rows = [] value_rows = [] # The value sets. value_sets = {} # Give an id to each distinct property or value name. strings = {s: i for i, s in enumerate(strings)} for prop in properties: val_set = tuple(val.name for val in prop) new_val_set = val_set not in value_sets val_set_id = value_sets.setdefault(val_set, len(value_sets)) # name of property, id of property, id of value set property_rows.append((strings[prop.name], prop.id, val_set_id)) for alias in prop.aliases: property_rows.append((strings[alias], prop.id, val_set_id)) # We don't want to duplicate value sets. if new_val_set: for val in prop: # name of value, id of value set, value value_rows.append((strings[val.name], val_set_id, val.id)) for alias in val.aliases: value_rows.append((strings[alias], val_set_id, val.id)) # Fix the column widths of the tables. property_rows = tabulate(property_rows) value_rows = tabulate(value_rows) expand_set = info["expand_set"] expand_data_type, expand_data_size = determine_data_type(min(expand_set), max(expand_set)) # write the property tables and the corresponding lookup functions. c_file.write(""" /* properties. */ RE_Property re_properties[] = { """) h_file.write("""\ extern RE_Property re_properties[{prop_rows}]; extern RE_PropertyValue re_property_values[{val_rows}]; extern {data_type} re_expand_on_folding[{expand_rows}]; extern RE_GetPropertyFunc re_get_property[{func_count}]; """.format(prop_rows=len(property_rows), val_rows=len(value_rows), data_type=expand_data_type, expand_rows=len(expand_set), func_count=len(properties))) for row in property_rows: c_file.write(" {{{}}},\n".format(", ".join(row))) c_file.write("""\ }}; /* properties: {bytesize} bytes. */ /* property values. */ RE_PropertyValue re_property_values[] = {{ """.format(bytesize=RE_Property_size * len(property_rows))) for row in value_rows: c_file.write(" {{{}}},\n".format(", ".join(row))) c_file.write("""\ }}; /* property values: {bytesize} bytes. */ /* Codepoints which expand on full case-folding. */ {data_type} re_expand_on_folding[] = {{ """.format(bytesize=RE_PropertyValue_size * len(value_rows), data_type=expand_data_type)) items = ["{},".format(c) for c in sorted(expand_set)] width = max(len(i) for i in items) items = [i.rjust(width) for i in items] columns = 8 for start in range(0, len(items), columns): c_file.write(" {}\n".format(" ".join(items[start : start + columns]))) c_file.write("""}}; /* expand_on_folding: {bytesize} bytes. */ """.format(bytesize=len(items) * expand_data_size)) # Build and write the property data tables. for property in properties: property.generate_code(h_file, c_file, info) info["all_cases"].generate_code(h_file, c_file, info) info["simple_case_folding"].generate_code(h_file, c_file, info) info["full_case_folding"].generate_code(h_file, c_file, info) # Write the property function array. c_file.write(""" /* Property function table. */ RE_GetPropertyFunc re_get_property[] = { """) for prop in properties: c_file.write(" re_get_{},\n".format(prop.name.lower())) c_file.write("};\n") h_file.close() c_file.close() # Build a dict for converting 8-tuples into bytes. bitflag_dict = {} for value in range(0x100): bits = [] for pos in range(8): bits.append((value >> pos) & 0x1) bitflag_dict[tuple(bits)] = value # Storage and support for reduced names. # # A reduced name is a name converted to uppercase and with its punctuation # removed. reduced_names = {} reduce_trans = str.maketrans({" ": "", "_": "","-": ""}) # The names, converted to a standardised form. standardised_names = {} # The Unicode data files. unicode_data_base = "http://www.unicode.org/Public/UNIDATA/" unicode_info = """ auxiliary/GraphemeBreakProperty.txt auxiliary/SentenceBreakProperty.txt auxiliary/WordBreakProperty.txt Blocks.txt CaseFolding.txt:~ DerivedCoreProperties.txt extracted/DerivedBidiClass.txt extracted/DerivedBinaryProperties.txt extracted/DerivedCombiningClass.txt extracted/DerivedDecompositionType.txt extracted/DerivedEastAsianWidth.txt extracted/DerivedGeneralCategory.txt extracted/DerivedJoiningGroup.txt extracted/DerivedJoiningType.txt extracted/DerivedLineBreak.txt extracted/DerivedNumericType.txt extracted/DerivedNumericValues.txt:NumericValues HangulSyllableType.txt IndicMatraCategory.txt IndicSyllabicCategory.txt PropertyAliases.txt:~ PropertyValueAliases.txt:~ PropList.txt Scripts.txt #UnicodeData.txt """ # Download the Unicode data files. for line in unicode_info.splitlines(): if line and line[0] != "#": url = line.partition(":")[0] download_unicode_file(urljoin(unicode_data_base, url), unicode_folder) # Parse the Unicode data. info = parse_unicode_data() properties = info["properties"] write_properties_description(properties, properties_path) if len(properties) > 0x100: raise UnicodeDataError("more than 256 properties") for prop in properties: if len(prop) > 0x100: raise UnicodeDataError("more than 256 values: property '{}'".format(prop.name)) # Create the list of standardised strings. strings = set() for prop in properties: strings.add(prop.name) strings |= prop.aliases for val in prop: strings.add(val.name) strings |= val.aliases strings = sorted(set(strings), key=reduce_name) # Generate the code. generate_code(strings) print("\nThere are {} properties".format(len(properties))) import re code = open(c_path).read() sizes = defaultdict(int) for n, s in re.findall(r"(\w+(?: \w+)*): (\d+) bytes", code): sizes[n] += int(s) sizes = sorted(sizes.items(), key=lambda pair: pair[1], reverse=True) total_size = sum(s for n, s in sizes) print("\nTotal: {} bytes\n".format(total_size)) prop_width = max(len(row[0]) for row in sizes) prop_width = max(prop_width, 8) storage_width = max(len(str(row[1])) for row in sizes) storage_width = max(storage_width, 7) print("{:{}} {:{}} {}".format("Property", prop_width, "Storage", storage_width, "Percentage")) print("{:{}} {:{}} {}".format("--------", prop_width, "-------", storage_width, "----------")) format = "{{:<{}}} {{:>{}}} {{:>5.1%}}".format(prop_width, storage_width) for n, s in sizes: print(format.format(n, s, s / total_size)) print("\nFinished!")