jlex-1.2.6/0000755000175000017500000000000010102162052014151 5ustar cjwatsoncjwatson00000000000000jlex-1.2.6/Main.java0000644000175000017500000061704007621017313015723 0ustar cjwatsoncjwatson00000000000000/************************************************************** JLex: A Lexical Analyzer Generator for Java(TM) Written by Elliot Berk . Copyright 1996. Maintained by C. Scott Ananian . See below for copyright notice, license, and disclaimer. New releases from http://www.cs.princeton.edu/~appel/modern/java/JLex/ Version 1.2.6, 2/7/03, [C. Scott Ananian] Renamed 'assert' function 'ASSERT' to accomodate Java 1.4's new keyword. Fixed a bug which certain forms of comment in the JLex directives section (which are not allowed) to be incorrectly parsed as macro definitions. Version 1.2.5, 7/25/99-5/16/00, [C. Scott Ananian] Stomped on one more 8-bit character bug. Should work now (really!). Added unicode support, including unicode escape sequences. Rewrote internal JavaLexBitSet class as SparseBitSet for efficient unicoding. Added an NFA character class simplification pass for unicode efficiency. Changed byte- and stream-oriented I/O routines to use characters and java.io.Reader and java.io.Writer instead --- which means we read in unicode specifications correctly and write out a proper unicode java source file. As a happy side-effect, the output java file is written with your platform's preferred newline character(s). Rewrote CInput to fix bugs with line-counting in the specification file and "unusual behaviour" when the last line of the specification wasn't terminated with a newline. Thanks to Matt Hanna for pointing out the bug. Fixed a bug that would cause JLex not to terminate given certain input specifications. Thanks to Mark Greenstreet and Frank B. Brokken for reporting this. CUP parser integration improved according to suggestions made by David MacMahon . The %cup directive now tells JLex to generate a parser conforming to the java_cup.runtime.Scanner interface; see manual for more details. Fixed bug with null string literals ("") in regexps. Reported by Charles Fischer . Rewrote start-of-line and end-of-line handling, closing active bug #5. Also fixed line-counting code, closing active bug #12. All new-line handling is now platform-independent. Used unpackFromString more extensively to allow larger cmap, etc, tables. This helps unicode support work reliably. It's also prettier now if you happen to read the source to the generated lexer. Generated lexer now accepts unicode LS (U+2028) and PS (U+2029) as line separators for strict unicode compliance; see http://www.unicode.org/unicode/reports/tr18/ Fixed bug with character constants in action strings. Reported by Andrew Appel against 1.2.5b3. Fixed bug with illegal \^C-style escape sequences. Reported by Toshiya Iwai against 1.2.5b4. Fixed "newline in quoted string" error when unpaired single- or double-quotes were present in comments in the action phrase. Reported by Stephen Ostermiller <1010JLex@ostermiller.com> against 1.2.5b4. Reported by Eric Esposito against 1.2.4 and 1.2.5b2. Fixed "newline in quoted string" error when /* or // appeared in quoted strings in the action phrase. Reported by David Eichmann against 1.2.5b5. Fixed 'illegal constant' errors in case statements caused by Sun's JDK 1.3 more closely adhering to the Java Language Specification. Reported by a number of people, but Harold Grovesteen was the first to direct me to a Sun bug report (4119776) which quoted the relevant section of the JLS (15.27) to convince me that the JLex construction actually was illegal. Reported against 1.2.5b6, but this bit of code has been present since the very first version of JLex (1.1.1). Version 1.2.4, 7/24/99, [C. Scott Ananian] Correct the parsing of '-' in character classes, closing active bug #1. Behaviour follows egrep: leading and trailing dashes in a character class lose their special meaning, so [-+] and [+-] do what you would expect them to. New %ignorecase directive for generating case-insensitive lexers by expanding matched character classes in a unicode-friendly way. Handle unmatched braces in quoted strings or comments within action code blocks. Fixed input lexer to allow whitespace in character classes, closing active bug #9. Whitespace in quotes had been previously fixed. Made Yylex.YYEOF and %yyeof work like the manual says they should. Version 1.2.3, 6/26/97, [Raimondas Lencevicius] Fixed the yy_nxt[][] assignment that has generated huge code exceeding 64K method size limit. Now the assignment is handled by unpacking a string encoding of integer array. To achieve that, added "private int [][] unpackFromString(int size1, int size2, String st)" function and coded the yy_nxt[][] values into a string by printing integers into a string and representing integer sequences as "value:length" pairs. Improvement: generated .java file reduced 2 times, .class file reduced 6 times for sample grammar. No 64K errors. Possible negatives: Some editors and OSs may not be able to handle the huge one-line generated string. String unpacking may be slower than direct array initialization. Version 1.2.2, 10/24/97, [Martin Dirichs] Notes: Changed yy_instream to yy_reader of type BufferedReader. This reflects the improvements in the JDK 1.1 concerning InputStreams. As a consequence, changed yy_buffer from byte[] to char[]. The lexer can now be initialized with either an InputStream or a Reader. A third, private constructor is called by the other two to execute user specified constructor code. Version 1.2.1, 9/15/97 [A. Appel] Fixed bugs 6 (character codes > 127) and 10 (deprecated String constructor). Version 1.2, 5/5/97, [Elliot Berk] Notes: Simply changed the name from JavaLex to JLex. No other changes. Version 1.1.5, 2/25/97, [Elliot Berk] Notes: Simple optimization to the creation of the source files. Added a BufferedOutputStream in the creation of the DataOutputStream field m_outstream of the class CLexGen. This helps performance by doing some buffering, and was suggested by Max Hailperin, Associate Professor of Computer Science, Gustavus Adolphus College. Version 1.1.4, 12/12/96, [Elliot Berk] Notes: Added %public directive to make generated class public. Version 1.1.3, 12/11/96, [Elliot Berk] Notes: Converted assertion failure on invalid character class when a dash '-' is not preceded with a start-of-range character. Converted this into parse error E_DASH. Version 1.1.2, October 30, 1996 [Elliot Berk] Fixed BitSet bugs by installing a BitSet class of my own, called JavaLexBitSet. Fixed support for '\r', non-UNIX sequences. Added try/catch block around lexer generation in main routine to moderate error information presented to user. Fixed macro expansion, so that macros following quotes are expanded correctly in regular expressions. Fixed dynamic reallocation of accept action buffers. Version 1.1.1, September 3, 1996 [Andrew Appel] Made the class "Main" instead of "JavaLex", improved the installation instructions to reflect this. Version 1.1, August 15, 1996 [Andrew Appel] Made yychar, yyline, yytext global to the lexer so that auxiliary functions can access them. **************************************************************/ /*************************************************************** JLEX COPYRIGHT NOTICE, LICENSE, AND DISCLAIMER Copyright 1996-2000 by Elliot Joel Berk and C. Scott Ananian Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice and warranty disclaimer appear in supporting documentation, and that the name of the authors or their employers not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. The authors and their employers disclaim all warranties with regard to this software, including all implied warranties of merchantability and fitness. In no event shall the authors or their employers be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this software. **************************************************************/ /*************************************************************** Package Declaration **************************************************************/ package JLex; /*************************************************************** Imported Packages **************************************************************/ import java.lang.System; import java.lang.Integer; import java.lang.Character; import java.util.Enumeration; import java.util.Stack; import java.util.Hashtable; import java.util.Vector; /****************************** Questions: 2) How should I use the Java package system to make my tool more modularized and coherent? Unimplemented: !) Fix BitSet issues -- expand only when necessary. 2) Repeated accept rules. 6) Clean up the CAlloc class and use buffered allocation. 9) Add to spec about extending character set. 11) m_verbose -- what should be done with it? 12) turn lexical analyzer into a coherent Java package 13) turn lexical analyzer generator into a coherent Java package 16) pretty up generated code 17) make it possible to have white space in regular expressions 18) clean up all of the class files the lexer generator produces when it is compiled, and reduce this number in some way. 24) character format to and from file: writeup and implementation 25) Debug by testing all arcane regular expression cases. 26) Look for and fix all UNDONE comments below. 27) Fix package system. 28) Clean up unnecessary classes. *****************************/ /*************************************************************** Class: CSpec **************************************************************/ class CSpec { /*************************************************************** Member Variables **************************************************************/ /* Lexical States. */ Hashtable m_states; /* Hashtable taking state indices (Integer) to state name (String). */ /* Regular Expression Macros. */ Hashtable m_macros; /* Hashtable taking macro name (String) to corresponding char buffer that holds macro definition. */ /* NFA Machine. */ CNfa m_nfa_start; /* Start state of NFA machine. */ Vector m_nfa_states; /* Vector of states, with index corresponding to label. */ Vector m_state_rules[]; /* An array of Vectors of Integers. The ith Vector represents the lexical state with index i. The contents of the ith Vector are the indices of the NFA start states that can be matched while in the ith lexical state. */ int m_state_dtrans[]; /* DFA Machine. */ Vector m_dfa_states; /* Vector of states, with index corresponding to label. */ Hashtable m_dfa_sets; /* Hashtable taking set of NFA states to corresponding DFA state, if the latter exists. */ /* Accept States and Corresponding Anchors. */ Vector m_accept_vector; int m_anchor_array[]; /* Transition Table. */ Vector m_dtrans_vector; int m_dtrans_ncols; int m_row_map[]; int m_col_map[]; /* Special pseudo-characters for beginning-of-line and end-of-file. */ static final int NUM_PSEUDO=2; int BOL; // beginning-of-line int EOF; // end-of-line /** NFA character class minimization map. */ int m_ccls_map[]; /* Regular expression token variables. */ int m_current_token; char m_lexeme; boolean m_in_quote; boolean m_in_ccl; /* Verbose execution flag. */ boolean m_verbose; /* JLex directives flags. */ boolean m_integer_type; boolean m_intwrap_type; boolean m_yyeof; boolean m_count_chars; boolean m_count_lines; boolean m_cup_compatible; boolean m_unix; boolean m_public; boolean m_ignorecase; char m_init_code[]; int m_init_read; char m_init_throw_code[]; int m_init_throw_read; char m_class_code[]; int m_class_read; char m_eof_code[]; int m_eof_read; char m_eof_value_code[]; int m_eof_value_read; char m_eof_throw_code[]; int m_eof_throw_read; char m_yylex_throw_code[]; int m_yylex_throw_read; /* Class, function, type names. */ char m_class_name[] = { 'Y', 'y', 'l', 'e', 'x' }; char m_implements_name[] = {}; char m_function_name[] = { 'y', 'y', 'l', 'e', 'x' }; char m_type_name[] = { 'Y', 'y', 't', 'o', 'k', 'e', 'n' }; /* Lexical Generator. */ private CLexGen m_lexGen; /*************************************************************** Constants ***********************************************************/ static final int NONE = 0; static final int START = 1; static final int END = 2; /*************************************************************** Function: CSpec Description: Constructor. **************************************************************/ CSpec ( CLexGen lexGen ) { m_lexGen = lexGen; /* Initialize regular expression token variables. */ m_current_token = m_lexGen.EOS; m_lexeme = '\0'; m_in_quote = false; m_in_ccl = false; /* Initialize hashtable for lexer states. */ m_states = new Hashtable(); m_states.put(new String("YYINITIAL"),new Integer(m_states.size())); /* Initialize hashtable for lexical macros. */ m_macros = new Hashtable(); /* Initialize variables for lexer options. */ m_integer_type = false; m_intwrap_type = false; m_count_lines = false; m_count_chars = false; m_cup_compatible = false; m_unix = true; m_public = false; m_yyeof = false; m_ignorecase = false; /* Initialize variables for JLex runtime options. */ m_verbose = true; m_nfa_start = null; m_nfa_states = new Vector(); m_dfa_states = new Vector(); m_dfa_sets = new Hashtable(); m_dtrans_vector = new Vector(); m_dtrans_ncols = CUtility.MAX_SEVEN_BIT + 1; m_row_map = null; m_col_map = null; m_accept_vector = null; m_anchor_array = null; m_init_code = null; m_init_read = 0; m_init_throw_code = null; m_init_throw_read = 0; m_yylex_throw_code = null; m_yylex_throw_read = 0; m_class_code = null; m_class_read = 0; m_eof_code = null; m_eof_read = 0; m_eof_value_code = null; m_eof_value_read = 0; m_eof_throw_code = null; m_eof_throw_read = 0; m_state_dtrans = null; m_state_rules = null; } } /*************************************************************** Class: CEmit **************************************************************/ class CEmit { /*************************************************************** Member Variables **************************************************************/ private CSpec m_spec; private java.io.PrintWriter m_outstream; /*************************************************************** Constants: Anchor Types **************************************************************/ private final int START = 1; private final int END = 2; private final int NONE = 4; /*************************************************************** Constants **************************************************************/ private final boolean EDBG = true; private final boolean NOT_EDBG = false; /*************************************************************** Function: CEmit Description: Constructor. **************************************************************/ CEmit ( ) { reset(); } /*************************************************************** Function: reset Description: Clears member variables. **************************************************************/ private void reset ( ) { m_spec = null; m_outstream = null; } /*************************************************************** Function: set Description: Initializes member variables. **************************************************************/ private void set ( CSpec spec, java.io.PrintWriter outstream ) { if (CUtility.DEBUG) { CUtility.ASSERT(null != spec); CUtility.ASSERT(null != outstream); } m_spec = spec; m_outstream = outstream; } /*************************************************************** Function: emit_imports Description: Emits import packages at top of generated source file. **************************************************************/ /*void emit_imports ( CSpec spec, OutputStream outstream ) throws java.io.IOException { set(spec,outstream); if (CUtility.DEBUG) { CUtility.ASSERT(null != m_spec); CUtility.ASSERT(null != m_outstream); }*/ /*m_outstream.println("import java.lang.String;"); m_outstream.println("import java.lang.System;"); m_outstream.println("import java.io.BufferedReader;"); m_outstream.println("import java.io.InputStream;");*/ /* reset(); }*/ /*************************************************************** Function: print_details Description: Debugging output. **************************************************************/ private void print_details ( ) { int i; int j; int next; int state; CDTrans dtrans; CAccept accept; boolean tr; System.out.println("---------------------- Transition Table " + "----------------------"); for (i = 0; i < m_spec.m_row_map.length; ++i) { System.out.print("State " + i); accept = (CAccept) m_spec.m_accept_vector.elementAt(i); if (null == accept) { System.out.println(" [nonaccepting]"); } else { System.out.println(" [accepting, line " + accept.m_line_number + " <" + (new java.lang.String(accept.m_action,0, accept.m_action_read)) + ">]"); } dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(m_spec.m_row_map[i]); tr = false; state = dtrans.m_dtrans[m_spec.m_col_map[0]]; if (CDTrans.F != state) { tr = true; System.out.print("\tgoto " + state + " on [" + ((char) 0)); } for (j = 1; j < m_spec.m_dtrans_ncols; ++j) { next = dtrans.m_dtrans[m_spec.m_col_map[j]]; if (state == next) { if (CDTrans.F != state) { System.out.print((char) j); } } else { state = next; if (tr) { System.out.println("]"); tr = false; } if (CDTrans.F != state) { tr = true; System.out.print("\tgoto " + state + " on [" + ((char) j)); } } } if (tr) { System.out.println("]"); } } System.out.println("---------------------- Transition Table " + "----------------------"); } /*************************************************************** Function: emit Description: High-level access function to module. **************************************************************/ void emit ( CSpec spec, java.io.PrintWriter outstream ) throws java.io.IOException { set(spec,outstream); if (CUtility.DEBUG) { CUtility.ASSERT(null != m_spec); CUtility.ASSERT(null != m_outstream); } if (CUtility.OLD_DEBUG) { print_details(); } emit_header(); emit_construct(); emit_helpers(); emit_driver(); emit_footer(); reset(); } /*************************************************************** Function: emit_construct Description: Emits constructor, member variables, and constants. **************************************************************/ private void emit_construct ( ) throws java.io.IOException { if (CUtility.DEBUG) { CUtility.ASSERT(null != m_spec); CUtility.ASSERT(null != m_outstream); } /* Constants */ m_outstream.println("\tprivate final int YY_BUFFER_SIZE = 512;"); m_outstream.println("\tprivate final int YY_F = -1;"); m_outstream.println("\tprivate final int YY_NO_STATE = -1;"); m_outstream.println("\tprivate final int YY_NOT_ACCEPT = 0;"); m_outstream.println("\tprivate final int YY_START = 1;"); m_outstream.println("\tprivate final int YY_END = 2;"); m_outstream.println("\tprivate final int YY_NO_ANCHOR = 4;"); // internal m_outstream.println("\tprivate final int YY_BOL = "+m_spec.BOL+";"); m_outstream.println("\tprivate final int YY_EOF = "+m_spec.EOF+";"); // external if (m_spec.m_integer_type || true == m_spec.m_yyeof) m_outstream.println("\tpublic final int YYEOF = -1;"); /* User specified class code. */ if (null != m_spec.m_class_code) { m_outstream.print(new String(m_spec.m_class_code,0, m_spec.m_class_read)); } /* Member Variables */ m_outstream.println("\tprivate java.io.BufferedReader yy_reader;"); m_outstream.println("\tprivate int yy_buffer_index;"); m_outstream.println("\tprivate int yy_buffer_read;"); m_outstream.println("\tprivate int yy_buffer_start;"); m_outstream.println("\tprivate int yy_buffer_end;"); m_outstream.println("\tprivate char yy_buffer[];"); if (m_spec.m_count_chars) { m_outstream.println("\tprivate int yychar;"); } if (m_spec.m_count_lines) { m_outstream.println("\tprivate int yyline;"); } m_outstream.println("\tprivate boolean yy_at_bol;"); m_outstream.println("\tprivate int yy_lexical_state;"); /*if (m_spec.m_count_lines || true == m_spec.m_count_chars) { m_outstream.println("\tprivate int yy_buffer_prev_start;"); }*/ m_outstream.println(); /* Function: first constructor (Reader) */ m_outstream.print("\t"); if (true == m_spec.m_public) { m_outstream.print("public "); } m_outstream.print(new String(m_spec.m_class_name)); m_outstream.print(" (java.io.Reader reader)"); if (null != m_spec.m_init_throw_code) { m_outstream.println(); m_outstream.print("\t\tthrows "); m_outstream.print(new String(m_spec.m_init_throw_code,0, m_spec.m_init_throw_read)); m_outstream.println(); m_outstream.println("\t\t{"); } else { m_outstream.println(" {"); } m_outstream.println("\t\tthis ();"); m_outstream.println("\t\tif (null == reader) {"); m_outstream.println("\t\t\tthrow (new Error(\"Error: Bad input " + "stream initializer.\"));"); m_outstream.println("\t\t}"); m_outstream.println("\t\tyy_reader = new java.io.BufferedReader(reader);"); m_outstream.println("\t}"); m_outstream.println(); /* Function: second constructor (InputStream) */ m_outstream.print("\t"); if (true == m_spec.m_public) { m_outstream.print("public "); } m_outstream.print(new String(m_spec.m_class_name)); m_outstream.print(" (java.io.InputStream instream)"); if (null != m_spec.m_init_throw_code) { m_outstream.println(); m_outstream.print("\t\tthrows "); m_outstream.println(new String(m_spec.m_init_throw_code,0, m_spec.m_init_throw_read)); m_outstream.println("\t\t{"); } else { m_outstream.println(" {"); } m_outstream.println("\t\tthis ();"); m_outstream.println("\t\tif (null == instream) {"); m_outstream.println("\t\t\tthrow (new Error(\"Error: Bad input " + "stream initializer.\"));"); m_outstream.println("\t\t}"); m_outstream.println("\t\tyy_reader = new java.io.BufferedReader(new java.io.InputStreamReader(instream));"); m_outstream.println("\t}"); m_outstream.println(); /* Function: third, private constructor - only for internal use */ m_outstream.print("\tprivate "); m_outstream.print(new String(m_spec.m_class_name)); m_outstream.print(" ()"); if (null != m_spec.m_init_throw_code) { m_outstream.println(); m_outstream.print("\t\tthrows "); m_outstream.println(new String(m_spec.m_init_throw_code,0, m_spec.m_init_throw_read)); m_outstream.println("\t\t{"); } else { m_outstream.println(" {"); } m_outstream.println("\t\tyy_buffer = new char[YY_BUFFER_SIZE];"); m_outstream.println("\t\tyy_buffer_read = 0;"); m_outstream.println("\t\tyy_buffer_index = 0;"); m_outstream.println("\t\tyy_buffer_start = 0;"); m_outstream.println("\t\tyy_buffer_end = 0;"); if (m_spec.m_count_chars) { m_outstream.println("\t\tyychar = 0;"); } if (m_spec.m_count_lines) { m_outstream.println("\t\tyyline = 0;"); } m_outstream.println("\t\tyy_at_bol = true;"); m_outstream.println("\t\tyy_lexical_state = YYINITIAL;"); /*if (m_spec.m_count_lines || true == m_spec.m_count_chars) { m_outstream.println("\t\tyy_buffer_prev_start = 0;"); }*/ /* User specified constructor code. */ if (null != m_spec.m_init_code) { m_outstream.print(new String(m_spec.m_init_code,0, m_spec.m_init_read)); } m_outstream.println("\t}"); m_outstream.println(); } /*************************************************************** Function: emit_states Description: Emits constants that serve as lexical states, including YYINITIAL. **************************************************************/ private void emit_states ( ) throws java.io.IOException { Enumeration states; String state; int index; states = m_spec.m_states.keys(); /*index = 0;*/ while (states.hasMoreElements()) { state = (String) states.nextElement(); if (CUtility.DEBUG) { CUtility.ASSERT(null != state); } m_outstream.println("\tprivate final int " + state + " = " + (m_spec.m_states.get(state)).toString() + ";"); /*++index;*/ } m_outstream.println("\tprivate final int yy_state_dtrans[] = {"); for (index = 0; index < m_spec.m_state_dtrans.length; ++index) { m_outstream.print("\t\t" + m_spec.m_state_dtrans[index]); if (index < m_spec.m_state_dtrans.length - 1) { m_outstream.println(","); } else { m_outstream.println(); } } m_outstream.println("\t};"); } /*************************************************************** Function: emit_helpers Description: Emits helper functions, particularly error handling and input buffering. **************************************************************/ private void emit_helpers ( ) throws java.io.IOException { if (CUtility.DEBUG) { CUtility.ASSERT(null != m_spec); CUtility.ASSERT(null != m_outstream); } /* Function: yy_do_eof */ m_outstream.println("\tprivate boolean yy_eof_done = false;"); if (null != m_spec.m_eof_code) { m_outstream.print("\tprivate void yy_do_eof ()"); if (null != m_spec.m_eof_throw_code) { m_outstream.println(); m_outstream.print("\t\tthrows "); m_outstream.println(new String(m_spec.m_eof_throw_code,0, m_spec.m_eof_throw_read)); m_outstream.println("\t\t{"); } else { m_outstream.println(" {"); } m_outstream.println("\t\tif (false == yy_eof_done) {"); m_outstream.print(new String(m_spec.m_eof_code,0, m_spec.m_eof_read)); m_outstream.println("\t\t}"); m_outstream.println("\t\tyy_eof_done = true;"); m_outstream.println("\t}"); } emit_states(); /* Function: yybegin */ m_outstream.println("\tprivate void yybegin (int state) {"); m_outstream.println("\t\tyy_lexical_state = state;"); m_outstream.println("\t}"); /* Function: yy_initial_dtrans */ /*m_outstream.println("\tprivate int yy_initial_dtrans (int state) {"); m_outstream.println("\t\treturn yy_state_dtrans[state];"); m_outstream.println("\t}");*/ /* Function: yy_advance */ m_outstream.println("\tprivate int yy_advance ()"); m_outstream.println("\t\tthrows java.io.IOException {"); /*m_outstream.println("\t\t{");*/ m_outstream.println("\t\tint next_read;"); m_outstream.println("\t\tint i;"); m_outstream.println("\t\tint j;"); m_outstream.println(); m_outstream.println("\t\tif (yy_buffer_index < yy_buffer_read) {"); m_outstream.println("\t\t\treturn yy_buffer[yy_buffer_index++];"); /*m_outstream.println("\t\t\t++yy_buffer_index;");*/ m_outstream.println("\t\t}"); m_outstream.println(); m_outstream.println("\t\tif (0 != yy_buffer_start) {"); m_outstream.println("\t\t\ti = yy_buffer_start;"); m_outstream.println("\t\t\tj = 0;"); m_outstream.println("\t\t\twhile (i < yy_buffer_read) {"); m_outstream.println("\t\t\t\tyy_buffer[j] = yy_buffer[i];"); m_outstream.println("\t\t\t\t++i;"); m_outstream.println("\t\t\t\t++j;"); m_outstream.println("\t\t\t}"); m_outstream.println("\t\t\tyy_buffer_end = yy_buffer_end - yy_buffer_start;"); m_outstream.println("\t\t\tyy_buffer_start = 0;"); m_outstream.println("\t\t\tyy_buffer_read = j;"); m_outstream.println("\t\t\tyy_buffer_index = j;"); m_outstream.println("\t\t\tnext_read = yy_reader.read(yy_buffer,"); m_outstream.println("\t\t\t\t\tyy_buffer_read,"); m_outstream.println("\t\t\t\t\tyy_buffer.length - yy_buffer_read);"); m_outstream.println("\t\t\tif (-1 == next_read) {"); m_outstream.println("\t\t\t\treturn YY_EOF;"); m_outstream.println("\t\t\t}"); m_outstream.println("\t\t\tyy_buffer_read = yy_buffer_read + next_read;"); m_outstream.println("\t\t}"); m_outstream.println(); m_outstream.println("\t\twhile (yy_buffer_index >= yy_buffer_read) {"); m_outstream.println("\t\t\tif (yy_buffer_index >= yy_buffer.length) {"); m_outstream.println("\t\t\t\tyy_buffer = yy_double(yy_buffer);"); m_outstream.println("\t\t\t}"); m_outstream.println("\t\t\tnext_read = yy_reader.read(yy_buffer,"); m_outstream.println("\t\t\t\t\tyy_buffer_read,"); m_outstream.println("\t\t\t\t\tyy_buffer.length - yy_buffer_read);"); m_outstream.println("\t\t\tif (-1 == next_read) {"); m_outstream.println("\t\t\t\treturn YY_EOF;"); m_outstream.println("\t\t\t}"); m_outstream.println("\t\t\tyy_buffer_read = yy_buffer_read + next_read;"); m_outstream.println("\t\t}"); m_outstream.println("\t\treturn yy_buffer[yy_buffer_index++];"); m_outstream.println("\t}"); /* Function: yy_move_end */ m_outstream.println("\tprivate void yy_move_end () {"); m_outstream.println("\t\tif (yy_buffer_end > yy_buffer_start &&"); m_outstream.println("\t\t '\\n' == yy_buffer[yy_buffer_end-1])"); m_outstream.println("\t\t\tyy_buffer_end--;"); m_outstream.println("\t\tif (yy_buffer_end > yy_buffer_start &&"); m_outstream.println("\t\t '\\r' == yy_buffer[yy_buffer_end-1])"); m_outstream.println("\t\t\tyy_buffer_end--;"); m_outstream.println("\t}"); /* Function: yy_mark_start */ m_outstream.println("\tprivate boolean yy_last_was_cr=false;"); m_outstream.println("\tprivate void yy_mark_start () {"); if (m_spec.m_count_lines || true == m_spec.m_count_chars) { if (m_spec.m_count_lines) { m_outstream.println("\t\tint i;"); m_outstream.println("\t\tfor (i = yy_buffer_start; " + "i < yy_buffer_index; ++i) {"); m_outstream.println("\t\t\tif ('\\n' == yy_buffer[i] && !yy_last_was_cr) {"); m_outstream.println("\t\t\t\t++yyline;"); m_outstream.println("\t\t\t}"); m_outstream.println("\t\t\tif ('\\r' == yy_buffer[i]) {"); m_outstream.println("\t\t\t\t++yyline;"); m_outstream.println("\t\t\t\tyy_last_was_cr=true;"); m_outstream.println("\t\t\t} else yy_last_was_cr=false;"); m_outstream.println("\t\t}"); } if (m_spec.m_count_chars) { m_outstream.println("\t\tyychar = yychar"); m_outstream.println("\t\t\t+ yy_buffer_index - yy_buffer_start;"); } } m_outstream.println("\t\tyy_buffer_start = yy_buffer_index;"); m_outstream.println("\t}"); /* Function: yy_mark_end */ m_outstream.println("\tprivate void yy_mark_end () {"); m_outstream.println("\t\tyy_buffer_end = yy_buffer_index;"); m_outstream.println("\t}"); /* Function: yy_to_mark */ m_outstream.println("\tprivate void yy_to_mark () {"); m_outstream.println("\t\tyy_buffer_index = yy_buffer_end;"); m_outstream.println("\t\tyy_at_bol = "+ "(yy_buffer_end > yy_buffer_start) &&"); m_outstream.println("\t\t "+ "('\\r' == yy_buffer[yy_buffer_end-1] ||"); m_outstream.println("\t\t "+ " '\\n' == yy_buffer[yy_buffer_end-1] ||"); m_outstream.println("\t\t "+ /* unicode LS */ " 2028/*LS*/ == yy_buffer[yy_buffer_end-1] ||"); m_outstream.println("\t\t "+ /* unicode PS */ " 2029/*PS*/ == yy_buffer[yy_buffer_end-1]);"); m_outstream.println("\t}"); /* Function: yytext */ m_outstream.println("\tprivate java.lang.String yytext () {"); m_outstream.println("\t\treturn (new java.lang.String(yy_buffer,"); m_outstream.println("\t\t\tyy_buffer_start,"); m_outstream.println("\t\t\tyy_buffer_end - yy_buffer_start));"); m_outstream.println("\t}"); /* Function: yylength */ m_outstream.println("\tprivate int yylength () {"); m_outstream.println("\t\treturn yy_buffer_end - yy_buffer_start;"); m_outstream.println("\t}"); /* Function: yy_double */ m_outstream.println("\tprivate char[] yy_double (char buf[]) {"); m_outstream.println("\t\tint i;"); m_outstream.println("\t\tchar newbuf[];"); m_outstream.println("\t\tnewbuf = new char[2*buf.length];"); m_outstream.println("\t\tfor (i = 0; i < buf.length; ++i) {"); m_outstream.println("\t\t\tnewbuf[i] = buf[i];"); m_outstream.println("\t\t}"); m_outstream.println("\t\treturn newbuf;"); m_outstream.println("\t}"); /* Function: yy_error */ m_outstream.println("\tprivate final int YY_E_INTERNAL = 0;"); m_outstream.println("\tprivate final int YY_E_MATCH = 1;"); m_outstream.println("\tprivate java.lang.String yy_error_string[] = {"); m_outstream.println("\t\t\"Error: Internal error.\\n\","); m_outstream.println("\t\t\"Error: Unmatched input.\\n\""); m_outstream.println("\t};"); m_outstream.println("\tprivate void yy_error (int code,boolean fatal) {"); m_outstream.println("\t\tjava.lang.System.out.print(yy_error_string[code]);"); m_outstream.println("\t\tjava.lang.System.out.flush();"); m_outstream.println("\t\tif (fatal) {"); m_outstream.println("\t\t\tthrow new Error(\"Fatal Error.\\n\");"); m_outstream.println("\t\t}"); m_outstream.println("\t}"); /* Function: yy_next */ /*m_outstream.println("\tprivate int yy_next (int current,char lookahead) {"); m_outstream.println("\t\treturn yy_nxt[yy_rmap[current]][yy_cmap[lookahead]];"); m_outstream.println("\t}");*/ /* Function: yy_accept */ /*m_outstream.println("\tprivate int yy_accept (int current) {"); m_outstream.println("\t\treturn yy_acpt[current];"); m_outstream.println("\t}");*/ // Function: private int [][] unpackFromString(int size1, int size2, String st) // Added 6/24/98 Raimondas Lencevicius // May be made more efficient by replacing String operations // Assumes correctly formed input String. Performs no error checking m_outstream.println("\tprivate int[][] unpackFromString"+ "(int size1, int size2, String st) {"); m_outstream.println("\t\tint colonIndex = -1;"); m_outstream.println("\t\tString lengthString;"); m_outstream.println("\t\tint sequenceLength = 0;"); m_outstream.println("\t\tint sequenceInteger = 0;"); m_outstream.println(); m_outstream.println("\t\tint commaIndex;"); m_outstream.println("\t\tString workString;"); m_outstream.println(); m_outstream.println("\t\tint res[][] = new int[size1][size2];"); m_outstream.println("\t\tfor (int i= 0; i < size1; i++) {"); m_outstream.println("\t\t\tfor (int j= 0; j < size2; j++) {"); m_outstream.println("\t\t\t\tif (sequenceLength != 0) {"); m_outstream.println("\t\t\t\t\tres[i][j] = sequenceInteger;"); m_outstream.println("\t\t\t\t\tsequenceLength--;"); m_outstream.println("\t\t\t\t\tcontinue;"); m_outstream.println("\t\t\t\t}"); m_outstream.println("\t\t\t\tcommaIndex = st.indexOf(',');"); m_outstream.println("\t\t\t\tworkString = (commaIndex==-1) ? st :"); m_outstream.println("\t\t\t\t\tst.substring(0, commaIndex);"); m_outstream.println("\t\t\t\tst = st.substring(commaIndex+1);"); m_outstream.println("\t\t\t\tcolonIndex = workString.indexOf(':');"); m_outstream.println("\t\t\t\tif (colonIndex == -1) {"); m_outstream.println("\t\t\t\t\tres[i][j]=Integer.parseInt(workString);"); m_outstream.println("\t\t\t\t\tcontinue;"); m_outstream.println("\t\t\t\t}"); m_outstream.println("\t\t\t\tlengthString ="); m_outstream.println("\t\t\t\t\tworkString.substring(colonIndex+1);"); m_outstream.println("\t\t\t\tsequenceLength="+ "Integer.parseInt(lengthString);"); m_outstream.println("\t\t\t\tworkString="+ "workString.substring(0,colonIndex);"); m_outstream.println("\t\t\t\tsequenceInteger="+ "Integer.parseInt(workString);"); m_outstream.println("\t\t\t\tres[i][j] = sequenceInteger;"); m_outstream.println("\t\t\t\tsequenceLength--;"); m_outstream.println("\t\t\t}"); m_outstream.println("\t\t}"); m_outstream.println("\t\treturn res;"); m_outstream.println("\t}"); } /*************************************************************** Function: emit_header Description: Emits class header. **************************************************************/ private void emit_header ( ) throws java.io.IOException { if (CUtility.DEBUG) { CUtility.ASSERT(null != m_spec); CUtility.ASSERT(null != m_outstream); } m_outstream.println(); m_outstream.println(); if (true == m_spec.m_public) { m_outstream.print("public "); } m_outstream.print("class "); m_outstream.print(new String(m_spec.m_class_name,0, m_spec.m_class_name.length)); if (m_spec.m_implements_name.length > 0) { m_outstream.print(" implements "); m_outstream.print(new String(m_spec.m_implements_name,0, m_spec.m_implements_name.length)); } m_outstream.println(" {"); } /*************************************************************** Function: emit_table Description: Emits transition table. **************************************************************/ private void emit_table ( ) throws java.io.IOException { int i; int elem; int size; CDTrans dtrans; boolean is_start; boolean is_end; CAccept accept; if (CUtility.DEBUG) { CUtility.ASSERT(null != m_spec); CUtility.ASSERT(null != m_outstream); } m_outstream.println("\tprivate int yy_acpt[] = {"); size = m_spec.m_accept_vector.size(); for (elem = 0; elem < size; ++elem) { accept = (CAccept) m_spec.m_accept_vector.elementAt(elem); m_outstream.print("\t\t/* "+elem+" */ "); if (null != accept) { is_start = (0 != (m_spec.m_anchor_array[elem] & CSpec.START)); is_end = (0 != (m_spec.m_anchor_array[elem] & CSpec.END)); if (is_start && true == is_end) { m_outstream.print("YY_START | YY_END"); } else if (is_start) { m_outstream.print("YY_START"); } else if (is_end) { m_outstream.print("YY_END"); } else { m_outstream.print("YY_NO_ANCHOR"); } } else { m_outstream.print("YY_NOT_ACCEPT"); } if (elem < size - 1) { m_outstream.print(","); } m_outstream.println(); } m_outstream.println("\t};"); // CSA: modified yy_cmap to use string packing 9-Aug-1999 int[] yy_cmap = new int[m_spec.m_ccls_map.length]; for (i = 0; i < m_spec.m_ccls_map.length; ++i) yy_cmap[i] = m_spec.m_col_map[m_spec.m_ccls_map[i]]; m_outstream.print("\tprivate int yy_cmap[] = unpackFromString("); emit_table_as_string(new int[][] { yy_cmap }); m_outstream.println(")[0];"); m_outstream.println(); // CSA: modified yy_rmap to use string packing 9-Aug-1999 m_outstream.print("\tprivate int yy_rmap[] = unpackFromString("); emit_table_as_string(new int[][] { m_spec.m_row_map }); m_outstream.println(")[0];"); m_outstream.println(); // 6/24/98 Raimondas Lencevicius // modified to use // int[][] unpackFromString(int size1, int size2, String st) size = m_spec.m_dtrans_vector.size(); int[][] yy_nxt = new int[size][]; for (elem=0; elem0?ia[0].length:0); m_outstream.println(","); StringBuffer outstr = new StringBuffer(); // RL - Output matrix for (int elem = 0; elem < ia.length; ++elem) { for (int i = 0; i < ia[elem].length; ++i) { int writeInt = ia[elem][i]; if (writeInt == previousInt) // RL - sequence? { if (sequenceStarted) { sequenceLength++; } else { outstr.append(writeInt); outstr.append(":"); sequenceLength = 2; sequenceStarted = true; } } else // RL - no sequence or end sequence { if (sequenceStarted) { outstr.append(sequenceLength); outstr.append(","); sequenceLength = 0; sequenceStarted = false; } else { if (previousInt != -20) { outstr.append(previousInt); outstr.append(","); } } } previousInt = writeInt; // CSA: output in 75 character chunks. if (outstr.length() > 75) { String s = outstr.toString(); m_outstream.println("\""+s.substring(0,75)+"\" +"); outstr = new StringBuffer(s.substring(75)); } } } if (sequenceStarted) { outstr.append(sequenceLength); } else { outstr.append(previousInt); } // CSA: output in 75 character chunks. if (outstr.length() > 75) { String s = outstr.toString(); m_outstream.println("\""+s.substring(0,75)+"\" +"); outstr = new StringBuffer(s.substring(75)); } m_outstream.print("\""+outstr+"\""); } /*************************************************************** Function: emit_driver Description: **************************************************************/ private void emit_driver ( ) throws java.io.IOException { if (CUtility.DEBUG) { CUtility.ASSERT(null != m_spec); CUtility.ASSERT(null != m_outstream); } emit_table(); if (m_spec.m_integer_type) { m_outstream.print("\tpublic int "); m_outstream.print(new String(m_spec.m_function_name)); m_outstream.println(" ()"); } else if (m_spec.m_intwrap_type) { m_outstream.print("\tpublic java.lang.Integer "); m_outstream.print(new String(m_spec.m_function_name)); m_outstream.println(" ()"); } else { m_outstream.print("\tpublic "); m_outstream.print(new String(m_spec.m_type_name)); m_outstream.print(" "); m_outstream.print(new String(m_spec.m_function_name)); m_outstream.println(" ()"); } /*m_outstream.println("\t\tthrows java.io.IOException {");*/ m_outstream.print("\t\tthrows java.io.IOException"); if (null != m_spec.m_yylex_throw_code) { m_outstream.print(", "); m_outstream.print(new String(m_spec.m_yylex_throw_code,0, m_spec.m_yylex_throw_read)); m_outstream.println(); m_outstream.println("\t\t{"); } else { m_outstream.println(" {"); } m_outstream.println("\t\tint yy_lookahead;"); m_outstream.println("\t\tint yy_anchor = YY_NO_ANCHOR;"); /*m_outstream.println("\t\tint yy_state " + "= yy_initial_dtrans(yy_lexical_state);");*/ m_outstream.println("\t\tint yy_state " + "= yy_state_dtrans[yy_lexical_state];"); m_outstream.println("\t\tint yy_next_state = YY_NO_STATE;"); /*m_outstream.println("\t\tint yy_prev_stave = YY_NO_STATE;");*/ m_outstream.println("\t\tint yy_last_accept_state = YY_NO_STATE;"); m_outstream.println("\t\tboolean yy_initial = true;"); m_outstream.println("\t\tint yy_this_accept;"); m_outstream.println(); m_outstream.println("\t\tyy_mark_start();"); /*m_outstream.println("\t\tyy_this_accept = yy_accept(yy_state);");*/ m_outstream.println("\t\tyy_this_accept = yy_acpt[yy_state];"); m_outstream.println("\t\tif (YY_NOT_ACCEPT != yy_this_accept) {"); m_outstream.println("\t\t\tyy_last_accept_state = yy_state;"); m_outstream.println("\t\t\tyy_mark_end();"); m_outstream.println("\t\t}"); if (NOT_EDBG) { m_outstream.println("\t\tjava.lang.System.out.println(\"Begin\");"); } m_outstream.println("\t\twhile (true) {"); m_outstream.println("\t\t\tif (yy_initial && yy_at_bol) "+ "yy_lookahead = YY_BOL;"); m_outstream.println("\t\t\telse yy_lookahead = yy_advance();"); m_outstream.println("\t\t\tyy_next_state = YY_F;"); /*m_outstream.println("\t\t\t\tyy_next_state = " + "yy_next(yy_state,yy_lookahead);");*/ m_outstream.println("\t\t\tyy_next_state = " + "yy_nxt[yy_rmap[yy_state]][yy_cmap[yy_lookahead]];"); if (NOT_EDBG) { m_outstream.println("java.lang.System.out.println(\"Current state: \"" + " + yy_state"); m_outstream.println("+ \"\tCurrent input: \""); m_outstream.println(" + ((char) yy_lookahead));"); } if (NOT_EDBG) { m_outstream.println("\t\t\tjava.lang.System.out.println(\"State = \"" + "+ yy_state);"); m_outstream.println("\t\t\tjava.lang.System.out.println(\"Accepting status = \"" + "+ yy_this_accept);"); m_outstream.println("\t\t\tjava.lang.System.out.println(\"Last accepting state = \"" + "+ yy_last_accept_state);"); m_outstream.println("\t\t\tjava.lang.System.out.println(\"Next state = \"" + "+ yy_next_state);"); m_outstream.println("\t\t\tjava.lang.System.out.println(\"Lookahead input = \"" + "+ ((char) yy_lookahead));"); } // handle bare EOF. m_outstream.println("\t\t\tif (YY_EOF == yy_lookahead " + "&& true == yy_initial) {"); if (null != m_spec.m_eof_code) { m_outstream.println("\t\t\t\tyy_do_eof();"); } if (true == m_spec.m_integer_type) { m_outstream.println("\t\t\t\treturn YYEOF;"); } else if (null != m_spec.m_eof_value_code) { m_outstream.print(new String(m_spec.m_eof_value_code,0, m_spec.m_eof_value_read)); } else { m_outstream.println("\t\t\t\treturn null;"); } m_outstream.println("\t\t\t}"); m_outstream.println("\t\t\tif (YY_F != yy_next_state) {"); m_outstream.println("\t\t\t\tyy_state = yy_next_state;"); m_outstream.println("\t\t\t\tyy_initial = false;"); /*m_outstream.println("\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/ m_outstream.println("\t\t\t\tyy_this_accept = yy_acpt[yy_state];"); m_outstream.println("\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {"); m_outstream.println("\t\t\t\t\tyy_last_accept_state = yy_state;"); m_outstream.println("\t\t\t\t\tyy_mark_end();"); m_outstream.println("\t\t\t\t}"); /*m_outstream.println("\t\t\t\tyy_prev_state = yy_state;");*/ /*m_outstream.println("\t\t\t\tyy_state = yy_next_state;");*/ m_outstream.println("\t\t\t}"); m_outstream.println("\t\t\telse {"); m_outstream.println("\t\t\t\tif (YY_NO_STATE == yy_last_accept_state) {"); /*m_outstream.println("\t\t\t\t\tyy_error(YY_E_MATCH,false);"); m_outstream.println("\t\t\t\t\tyy_initial = true;"); m_outstream.println("\t\t\t\t\tyy_state " + "= yy_state_dtrans[yy_lexical_state];"); m_outstream.println("\t\t\t\t\tyy_next_state = YY_NO_STATE;");*/ /*m_outstream.println("\t\t\t\t\tyy_prev_state = YY_NO_STATE;");*/ /*m_outstream.println("\t\t\t\t\tyy_last_accept_state = YY_NO_STATE;"); m_outstream.println("\t\t\t\t\tyy_mark_start();");*/ /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/ /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_acpt[yy_state];"); m_outstream.println("\t\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {"); m_outstream.println("\t\t\t\t\t\tyy_last_accept_state = yy_state;"); m_outstream.println("\t\t\t\t\t}");*/ m_outstream.println("\t\t\t\t\tthrow (new Error(\"Lexical Error: Unmatched Input.\"));"); m_outstream.println("\t\t\t\t}"); m_outstream.println("\t\t\t\telse {"); m_outstream.println("\t\t\t\t\tyy_anchor = yy_acpt[yy_last_accept_state];"); /*m_outstream.println("\t\t\t\t\tyy_anchor " + "= yy_accept(yy_last_accept_state);");*/ m_outstream.println("\t\t\t\t\tif (0 != (YY_END & yy_anchor)) {"); m_outstream.println("\t\t\t\t\t\tyy_move_end();"); m_outstream.println("\t\t\t\t\t}"); m_outstream.println("\t\t\t\t\tyy_to_mark();"); m_outstream.println("\t\t\t\t\tswitch (yy_last_accept_state) {"); emit_actions("\t\t\t\t\t"); m_outstream.println("\t\t\t\t\tdefault:"); m_outstream.println("\t\t\t\t\t\tyy_error(YY_E_INTERNAL,false);"); /*m_outstream.println("\t\t\t\t\t\treturn null;");*/ m_outstream.println("\t\t\t\t\tcase -1:"); m_outstream.println("\t\t\t\t\t}"); m_outstream.println("\t\t\t\t\tyy_initial = true;"); m_outstream.println("\t\t\t\t\tyy_state " + "= yy_state_dtrans[yy_lexical_state];"); m_outstream.println("\t\t\t\t\tyy_next_state = YY_NO_STATE;"); /*m_outstream.println("\t\t\t\t\tyy_prev_state = YY_NO_STATE;");*/ m_outstream.println("\t\t\t\t\tyy_last_accept_state = YY_NO_STATE;"); m_outstream.println("\t\t\t\t\tyy_mark_start();"); /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/ m_outstream.println("\t\t\t\t\tyy_this_accept = yy_acpt[yy_state];"); m_outstream.println("\t\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {"); m_outstream.println("\t\t\t\t\t\tyy_last_accept_state = yy_state;"); m_outstream.println("\t\t\t\t\t\tyy_mark_end();"); m_outstream.println("\t\t\t\t\t}"); m_outstream.println("\t\t\t\t}"); m_outstream.println("\t\t\t}"); m_outstream.println("\t\t}"); m_outstream.println("\t}"); /*m_outstream.println("\t\t\t\t"); m_outstream.println("\t\t\t"); m_outstream.println("\t\t\t"); m_outstream.println("\t\t\t"); m_outstream.println("\t\t\t"); m_outstream.println("\t\t}");*/ } /*************************************************************** Function: emit_actions Description: **************************************************************/ private void emit_actions ( String tabs ) throws java.io.IOException { int elem; int size; int bogus_index; CAccept accept; if (CUtility.DEBUG) { CUtility.ASSERT(m_spec.m_accept_vector.size() == m_spec.m_anchor_array.length); } bogus_index = -2; size = m_spec.m_accept_vector.size(); for (elem = 0; elem < size; ++elem) { accept = (CAccept) m_spec.m_accept_vector.elementAt(elem); if (null != accept) { m_outstream.println(tabs + "case " + elem + ":"); m_outstream.print(tabs + "\t"); m_outstream.print(new String(accept.m_action,0, accept.m_action_read)); m_outstream.println(); m_outstream.println(tabs + "case " + bogus_index + ":"); m_outstream.println(tabs + "\tbreak;"); --bogus_index; } } } /*************************************************************** Function: emit_footer Description: **************************************************************/ private void emit_footer ( ) throws java.io.IOException { if (CUtility.DEBUG) { CUtility.ASSERT(null != m_spec); CUtility.ASSERT(null != m_outstream); } m_outstream.println("}"); } } /*************************************************************** Class: CBunch **************************************************************/ class CBunch { /*************************************************************** Member Variables **************************************************************/ Vector m_nfa_set; /* Vector of CNfa states in dfa state. */ SparseBitSet m_nfa_bit; /* BitSet representation of CNfa labels. */ CAccept m_accept; /* Accepting actions, or null if nonaccepting state. */ int m_anchor; /* Anchors on regular expression. */ int m_accept_index; /* CNfa index corresponding to accepting actions. */ /*************************************************************** Function: CBunch Description: Constructor. **************************************************************/ CBunch ( ) { m_nfa_set = null; m_nfa_bit = null; m_accept = null; m_anchor = CSpec.NONE; m_accept_index = -1; } } /*************************************************************** Class: CMakeNfa **************************************************************/ class CMakeNfa { /*************************************************************** Member Variables **************************************************************/ private CSpec m_spec; private CLexGen m_lexGen; private CInput m_input; /*************************************************************** Function: CMakeNfa Description: Constructor. **************************************************************/ CMakeNfa ( ) { reset(); } /*************************************************************** Function: reset Description: Resets CMakeNfa member variables. **************************************************************/ private void reset ( ) { m_input = null; m_lexGen = null; m_spec = null; } /*************************************************************** Function: set Description: Sets CMakeNfa member variables. **************************************************************/ private void set ( CLexGen lexGen, CSpec spec, CInput input ) { if (CUtility.DEBUG) { CUtility.ASSERT(null != input); CUtility.ASSERT(null != lexGen); CUtility.ASSERT(null != spec); } m_input = input; m_lexGen = lexGen; m_spec = spec; } /*************************************************************** Function: allocate_BOL_EOF Description: Expands character class to include special BOL and EOF characters. Puts numeric index of these characters in input CSpec. **************************************************************/ void allocate_BOL_EOF ( CSpec spec ) { CUtility.ASSERT(CSpec.NUM_PSEUDO==2); spec.BOL = spec.m_dtrans_ncols++; spec.EOF = spec.m_dtrans_ncols++; } /*************************************************************** Function: thompson Description: High level access function to module. Deposits result in input CSpec. **************************************************************/ void thompson ( CLexGen lexGen, CSpec spec, CInput input ) throws java.io.IOException { int i; CNfa elem; int size; /* Set member variables. */ reset(); set(lexGen,spec,input); size = m_spec.m_states.size(); m_spec.m_state_rules = new Vector[size]; for (i = 0; i < size; ++i) { m_spec.m_state_rules[i] = new Vector(); } /* Initialize current token variable and create nfa. */ /*m_spec.m_current_token = m_lexGen.EOS; m_lexGen.advance();*/ m_spec.m_nfa_start = machine(); /* Set labels in created nfa machine. */ size = m_spec.m_nfa_states.size(); for (i = 0; i < size; ++i) { elem = (CNfa) m_spec.m_nfa_states.elementAt(i); elem.m_label = i; } /* Debugging output. */ if (CUtility.DO_DEBUG) { m_lexGen.print_nfa(); } if (m_spec.m_verbose) { System.out.println("NFA comprised of " + (m_spec.m_nfa_states.size() + 1) + " states."); } reset(); } /*************************************************************** Function: discardCNfa Description: **************************************************************/ private void discardCNfa ( CNfa nfa ) { m_spec.m_nfa_states.removeElement(nfa); } /*************************************************************** Function: processStates Description: **************************************************************/ private void processStates ( SparseBitSet states, CNfa current ) { int size; int i; size = m_spec.m_states.size(); for (i = 0; i < size; ++i) { if (states.get(i)) { m_spec.m_state_rules[i].addElement(current); } } } /*************************************************************** Function: machine Description: Recursive descent regular expression parser. **************************************************************/ private CNfa machine ( ) throws java.io.IOException { CNfa start; CNfa p; SparseBitSet states; if (CUtility.DESCENT_DEBUG) { CUtility.enter("machine",m_spec.m_lexeme,m_spec.m_current_token); } start = CAlloc.newCNfa(m_spec); p = start; states = m_lexGen.getStates(); /* Begin: Added for states. */ m_spec.m_current_token = m_lexGen.EOS; m_lexGen.advance(); /* End: Added for states. */ if (m_lexGen.END_OF_INPUT != m_spec.m_current_token) // CSA fix. { p.m_next = rule(); processStates(states,p.m_next); } while (m_lexGen.END_OF_INPUT != m_spec.m_current_token) { /* Make state changes HERE. */ states = m_lexGen.getStates(); /* Begin: Added for states. */ m_lexGen.advance(); if (m_lexGen.END_OF_INPUT == m_spec.m_current_token) { break; } /* End: Added for states. */ p.m_next2 = CAlloc.newCNfa(m_spec); p = p.m_next2; p.m_next = rule(); processStates(states,p.m_next); } // CSA: add pseudo-rules for BOL and EOF SparseBitSet all_states = new SparseBitSet(); for (int i = 0; i < m_spec.m_states.size(); ++i) all_states.set(i); p.m_next2 = CAlloc.newCNfa(m_spec); p = p.m_next2; p.m_next = CAlloc.newCNfa(m_spec); p.m_next.m_edge = CNfa.CCL; p.m_next.m_next = CAlloc.newCNfa(m_spec); p.m_next.m_set = new CSet(); p.m_next.m_set.add(m_spec.BOL); p.m_next.m_set.add(m_spec.EOF); p.m_next.m_next.m_accept = // do-nothing accept rule new CAccept(new char[0], 0, m_input.m_line_number+1); processStates(all_states,p.m_next); // CSA: done. if (CUtility.DESCENT_DEBUG) { CUtility.leave("machine",m_spec.m_lexeme,m_spec.m_current_token); } return start; } /*************************************************************** Function: rule Description: Recursive descent regular expression parser. **************************************************************/ private CNfa rule ( ) throws java.io.IOException { CNfaPair pair; CNfa p; CNfa start = null; CNfa end = null; int anchor = CSpec.NONE; if (CUtility.DESCENT_DEBUG) { CUtility.enter("rule",m_spec.m_lexeme,m_spec.m_current_token); } pair = CAlloc.newCNfaPair(); if (m_lexGen.AT_BOL == m_spec.m_current_token) { anchor = anchor | CSpec.START; m_lexGen.advance(); expr(pair); // CSA: fixed beginning-of-line operator. 8-aug-1999 start = CAlloc.newCNfa(m_spec); start.m_edge = m_spec.BOL; start.m_next = pair.m_start; end = pair.m_end; } else { expr(pair); start = pair.m_start; end = pair.m_end; } if (m_lexGen.AT_EOL == m_spec.m_current_token) { m_lexGen.advance(); // CSA: fixed end-of-line operator. 8-aug-1999 CNfaPair nlpair = CAlloc.newNLPair(m_spec); end.m_next = CAlloc.newCNfa(m_spec); end.m_next.m_next = nlpair.m_start; end.m_next.m_next2 = CAlloc.newCNfa(m_spec); end.m_next.m_next2.m_edge = m_spec.EOF; end.m_next.m_next2.m_next = nlpair.m_end; end = nlpair.m_end; anchor = anchor | CSpec.END; } /* Check for null rules. Charles Fischer found this bug. [CSA] */ if (end==null) CError.parse_error(CError.E_ZERO, m_input.m_line_number); /* Handle end of regular expression. See page 103. */ end.m_accept = m_lexGen.packAccept(); end.m_anchor = anchor; /* Begin: Removed for states. */ /*m_lexGen.advance();*/ /* End: Removed for states. */ if (CUtility.DESCENT_DEBUG) { CUtility.leave("rule",m_spec.m_lexeme,m_spec.m_current_token); } return start; } /*************************************************************** Function: expr Description: Recursive descent regular expression parser. **************************************************************/ private void expr ( CNfaPair pair ) throws java.io.IOException { CNfaPair e2_pair; CNfa p; if (CUtility.DESCENT_DEBUG) { CUtility.enter("expr",m_spec.m_lexeme,m_spec.m_current_token); } if (CUtility.DEBUG) { CUtility.ASSERT(null != pair); } e2_pair = CAlloc.newCNfaPair(); cat_expr(pair); while (m_lexGen.OR == m_spec.m_current_token) { m_lexGen.advance(); cat_expr(e2_pair); p = CAlloc.newCNfa(m_spec); p.m_next2 = e2_pair.m_start; p.m_next = pair.m_start; pair.m_start = p; p = CAlloc.newCNfa(m_spec); pair.m_end.m_next = p; e2_pair.m_end.m_next = p; pair.m_end = p; } if (CUtility.DESCENT_DEBUG) { CUtility.leave("expr",m_spec.m_lexeme,m_spec.m_current_token); } } /*************************************************************** Function: cat_expr Description: Recursive descent regular expression parser. **************************************************************/ private void cat_expr ( CNfaPair pair ) throws java.io.IOException { CNfaPair e2_pair; if (CUtility.DESCENT_DEBUG) { CUtility.enter("cat_expr",m_spec.m_lexeme,m_spec.m_current_token); } if (CUtility.DEBUG) { CUtility.ASSERT(null != pair); } e2_pair = CAlloc.newCNfaPair(); if (first_in_cat(m_spec.m_current_token)) { factor(pair); } while (first_in_cat(m_spec.m_current_token)) { factor(e2_pair); /* Destroy */ pair.m_end.mimic(e2_pair.m_start); discardCNfa(e2_pair.m_start); pair.m_end = e2_pair.m_end; } if (CUtility.DESCENT_DEBUG) { CUtility.leave("cat_expr",m_spec.m_lexeme,m_spec.m_current_token); } } /*************************************************************** Function: first_in_cat Description: Recursive descent regular expression parser. **************************************************************/ private boolean first_in_cat ( int token ) { switch (token) { case CLexGen.CLOSE_PAREN: case CLexGen.AT_EOL: case CLexGen.OR: case CLexGen.EOS: return false; case CLexGen.CLOSURE: case CLexGen.PLUS_CLOSE: case CLexGen.OPTIONAL: CError.parse_error(CError.E_CLOSE,m_input.m_line_number); return false; case CLexGen.CCL_END: CError.parse_error(CError.E_BRACKET,m_input.m_line_number); return false; case CLexGen.AT_BOL: CError.parse_error(CError.E_BOL,m_input.m_line_number); return false; default: break; } return true; } /*************************************************************** Function: factor Description: Recursive descent regular expression parser. **************************************************************/ private void factor ( CNfaPair pair ) throws java.io.IOException { CNfa start = null; CNfa end = null; if (CUtility.DESCENT_DEBUG) { CUtility.enter("factor",m_spec.m_lexeme,m_spec.m_current_token); } term(pair); if (m_lexGen.CLOSURE == m_spec.m_current_token || m_lexGen.PLUS_CLOSE == m_spec.m_current_token || m_lexGen.OPTIONAL == m_spec.m_current_token) { start = CAlloc.newCNfa(m_spec); end = CAlloc.newCNfa(m_spec); start.m_next = pair.m_start; pair.m_end.m_next = end; if (m_lexGen.CLOSURE == m_spec.m_current_token || m_lexGen.OPTIONAL == m_spec.m_current_token) { start.m_next2 = end; } if (m_lexGen.CLOSURE == m_spec.m_current_token || m_lexGen.PLUS_CLOSE == m_spec.m_current_token) { pair.m_end.m_next2 = pair.m_start; } pair.m_start = start; pair.m_end = end; m_lexGen.advance(); } if (CUtility.DESCENT_DEBUG) { CUtility.leave("factor",m_spec.m_lexeme,m_spec.m_current_token); } } /*************************************************************** Function: term Description: Recursive descent regular expression parser. **************************************************************/ private void term ( CNfaPair pair ) throws java.io.IOException { CNfa start; boolean isAlphaL; int c; if (CUtility.DESCENT_DEBUG) { CUtility.enter("term",m_spec.m_lexeme,m_spec.m_current_token); } if (m_lexGen.OPEN_PAREN == m_spec.m_current_token) { m_lexGen.advance(); expr(pair); if (m_lexGen.CLOSE_PAREN == m_spec.m_current_token) { m_lexGen.advance(); } else { CError.parse_error(CError.E_SYNTAX,m_input.m_line_number); } } else { start = CAlloc.newCNfa(m_spec); pair.m_start = start; start.m_next = CAlloc.newCNfa(m_spec); pair.m_end = start.m_next; if (m_lexGen.L == m_spec.m_current_token && Character.isLetter(m_spec.m_lexeme)) { isAlphaL = true; } else { isAlphaL = false; } if (false == (m_lexGen.ANY == m_spec.m_current_token || m_lexGen.CCL_START == m_spec.m_current_token || (m_spec.m_ignorecase && isAlphaL))) { start.m_edge = m_spec.m_lexeme; m_lexGen.advance(); } else { start.m_edge = CNfa.CCL; start.m_set = new CSet(); /* Match case-insensitive letters using character class. */ if (m_spec.m_ignorecase && isAlphaL) { start.m_set.addncase(m_spec.m_lexeme); } /* Match dot (.) using character class. */ else if (m_lexGen.ANY == m_spec.m_current_token) { start.m_set.add('\n'); start.m_set.add('\r'); // CSA: exclude BOL and EOF from character classes start.m_set.add(m_spec.BOL); start.m_set.add(m_spec.EOF); start.m_set.complement(); } else { m_lexGen.advance(); if (m_lexGen.AT_BOL == m_spec.m_current_token) { m_lexGen.advance(); // CSA: exclude BOL and EOF from character classes start.m_set.add(m_spec.BOL); start.m_set.add(m_spec.EOF); start.m_set.complement(); } if (false == (m_lexGen.CCL_END == m_spec.m_current_token)) { dodash(start.m_set); } /*else { for (c = 0; c <= ' '; ++c) { start.m_set.add((byte) c); } }*/ } m_lexGen.advance(); } } if (CUtility.DESCENT_DEBUG) { CUtility.leave("term",m_spec.m_lexeme,m_spec.m_current_token); } } /*************************************************************** Function: dodash Description: Recursive descent regular expression parser. **************************************************************/ private void dodash ( CSet set ) throws java.io.IOException { int first = -1; if (CUtility.DESCENT_DEBUG) { CUtility.enter("dodash",m_spec.m_lexeme,m_spec.m_current_token); } while (m_lexGen.EOS != m_spec.m_current_token && m_lexGen.CCL_END != m_spec.m_current_token) { // DASH loses its special meaning if it is first in class. if (m_lexGen.DASH == m_spec.m_current_token && -1 != first) { m_lexGen.advance(); // DASH loses its special meaning if it is last in class. if (m_spec.m_current_token == m_lexGen.CCL_END) { // 'first' already in set. set.add('-'); break; } for ( ; first <= m_spec.m_lexeme; ++first) { if (m_spec.m_ignorecase) set.addncase((char)first); else set.add(first); } } else { first = m_spec.m_lexeme; if (m_spec.m_ignorecase) set.addncase(m_spec.m_lexeme); else set.add(m_spec.m_lexeme); } m_lexGen.advance(); } if (CUtility.DESCENT_DEBUG) { CUtility.leave("dodash",m_spec.m_lexeme,m_spec.m_current_token); } } } /** * Extract character classes from NFA and simplify. * @author C. Scott Ananian 25-Jul-1999 */ class CSimplifyNfa { private int[] ccls; // character class mapping. private int original_charset_size; // original charset size private int mapped_charset_size; // reduced charset size void simplify(CSpec m_spec) { computeClasses(m_spec); // initialize fields. // now rewrite the NFA using our character class mapping. for (Enumeration e=m_spec.m_nfa_states.elements(); e.hasMoreElements(); ) { CNfa nfa = (CNfa) e.nextElement(); if (nfa.m_edge==CNfa.EMPTY || nfa.m_edge==CNfa.EPSILON) continue; // no change. if (nfa.m_edge==CNfa.CCL) { CSet ncset = new CSet(); ncset.map(nfa.m_set, ccls); // map it. nfa.m_set = ncset; } else { // single character nfa.m_edge = ccls[nfa.m_edge]; // map it. } } // now update m_spec with the mapping. m_spec.m_ccls_map = ccls; m_spec.m_dtrans_ncols = mapped_charset_size; } /** Compute minimum set of character classes needed to disambiguate * edges. We optimistically assume that every character belongs to * a single character class, and then incrementally split classes * as we see edges that require discrimination between characters in * the class. [CSA, 25-Jul-1999] */ private void computeClasses(CSpec m_spec) { this.original_charset_size = m_spec.m_dtrans_ncols; this.ccls = new int[original_charset_size]; // initially all zero. int nextcls = 1; SparseBitSet clsA = new SparseBitSet(), clsB = new SparseBitSet(); Hashtable h = new Hashtable(); System.out.print("Working on character classes."); for (Enumeration e=m_spec.m_nfa_states.elements(); e.hasMoreElements(); ) { CNfa nfa = (CNfa) e.nextElement(); if (nfa.m_edge==CNfa.EMPTY || nfa.m_edge==CNfa.EPSILON) continue; // no discriminatory information. clsA.clearAll(); clsB.clearAll(); for (int i=0; i= m_spec.m_dtrans_ncols) { break; } if (CUtility.DEBUG) { CUtility.ASSERT(false == set.get(i)); CUtility.ASSERT(-1 == m_spec.m_col_map[i]); } set.set(i); m_spec.m_col_map[i] = reduced_ncols; /* UNDONE: Optimize by doing all comparisons in one batch. */ for (j = i + 1; j < m_spec.m_dtrans_ncols; ++j) { if (-1 == m_spec.m_col_map[j] && true == col_equiv(i,j)) { m_spec.m_col_map[j] = reduced_ncols; } } } /* Reduce columns. */ k = 0; for (i = 0; i < m_spec.m_dtrans_ncols; ++i) { if (set.get(i)) { ++k; set.clear(i); j = m_spec.m_col_map[i]; if (CUtility.DEBUG) { CUtility.ASSERT(j <= i); } if (j == i) { continue; } col_copy(j,i); } } m_spec.m_dtrans_ncols = reduced_ncols; /* truncate m_dtrans at proper length (freeing extra) */ trunc_col(); if (CUtility.DEBUG) { CUtility.ASSERT(k == reduced_ncols); } /* Allocate row map. */ nrows = m_spec.m_dtrans_vector.size(); m_spec.m_row_map = new int[nrows]; for (i = 0; i < nrows; ++i) { m_spec.m_row_map[i] = -1; } /* Process rows to reduce. */ for (reduced_nrows = 0; ; ++reduced_nrows) { if (CUtility.DEBUG) { for (i = 0; i < reduced_nrows; ++i) { CUtility.ASSERT(-1 != m_spec.m_row_map[i]); } } for (i = reduced_nrows; i < nrows; ++i) { if (-1 == m_spec.m_row_map[i]) { break; } } if (i >= nrows) { break; } if (CUtility.DEBUG) { CUtility.ASSERT(false == set.get(i)); CUtility.ASSERT(-1 == m_spec.m_row_map[i]); } set.set(i); m_spec.m_row_map[i] = reduced_nrows; /* UNDONE: Optimize by doing all comparisons in one batch. */ for (j = i + 1; j < nrows; ++j) { if (-1 == m_spec.m_row_map[j] && true == row_equiv(i,j)) { m_spec.m_row_map[j] = reduced_nrows; } } } /* Reduce rows. */ k = 0; for (i = 0; i < nrows; ++i) { if (set.get(i)) { ++k; set.clear(i); j = m_spec.m_row_map[i]; if (CUtility.DEBUG) { CUtility.ASSERT(j <= i); } if (j == i) { continue; } row_copy(j,i); } } m_spec.m_dtrans_vector.setSize(reduced_nrows); if (CUtility.DEBUG) { /*System.out.println("k = " + k + "\nreduced_nrows = " + reduced_nrows + "");*/ CUtility.ASSERT(k == reduced_nrows); } } /*************************************************************** Function: fix_dtrans Description: Updates CDTrans table after minimization using groups, removing redundant transition table states. **************************************************************/ private void fix_dtrans ( ) { Vector new_vector; int i; int size; Vector dtrans_group; CDTrans first; int c; new_vector = new Vector(); size = m_spec.m_state_dtrans.length; for (i = 0; i < size; ++i) { if (CDTrans.F != m_spec.m_state_dtrans[i]) { m_spec.m_state_dtrans[i] = m_ingroup[m_spec.m_state_dtrans[i]]; } } size = m_group.size(); for (i = 0; i < size; ++i) { dtrans_group = (Vector) m_group.elementAt(i); first = (CDTrans) dtrans_group.elementAt(0); new_vector.addElement(first); for (c = 0; c < m_spec.m_dtrans_ncols; ++c) { if (CDTrans.F != first.m_dtrans[c]) { first.m_dtrans[c] = m_ingroup[first.m_dtrans[c]]; } } } m_group = null; m_spec.m_dtrans_vector = new_vector; } /*************************************************************** Function: minimize Description: Removes redundant transition table states. **************************************************************/ private void minimize ( ) { Vector dtrans_group; Vector new_group; int i; int j; int old_group_count; int group_count; CDTrans next; CDTrans first; int goto_first; int goto_next; int c; int group_size; boolean added; init_groups(); group_count = m_group.size(); old_group_count = group_count - 1; while (old_group_count != group_count) { old_group_count = group_count; if (CUtility.DEBUG) { CUtility.ASSERT(m_group.size() == group_count); } for (i = 0; i < group_count; ++i) { dtrans_group = (Vector) m_group.elementAt(i); group_size = dtrans_group.size(); if (group_size <= 1) { continue; } new_group = new Vector(); added = false; first = (CDTrans) dtrans_group.elementAt(0); for (j = 1; j < group_size; ++j) { next = (CDTrans) dtrans_group.elementAt(j); for (c = 0; c < m_spec.m_dtrans_ncols; ++c) { goto_first = first.m_dtrans[c]; goto_next = next.m_dtrans[c]; if (goto_first != goto_next && (goto_first == CDTrans.F || goto_next == CDTrans.F || m_ingroup[goto_next] != m_ingroup[goto_first])) { if (CUtility.DEBUG) { CUtility.ASSERT(dtrans_group.elementAt(j) == next); } dtrans_group.removeElementAt(j); --j; --group_size; new_group.addElement(next); if (false == added) { added = true; ++group_count; m_group.addElement(new_group); } m_ingroup[next.m_label] = m_group.size() - 1; if (CUtility.DEBUG) { CUtility.ASSERT(m_group.contains(new_group) == true); CUtility.ASSERT(m_group.contains(dtrans_group) == true); CUtility.ASSERT(dtrans_group.contains(first) == true); CUtility.ASSERT(dtrans_group.contains(next) == false); CUtility.ASSERT(new_group.contains(first) == false); CUtility.ASSERT(new_group.contains(next) == true); CUtility.ASSERT(dtrans_group.size() == group_size); CUtility.ASSERT(i == m_ingroup[first.m_label]); CUtility.ASSERT((m_group.size() - 1) == m_ingroup[next.m_label]); } break; } } } } } System.out.println(m_group.size() + " states after removal of redundant states."); if (m_spec.m_verbose && true == CUtility.OLD_DUMP_DEBUG) { System.out.println(); System.out.println("States grouped as follows after minimization"); pgroups(); } fix_dtrans(); } /*************************************************************** Function: init_groups Description: **************************************************************/ private void init_groups ( ) { int i; int j; int group_count; int size; CAccept accept; CDTrans dtrans; Vector dtrans_group; CDTrans first; boolean group_found; m_group = new Vector(); group_count = 0; size = m_spec.m_dtrans_vector.size(); m_ingroup = new int[size]; for (i = 0; i < size; ++i) { group_found = false; dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i); if (CUtility.DEBUG) { CUtility.ASSERT(i == dtrans.m_label); CUtility.ASSERT(false == group_found); CUtility.ASSERT(group_count == m_group.size()); } for (j = 0; j < group_count; ++j) { dtrans_group = (Vector) m_group.elementAt(j); if (CUtility.DEBUG) { CUtility.ASSERT(false == group_found); CUtility.ASSERT(0 < dtrans_group.size()); } first = (CDTrans) dtrans_group.elementAt(0); if (CUtility.SLOW_DEBUG) { CDTrans check; int k; int s; s = dtrans_group.size(); CUtility.ASSERT(0 < s); for (k = 1; k < s; ++k) { check = (CDTrans) dtrans_group.elementAt(k); CUtility.ASSERT(check.m_accept == first.m_accept); } } if (first.m_accept == dtrans.m_accept) { dtrans_group.addElement(dtrans); m_ingroup[i] = j; group_found = true; if (CUtility.DEBUG) { CUtility.ASSERT(j == m_ingroup[dtrans.m_label]); } break; } } if (false == group_found) { dtrans_group = new Vector(); dtrans_group.addElement(dtrans); m_ingroup[i] = m_group.size(); m_group.addElement(dtrans_group); ++group_count; } } if (m_spec.m_verbose && true == CUtility.OLD_DUMP_DEBUG) { System.out.println("Initial grouping:"); pgroups(); System.out.println(); } } /*************************************************************** Function: pset **************************************************************/ private void pset ( Vector dtrans_group ) { int i; int size; CDTrans dtrans; size = dtrans_group.size(); for (i = 0; i < size; ++i) { dtrans = (CDTrans) dtrans_group.elementAt(i); System.out.print(dtrans.m_label + " "); } } /*************************************************************** Function: pgroups **************************************************************/ private void pgroups ( ) { int i; int dtrans_size; int group_size; group_size = m_group.size(); for (i = 0; i < group_size; ++i) { System.out.print("\tGroup " + i + " {"); pset((Vector) m_group.elementAt(i)); System.out.println("}"); System.out.println(); } System.out.println(); dtrans_size = m_spec.m_dtrans_vector.size(); for (i = 0; i < dtrans_size; ++i) { System.out.println("\tstate " + i + " is in group " + m_ingroup[i]); } } } /*************************************************************** Class: CNfa2Dfa **************************************************************/ class CNfa2Dfa { /*************************************************************** Member Variables **************************************************************/ private CSpec m_spec; private int m_unmarked_dfa; private CLexGen m_lexGen; /*************************************************************** Constants **************************************************************/ private static final int NOT_IN_DSTATES = -1; /*************************************************************** Function: CNfa2Dfa **************************************************************/ CNfa2Dfa ( ) { reset(); } /*************************************************************** Function: set Description: **************************************************************/ private void set ( CLexGen lexGen, CSpec spec ) { m_lexGen = lexGen; m_spec = spec; m_unmarked_dfa = 0; } /*************************************************************** Function: reset Description: **************************************************************/ private void reset ( ) { m_lexGen = null; m_spec = null; m_unmarked_dfa = 0; } /*************************************************************** Function: make_dfa Description: High-level access function to module. **************************************************************/ void make_dfa ( CLexGen lexGen, CSpec spec ) { int i; reset(); set(lexGen,spec); make_dtrans(); free_nfa_states(); if (m_spec.m_verbose && true == CUtility.OLD_DUMP_DEBUG) { System.out.println(m_spec.m_dfa_states.size() + " DFA states in original machine."); } free_dfa_states(); } /*************************************************************** Function: make_dtrans Description: Creates uncompressed CDTrans transition table. **************************************************************/ private void make_dtrans ( ) /* throws java.lang.CloneNotSupportedException*/ { CDfa next; CDfa dfa; CBunch bunch; int i; int nextstate; int size; CDTrans dtrans; CNfa nfa; int istate; int nstates; System.out.print("Working on DFA states."); /* Reference passing type and initializations. */ bunch = new CBunch(); m_unmarked_dfa = 0; /* Allocate mapping array. */ nstates = m_spec.m_state_rules.length; m_spec.m_state_dtrans = new int[nstates]; for (istate = 0; nstates > istate; ++istate) { /* CSA bugfix: if we skip all zero size rules, then an specification with no rules produces an illegal lexer (0 states) instead of a lexer that rejects everything (1 nonaccepting state). [27-Jul-1999] if (0 == m_spec.m_state_rules[istate].size()) { m_spec.m_state_dtrans[istate] = CDTrans.F; continue; } */ /* Create start state and initialize fields. */ bunch.m_nfa_set = (Vector) m_spec.m_state_rules[istate].clone(); sortStates(bunch.m_nfa_set); bunch.m_nfa_bit = new SparseBitSet(); /* Initialize bit set. */ size = bunch.m_nfa_set.size(); for (i = 0; size > i; ++i) { nfa = (CNfa) bunch.m_nfa_set.elementAt(i); bunch.m_nfa_bit.set(nfa.m_label); } bunch.m_accept = null; bunch.m_anchor = CSpec.NONE; bunch.m_accept_index = CUtility.INT_MAX; e_closure(bunch); add_to_dstates(bunch); m_spec.m_state_dtrans[istate] = m_spec.m_dtrans_vector.size(); /* Main loop of CDTrans creation. */ while (null != (dfa = get_unmarked())) { System.out.print("."); System.out.flush(); if (CUtility.DEBUG) { CUtility.ASSERT(false == dfa.m_mark); } /* Get first unmarked node, then mark it. */ dfa.m_mark = true; /* Allocate new CDTrans, then initialize fields. */ dtrans = new CDTrans(m_spec.m_dtrans_vector.size(),m_spec); dtrans.m_accept = dfa.m_accept; dtrans.m_anchor = dfa.m_anchor; /* Set CDTrans array for each character transition. */ for (i = 0; i < m_spec.m_dtrans_ncols; ++i) { if (CUtility.DEBUG) { CUtility.ASSERT(0 <= i); CUtility.ASSERT(m_spec.m_dtrans_ncols > i); } /* Create new dfa set by attempting character transition. */ move(dfa.m_nfa_set,dfa.m_nfa_bit,i,bunch); if (null != bunch.m_nfa_set) { e_closure(bunch); } if (CUtility.DEBUG) { CUtility.ASSERT((null == bunch.m_nfa_set && null == bunch.m_nfa_bit) || (null != bunch.m_nfa_set && null != bunch.m_nfa_bit)); } /* Create new state or set state to empty. */ if (null == bunch.m_nfa_set) { nextstate = CDTrans.F; } else { nextstate = in_dstates(bunch); if (NOT_IN_DSTATES == nextstate) { nextstate = add_to_dstates(bunch); } } if (CUtility.DEBUG) { CUtility.ASSERT(nextstate < m_spec.m_dfa_states.size()); } dtrans.m_dtrans[i] = nextstate; } if (CUtility.DEBUG) { CUtility.ASSERT(m_spec.m_dtrans_vector.size() == dfa.m_label); } m_spec.m_dtrans_vector.addElement(dtrans); } } System.out.println(); } /*************************************************************** Function: free_dfa_states **************************************************************/ private void free_dfa_states ( ) { m_spec.m_dfa_states = null; m_spec.m_dfa_sets = null; } /*************************************************************** Function: free_nfa_states **************************************************************/ private void free_nfa_states ( ) { /* UNDONE: Remove references to nfas from within dfas. */ /* UNDONE: Don't free CAccepts. */ m_spec.m_nfa_states = null; m_spec.m_nfa_start = null; m_spec.m_state_rules = null; } /*************************************************************** Function: e_closure Description: Alters and returns input set. **************************************************************/ private void e_closure ( CBunch bunch ) { Stack nfa_stack; int size; int i; CNfa state; /* Debug checks. */ if (CUtility.DEBUG) { CUtility.ASSERT(null != bunch); CUtility.ASSERT(null != bunch.m_nfa_set); CUtility.ASSERT(null != bunch.m_nfa_bit); } bunch.m_accept = null; bunch.m_anchor = CSpec.NONE; bunch.m_accept_index = CUtility.INT_MAX; /* Create initial stack. */ nfa_stack = new Stack(); size = bunch.m_nfa_set.size(); for (i = 0; i < size; ++i) { state = (CNfa) bunch.m_nfa_set.elementAt(i); if (CUtility.DEBUG) { CUtility.ASSERT(bunch.m_nfa_bit.get(state.m_label)); } nfa_stack.push(state); } /* Main loop. */ while (false == nfa_stack.empty()) { state = (CNfa) nfa_stack.pop(); if (CUtility.OLD_DUMP_DEBUG) { if (null != state.m_accept) { System.out.println("Looking at accepting state " + state.m_label + " with <" + (new String(state.m_accept.m_action,0, state.m_accept.m_action_read)) + ">"); } } if (null != state.m_accept && state.m_label < bunch.m_accept_index) { bunch.m_accept_index = state.m_label; bunch.m_accept = state.m_accept; bunch.m_anchor = state.m_anchor; if (CUtility.OLD_DUMP_DEBUG) { System.out.println("Found accepting state " + state.m_label + " with <" + (new String(state.m_accept.m_action,0, state.m_accept.m_action_read)) + ">"); } if (CUtility.DEBUG) { CUtility.ASSERT(null != bunch.m_accept); CUtility.ASSERT(CSpec.NONE == bunch.m_anchor || 0 != (bunch.m_anchor & CSpec.END) || 0 != (bunch.m_anchor & CSpec.START)); } } if (CNfa.EPSILON == state.m_edge) { if (null != state.m_next) { if (false == bunch.m_nfa_set.contains(state.m_next)) { if (CUtility.DEBUG) { CUtility.ASSERT(false == bunch.m_nfa_bit.get(state.m_next.m_label)); } bunch.m_nfa_bit.set(state.m_next.m_label); bunch.m_nfa_set.addElement(state.m_next); nfa_stack.push(state.m_next); } } if (null != state.m_next2) { if (false == bunch.m_nfa_set.contains(state.m_next2)) { if (CUtility.DEBUG) { CUtility.ASSERT(false == bunch.m_nfa_bit.get(state.m_next2.m_label)); } bunch.m_nfa_bit.set(state.m_next2.m_label); bunch.m_nfa_set.addElement(state.m_next2); nfa_stack.push(state.m_next2); } } } } if (null != bunch.m_nfa_set) { sortStates(bunch.m_nfa_set); } return; } /*************************************************************** Function: move Description: Returns null if resulting NFA set is empty. **************************************************************/ void move ( Vector nfa_set, SparseBitSet nfa_bit, int b, CBunch bunch ) { int size; int index; CNfa state; bunch.m_nfa_set = null; bunch.m_nfa_bit = null; size = nfa_set.size(); for (index = 0; index < size; ++index) { state = (CNfa) nfa_set.elementAt(index); if (b == state.m_edge || (CNfa.CCL == state.m_edge && true == state.m_set.contains(b))) { if (null == bunch.m_nfa_set) { if (CUtility.DEBUG) { CUtility.ASSERT(null == bunch.m_nfa_bit); } bunch.m_nfa_set = new Vector(); /*bunch.m_nfa_bit = new SparseBitSet(m_spec.m_nfa_states.size());*/ bunch.m_nfa_bit = new SparseBitSet(); } bunch.m_nfa_set.addElement(state.m_next); /*System.out.println("Size of bitset: " + bunch.m_nfa_bit.size()); System.out.println("Reference index: " + state.m_next.m_label); System.out.flush();*/ bunch.m_nfa_bit.set(state.m_next.m_label); } } if (null != bunch.m_nfa_set) { if (CUtility.DEBUG) { CUtility.ASSERT(null != bunch.m_nfa_bit); } sortStates(bunch.m_nfa_set); } return; } /*************************************************************** Function: sortStates **************************************************************/ private void sortStates ( Vector nfa_set ) { CNfa elem; int begin; int size; int index; int value; int smallest_index; int smallest_value; CNfa begin_elem; size = nfa_set.size(); for (begin = 0; begin < size; ++begin) { elem = (CNfa) nfa_set.elementAt(begin); smallest_value = elem.m_label; smallest_index = begin; for (index = begin + 1; index < size; ++index) { elem = (CNfa) nfa_set.elementAt(index); value = elem.m_label; if (value < smallest_value) { smallest_index = index; smallest_value = value; } } begin_elem = (CNfa) nfa_set.elementAt(begin); elem = (CNfa) nfa_set.elementAt(smallest_index); nfa_set.setElementAt(elem,begin); nfa_set.setElementAt(begin_elem,smallest_index); } if (CUtility.OLD_DEBUG) { System.out.print("NFA vector indices: "); for (index = 0; index < size; ++index) { elem = (CNfa) nfa_set.elementAt(index); System.out.print(elem.m_label + " "); } System.out.println(); } return; } /*************************************************************** Function: get_unmarked Description: Returns next unmarked DFA state. **************************************************************/ private CDfa get_unmarked ( ) { int size; CDfa dfa; size = m_spec.m_dfa_states.size(); while (m_unmarked_dfa < size) { dfa = (CDfa) m_spec.m_dfa_states.elementAt(m_unmarked_dfa); if (false == dfa.m_mark) { if (CUtility.OLD_DUMP_DEBUG) { System.out.print("*"); System.out.flush(); } if (m_spec.m_verbose && true == CUtility.OLD_DUMP_DEBUG) { System.out.println("---------------"); System.out.print("working on DFA state " + m_unmarked_dfa + " = NFA states: "); m_lexGen.print_set(dfa.m_nfa_set); System.out.println(); } return dfa; } ++m_unmarked_dfa; } return null; } /*************************************************************** function: add_to_dstates Description: Takes as input a CBunch with details of a dfa state that needs to be created. 1) Allocates a new dfa state and saves it in the appropriate CSpec vector. 2) Initializes the fields of the dfa state with the information in the CBunch. 3) Returns index of new dfa. **************************************************************/ private int add_to_dstates ( CBunch bunch ) { CDfa dfa; if (CUtility.DEBUG) { CUtility.ASSERT(null != bunch.m_nfa_set); CUtility.ASSERT(null != bunch.m_nfa_bit); CUtility.ASSERT(null != bunch.m_accept || CSpec.NONE == bunch.m_anchor); } /* Allocate, passing CSpec so dfa label can be set. */ dfa = CAlloc.newCDfa(m_spec); /* Initialize fields, including the mark field. */ dfa.m_nfa_set = (Vector) bunch.m_nfa_set.clone(); dfa.m_nfa_bit = (SparseBitSet) bunch.m_nfa_bit.clone(); dfa.m_accept = bunch.m_accept; dfa.m_anchor = bunch.m_anchor; dfa.m_mark = false; /* Register dfa state using BitSet in CSpec Hashtable. */ m_spec.m_dfa_sets.put(dfa.m_nfa_bit,dfa); /*registerCDfa(dfa);*/ if (CUtility.OLD_DUMP_DEBUG) { System.out.print("Registering set : "); m_lexGen.print_set(dfa.m_nfa_set); System.out.println(); } return dfa.m_label; } /*************************************************************** Function: in_dstates **************************************************************/ private int in_dstates ( CBunch bunch ) { CDfa dfa; if (CUtility.OLD_DEBUG) { System.out.print("Looking for set : "); m_lexGen.print_set(bunch.m_nfa_set); } dfa = (CDfa) m_spec.m_dfa_sets.get(bunch.m_nfa_bit); if (null != dfa) { if (CUtility.OLD_DUMP_DEBUG) { System.out.println(" FOUND!"); } return dfa.m_label; } if (CUtility.OLD_DUMP_DEBUG) { System.out.println(" NOT FOUND!"); } return NOT_IN_DSTATES; } } /*************************************************************** Class: CAlloc **************************************************************/ class CAlloc { /*************************************************************** Function: newCDfa **************************************************************/ static CDfa newCDfa ( CSpec spec ) { CDfa dfa; dfa = new CDfa(spec.m_dfa_states.size()); spec.m_dfa_states.addElement(dfa); return dfa; } /*************************************************************** Function: newCNfaPair Description: **************************************************************/ static CNfaPair newCNfaPair ( ) { CNfaPair pair = new CNfaPair(); return pair; } /*************************************************************** Function: newNLPair Description: return a new CNfaPair that matches a new line: (\r\n?|[\n\uu2028\uu2029]) Added by CSA 8-Aug-1999, updated 10-Aug-1999 **************************************************************/ static CNfaPair newNLPair(CSpec spec) { CNfaPair pair = newCNfaPair(); pair.m_end=newCNfa(spec); // newline accepting state pair.m_start=newCNfa(spec); // new state with two epsilon edges pair.m_start.m_next = newCNfa(spec); pair.m_start.m_next.m_edge = CNfa.CCL; pair.m_start.m_next.m_set = new CSet(); pair.m_start.m_next.m_set.add('\n'); if (spec.m_dtrans_ncols-CSpec.NUM_PSEUDO > 2029) { pair.m_start.m_next.m_set.add(2028); /*U+2028 is LS, the line separator*/ pair.m_start.m_next.m_set.add(2029); /*U+2029 is PS, the paragraph sep.*/ } pair.m_start.m_next.m_next = pair.m_end; // accept '\n', U+2028, or U+2029 pair.m_start.m_next2 = newCNfa(spec); pair.m_start.m_next2.m_edge = '\r'; pair.m_start.m_next2.m_next = newCNfa(spec); pair.m_start.m_next2.m_next.m_next = pair.m_end; // accept '\r'; pair.m_start.m_next2.m_next.m_next2 = newCNfa(spec); pair.m_start.m_next2.m_next.m_next2.m_edge = '\n'; pair.m_start.m_next2.m_next.m_next2.m_next = pair.m_end; // accept '\r\n'; return pair; } /*************************************************************** Function: newCNfa Description: **************************************************************/ static CNfa newCNfa ( CSpec spec ) { CNfa p; /* UNDONE: Buffer this? */ p = new CNfa(); /*p.m_label = spec.m_nfa_states.size();*/ spec.m_nfa_states.addElement(p); p.m_edge = CNfa.EPSILON; return p; } } /*************************************************************** Class: Main Description: Top-level lexical analyzer generator function. **************************************************************/ public class Main { /*************************************************************** Function: main **************************************************************/ public static void main ( String arg[] ) throws java.io.IOException { CLexGen lg; if (arg.length < 1) { System.out.println("Usage: JLex.Main "); return; } /* Note: For debuging, it may be helpful to remove the try/catch block and permit the Exception to propagate to the top level. This gives more information. */ try { lg = new CLexGen(arg[0]); lg.generate(); } catch (Error e) { System.out.println(e.getMessage()); } } } /*************************************************************** Class: CDTrans **************************************************************/ class CDTrans { /************************************************************* Member Variables ***********************************************************/ int m_dtrans[]; CAccept m_accept; int m_anchor; int m_label; /************************************************************* Constants ***********************************************************/ static final int F = -1; /************************************************************* Function: CTrans ***********************************************************/ CDTrans ( int label, CSpec spec ) { m_dtrans = new int[spec.m_dtrans_ncols]; m_accept = null; m_anchor = CSpec.NONE; m_label = label; } } /*************************************************************** Class: CDfa **************************************************************/ class CDfa { /*************************************************************** Member Variables ***********************************************************/ int m_group; boolean m_mark; CAccept m_accept; int m_anchor; Vector m_nfa_set; SparseBitSet m_nfa_bit; int m_label; /*************************************************************** Function: CDfa **************************************************************/ CDfa ( int label ) { m_group = 0; m_mark = false; m_accept = null; m_anchor = CSpec.NONE; m_nfa_set = null; m_nfa_bit = null; m_label = label; } } /*************************************************************** Class: CAccept **************************************************************/ class CAccept { /*************************************************************** Member Variables **************************************************************/ char m_action[]; int m_action_read; int m_line_number; /*************************************************************** Function: CAccept **************************************************************/ CAccept ( char action[], int action_read, int line_number ) { int elem; m_action_read = action_read; m_action = new char[m_action_read]; for (elem = 0; elem < m_action_read; ++elem) { m_action[elem] = action[elem]; } m_line_number = line_number; } /*************************************************************** Function: CAccept **************************************************************/ CAccept ( CAccept accept ) { int elem; m_action_read = accept.m_action_read; m_action = new char[m_action_read]; for (elem = 0; elem < m_action_read; ++elem) { m_action[elem] = accept.m_action[elem]; } m_line_number = accept.m_line_number; } /*************************************************************** Function: mimic **************************************************************/ void mimic ( CAccept accept ) { int elem; m_action_read = accept.m_action_read; m_action = new char[m_action_read]; for (elem = 0; elem < m_action_read; ++elem) { m_action[elem] = accept.m_action[elem]; } } } /*************************************************************** Class: CAcceptAnchor **************************************************************/ class CAcceptAnchor { /*************************************************************** Member Variables **************************************************************/ CAccept m_accept; int m_anchor; /*************************************************************** Function: CAcceptAnchor **************************************************************/ CAcceptAnchor ( ) { m_accept = null; m_anchor = CSpec.NONE; } } /*************************************************************** Class: CNfaPair **************************************************************/ class CNfaPair { /*************************************************************** Member Variables **************************************************************/ CNfa m_start; CNfa m_end; /*************************************************************** Function: CNfaPair **************************************************************/ CNfaPair ( ) { m_start = null; m_end = null; } } /*************************************************************** Class: CInput Description: **************************************************************/ class CInput { /*************************************************************** Member Variables **************************************************************/ private java.io.BufferedReader m_input; /* JLex specification file. */ boolean m_eof_reached; /* Whether EOF has been encountered. */ boolean m_pushback_line; char m_line[]; /* Line buffer. */ int m_line_read; /* Number of bytes read into line buffer. */ int m_line_index; /* Current index into line buffer. */ int m_line_number; /* Current line number. */ /*************************************************************** Constants **************************************************************/ static final boolean EOF = true; static final boolean NOT_EOF = false; /*************************************************************** Function: CInput Description: **************************************************************/ CInput ( java.io.Reader input ) { if (CUtility.DEBUG) { CUtility.ASSERT(null != input); } /* Initialize input stream. */ m_input = new java.io.BufferedReader(input); /* Initialize buffers and index counters. */ m_line = null; m_line_read = 0; m_line_index = 0; /* Initialize state variables. */ m_eof_reached = false; m_line_number = 0; m_pushback_line = false; } /*************************************************************** Function: getLine Description: Returns true on EOF, false otherwise. Guarantees not to return a blank line, or a line of zero length. **************************************************************/ boolean getLine ( ) throws java.io.IOException { String lineStr; int elem; /* Has EOF already been reached? */ if (m_eof_reached) { return EOF; } /* Pushback current line? */ if (m_pushback_line) { m_pushback_line = false; /* Check for empty line. */ for (elem = 0; elem < m_line_read; ++elem) { if (false == CUtility.isspace(m_line[elem])) { break; } } /* Nonempty? */ if (elem < m_line_read) { m_line_index = 0; return NOT_EOF; } } while (true) { if (null == (lineStr = m_input.readLine())) { m_eof_reached = true; m_line_index = 0; return EOF; } m_line = (lineStr + "\n").toCharArray(); m_line_read=m_line.length; ++m_line_number; /* Check for empty lines and discard them. */ elem = 0; while (CUtility.isspace(m_line[elem])) { ++elem; if (elem == m_line_read) { break; } } if (elem < m_line_read) { break; } } m_line_index = 0; return NOT_EOF; } } /******************************************************** Class: Utility *******************************************************/ class CUtility { /******************************************************** Constants *******************************************************/ static final boolean DEBUG = true; static final boolean SLOW_DEBUG = true; static final boolean DUMP_DEBUG = true; /*static final boolean DEBUG = false; static final boolean SLOW_DEBUG = false; static final boolean DUMP_DEBUG = false;*/ static final boolean DESCENT_DEBUG = false; static final boolean OLD_DEBUG = false; static final boolean OLD_DUMP_DEBUG = false; static final boolean FOODEBUG = false; static final boolean DO_DEBUG = false; /******************************************************** Constants: Integer Bounds *******************************************************/ static final int INT_MAX = 2147483647; static final int MAX_SEVEN_BIT = 127; static final int MAX_EIGHT_BIT = 255; static final int MAX_SIXTEEN_BIT=65535; /******************************************************** Function: enter Description: Debugging routine. *******************************************************/ static void enter ( String descent, char lexeme, int token ) { System.out.println("Entering " + descent + " [lexeme: " + lexeme + "] [token: " + token + "]"); } /******************************************************** Function: leave Description: Debugging routine. *******************************************************/ static void leave ( String descent, char lexeme, int token ) { System.out.println("Leaving " + descent + " [lexeme:" + lexeme + "] [token:" + token + "]"); } /******************************************************** Function: ASSERT Description: Debugging routine. *******************************************************/ static void ASSERT ( boolean expr ) { if (DEBUG && false == expr) { System.out.println("Assertion Failed"); throw new Error("Assertion Failed."); } } /*************************************************************** Function: doubleSize **************************************************************/ static char[] doubleSize ( char oldBuffer[] ) { char newBuffer[] = new char[2 * oldBuffer.length]; int elem; for (elem = 0; elem < oldBuffer.length; ++elem) { newBuffer[elem] = oldBuffer[elem]; } return newBuffer; } /*************************************************************** Function: doubleSize **************************************************************/ static byte[] doubleSize ( byte oldBuffer[] ) { byte newBuffer[] = new byte[2 * oldBuffer.length]; int elem; for (elem = 0; elem < oldBuffer.length; ++elem) { newBuffer[elem] = oldBuffer[elem]; } return newBuffer; } /******************************************************** Function: hex2bin *******************************************************/ static char hex2bin ( char c ) { if ('0' <= c && '9' >= c) { return (char) (c - '0'); } else if ('a' <= c && 'f' >= c) { return (char) (c - 'a' + 10); } else if ('A' <= c && 'F' >= c) { return (char) (c - 'A' + 10); } CError.impos("Bad hexidecimal digit" + c); return 0; } /******************************************************** Function: ishexdigit *******************************************************/ static boolean ishexdigit ( char c ) { if (('0' <= c && '9' >= c) || ('a' <= c && 'f' >= c) || ('A' <= c && 'F' >= c)) { return true; } return false; } /******************************************************** Function: oct2bin *******************************************************/ static char oct2bin ( char c ) { if ('0' <= c && '7' >= c) { return (char) (c - '0'); } CError.impos("Bad octal digit " + c); return 0; } /******************************************************** Function: isoctdigit *******************************************************/ static boolean isoctdigit ( char c ) { if ('0' <= c && '7' >= c) { return true; } return false; } /******************************************************** Function: isspace *******************************************************/ static boolean isspace ( char c ) { if ('\b' == c || '\t' == c || '\n' == c || '\f' == c || '\r' == c || ' ' == c) { return true; } return false; } /******************************************************** Function: isnewline *******************************************************/ static boolean isnewline ( char c ) { if ('\n' == c || '\r' == c) { return true; } return false; } /******************************************************** Function: bytencmp Description: Compares up to n elements of byte array a[] against byte array b[]. The first byte comparison is made between a[a_first] and b[b_first]. Comparisons continue until the null terminating byte '\0' is reached or until n bytes are compared. Return Value: Returns 0 if arrays are the same up to and including the null terminating byte or up to and including the first n bytes, whichever comes first. *******************************************************/ static int bytencmp ( byte a[], int a_first, byte b[], int b_first, int n ) { int elem; for (elem = 0; elem < n; ++elem) { /*System.out.print((char) a[a_first + elem]); System.out.print((char) b[b_first + elem]);*/ if ('\0' == a[a_first + elem] && '\0' == b[b_first + elem]) { /*System.out.println("return 0");*/ return 0; } if (a[a_first + elem] < b[b_first + elem]) { /*System.out.println("return 1");*/ return 1; } else if (a[a_first + elem] > b[b_first + elem]) { /*System.out.println("return -1");*/ return -1; } } /*System.out.println("return 0");*/ return 0; } /******************************************************** Function: charncmp *******************************************************/ static int charncmp ( char a[], int a_first, char b[], int b_first, int n ) { int elem; for (elem = 0; elem < n; ++elem) { if ('\0' == a[a_first + elem] && '\0' == b[b_first + elem]) { return 0; } if (a[a_first + elem] < b[b_first + elem]) { return 1; } else if (a[a_first + elem] > b[b_first + elem]) { return -1; } } return 0; } } /******************************************************** Class: CError *******************************************************/ class CError { /******************************************************** Function: impos Description: *******************************************************/ static void impos ( String message ) { System.out.println("JLex Error: " + message); } /******************************************************** Constants Description: Error codes for parse_error(). *******************************************************/ static final int E_BADEXPR = 0; static final int E_PAREN = 1; static final int E_LENGTH = 2; static final int E_BRACKET = 3; static final int E_BOL = 4; static final int E_CLOSE = 5; static final int E_NEWLINE = 6; static final int E_BADMAC = 7; static final int E_NOMAC = 8; static final int E_MACDEPTH = 9; static final int E_INIT = 10; static final int E_EOF = 11; static final int E_DIRECT = 12; static final int E_INTERNAL = 13; static final int E_STATE = 14; static final int E_MACDEF = 15; static final int E_SYNTAX = 16; static final int E_BRACE = 17; static final int E_DASH = 18; static final int E_ZERO = 19; static final int E_BADCTRL = 20; /******************************************************** Constants Description: String messages for parse_error(); *******************************************************/ static final String errmsg[] = { "Malformed regular expression.", "Missing close parenthesis.", "Too many regular expressions or expression too long.", "Missing [ in character class.", "^ must be at start of expression or after [.", "+ ? or * must follow an expression or subexpression.", "Newline in quoted string.", "Missing } in macro expansion.", "Macro does not exist.", "Macro expansions nested too deeply.", "JLex has not been successfully initialized.", "Unexpected end-of-file found.", "Undefined or badly-formed JLex directive.", "Internal JLex error.", "Unitialized state name.", "Badly formed macro definition.", "Syntax error.", "Missing brace at start of lexical action.", "Special character dash - in character class [...] must\n" + "\tbe preceded by start-of-range character.", "Zero-length regular expression.", "Illegal \\^C-style escape sequence (character following caret must\n" + "\tbe alphabetic).", }; /******************************************************** Function: parse_error Description: *******************************************************/ static void parse_error ( int error_code, int line_number ) { System.out.println("Error: Parse error at line " + line_number + "."); System.out.println("Description: " + errmsg[error_code]); throw new Error("Parse error."); } } /******************************************************** Class: CSet *******************************************************/ class CSet { /******************************************************** Member Variables *******************************************************/ private SparseBitSet m_set; private boolean m_complement; /******************************************************** Function: CSet *******************************************************/ CSet ( ) { m_set = new SparseBitSet(); m_complement = false; } /******************************************************** Function: complement *******************************************************/ void complement ( ) { m_complement = true; } /******************************************************** Function: add *******************************************************/ void add ( int i ) { m_set.set(i); } /******************************************************** Function: addncase *******************************************************/ void addncase // add, ignoring case. ( char c ) { /* Do this in a Unicode-friendly way. */ /* (note that duplicate adds have no effect) */ add(c); add(Character.toLowerCase(c)); add(Character.toTitleCase(c)); add(Character.toUpperCase(c)); } /******************************************************** Function: contains *******************************************************/ boolean contains ( int i ) { boolean result; result = m_set.get(i); if (m_complement) { return (false == result); } return result; } /******************************************************** Function: mimic *******************************************************/ void mimic ( CSet set ) { m_complement = set.m_complement; m_set = (SparseBitSet) set.m_set.clone(); } /** Map set using character classes [CSA] */ void map(CSet set, int[] mapping) { m_complement = set.m_complement; m_set.clearAll(); for (Enumeration e=set.m_set.elements(); e.hasMoreElements(); ) { int old_value =((Integer)e.nextElement()).intValue(); if (old_value= m_input.m_line_read) { CError.parse_error(CError.E_DIRECT,0); } /* Determine length. */ elem = m_input.m_line_index; while (elem < m_input.m_line_read && false == CUtility.isnewline(m_input.m_line[elem])) { ++elem; } /* Allocate non-terminated buffer of exact length. */ buffer = new char[elem - m_input.m_line_index]; /* Copy. */ elem = 0; while (m_input.m_line_index < m_input.m_line_read && false == CUtility.isnewline(m_input.m_line[m_input.m_line_index])) { buffer[elem] = m_input.m_line[m_input.m_line_index]; ++elem; ++m_input.m_line_index; } return buffer; } private final int CLASS_CODE = 0; private final int INIT_CODE = 1; private final int EOF_CODE = 2; private final int INIT_THROW_CODE = 3; private final int YYLEX_THROW_CODE = 4; private final int EOF_THROW_CODE = 5; private final int EOF_VALUE_CODE = 6; /*************************************************************** Function: packCode Description: **************************************************************/ private char[] packCode ( char start_dir[], char end_dir[], char prev_code[], int prev_read, int specified ) throws java.io.IOException { if (CUtility.DEBUG) { CUtility.ASSERT(INIT_CODE == specified || CLASS_CODE == specified || EOF_CODE == specified || EOF_VALUE_CODE == specified || INIT_THROW_CODE == specified || YYLEX_THROW_CODE == specified || EOF_THROW_CODE == specified); } if (0 != CUtility.charncmp(m_input.m_line, 0, start_dir, 0, start_dir.length - 1)) { CError.parse_error(CError.E_INTERNAL,0); } if (null == prev_code) { prev_code = new char[BUFFER_SIZE]; prev_read = 0; } if (prev_read >= prev_code.length) { prev_code = CUtility.doubleSize(prev_code); } m_input.m_line_index = start_dir.length - 1; while (true) { while (m_input.m_line_index >= m_input.m_line_read) { if (m_input.getLine()) { CError.parse_error(CError.E_EOF,m_input.m_line_number); } if (0 == CUtility.charncmp(m_input.m_line, 0, end_dir, 0, end_dir.length - 1)) { m_input.m_line_index = end_dir.length - 1; switch (specified) { case CLASS_CODE: m_spec.m_class_read = prev_read; break; case INIT_CODE: m_spec.m_init_read = prev_read; break; case EOF_CODE: m_spec.m_eof_read = prev_read; break; case EOF_VALUE_CODE: m_spec.m_eof_value_read = prev_read; break; case INIT_THROW_CODE: m_spec.m_init_throw_read = prev_read; break; case YYLEX_THROW_CODE: m_spec.m_yylex_throw_read = prev_read; break; case EOF_THROW_CODE: m_spec.m_eof_throw_read = prev_read; break; default: CError.parse_error(CError.E_INTERNAL,m_input.m_line_number); break; } return prev_code; } } while (m_input.m_line_index < m_input.m_line_read) { prev_code[prev_read] = m_input.m_line[m_input.m_line_index]; ++prev_read; ++m_input.m_line_index; if (prev_read >= prev_code.length) { prev_code = CUtility.doubleSize(prev_code); } } } } /*************************************************************** Member Variables: JLex directives. **************************************************************/ private char m_state_dir[] = { '%', 's', 't', 'a', 't', 'e', '\0' }; private char m_char_dir[] = { '%', 'c', 'h', 'a', 'r', '\0' }; private char m_line_dir[] = { '%', 'l', 'i', 'n', 'e', '\0' }; private char m_cup_dir[] = { '%', 'c', 'u', 'p', '\0' }; private char m_class_dir[] = { '%', 'c', 'l', 'a', 's', 's', '\0' }; private char m_implements_dir[] = { '%', 'i', 'm', 'p', 'l', 'e', 'm', 'e', 'n', 't', 's', '\0' }; private char m_function_dir[] = { '%', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', '\0' }; private char m_type_dir[] = { '%', 't', 'y', 'p', 'e', '\0' }; private char m_integer_dir[] = { '%', 'i', 'n', 't', 'e', 'g', 'e', 'r', '\0' }; private char m_intwrap_dir[] = { '%', 'i', 'n', 't', 'w', 'r', 'a', 'p', '\0' }; private char m_full_dir[] = { '%', 'f', 'u', 'l', 'l', '\0' }; private char m_unicode_dir[] = { '%', 'u', 'n', 'i', 'c', 'o', 'd', 'e', '\0' }; private char m_ignorecase_dir[] = { '%', 'i', 'g', 'n', 'o', 'r', 'e', 'c', 'a', 's', 'e', '\0' }; private char m_notunix_dir[] = { '%', 'n', 'o', 't', 'u', 'n', 'i', 'x', '\0' }; private char m_init_code_dir[] = { '%', 'i', 'n', 'i', 't', '{', '\0' }; private char m_init_code_end_dir[] = { '%', 'i', 'n', 'i', 't', '}', '\0' }; private char m_init_throw_code_dir[] = { '%', 'i', 'n', 'i', 't', 't', 'h', 'r', 'o', 'w', '{', '\0' }; private char m_init_throw_code_end_dir[] = { '%', 'i', 'n', 'i', 't', 't', 'h', 'r', 'o', 'w', '}', '\0' }; private char m_yylex_throw_code_dir[] = { '%', 'y', 'y', 'l', 'e', 'x', 't', 'h', 'r', 'o', 'w', '{', '\0' }; private char m_yylex_throw_code_end_dir[] = { '%', 'y', 'y', 'l', 'e', 'x', 't', 'h', 'r', 'o', 'w', '}', '\0' }; private char m_eof_code_dir[] = { '%', 'e', 'o', 'f', '{', '\0' }; private char m_eof_code_end_dir[] = { '%', 'e', 'o', 'f', '}', '\0' }; private char m_eof_value_code_dir[] = { '%', 'e', 'o', 'f', 'v', 'a', 'l', '{', '\0' }; private char m_eof_value_code_end_dir[] = { '%', 'e', 'o', 'f', 'v', 'a', 'l', '}', '\0' }; private char m_eof_throw_code_dir[] = { '%', 'e', 'o', 'f', 't', 'h', 'r', 'o', 'w', '{', '\0' }; private char m_eof_throw_code_end_dir[] = { '%', 'e', 'o', 'f', 't', 'h', 'r', 'o', 'w', '}', '\0' }; private char m_class_code_dir[] = { '%', '{', '\0' }; private char m_class_code_end_dir[] = { '%', '}', '\0' }; private char m_yyeof_dir[] = { '%', 'y', 'y', 'e', 'o', 'f', '\0' }; private char m_public_dir[] = { '%', 'p', 'u', 'b', 'l', 'i', 'c', '\0' }; /*************************************************************** Function: userDeclare Description: **************************************************************/ private void userDeclare ( ) throws java.io.IOException { int elem; if (CUtility.DEBUG) { CUtility.ASSERT(null != this); CUtility.ASSERT(null != m_outstream); CUtility.ASSERT(null != m_input); CUtility.ASSERT(null != m_tokens); CUtility.ASSERT(null != m_spec); } if (m_input.m_eof_reached) { /* End-of-file. */ CError.parse_error(CError.E_EOF, m_input.m_line_number); } while (false == m_input.getLine()) { /* Look for double percent. */ if (2 <= m_input.m_line_read && '%' == m_input.m_line[0] && '%' == m_input.m_line[1]) { /* Mess around with line. */ m_input.m_line_read -= 2; System.arraycopy(m_input.m_line, 2, m_input.m_line, 0, m_input.m_line_read); m_input.m_pushback_line = true; /* Check for and discard empty line. */ if (0 == m_input.m_line_read || '\n' == m_input.m_line[0]) { m_input.m_pushback_line = false; } return; } if (0 == m_input.m_line_read) { continue; } if ('%' == m_input.m_line[0]) { /* Special lex declarations. */ if (1 >= m_input.m_line_read) { CError.parse_error(CError.E_DIRECT, m_input.m_line_number); continue; } switch (m_input.m_line[1]) { case '{': if (0 == CUtility.charncmp(m_input.m_line, 0, m_class_code_dir, 0, m_class_code_dir.length - 1)) { m_spec.m_class_code = packCode(m_class_code_dir, m_class_code_end_dir, m_spec.m_class_code, m_spec.m_class_read, CLASS_CODE); break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'c': if (0 == CUtility.charncmp(m_input.m_line, 0, m_char_dir, 0, m_char_dir.length - 1)) { /* Set line counting to ON. */ m_input.m_line_index = m_char_dir.length; m_spec.m_count_chars = true; break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_class_dir, 0, m_class_dir.length - 1)) { m_input.m_line_index = m_class_dir.length; m_spec.m_class_name = getName(); break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_cup_dir, 0, m_cup_dir.length - 1)) { /* Set Java CUP compatibility to ON. */ m_input.m_line_index = m_cup_dir.length; m_spec.m_cup_compatible = true; // this is what %cup does: [CSA, 27-Jul-1999] m_spec.m_implements_name = "java_cup.runtime.Scanner".toCharArray(); m_spec.m_function_name = "next_token".toCharArray(); m_spec.m_type_name = "java_cup.runtime.Symbol".toCharArray(); break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'e': if (0 == CUtility.charncmp(m_input.m_line, 0, m_eof_code_dir, 0, m_eof_code_dir.length - 1)) { m_spec.m_eof_code = packCode(m_eof_code_dir, m_eof_code_end_dir, m_spec.m_eof_code, m_spec.m_eof_read, EOF_CODE); break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_eof_value_code_dir, 0, m_eof_value_code_dir.length - 1)) { m_spec.m_eof_value_code = packCode(m_eof_value_code_dir, m_eof_value_code_end_dir, m_spec.m_eof_value_code, m_spec.m_eof_value_read, EOF_VALUE_CODE); break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_eof_throw_code_dir, 0, m_eof_throw_code_dir.length - 1)) { m_spec.m_eof_throw_code = packCode(m_eof_throw_code_dir, m_eof_throw_code_end_dir, m_spec.m_eof_throw_code, m_spec.m_eof_throw_read, EOF_THROW_CODE); break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'f': if (0 == CUtility.charncmp(m_input.m_line, 0, m_function_dir, 0, m_function_dir.length - 1)) { /* Set line counting to ON. */ m_input.m_line_index = m_function_dir.length; m_spec.m_function_name = getName(); break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_full_dir, 0, m_full_dir.length - 1)) { m_input.m_line_index = m_full_dir.length; m_spec.m_dtrans_ncols = CUtility.MAX_EIGHT_BIT + 1; break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'i': if (0 == CUtility.charncmp(m_input.m_line, 0, m_integer_dir, 0, m_integer_dir.length - 1)) { /* Set line counting to ON. */ m_input.m_line_index = m_integer_dir.length; m_spec.m_integer_type = true; break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_intwrap_dir, 0, m_intwrap_dir.length - 1)) { /* Set line counting to ON. */ m_input.m_line_index = m_integer_dir.length; m_spec.m_intwrap_type = true; break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_init_code_dir, 0, m_init_code_dir.length - 1)) { m_spec.m_init_code = packCode(m_init_code_dir, m_init_code_end_dir, m_spec.m_init_code, m_spec.m_init_read, INIT_CODE); break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_init_throw_code_dir, 0, m_init_throw_code_dir.length - 1)) { m_spec.m_init_throw_code = packCode(m_init_throw_code_dir, m_init_throw_code_end_dir, m_spec.m_init_throw_code, m_spec.m_init_throw_read, INIT_THROW_CODE); break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_implements_dir, 0, m_implements_dir.length - 1)) { m_input.m_line_index = m_implements_dir.length; m_spec.m_implements_name = getName(); break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_ignorecase_dir, 0, m_ignorecase_dir.length-1)) { /* Set m_ignorecase to ON. */ m_input.m_line_index = m_ignorecase_dir.length; m_spec.m_ignorecase = true; break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'l': if (0 == CUtility.charncmp(m_input.m_line, 0, m_line_dir, 0, m_line_dir.length - 1)) { /* Set line counting to ON. */ m_input.m_line_index = m_line_dir.length; m_spec.m_count_lines = true; break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'n': if (0 == CUtility.charncmp(m_input.m_line, 0, m_notunix_dir, 0, m_notunix_dir.length - 1)) { /* Set line counting to ON. */ m_input.m_line_index = m_notunix_dir.length; m_spec.m_unix = false; break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'p': if (0 == CUtility.charncmp(m_input.m_line, 0, m_public_dir, 0, m_public_dir.length - 1)) { /* Set public flag. */ m_input.m_line_index = m_public_dir.length; m_spec.m_public = true; break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 's': if (0 == CUtility.charncmp(m_input.m_line, 0, m_state_dir, 0, m_state_dir.length - 1)) { /* Recognize state list. */ m_input.m_line_index = m_state_dir.length; saveStates(); break; } /* Undefined directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 't': if (0 == CUtility.charncmp(m_input.m_line, 0, m_type_dir, 0, m_type_dir.length - 1)) { /* Set Java CUP compatibility to ON. */ m_input.m_line_index = m_type_dir.length; m_spec.m_type_name = getName(); break; } /* Undefined directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'u': if (0 == CUtility.charncmp(m_input.m_line, 0, m_unicode_dir, 0, m_unicode_dir.length - 1)) { m_input.m_line_index = m_unicode_dir.length; m_spec.m_dtrans_ncols= CUtility.MAX_SIXTEEN_BIT + 1; break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; case 'y': if (0 == CUtility.charncmp(m_input.m_line, 0, m_yyeof_dir, 0, m_yyeof_dir.length - 1)) { m_input.m_line_index = m_yyeof_dir.length; m_spec.m_yyeof = true; break; } else if (0 == CUtility.charncmp(m_input.m_line, 0, m_yylex_throw_code_dir, 0, m_yylex_throw_code_dir.length - 1)) { m_spec.m_yylex_throw_code = packCode(m_yylex_throw_code_dir, m_yylex_throw_code_end_dir, m_spec.m_yylex_throw_code, m_spec.m_yylex_throw_read, YYLEX_THROW_CODE); break; } /* Bad directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; default: /* Undefined directive. */ CError.parse_error(CError.E_DIRECT, m_input.m_line_number); break; } } else { /* Regular expression macro. */ m_input.m_line_index = 0; saveMacro(); } if (CUtility.OLD_DEBUG) { System.out.println("Line number " + m_input.m_line_number + ":"); System.out.print(new String(m_input.m_line, 0,m_input.m_line_read)); } } } /*************************************************************** Function: userRules Description: Processes third section of JLex specification and creates minimized transition table. **************************************************************/ private void userRules ( ) throws java.io.IOException { int code; if (false == m_init_flag) { CError.parse_error(CError.E_INIT,0); } if (CUtility.DEBUG) { CUtility.ASSERT(null != this); CUtility.ASSERT(null != m_outstream); CUtility.ASSERT(null != m_input); CUtility.ASSERT(null != m_tokens); CUtility.ASSERT(null != m_spec); } /* UNDONE: Need to handle states preceding rules. */ if (m_spec.m_verbose) { System.out.println("Creating NFA machine representation."); } m_makeNfa.allocate_BOL_EOF(m_spec); m_makeNfa.thompson(this,m_spec,m_input); m_simplifyNfa.simplify(m_spec); /*print_nfa();*/ if (CUtility.DEBUG) { CUtility.ASSERT(END_OF_INPUT == m_spec.m_current_token); } if (m_spec.m_verbose) { System.out.println("Creating DFA transition table."); } m_nfa2dfa.make_dfa(this,m_spec); if (CUtility.FOODEBUG) { print_header(); } if (m_spec.m_verbose) { System.out.println("Minimizing DFA transition table."); } m_minimize.min_dfa(m_spec); } /*************************************************************** Function: printccl Description: Debugging routine that outputs readable form of character class. **************************************************************/ private void printccl ( CSet set ) { int i; System.out.print(" ["); for (i = 0; i < m_spec.m_dtrans_ncols; ++i) { if (set.contains(i)) { System.out.print(interp_int(i)); } } System.out.print(']'); } /*************************************************************** Function: plab Description: **************************************************************/ private String plab ( CNfa state ) { int index; if (null == state) { return (new String("--")); } index = m_spec.m_nfa_states.indexOf(state); return ((new Integer(index)).toString()); } /*************************************************************** Function: interp_int Description: **************************************************************/ private String interp_int ( int i ) { switch (i) { case (int) '\b': return (new String("\\b")); case (int) '\t': return (new String("\\t")); case (int) '\n': return (new String("\\n")); case (int) '\f': return (new String("\\f")); case (int) '\r': return (new String("\\r")); case (int) ' ': return (new String("\\ ")); default: return ((new Character((char) i)).toString()); } } /*************************************************************** Function: print_nfa Description: **************************************************************/ void print_nfa ( ) { int elem; CNfa nfa; int size; Enumeration states; Integer index; int i; int j; int vsize; String state; System.out.println("--------------------- NFA -----------------------"); size = m_spec.m_nfa_states.size(); for (elem = 0; elem < size; ++elem) { nfa = (CNfa) m_spec.m_nfa_states.elementAt(elem); System.out.print("Nfa state " + plab(nfa) + ": "); if (null == nfa.m_next) { System.out.print("(TERMINAL)"); } else { System.out.print("--> " + plab(nfa.m_next)); System.out.print("--> " + plab(nfa.m_next2)); switch (nfa.m_edge) { case CNfa.CCL: printccl(nfa.m_set); break; case CNfa.EPSILON: System.out.print(" EPSILON "); break; default: System.out.print(" " + interp_int(nfa.m_edge)); break; } } if (0 == elem) { System.out.print(" (START STATE)"); } if (null != nfa.m_accept) { System.out.print(" accepting " + ((0 != (nfa.m_anchor & CSpec.START)) ? "^" : "") + "<" + (new String(nfa.m_accept.m_action,0, nfa.m_accept.m_action_read)) + ">" + ((0 != (nfa.m_anchor & CSpec.END)) ? "$" : "")); } System.out.println(); } states = m_spec.m_states.keys(); while (states.hasMoreElements()) { state = (String) states.nextElement(); index = (Integer) m_spec.m_states.get(state); if (CUtility.DEBUG) { CUtility.ASSERT(null != state); CUtility.ASSERT(null != index); } System.out.println("State \"" + state + "\" has identifying index " + index.toString() + "."); System.out.print("\tStart states of matching rules: "); i = index.intValue(); vsize = m_spec.m_state_rules[i].size(); for (j = 0; j < vsize; ++j) { nfa = (CNfa) m_spec.m_state_rules[i].elementAt(j); System.out.print(m_spec.m_nfa_states.indexOf(nfa) + " "); } System.out.println(); } System.out.println("-------------------- NFA ----------------------"); } /*************************************************************** Function: getStates Description: Parses the state area of a rule, from the beginning of a line. < state1, state2 ... > regular_expression { action } Returns null on only EOF. Returns all_states, initialied properly to correspond to all states, if no states are found. Special Notes: This function treats commas as optional and permits states to be spread over multiple lines. **************************************************************/ private SparseBitSet all_states = null; SparseBitSet getStates ( ) throws java.io.IOException { int start_state; int count_state; SparseBitSet states; String name; Integer index; int i; int size; if (CUtility.DEBUG) { CUtility.ASSERT(null != this); CUtility.ASSERT(null != m_outstream); CUtility.ASSERT(null != m_input); CUtility.ASSERT(null != m_tokens); CUtility.ASSERT(null != m_spec); } states = null; /* Skip white space. */ while (CUtility.isspace(m_input.m_line[m_input.m_line_index])) { ++m_input.m_line_index; while (m_input.m_line_index >= m_input.m_line_read) { /* Must just be an empty line. */ if (m_input.getLine()) { /* EOF found. */ return null; } } } /* Look for states. */ if ('<' == m_input.m_line[m_input.m_line_index]) { ++m_input.m_line_index; states = new SparseBitSet(); /* Parse states. */ while (true) { /* We may have reached the end of the line. */ while (m_input.m_line_index >= m_input.m_line_read) { if (m_input.getLine()) { /* EOF found. */ CError.parse_error(CError.E_EOF,m_input.m_line_number); return states; } } while (true) { /* Skip white space. */ while (CUtility.isspace(m_input.m_line[m_input.m_line_index])) { ++m_input.m_line_index; while (m_input.m_line_index >= m_input.m_line_read) { if (m_input.getLine()) { /* EOF found. */ CError.parse_error(CError.E_EOF,m_input.m_line_number); return states; } } } if (',' != m_input.m_line[m_input.m_line_index]) { break; } ++m_input.m_line_index; } if ('>' == m_input.m_line[m_input.m_line_index]) { ++m_input.m_line_index; if (m_input.m_line_index < m_input.m_line_read) { m_advance_stop = true; } return states; } /* Read in state name. */ start_state = m_input.m_line_index; while (false == CUtility.isspace(m_input.m_line[m_input.m_line_index]) && ',' != m_input.m_line[m_input.m_line_index] && '>' != m_input.m_line[m_input.m_line_index]) { ++m_input.m_line_index; if (m_input.m_line_index >= m_input.m_line_read) { /* End of line means end of state name. */ break; } } count_state = m_input.m_line_index - start_state; /* Save name after checking definition. */ name = new String(m_input.m_line, start_state, count_state); index = (Integer) m_spec.m_states.get(name); if (null == index) { /* Uninitialized state. */ System.out.println("Uninitialized State Name: " + name); CError.parse_error(CError.E_STATE,m_input.m_line_number); } states.set(index.intValue()); } } if (null == all_states) { all_states = new SparseBitSet(); size = m_spec.m_states.size(); for (i = 0; i < size; ++i) { all_states.set(i); } } if (m_input.m_line_index < m_input.m_line_read) { m_advance_stop = true; } return all_states; } /******************************************************** Function: expandMacro Description: Returns false on error, true otherwise. *******************************************************/ private boolean expandMacro ( ) { int elem; int start_macro; int end_macro; int start_name; int count_name; String def; int def_elem; String name; char replace[]; int rep_elem; if (CUtility.DEBUG) { CUtility.ASSERT(null != this); CUtility.ASSERT(null != m_outstream); CUtility.ASSERT(null != m_input); CUtility.ASSERT(null != m_tokens); CUtility.ASSERT(null != m_spec); } /* Check for macro. */ if ('{' != m_input.m_line[m_input.m_line_index]) { CError.parse_error(CError.E_INTERNAL,m_input.m_line_number); return ERROR; } start_macro = m_input.m_line_index; elem = m_input.m_line_index + 1; if (elem >= m_input.m_line_read) { CError.impos("Unfinished macro name"); return ERROR; } /* Get macro name. */ start_name = elem; while ('}' != m_input.m_line[elem]) { ++elem; if (elem >= m_input.m_line_read) { CError.impos("Unfinished macro name at line " + m_input.m_line_number); return ERROR; } } count_name = elem - start_name; end_macro = elem; /* Check macro name. */ if (0 == count_name) { CError.impos("Nonexistent macro name"); return ERROR; } /* Debug checks. */ if (CUtility.DEBUG) { CUtility.ASSERT(0 < count_name); } /* Retrieve macro definition. */ name = new String(m_input.m_line,start_name,count_name); def = (String) m_spec.m_macros.get(name); if (null == def) { /*CError.impos("Undefined macro \"" + name + "\".");*/ System.out.println("Error: Undefined macro \"" + name + "\"."); CError.parse_error(CError.E_NOMAC, m_input.m_line_number); return ERROR; } if (CUtility.OLD_DUMP_DEBUG) { System.out.println("expanded escape: " + def); } /* Replace macro in new buffer, beginning by copying first part of line buffer. */ replace = new char[m_input.m_line.length]; for (rep_elem = 0; rep_elem < start_macro; ++rep_elem) { replace[rep_elem] = m_input.m_line[rep_elem]; if (CUtility.DEBUG) { CUtility.ASSERT(rep_elem < replace.length); } } /* Copy macro definition. */ if (rep_elem >= replace.length) { replace = CUtility.doubleSize(replace); } for (def_elem = 0; def_elem < def.length(); ++def_elem) { replace[rep_elem] = def.charAt(def_elem); ++rep_elem; if (rep_elem >= replace.length) { replace = CUtility.doubleSize(replace); } } /* Copy last part of line. */ if (rep_elem >= replace.length) { replace = CUtility.doubleSize(replace); } for (elem = end_macro + 1; elem < m_input.m_line_read; ++elem) { replace[rep_elem] = m_input.m_line[elem]; ++rep_elem; if (rep_elem >= replace.length) { replace = CUtility.doubleSize(replace); } } /* Replace buffer. */ m_input.m_line = replace; m_input.m_line_read = rep_elem; if (CUtility.OLD_DEBUG) { System.out.println(new String(m_input.m_line,0,m_input.m_line_read)); } return NOT_ERROR; } /*************************************************************** Function: saveMacro Description: Saves macro definition of form: macro_name = macro_definition **************************************************************/ private void saveMacro ( ) { int elem; int start_name; int count_name; int start_def; int count_def; boolean saw_escape; boolean in_quote; boolean in_ccl; if (CUtility.DEBUG) { CUtility.ASSERT(null != this); CUtility.ASSERT(null != m_outstream); CUtility.ASSERT(null != m_input); CUtility.ASSERT(null != m_tokens); CUtility.ASSERT(null != m_spec); } /* Macro declarations are of the following form: macro_name macro_definition */ elem = 0; /* Skip white space preceding macro name. */ while (CUtility.isspace(m_input.m_line[elem])) { ++elem; if (elem >= m_input.m_line_read) { /* End of line has been reached, and line was found to be empty. */ return; } } /* Read macro name. */ start_name = elem; while (false == CUtility.isspace(m_input.m_line[elem]) && '=' != m_input.m_line[elem]) { ++elem; if (elem >= m_input.m_line_read) { /* Macro name but no associated definition. */ CError.parse_error(CError.E_MACDEF,m_input.m_line_number); } } count_name = elem - start_name; /* Check macro name. */ if (0 == count_name) { /* Nonexistent macro name. */ CError.parse_error(CError.E_MACDEF,m_input.m_line_number); } /* Skip white space between name and definition. */ while (CUtility.isspace(m_input.m_line[elem])) { ++elem; if (elem >= m_input.m_line_read) { /* Macro name but no associated definition. */ CError.parse_error(CError.E_MACDEF,m_input.m_line_number); } } if ('=' == m_input.m_line[elem]) { ++elem; if (elem >= m_input.m_line_read) { /* Macro name but no associated definition. */ CError.parse_error(CError.E_MACDEF,m_input.m_line_number); } } else /* macro definition without = */ CError.parse_error(CError.E_MACDEF,m_input.m_line_number); /* Skip white space between name and definition. */ while (CUtility.isspace(m_input.m_line[elem])) { ++elem; if (elem >= m_input.m_line_read) { /* Macro name but no associated definition. */ CError.parse_error(CError.E_MACDEF,m_input.m_line_number); } } /* Read macro definition. */ start_def = elem; in_quote = false; in_ccl = false; saw_escape = false; while (false == CUtility.isspace(m_input.m_line[elem]) || true == in_quote || true == in_ccl || true == saw_escape) { if ('\"' == m_input.m_line[elem] && false == saw_escape) { in_quote = !in_quote; } if ('\\' == m_input.m_line[elem] && false == saw_escape) { saw_escape = true; } else { saw_escape = false; } if (false == saw_escape && false == in_quote) { // CSA, 24-jul-99 if ('[' == m_input.m_line[elem] && false == in_ccl) in_ccl = true; if (']' == m_input.m_line[elem] && true == in_ccl) in_ccl = false; } ++elem; if (elem >= m_input.m_line_read) { /* End of line. */ break; } } count_def = elem - start_def; /* Check macro definition. */ if (0 == count_def) { /* Nonexistent macro name. */ CError.parse_error(CError.E_MACDEF,m_input.m_line_number); } /* Debug checks. */ if (CUtility.DEBUG) { CUtility.ASSERT(0 < count_def); CUtility.ASSERT(0 < count_name); CUtility.ASSERT(null != m_spec.m_macros); } if (CUtility.OLD_DEBUG) { System.out.println("macro name \"" + new String(m_input.m_line,start_name,count_name) + "\"."); System.out.println("macro definition \"" + new String(m_input.m_line,start_def,count_def) + "\"."); } /* Add macro name and definition to table. */ m_spec.m_macros.put(new String(m_input.m_line,start_name,count_name), new String(m_input.m_line,start_def,count_def)); } /*************************************************************** Function: saveStates Description: Takes state declaration and makes entries for them in state hashtable in CSpec structure. State declaration should be of the form: %state name0[, name1, name2 ...] (But commas are actually optional as long as there is white space in between them.) **************************************************************/ private void saveStates ( ) { int start_state; int count_state; if (CUtility.DEBUG) { CUtility.ASSERT(null != this); CUtility.ASSERT(null != m_outstream); CUtility.ASSERT(null != m_input); CUtility.ASSERT(null != m_tokens); CUtility.ASSERT(null != m_spec); } /* EOF found? */ if (m_input.m_eof_reached) { return; } /* Debug checks. */ if (CUtility.DEBUG) { CUtility.ASSERT('%' == m_input.m_line[0]); CUtility.ASSERT('s' == m_input.m_line[1]); CUtility.ASSERT(m_input.m_line_index <= m_input.m_line_read); CUtility.ASSERT(0 <= m_input.m_line_index); CUtility.ASSERT(0 <= m_input.m_line_read); } /* Blank line? No states? */ if (m_input.m_line_index >= m_input.m_line_read) { return; } while (m_input.m_line_index < m_input.m_line_read) { if (CUtility.OLD_DEBUG) { System.out.println("line read " + m_input.m_line_read + "\tline index = " + m_input.m_line_index); } /* Skip white space. */ while (CUtility.isspace(m_input.m_line[m_input.m_line_index])) { ++m_input.m_line_index; if (m_input.m_line_index >= m_input.m_line_read) { /* No more states to be found. */ return; } } /* Look for state name. */ start_state = m_input.m_line_index; while (false == CUtility.isspace(m_input.m_line[m_input.m_line_index]) && ',' != m_input.m_line[m_input.m_line_index]) { ++m_input.m_line_index; if (m_input.m_line_index >= m_input.m_line_read) { /* End of line and end of state name. */ break; } } count_state = m_input.m_line_index - start_state; if (CUtility.OLD_DEBUG) { System.out.println("State name \"" + new String(m_input.m_line,start_state,count_state) + "\"."); System.out.println("Integer index \"" + m_spec.m_states.size() + "\"."); } /* Enter new state name, along with unique index. */ m_spec.m_states.put(new String(m_input.m_line,start_state,count_state), new Integer(m_spec.m_states.size())); /* Skip comma. */ if (',' == m_input.m_line[m_input.m_line_index]) { ++m_input.m_line_index; if (m_input.m_line_index >= m_input.m_line_read) { /* End of line. */ return; } } } } /******************************************************** Function: expandEscape Description: Takes escape sequence and returns corresponding character code. *******************************************************/ private char expandEscape ( ) { char r; /* Debug checks. */ if (CUtility.DEBUG) { CUtility.ASSERT(m_input.m_line_index < m_input.m_line_read); CUtility.ASSERT(0 < m_input.m_line_read); CUtility.ASSERT(0 <= m_input.m_line_index); } if ('\\' != m_input.m_line[m_input.m_line_index]) { ++m_input.m_line_index; return m_input.m_line[m_input.m_line_index - 1]; } else { boolean unicode_escape = false; ++m_input.m_line_index; switch (m_input.m_line[m_input.m_line_index]) { case 'b': ++m_input.m_line_index; return '\b'; case 't': ++m_input.m_line_index; return '\t'; case 'n': ++m_input.m_line_index; return '\n'; case 'f': ++m_input.m_line_index; return '\f'; case 'r': ++m_input.m_line_index; return '\r'; case '^': ++m_input.m_line_index; r=Character.toUpperCase(m_input.m_line[m_input.m_line_index]); if (r<'@' || r>'Z') // non-fatal CError.parse_error(CError.E_BADCTRL,m_input.m_line_number); r = (char) (r - '@'); ++m_input.m_line_index; return r; case 'u': unicode_escape = true; case 'x': ++m_input.m_line_index; r = 0; for (int i=0; i<(unicode_escape?4:2); i++) if (CUtility.ishexdigit(m_input.m_line[m_input.m_line_index])) { r = (char) (r << 4); r = (char) (r | CUtility.hex2bin(m_input.m_line[m_input.m_line_index])); ++m_input.m_line_index; } else break; return r; default: if (false == CUtility.isoctdigit(m_input.m_line[m_input.m_line_index])) { r = m_input.m_line[m_input.m_line_index]; ++m_input.m_line_index; } else { r = 0; for (int i=0; i<3; i++) if (CUtility.isoctdigit(m_input.m_line[m_input.m_line_index])) { r = (char) (r << 3); r = (char) (r | CUtility.oct2bin(m_input.m_line[m_input.m_line_index])); ++m_input.m_line_index; } else break; } return r; } } } /******************************************************** Function: packAccept Description: Packages and returns CAccept for action next in input stream. *******************************************************/ CAccept packAccept ( ) throws java.io.IOException { CAccept accept; char action[]; int action_index; int brackets; boolean insinglequotes; boolean indoublequotes; boolean instarcomment; boolean inslashcomment; boolean escaped; boolean slashed; action = new char[BUFFER_SIZE]; action_index = 0; if (CUtility.DEBUG) { CUtility.ASSERT(null != this); CUtility.ASSERT(null != m_outstream); CUtility.ASSERT(null != m_input); CUtility.ASSERT(null != m_tokens); CUtility.ASSERT(null != m_spec); } /* Get a new line, if needed. */ while (m_input.m_line_index >= m_input.m_line_read) { if (m_input.getLine()) { CError.parse_error(CError.E_EOF,m_input.m_line_number); return null; } } /* Look for beginning of action. */ while (CUtility.isspace(m_input.m_line[m_input.m_line_index])) { ++m_input.m_line_index; /* Get a new line, if needed. */ while (m_input.m_line_index >= m_input.m_line_read) { if (m_input.getLine()) { CError.parse_error(CError.E_EOF,m_input.m_line_number); return null; } } } /* Look for brackets. */ if ('{' != m_input.m_line[m_input.m_line_index]) { CError.parse_error(CError.E_BRACE,m_input.m_line_number); } /* Copy new line into action buffer. */ brackets = 0; insinglequotes = indoublequotes = inslashcomment = instarcomment = escaped = slashed = false; while (true) { action[action_index] = m_input.m_line[m_input.m_line_index]; /* Look for quotes. */ if ((insinglequotes || indoublequotes) && escaped) escaped=false; // only protects one char, but this is enough. else if ((insinglequotes || indoublequotes) && '\\' == m_input.m_line[m_input.m_line_index]) escaped=true; else if (!(insinglequotes || inslashcomment || instarcomment) && '\"' == m_input.m_line[m_input.m_line_index]) indoublequotes=!indoublequotes; // unescaped double quote. else if (!(indoublequotes || inslashcomment || instarcomment) && '\'' == m_input.m_line[m_input.m_line_index]) insinglequotes=!insinglequotes; // unescaped single quote. /* Look for comments. */ if (instarcomment) { // inside "/*" comment; look for "*/" if (slashed && '/' == m_input.m_line[m_input.m_line_index]) instarcomment = slashed = false; else // note that inside a star comment, slashed means starred slashed = ('*' == m_input.m_line[m_input.m_line_index]); } else if (!inslashcomment && !insinglequotes && !indoublequotes) { // not in comment, look for /* or // inslashcomment = (slashed && '/' == m_input.m_line[m_input.m_line_index]); instarcomment = (slashed && '*' == m_input.m_line[m_input.m_line_index]); slashed = ('/' == m_input.m_line[m_input.m_line_index]); } /* Look for brackets. */ if (!insinglequotes && !indoublequotes && !instarcomment && !inslashcomment) { if ('{' == m_input.m_line[m_input.m_line_index]) { ++brackets; } else if ('}' == m_input.m_line[m_input.m_line_index]) { --brackets; if (0 == brackets) { ++action_index; ++m_input.m_line_index; break; } } } ++action_index; /* Double the buffer size, if needed. */ if (action_index >= action.length) { action = CUtility.doubleSize(action); } ++m_input.m_line_index; /* Get a new line, if needed. */ while (m_input.m_line_index >= m_input.m_line_read) { inslashcomment = slashed = false; if (insinglequotes || indoublequotes) { // non-fatal CError.parse_error(CError.E_NEWLINE,m_input.m_line_number); insinglequotes = indoublequotes = false; } if (m_input.getLine()) { CError.parse_error(CError.E_SYNTAX,m_input.m_line_number); return null; } } } accept = new CAccept(action,action_index,m_input.m_line_number); if (CUtility.DEBUG) { CUtility.ASSERT(null != accept); } if (CUtility.DESCENT_DEBUG) { System.out.print("Accepting action:"); System.out.println(new String(accept.m_action,0,accept.m_action_read)); } return accept; } /******************************************************** Function: advance Description: Returns code for next token. *******************************************************/ private boolean m_advance_stop = false; int advance ( ) throws java.io.IOException { boolean saw_escape = false; Integer code; /*if (m_input.m_line_index > m_input.m_line_read) { System.out.println("m_input.m_line_index = " + m_input.m_line_index); System.out.println("m_input.m_line_read = " + m_input.m_line_read); CUtility.ASSERT(m_input.m_line_index <= m_input.m_line_read); }*/ if (m_input.m_eof_reached) { /* EOF has already been reached, so return appropriate code. */ m_spec.m_current_token = END_OF_INPUT; m_spec.m_lexeme = '\0'; return m_spec.m_current_token; } /* End of previous regular expression? Refill line buffer? */ if (EOS == m_spec.m_current_token /* ADDED */ || m_input.m_line_index >= m_input.m_line_read) /* ADDED */ { if (m_spec.m_in_quote) { CError.parse_error(CError.E_SYNTAX,m_input.m_line_number); } while (true) { if (false == m_advance_stop || m_input.m_line_index >= m_input.m_line_read) { if (m_input.getLine()) { /* EOF has already been reached, so return appropriate code. */ m_spec.m_current_token = END_OF_INPUT; m_spec.m_lexeme = '\0'; return m_spec.m_current_token; } m_input.m_line_index = 0; } else { m_advance_stop = false; } while (m_input.m_line_index < m_input.m_line_read && true == CUtility.isspace(m_input.m_line[m_input.m_line_index])) { ++m_input.m_line_index; } if (m_input.m_line_index < m_input.m_line_read) { break; } } } if (CUtility.DEBUG) { CUtility.ASSERT(m_input.m_line_index <= m_input.m_line_read); } while (true) { if (false == m_spec.m_in_quote && '{' == m_input.m_line[m_input.m_line_index]) { if (false == expandMacro()) { break; } if (m_input.m_line_index >= m_input.m_line_read) { m_spec.m_current_token = EOS; m_spec.m_lexeme = '\0'; return m_spec.m_current_token; } } else if ('\"' == m_input.m_line[m_input.m_line_index]) { m_spec.m_in_quote = !m_spec.m_in_quote; ++m_input.m_line_index; if (m_input.m_line_index >= m_input.m_line_read) { m_spec.m_current_token = EOS; m_spec.m_lexeme = '\0'; return m_spec.m_current_token; } } else { break; } } if (m_input.m_line_index > m_input.m_line_read) { System.out.println("m_input.m_line_index = " + m_input.m_line_index); System.out.println("m_input.m_line_read = " + m_input.m_line_read); CUtility.ASSERT(m_input.m_line_index <= m_input.m_line_read); } /* Look for backslash, and corresponding escape sequence. */ if ('\\' == m_input.m_line[m_input.m_line_index]) { saw_escape = true; } else { saw_escape = false; } if (false == m_spec.m_in_quote) { if (false == m_spec.m_in_ccl && CUtility.isspace(m_input.m_line[m_input.m_line_index])) { /* White space means the end of the current regular expression. */ m_spec.m_current_token = EOS; m_spec.m_lexeme = '\0'; return m_spec.m_current_token; } /* Process escape sequence, if needed. */ if (saw_escape) { m_spec.m_lexeme = expandEscape(); } else { m_spec.m_lexeme = m_input.m_line[m_input.m_line_index]; ++m_input.m_line_index; } } else { if (saw_escape && (m_input.m_line_index + 1) < m_input.m_line_read && '\"' == m_input.m_line[m_input.m_line_index + 1]) { m_spec.m_lexeme = '\"'; m_input.m_line_index = m_input.m_line_index + 2; } else { m_spec.m_lexeme = m_input.m_line[m_input.m_line_index]; ++m_input.m_line_index; } } code = (Integer) m_tokens.get(new Character(m_spec.m_lexeme)); if (m_spec.m_in_quote || true == saw_escape) { m_spec.m_current_token = L; } else { if (null == code) { m_spec.m_current_token = L; } else { m_spec.m_current_token = code.intValue(); } } if (CCL_START == m_spec.m_current_token) m_spec.m_in_ccl = true; if (CCL_END == m_spec.m_current_token) m_spec.m_in_ccl = false; if (CUtility.FOODEBUG) { System.out.println("Lexeme: " + m_spec.m_lexeme + "\tToken: " + m_spec.m_current_token + "\tIndex: " + m_input.m_line_index); } return m_spec.m_current_token; } /*************************************************************** Function: details Description: High level debugging routine. **************************************************************/ private void details ( ) { Enumeration names; String name; String def; Enumeration states; String state; Integer index; int elem; int size; System.out.println(); System.out.println("\t** Macros **"); names = m_spec.m_macros.keys(); while (names.hasMoreElements()) { name = (String) names.nextElement(); def = (String) m_spec.m_macros.get(name); if (CUtility.DEBUG) { CUtility.ASSERT(null != name); CUtility.ASSERT(null != def); } System.out.println("Macro name \"" + name + "\" has definition \"" + def + "\"."); } System.out.println(); System.out.println("\t** States **"); states = m_spec.m_states.keys(); while (states.hasMoreElements()) { state = (String) states.nextElement(); index = (Integer) m_spec.m_states.get(state); if (CUtility.DEBUG) { CUtility.ASSERT(null != state); CUtility.ASSERT(null != index); } System.out.println("State \"" + state + "\" has identifying index " + index.toString() + "."); } System.out.println(); System.out.println("\t** Character Counting **"); if (false == m_spec.m_count_chars) { System.out.println("Character counting is off."); } else { if (CUtility.DEBUG) { CUtility.ASSERT(m_spec.m_count_lines); } System.out.println("Character counting is on."); } System.out.println(); System.out.println("\t** Line Counting **"); if (false == m_spec.m_count_lines) { System.out.println("Line counting is off."); } else { if (CUtility.DEBUG) { CUtility.ASSERT(m_spec.m_count_lines); } System.out.println("Line counting is on."); } System.out.println(); System.out.println("\t** Operating System Specificity **"); if (false == m_spec.m_unix) { System.out.println("Not generating UNIX-specific code."); System.out.println("(This means that \"\\r\\n\" is a " + "newline, rather than \"\\n\".)"); } else { System.out.println("Generating UNIX-specific code."); System.out.println("(This means that \"\\n\" is a " + "newline, rather than \"\\r\\n\".)"); } System.out.println(); System.out.println("\t** Java CUP Compatibility **"); if (false == m_spec.m_cup_compatible) { System.out.println("Generating CUP compatible code."); System.out.println("(Scanner implements " + "java_cup.runtime.Scanner.)"); } else { System.out.println("Not generating CUP compatible code."); } if (CUtility.FOODEBUG) { if (null != m_spec.m_nfa_states && null != m_spec.m_nfa_start) { System.out.println(); System.out.println("\t** NFA machine **"); print_nfa(); } } if (null != m_spec.m_dtrans_vector) { System.out.println(); System.out.println("\t** DFA transition table **"); /*print_header();*/ } /*if (null != m_spec.m_accept_vector && null != m_spec.m_anchor_array) { System.out.println(); System.out.println("\t** Accept States and Anchor Vector **"); print_accept(); }*/ } /*************************************************************** function: print_set **************************************************************/ void print_set ( Vector nfa_set ) { int size; int elem; CNfa nfa; size = nfa_set.size(); if (0 == size) { System.out.print("empty "); } for (elem = 0; elem < size; ++elem) { nfa = (CNfa) nfa_set.elementAt(elem); /*System.out.print(m_spec.m_nfa_states.indexOf(nfa) + " ");*/ System.out.print(nfa.m_label + " "); } } /*************************************************************** Function: print_header **************************************************************/ private void print_header ( ) { Enumeration states; int i; int j; int chars_printed=0; CDTrans dtrans; int last_transition; String str; CAccept accept; String state; Integer index; System.out.println("/*---------------------- DFA -----------------------"); states = m_spec.m_states.keys(); while (states.hasMoreElements()) { state = (String) states.nextElement(); index = (Integer) m_spec.m_states.get(state); if (CUtility.DEBUG) { CUtility.ASSERT(null != state); CUtility.ASSERT(null != index); } System.out.println("State \"" + state + "\" has identifying index " + index.toString() + "."); i = index.intValue(); if (CDTrans.F != m_spec.m_state_dtrans[i]) { System.out.println("\tStart index in transition table: " + m_spec.m_state_dtrans[i]); } else { System.out.println("\tNo associated transition states."); } } for (i = 0; i < m_spec.m_dtrans_vector.size(); ++i) { dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i); if (null == m_spec.m_accept_vector && null == m_spec.m_anchor_array) { if (null == dtrans.m_accept) { System.out.print(" * State " + i + " [nonaccepting]"); } else { System.out.print(" * State " + i + " [accepting, line " + dtrans.m_accept.m_line_number + " <" + (new String(dtrans.m_accept.m_action,0, dtrans.m_accept.m_action_read)) + ">]"); if (CSpec.NONE != dtrans.m_anchor) { System.out.print(" Anchor: " + ((0 != (dtrans.m_anchor & CSpec.START)) ? "start " : "") + ((0 != (dtrans.m_anchor & CSpec.END)) ? "end " : "")); } } } else { accept = (CAccept) m_spec.m_accept_vector.elementAt(i); if (null == accept) { System.out.print(" * State " + i + " [nonaccepting]"); } else { System.out.print(" * State " + i + " [accepting, line " + accept.m_line_number + " <" + (new String(accept.m_action,0, accept.m_action_read)) + ">]"); if (CSpec.NONE != m_spec.m_anchor_array[i]) { System.out.print(" Anchor: " + ((0 != (m_spec.m_anchor_array[i] & CSpec.START)) ? "start " : "") + ((0 != (m_spec.m_anchor_array[i] & CSpec.END)) ? "end " : "")); } } } last_transition = -1; for (j = 0; j < m_spec.m_dtrans_ncols; ++j) { if (CDTrans.F != dtrans.m_dtrans[j]) { if (last_transition != dtrans.m_dtrans[j]) { System.out.println(); System.out.print(" * goto " + dtrans.m_dtrans[j] + " on "); chars_printed = 0; } str = interp_int((int) j); System.out.print(str); chars_printed = chars_printed + str.length(); if (56 < chars_printed) { System.out.println(); System.out.print(" * "); chars_printed = 0; } last_transition = dtrans.m_dtrans[j]; } } System.out.println(); } System.out.println(" */"); System.out.println(); } } /* * SparseBitSet 25-Jul-1999. * C. Scott Ananian * * Re-implementation of the standard java.util.BitSet to support sparse * sets, which we need to efficiently support unicode character classes. */ /** * A set of bits. The set automatically grows as more bits are * needed. * * @version 1.00, 25 Jul 1999 * @author C. Scott Ananian */ final class SparseBitSet implements Cloneable { /** Sorted array of bit-block offsets. */ int offs[]; /** Array of bit-blocks; each holding BITS bits. */ long bits[]; /** Number of blocks currently in use. */ int size; /** log base 2 of BITS, for the identity: x/BITS == x >> LG_BITS */ static final private int LG_BITS = 6; /** Number of bits in a block. */ static final private int BITS = 1<offs[p]) l=p+1; else return p; } CUtility.ASSERT(l==r); return l; // index at which the bnum *should* be, if it's not. } /** * Sets a bit. * @param bit the bit to be set */ public void set(int bit) { int bnum = bit >> LG_BITS; int idx = bsearch(bnum); if (idx >= size || offs[idx]!=bnum) new_block(idx, bnum); bits[idx] |= (1L << (bit & BITS_M1) ); } /** * Clears a bit. * @param bit the bit to be cleared */ public void clear(int bit) { int bnum = bit >> LG_BITS; int idx = bsearch(bnum); if (idx >= size || offs[idx]!=bnum) new_block(idx, bnum); bits[idx] &= ~(1L << (bit & BITS_M1) ); } /** * Clears all bits. */ public void clearAll() { size = 0; } /** * Gets a bit. * @param bit the bit to be gotten */ public boolean get(int bit) { int bnum = bit >> LG_BITS; int idx = bsearch(bnum); if (idx >= size || offs[idx]!=bnum) return false; return 0 != ( bits[idx] & (1L << (bit & BITS_M1) ) ); } /** * Logically ANDs this bit set with the specified set of bits. * @param set the bit set to be ANDed with */ public void and(SparseBitSet set) { binop(this, set, AND); } /** * Logically ORs this bit set with the specified set of bits. * @param set the bit set to be ORed with */ public void or(SparseBitSet set) { binop(this, set, OR); } /** * Logically XORs this bit set with the specified set of bits. * @param set the bit set to be XORed with */ public void xor(SparseBitSet set) { binop(this, set, XOR); } // BINARY OPERATION MACHINERY private static interface BinOp { public long op(long a, long b); } private static final BinOp AND = new BinOp() { public final long op(long a, long b) { return a & b; } }; private static final BinOp OR = new BinOp() { public final long op(long a, long b) { return a | b; } }; private static final BinOp XOR = new BinOp() { public final long op(long a, long b) { return a ^ b; } }; private static final void binop(SparseBitSet a, SparseBitSet b, BinOp op) { int nsize = a.size + b.size; long[] nbits; int [] noffs; int a_zero, a_size; // be very clever and avoid allocating more memory if we can. if (a.bits.length < nsize) { // oh well, have to make working space. nbits = new long[nsize]; noffs = new int [nsize]; a_zero = 0; a_size = a.size; } else { // reduce, reuse, recycle! nbits = a.bits; noffs = a.offs; a_zero = a.bits.length - a.size; a_size = a.bits.length; System.arraycopy(a.bits, 0, a.bits, a_zero, a.size); System.arraycopy(a.offs, 0, a.offs, a_zero, a.size); } // ok, crunch through and binop those sets! nsize = 0; for (int i=a_zero, j=0; i=b.size || a.offs[i] < b.offs[j])) { nb = op.op(a.bits[i], 0); no = a.offs[i]; i++; } else if (j=a_size || a.offs[i] > b.offs[j])) { nb = op.op(0, b.bits[j]); no = b.offs[j]; j++; } else { // equal keys; merge. nb = op.op(a.bits[i], b.bits[j]); no = a.offs[i]; i++; j++; } if (nb!=0) { nbits[nsize] = nb; noffs[nsize] = no; nsize++; } } a.bits = nbits; a.offs = noffs; a.size = nsize; } /** * Gets the hashcode. */ public int hashCode() { long h = 1234; for (int i=0; i> 32) ^ h); } /** * Calculates and returns the set's size */ public int size() { return (size==0)?0:((1+offs[size-1]) << LG_BITS); } /** * Compares this object against the specified object. * @param obj the object to commpare with * @return true if the objects are the same; false otherwise. */ public boolean equals(Object obj) { if ((obj != null) && (obj instanceof SparseBitSet)) return equals(this, (SparseBitSet)obj); return false; } /** * Compares two SparseBitSets for equality. * @return true if the objects are the same; false otherwise. */ public static boolean equals(SparseBitSet a, SparseBitSet b) { for (int i=0, j=0; i=b.size || a.offs[i] < b.offs[j])) { if (a.bits[i++]!=0) return false; } else if (j=a.size || a.offs[i] > b.offs[j])) { if (b.bits[j++]!=0) return false; } else { // equal keys if (a.bits[i++]!=b.bits[j++]) return false; } } return true; } /** * Clones the SparseBitSet. */ public Object clone() { try { SparseBitSet set = (SparseBitSet)super.clone(); set.bits = (long[]) bits.clone(); set.offs = (int []) offs.clone(); return set; } catch (CloneNotSupportedException e) { // this shouldn't happen, since we are Cloneable throw new InternalError(); } } /** * Return an Enumeration of Integers * which represent set bit indices in this SparseBitSet. */ public Enumeration elements() { return new Enumeration() { int idx=-1, bit=BITS; { advance(); } public boolean hasMoreElements() { return (idx 1) sb.append(", "); sb.append(e.nextElement()); } sb.append('}'); return sb.toString(); } /** Check validity. */ private boolean isValid() { if (bits.length!=offs.length) return false; if (size>bits.length) return false; if (size!=0 && 0<=offs[0]) return false; for (int i=1; i>>1) % RANGE) << 1; a.set(rr); v.addElement(new Integer(rr)); // check that all the numbers are there. CUtility.ASSERT(a.get(rr) && !a.get(rr+1) && !a.get(rr-1)); for (int i=0; i>>1) % v.size(); int m = ((Integer)v.elementAt(rr)).intValue(); b.clear(m); v.removeElementAt(rr); // check that numbers are removed properly. CUtility.ASSERT(!b.get(m)); } CUtility.ASSERT(!a.equals(b)); SparseBitSet c = (SparseBitSet) a.clone(); SparseBitSet d = (SparseBitSet) a.clone(); c.and(a); CUtility.ASSERT(c.equals(a) && a.equals(c)); c.xor(a); CUtility.ASSERT(!c.equals(a) && c.size()==0); d.or(b); CUtility.ASSERT(d.equals(a) && !b.equals(d)); d.and(b); CUtility.ASSERT(!d.equals(a) && b.equals(d)); d.xor(a); CUtility.ASSERT(!d.equals(a) && !b.equals(d)); c.or(d); c.or(b); CUtility.ASSERT(c.equals(a) && a.equals(c)); c = (SparseBitSet) d.clone(); c.and(b); CUtility.ASSERT(c.size()==0); System.out.println("Success."); } } /************************************************************************ JLEX COPYRIGHT NOTICE, LICENSE AND DISCLAIMER. Copyright 1996 by Elliot Joel Berk Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice and warranty disclaimer appear in supporting documentation, and that the name of Elliot Joel Berk not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. Elliot Joel Berk disclaims all warranties with regard to this software, including all implied warranties of merchantability and fitness. In no event shall Elliot Joel Berk be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this software. ***********************************************************************/ // set emacs indentation // Local Variables: // c-basic-offset:2 // End: jlex-1.2.6/sample.lex0000644000175000017500000001114507621017234016163 0ustar cjwatsoncjwatson00000000000000import java.lang.System; class Sample { public static void main(String argv[]) throws java.io.IOException { Yylex yy = new Yylex(System.in); Yytoken t; while ((t = yy.yylex()) != null) System.out.println(t); } } class Utility { public static void assert ( boolean expr ) { if (false == expr) { throw (new Error("Error: Assertion failed.")); } } private static final String errorMsg[] = { "Error: Unmatched end-of-comment punctuation.", "Error: Unmatched start-of-comment punctuation.", "Error: Unclosed string.", "Error: Illegal character." }; public static final int E_ENDCOMMENT = 0; public static final int E_STARTCOMMENT = 1; public static final int E_UNCLOSEDSTR = 2; public static final int E_UNMATCHED = 3; public static void error ( int code ) { System.out.println(errorMsg[code]); } } class Yytoken { Yytoken ( int index, String text, int line, int charBegin, int charEnd ) { m_index = index; m_text = new String(text); m_line = line; m_charBegin = charBegin; m_charEnd = charEnd; } public int m_index; public String m_text; public int m_line; public int m_charBegin; public int m_charEnd; public String toString() { return "Token #"+m_index+": "+m_text+" (line "+m_line+")"; } } %% %{ private int comment_count = 0; %} %line %char %state COMMENT ALPHA=[A-Za-z] DIGIT=[0-9] NONNEWLINE_WHITE_SPACE_CHAR=[\ \t\b\012] WHITE_SPACE_CHAR=[\n\ \t\b\012] STRING_TEXT=(\\\"|[^\n\"]|\\{WHITE_SPACE_CHAR}+\\)* COMMENT_TEXT=([^/*\n]|[^*\n]"/"[^*\n]|[^/\n]"*"[^/\n]|"*"[^/\n]|"/"[^*\n])* %% "," { return (new Yytoken(0,yytext(),yyline,yychar,yychar+1)); } ":" { return (new Yytoken(1,yytext(),yyline,yychar,yychar+1)); } ";" { return (new Yytoken(2,yytext(),yyline,yychar,yychar+1)); } "(" { return (new Yytoken(3,yytext(),yyline,yychar,yychar+1)); } ")" { return (new Yytoken(4,yytext(),yyline,yychar,yychar+1)); } "[" { return (new Yytoken(5,yytext(),yyline,yychar,yychar+1)); } "]" { return (new Yytoken(6,yytext(),yyline,yychar,yychar+1)); } "{" { return (new Yytoken(7,yytext(),yyline,yychar,yychar+1)); } "}" { return (new Yytoken(8,yytext(),yyline,yychar,yychar+1)); } "." { return (new Yytoken(9,yytext(),yyline,yychar,yychar+1)); } "+" { return (new Yytoken(10,yytext(),yyline,yychar,yychar+1)); } "-" { return (new Yytoken(11,yytext(),yyline,yychar,yychar+1)); } "*" { return (new Yytoken(12,yytext(),yyline,yychar,yychar+1)); } "/" { return (new Yytoken(13,yytext(),yyline,yychar,yychar+1)); } "=" { return (new Yytoken(14,yytext(),yyline,yychar,yychar+1)); } "<>" { return (new Yytoken(15,yytext(),yyline,yychar,yychar+2)); } "<" { return (new Yytoken(16,yytext(),yyline,yychar,yychar+1)); } "<=" { return (new Yytoken(17,yytext(),yyline,yychar,yychar+2)); } ">" { return (new Yytoken(18,yytext(),yyline,yychar,yychar+1)); } ">=" { return (new Yytoken(19,yytext(),yyline,yychar,yychar+2)); } "&" { return (new Yytoken(20,yytext(),yyline,yychar,yychar+1)); } "|" { return (new Yytoken(21,yytext(),yyline,yychar,yychar+1)); } ":=" { return (new Yytoken(22,yytext(),yyline,yychar,yychar+2)); } {NONNEWLINE_WHITE_SPACE_CHAR}+ { } \n { } "/*" { yybegin(COMMENT); comment_count = comment_count + 1; } "/*" { comment_count = comment_count + 1; } "*/" { comment_count = comment_count - 1; Utility.assert(comment_count >= 0); if (comment_count == 0) { yybegin(YYINITIAL); } } {COMMENT_TEXT} { } \"{STRING_TEXT}\" { String str = yytext().substring(1,yytext().length() - 1); Utility.assert(str.length() == yytext().length() - 2); return (new Yytoken(40,str,yyline,yychar,yychar + str.length())); } \"{STRING_TEXT} { String str = yytext().substring(1,yytext().length()); Utility.error(Utility.E_UNCLOSEDSTR); Utility.assert(str.length() == yytext().length() - 1); return (new Yytoken(41,str,yyline,yychar,yychar + str.length())); } {DIGIT}+ { return (new Yytoken(42,yytext(),yyline,yychar,yychar + yytext().length())); } {ALPHA}({ALPHA}|{DIGIT}|_)* { return (new Yytoken(43,yytext(),yyline,yychar,yychar + yytext().length())); } . { System.out.println("Illegal character: <" + yytext() + ">"); Utility.error(Utility.E_UNMATCHED); } jlex-1.2.6/README0000644000175000017500000000570307621017234015053 0ustar cjwatsoncjwatson00000000000000************************************************************* * JLex README Version 1.2 * ************************************************************* Written by Elliot Berk [edited by A. Appel] [revised by C. Scott Ananian]. Contact cananian@alumni.princeton.edu with any problems relating to JLex. The following steps describe the compilation and usage of JLex. (1) Choose some directory that is on your CLASSPATH, where you install Java utilities such as JLex. I will refer to this directory as "J", for example. (2) Make a directory "J/JLex" and put the sourcefile Main.java in J/JLex. (3) Compile Main.java as you would any Java source file: javac Main.java This should produce a number of Java class files, including Main.class, in the "J/JLex" directory, where "J" is in your CLASSPATH. (4) To run JLex with a JLex specification file, the usage is: java JLex.Main where is the name of the JLex specification file. If java complains that it can't find JLex.Main, then the directory "J" (which contains the subdirectory "JLex" which contains the class files) isn't in your CLASSPATH; go back and read steps 1-3 more carefully, please. JLex will produce diagnostic output to inform you of its progress and, upon completion, will produce a Java source file that contains the lexical analyzer. The name of the lexical analyzer file will be the name of the JLex specification file, with the string ".java" added to the end. (So if the JLex specification file is called foo.lex, the lexical analyzer source file that JLex produces will be called foo.lex.java.) (5) The resulting lexical analyzer source file should be compiled with the Java compiler: javac where is the name of the lexical analyzer source file. This produces a lexical analyzer class file, which can then be used in your applications. If the default settings have not been changed, the lexical analyzer class will be called Yylex and the classs files will named Yylex.class and Yytoken.class. (6) As an example, there is a sample lexical specification on the JLex web site: http://www.cs.princeton.edu/~appel/modern/java/JLex/ named 'sample.lex'. Transfer this to your system and use the command: java JLex.Main sample.lex to generate a file named 'sample.lex.java'. Compile this with: javac -d J sample.lex.java where "J" is the above mentioned path to a directory in your CLASSPATH. If '.' is in your CLASSPATH, you can use "-d .". Run the generated lexer with: java Sample which expects input on stdin. The lexer parses tokens that resemble those for a typical programming language; whitespace is generally ignored. Java buffers input from stdin a line at a time, so you won't see any output until you type enter. Try inputting things like: an_identifier "a string" 123124 (1+2) { /* comment */ a := b & c; } Look at the sample.lex input file for more information on the operation of this example scanner. jlex-1.2.6/bug1.txt0000644000175000017500000001155006236701353015572 0ustar cjwatsoncjwatson00000000000000 The file in the bug report below has some errors and tries to reference some features that are not actually in JavaLex, but it seems to illustrate a problem with regular expression parsing. For instance, the following line has some problems. ONECHAR [^\\"']|(\\.)|(\[0123]?{OCTNUMBER}{0,2})|{UNICODE_CHARACTER} 1) A '\' must precede the '"' in the first [ ... ]. 2) A '\' must precede the '\' before the second [ ... ]. 3) The {0,2} is not supported. It means the number of occurances of a macro (?), a feature not supported by Java-Lex. ONECHAR [^\\\"']|(\\.)|(\\[0123]?{OCTNUMBER})|{UNICODE_CHARACTER} But there still appear to be some problems parsing the (very complex) FLOAT macro. This bug does not result in incorrect lexers being generated but rather in Java-Lex incorrectly asserting a parse error during generation of the lexer source file. This makes the bug in a sense easy to recognize, since the processing and compilation of the lex file will not go to completion. -- Elliot ============================================================================ From glunz@zfe.siemens.deWed Sep 11 14:59:23 1996 Date: Fri, 26 Jul 1996 13:59:56 +0200 (MET DST) From: Wolfgang Glunz To: ejberk@Princeton.EDU Subject: Problem with JavaLex Hello, First of all thanks for the effort you put into the development of JavaLex. Unfortunately I have a problem where I don't know how to proceed. I tried the following .lex file: %% %{ /* this goes into the lexer class */ %} %line %notunix %state INCOMMENT IDENTIFIER [A-Za-z_$][A-Za-z_$0-9]* DIGIT [0-9] HEXDIGIT [A-Fa-f0-9] OCTDIGIT [0-7] DECNUMBER [1-9]{DIGIT}* HEXNUMBER 0[Xx]{HEXDIGIT}+ OCTNUMBER 0{OCTDIGIT}* DECLONG {DECNUMBER}[Ll] HEXLONG {HEXNUMBER}[Ll] OCTLONG {OCTNUMBER}[Ll] EXPONENT [Ee][+-]?{DIGIT}+ FLOATBASE ([+-])?((({DIGIT}+\.{DIGIT}*)|({DIGIT}*\.{DIGIT}+)){EXPONENT}?)|({DIGIT}+{EXPONENT}) DOUBLE ({FLOATBASE}[Dd]?)|({DIGIT}+[Dd]) FLOAT (({FLOATBASE})|({DIGIT}+))[Ff] UNICODE_CHARACTER \\u{HEXDIGIT}{4} ONECHAR [^\\"']|(\\.)|(\[0123]?{OCTNUMBER}{0,2})|{UNICODE_CHARACTER} CHARLITCHAR {ONECHAR}|\" CHARACTER "'"{CHARLITCHAR}"'" STRINGCHAR {ONECHAR}|"'" WASSTRING \"({STRINGCHAR})*\" STRING "SSSS" CHAR_OP ([-;{},;()[\].&|!~=+*/%<>^?:]) WHITESPACE [ \n\r\t]+ %% "/*" { } "*/" { BEGIN(INITIAL); } . { } \n { } "/*" { BEGIN(INCOMMENT); } "//".* { } {WHITESPACE} { } {DECLONG} { } {HEXLONG} { } {OCTLONG} { } {DECNUMBER} { } {HEXNUMBER} { } {OCTNUMBER} { } {CHARACTER} { } {FLOAT} { } {DOUBLE} { } I switched debugging on in JavaLex and get the following output: (tail only) Entering dodash [lexeme: +] [token: 17] Lexeme: - Token: 10 Index: 52 Lexeme: ] Token: 5 Index: 53 Lexeme: ? Token: 15 Index: 54 expanded escape: [0-9] ((([+-])?((([0-9]+\.[0-9]*)|([0-9]*\.[0-9]+))[Ee][+-]?[0-9]+?)|({DIGIT}+{EXPONENT}))|({DIGIT}+))[Ff] { } Lexeme: [ Token: 6 Index: 55 Lexeme: 0 Token: 12 Index: 56 Lexeme: - Token: 10 Index: 57 Lexeme: 9 Token: 12 Index: 58 Lexeme: ] Token: 5 Index: 59 Leaving dodash [lexeme:]] [token:5] Lexeme: + Token: 17 Index: 60 Leaving term [lexeme:+] [token:17] Lexeme: ? Token: 15 Index: 61 Leaving factor [lexeme:?] [token:15] Error: Parse error at line 54. Error Description: + ? or * must follow an expression or subexpression. java.lang.Error: Parse error. at CError.parse_error(JavaLex.java:4227) at CMakeNfa.first_in_cat(JavaLex.java:1764) at CMakeNfa.cat_expr(JavaLex.java:1727) at CMakeNfa.expr(JavaLex.java:1674) at CMakeNfa.term(JavaLex.java:1856) at CMakeNfa.factor(JavaLex.java:1800) at CMakeNfa.cat_expr(JavaLex.java:1724) at CMakeNfa.expr(JavaLex.java:1674) at CMakeNfa.rule(JavaLex.java:1608) at CMakeNfa.machine(JavaLex.java:1560) at CMakeNfa.thompson(JavaLex.java:1453) at CLexGen.userRules(JavaLex.java:5441) at CLexGen.generate(JavaLex.java:4584) at JavaLex.main(JavaLex.java:3367) Two questions: 1. The error message says that a + ? or * must follow an expression. Is it not possible to write (expression)|(expression) ? 2. JavaLex seems to expand the macros only partially. Why so ? Any help is appreciated, Wolfgang -- Wolfgang Glunz email: Wolfgang.Glunz@zfe.siemens.de Siemens AG, ZFE T SE 2 WWW: (Siemens only) 81730 Muenchen / Germany Phone: +49 89 63649492 Otto Hahn Ring 6 Fax: +49 89 63640898 jlex-1.2.6/manual.html0000644000175000017500000012701507702561503016342 0ustar cjwatsoncjwatson00000000000000

JLex:
A lexical analyzer generator for Java(TM)

Elliot Berk
Department of Computer Science, Princeton University

Version 1.2, May 5, 1997

Manual revision October 29, 1997

Last updated September 6, 2000 for JLex 1.2.5

(latest version can be obtained from http://www.cs.princeton.edu/~appel/modern/java/JLex/ )


Contents




1. Introduction

A lexical analyzer breaks an input stream of characters into tokens. Writing lexical analyzers by hand can be a tedious process, so software tools have been developed to ease this task.

Perhaps the best known such utility is Lex. Lex is a lexical analyzer generator for the UNIX operating system, targeted to the C programming language. Lex takes a specially-formatted specification file containing the details of a lexical analyzer. This tool then creates a C source file for the associated table-driven lexer.

The JLex utility is based upon the Lex lexical analyzer generator model. JLex takes a specification file similar to that accepted by Lex, then creates a Java source file for the corresponding lexical analyzer.




2. JLex Specifications

A JLex input file is organized into three sections, separated by double-percent directives (``%%''). A proper JLex specification has the following format.
user code
%%
JLex directives
%%
regular expression rules
The ``%%'' directives distinguish sections of the input file and must be placed at the beginning of their line. The remainder of the line containing the ``%%'' directives may be discarded and should not be used to house additional declarations or code.

The user code section - the first section of the specification file - is copied directly into the resulting output file. This area of the specification provides space for the implementation of utility classes or return types.

The JLex directives section is the second part of the input file. Here, macros definitions are given and state names are declared.

The third section contains the rules of lexical analysis, each of which consists of three parts: an optional state list, a regular expression, and an action.




2.1 User Code

User code precedes the first double-percent directive (``%%'). This code is copied verbatim into the lexical analyzer source file that JLex outputs, at the top of the file. Therefore, if the lexer source file needs to begin with a package declaration or with the importation of an external class, the user code section should begin with the corresponding declaration. This declaration will then be copied onto the top of the generated source file.




2.2 JLex Directives

The JLex directive section begins after the first ``%%'' and continues until the second ``%%'' delimiter. Each JLex directive should be contained on a single line and should begin that line.




2.2.1 Internal Code to Lexical Analyzer Class

The %{...%} directive allows the user to write Java code to be copied into the lexical analyzer class. This directive is used as follows.
%{
<code>
%}
To be properly recognized, the %{ and %} should each be situated at the beginning of a line. The specified Java code in <code> will be then copied into the lexical analyzer class created by JLex.
class Yylex {
... <code> ...
}
This permits the declaration of variables and functions internal to the generated lexical analyzer class. Variable names beginning with yy should be avoided, as these are reserved for use by the generated lexical analyzer class.




2.2.2 Initialization Code for Lexical Analyzer Class

The %init{ ... %init} directive allows the user to write Java code to be copied into the constructor for the lexical analyzer class.
%init{
<code>
%init}
The %init{ and %init} directives should be situated at the beginning of a line. The specified Java code in <code> will be then copied into the lexical analyzer class constructor.
class Yylex {
Yylex () {
... <code> ...
}
}
This directive permits one-time initializations of the lexical analyzer class from inside its constructor. Variable names beginning with yy should be avoided, as these are reserved for use by the generated lexical analyzer class.

The code given in the %init{ ... %init} directive may potentially throw an exception, or propagate it from another function. To declare this exception, use the %initthrow{ ... %initthrow} directive.
%initthrow{
<exception[1]>[, <exception[2]>, ...]
%initthrow}
The Java code specified here will be copied into the declaration of the lexical analyzer constructor.
Yylex ()
throws <exception[1]>[, <exception[2]>, ...]
{
... <code> ...
}
If the Java code given in the %init{ ... %init} directive throws an exception that is not declared, the resulting lexical analyzer source file may not compile successfully.




2.2.3 End-of-File Code for Lexical Analyzer Class

The %eof{ ... %eof} directive allows the user to write Java code to be copied into the lexical analyzer class for execution after the end-of-file is reached.
%eof{
<code>
%eof}
The %eof{ and %eof} directives should be situated at the beginning of a line. The specified Java code in <code> will be executed at most once, and immediately after the end-of-file is reached for the input file the lexical analyzer class is processing.

The code given in the %eof{ ... %eof} directive may potentially throw an exception, or propagate it from another function. To declare this exception, use the %eofthrow{ ... %eofthrow} directive.
%eofthrow{
<exception[1]>[, <exception[2]>, ...]
%eofthrow}
The Java code specified here will be copied into the declaration of the lexical analyzer function called to clean-up upon reaching end-of-file.
private void yy_do_eof ()
throws <exception[1]>[, <exception[2]>, ...]
{
... <code> ...
}
The Java code in <code> that makes up the body of this function will, in part, come from the code given in the %eof{ ... %eof} directive. If this code throws an exception that is not declared using the %eofthrow{ ... %eofthrow} directive, the resulting lexer may not compile successfully.




2.2.4 Macro Definitions

Macro definitions are given in the JLex directives section of the specification. Each macro definition is contained on a single line and consists of a macro name followed by an equal sign (=), then by its associated definition. The format can therefore be summarized as follows.
<name> = <definition>
Non-newline white space, e.g. blanks and tabs, is optional between the macro name and the equal sign and between the equal sign and the macro definition. Each macro definition should be contained on a single line.

Macro names should be valid identifiers, e.g. sequences of letters, digits, and underscores beginning with a letter or underscore.

Macro definitions should be valid regular expressions, the details of which are described in another section below.

Macro definitions can contain other macro expansions, in the standard
{<name>} format for macros within regular expressions. However, the user should note that these expressions are macros - not functions or nonterminals - so mutually recursive constructs using macros are illegal. Therefore, cycles in macro definitions will have unpredictable results.




2.2.5 State Declarations

Lexical states are used to control when certain regular expressions are matched. These are declared in the JLex directives in the following way.
%state state[0][, state[1], state[2], ...]
Each declaration of a series of lexical states should be contained on a single line. Multiple declarations can be included in the same JLex specification, so the declaration of many states can be broken into many declarations over multiple lines.

State names should be valid identifiers, e.g. sequences of letters, digits, and underscores beginning with a letter or underscore.

A single lexical state is implicitly declared by JLex. This state is called YYINITIAL, and the generated lexer begins lexical analysis in this state.

Rules of lexical analysis begin with an optional state list. If a state list is given, the lexical rule is matched only when the lexical analyzer is in one of the specified states. If a state list is not given, the lexical rule is matched when the lexical analyzer is in any state.

If a JLex specification does not make use of states, by neither declaring states nor preceding lexical rules with state lists, the resulting lexer will remain in state YYINITIAL throughout execution. Since lexical rules are not prefaced by state lists, these rules are matched in all existing states, including the implicitly declared state YYINITIAL. Therefore, everything works as expected if states are not used at all.

States are declared as constant integers within the generated lexical analyzer class. The constant integer declared for a declared state has the same name as that state. The user should be careful to avoid name conflict between state names and variables declared in the action portion of rules or elsewhere within the lexical analyzer class. A convenient convention would be to declare state names in all capitals, as a reminder that these identifiers effectively become constants.



2.2.6 Character Counting

Character counting is turned off by default, but can be activated with the %char directive.
%char
The zero-based character index of the first character in the matched region of text is then placed in the integer variable yychar.




2.2.7 Line Counting

Line counting is turned off by default, but can be activated with the %line directive.
%line
The zero-based line index at the beginning of the matched region of text is then placed in the integer variable yyline.




2.2.8 Java CUP Compatibility

Java CUP is a parser generator for Java originally written by Scott Hudson of Georgia Tech University, and maintained and extended by Frank Flannery, Dan Wang, and C. Scott Ananian. Details of this software tool are on the World Wide Web at
http://www.cs.princeton.edu/~appel/modern/java/CUP/.
Java CUP compatibility is turned off by default, but can be activated with the following JLex directive.
%cup
When given, this directive makes the generated scanner conform to the java_cup.runtime.Scanner interface. It has the same effect as the following three directives:
%implements java_cup.runtime.Scanner
%function next_token
%type java_cup.runtime.Symbol
See the next section for more details on these three directives, and the CUP manual for more details on using CUP and JLex together.




2.2.9 Lexical Analyzer Component Titles

The following directives can be used to change the name of the generated lexical analyzer class, the tokenizing function, and the token return type. To change the name of the lexical analyzer class from Yylex, use the %class directive.
%class <name>
To change the name of the tokenizing function from yylex, use the %function directive.
%function <name>
To change the name of the return type from the tokenizing function from Yytoken, use the %type directive.
%type <name>
If the default names are not altering using these directives, the tokenizing function is envoked with a call to Yylex.yylex(), which returns the Ytoken type.

To avoid scoping conflicts, names beginning with yy are normally reserved for lexical analyzer internal functions and variables.




2.2.10 Default Token Type

To make the 32-bit primitive integer type int, the return type for the tokenizing function (and therefore the token type), use the %integer directive.
%integer
Under default settings, Yytoken is the return type of the tokenizing function
Yylex.yylex(), as in the following code fragment.
class Yylex { ...
public Yytoken yylex () {
... }
The %integer directive replaces the previous code with a revised declaration, in which the token type has been changed to int.
class Yylex { ...
public int yylex () {
... }
This declaration allows lexical actions to return integer codes, as in the following code fragment from a hypothetical lexical action.
{ ...
return 7;
... }

The integer return type forces changes the behavior at end of file. Under default settings, objects - subclasses of the java.lang.Object class - are returned by Yylex.yylex(). During execution of the generated lexer Yylex, a special object value must be reserved for end-of-file. Therefore, when the end-of-file is reached for the processed input file (and from then onward), Yylex.yylex() returns null.

When int is the return type of Yylex.yylex(), null can no longer be returned. Instead, Yylex.yylex() returns the value -1, corresponding to constant integer
Yylex.YYEOF. The %integer directive implies %yyeof; see below.




2.2.11 Default Token Type II: Wrapped Integer

To make java.lang.Integer the return type for the tokenizing function (and therefore the token type), use the %intwrap directive.
%intwrap
Under default settings, Yytoken is the return type of the tokenizing function
Yylex.yylex(), as in the following code fragment.
class Yylex { ...
public Yytoken yylex () {
... }
The %intwrap directive replaces the previous code with a revised declaration, in which the token type has been changed to java.lang.Integer.
class Yylex { ...
public java.lang.Integer yylex () {
... }
This declaration allows lexical actions to return wrapped integer codes, as in the following code fragment from a hypothetical lexical action.
{ ...
return new java.lang.Integer(0);
... }

Notice that the effect of %intwrap directive can be equivalently accomplished using the %type directive, as follows.
%type java.lang.Integer
This manually changes the name of the return type from Yylex.yylex() to
java.lang.Integer.




2.2.12 YYEOF on End-of-File

The %yyeof directive causes the constant integer Yylex.YYEOF to be declared. If the %integer directive is present, Yylex.YYEOF is returned upon end-of-file.
%yyeof
This directive causes Yylex.YYEOF to be declared as follows:
public final int YYEOF = -1;
The %integer directive implies %yyeof.




2.2.13 Newlines and Operating System Compatibility

In UNIX operating systems, the character code sequence representing a newline is the single character ``\n''. Conversely, in DOS-based operating systems, the newline is the two-character sequence ``\r\n'' consisting of the carriage return followed by the newline. The %notunix directive results in either the carriage return or the newline being recognized as a newline.
%notunix
This issue of recognizing the proper sequence of characters as a newline is important in ensuring Java platform independence.




2.2.14 Character Sets

The default settings support an alphabet of character codes between 0 and 127 inclusive. If the generated lexical analyzer receives an input character code that falls outside of these bounds, the lexer may fail.

The %full directive can be used to extend this alphabet to include all 8-bit values.
%full
If the %full directive is given, JLex will generate a lexical analyzer that supports an alphabet of character codes between 0 and 255 inclusive.

The %unicode can be used to extend the alphabet to include the full 16-bit Unicode alphabet.
%unicode
If the %unicode directive is given, JLex will generate a lexical analyzer that supports an alphabet of character codes between 0 and 2^16-1 inclusive.

The %ignorecase directive can be given to generate case-insensitive lexers.
%ignorecase
If the %ignorecase directive is given, CUP will expand all character classes in a unicode-friendly way to match both upper, lower, and title-case letters.




2.2.15 Character Format To and From File

Under the status quo, JLex and the lexical analyzer it generates read from and write to Ascii text files, with byte sized characters. However, to support further extensions on the JLex tool, all internal processing of characters is done using the 16-bit Java character type, although the full range of 16-bit values is not supported.




2.2.16 Exceptions Generated by Lexical Actions

The code given in the action portion of the regular expression rules, in section three of the JLex specification, may potentially throw an exception, or propagate it from another function. To declare these exceptions, use the %yylexthrow{ ... %yylexthrow} directive.
%yylexthrow{
<exception[1]>[, <exception[2]>, ...]
%yylexthrow}
The Java code specified here will be copied into the declaration of the lexical analyzer tokenizing function Yylex.yylex(), as follows.
public Yytoken yylex ()
throws <exception[1]>[, <exception[2]>, ...]
{
...
}
If the code given in the action portion of the regular expression rules throws an exception that is not declared using the %yylexthrow{ ... %yylexthrow} directive, the resulting lexer may not compile successfully.




2.2.17 Specifying the Return Value on End-of-File

The %eofval{ ... %eofval} directive specifies the return value on end-of-file. This directive allows the user to write Java code to be copied into the lexical analyzer tokenizing function Yylex.yylex() for execution when the end-of-file is reached. This code must return a value compatible with the type of the tokenizing function Yylex.yylex().
%eofval{
<code>
%eofval}
The specified Java code in <code> determines the return value of Yylex.yylex() when the end-of-file is reached for the input file the lexical analyzer class is processing. This will also be the value returned by Yylex.yylex() each additional time this function is called after end-of-file is initially reached, so <code> may be executed more than once. Finally, the %eofval{ and %eofval} directives should be situated at the beginning of a line.

An example of usage is given below. Suppose the return value desired on end-of-file is (new token(sym.EOF)) rather than the default value null. The user adds the following declaration to the specification file.
%eofval{
return (new token(sym.EOF));
%eofval}
The code is then copied into Yylex.yylex() into the appropriate place.
public Yytoken yylex () { ...
return (new token(sym.EOF));
... }
The value returned by Yylex.yylex() upon end-of-file and from that point onward is now (new token(sym.EOF)).



2.2.18 Specifying an interface to implement

JLex allows the user to specify an interface which the Yylex class will implement. By adding the following declaration to the input file:
%implements <classname>
the user specifies that Yylex will implement classname. The generated parser class declaration will look like:
class Yylex implements classname { ...




2.2.19 Making the Generated Class Public

The %public directive causes the lexical analyzer class generated by JLex to be a public class.
%public
The default behavior adds no access specifier to the generated class, resulting in the class being visible only from the current package.




2.3 Regular Expression Rules

The third part of the JLex specification consists of a series of rules for breaking the input stream into tokens. These rules specify regular expressions, then associate these expressions with actions consisting of Java source code.

The rules have three distinct parts: the optional state list, the regular expression, and the associated action. This format is represented as follows.
[<states>] <expression> { <action> }
Each part of the rule is discussed in a section below.

If more than one rule matches strings from its input, the generated lexer resolves conflicts between rules by greedily choosing the rule that matches the longest string. If more than one rule matches strings of the same length, the lexer will choose the rule that is given first in the JLex specification. Therefore, rules appearing earlier in the specification are given a higher priority by the generated lexer.

The rules given in a JLex specification should match all possible input. If the generated lexical analyzer receives input that does not match any of its rules, an error will be raised.

Therefore, all input should be matched by at least one rule. This can be guaranteed by placing the following rule at the bottom of a JLex specification:
. { java.lang.System.out.println("Unmatched input: " + yytext()); }
The dot (.), as described below, will match any input except for the newline.




2.3.1 Lexical States

An optional lexical state list preceeds each rule. This list should be in the following form:
<state[0][, state[1], state[2], ...]>
The outer set of brackets ([]) indicate that multiple states are optional. The greater than (<) and less than (>) symbols represent themselves and should surround the state list, preceding the regular expression. The state list specifies under which initial states the rule can be matched.

For instance, if yylex() is called with the lexer at state A, the lexer will attempt to match the input only against those rules that have A in their state list.

If no state list is specified for a given rule, the rule is matched against in all lexical states.




2.3.2 Regular Expressions

Regular expressions should not contain any white space, as white space is interpreted as the end of the current regular expression. There is one exception; if (non-newline) white space characters appear from within double quotes, these characters are taken to represent themselves. For instance, `` '' is interpreted as a blank space.

The alphabet for JLex is the Ascii character set, meaning character codes between 0 and 127 inclusive.

The following characters are metacharacters, with special meanings in JLex regular expressions.

? * + | ( ) ^ $ . [ ] { } " \


Otherwise, individual characters stand for themselves.

ef Consecutive regular expressions represents their concatenation.

e|f The vertical bar (|) represents an option between the regular expressions that surround it, so matches either expression e or f.

The following escape sequences are recognized and expanded:
\b Backspace
\n newline
\t Tab
\f Formfeed
\r Carriage return
\ddd The character code corresponding to the number formed by three octal digits ddd
\xdd The character code corresponding to the number formed by two hexadecimal digits dd
\udddd The Unicode character code corresponding to the number formed by four hexidecimal digits dddd.
\^C Control character
\c A backslash followed by any other character c matches itself
$ The dollar sign ($) denotes the end of a line. If the dollar sign ends a regular expression, the expression is matched only at the end of a line.

. The dot (.) matches any character except the newline, so this expression is equivalent to [^\n].

"..." Metacharacters lose their meaning within double quotes and represent themselves. The sequence \" (which represents the single character ") is the only exception.

{name} Curly braces denote a macro expansion, with name the declared name of the associated macro.

* The star (*) represents Kleene closure and matches zero or more repetitions of the preceding regular expression.

+ The plus (+) matches one or more repetitions of the preceding regular expression, so e+ is equivalent to ee*.

? The question mark (?) matches zero or one repetitions of the preceding regular expression.

(...) Parentheses are used for grouping within regular expressions.

[...] Square backets denote a class of characters and match any one character enclosed in the backets. If the first character following the left bracket ([) is the up arrow (^), the set is negated and the expression matches any character except those enclosed in the backets. Different metacharacter rules hold inside the backets, with the following expressions having special meanings:
{name} Macro expansion
a - b Range of character codes from a to b to be included in character set
"..." All metacharacters within double quotes lose their special meanings. The sequence \" (which represents the single character ") is the only exception.
\ Metacharacter following backslash(\) loses its special meaning

For example, [a-z] matches any lower-case letter, [^0-9] matches anything except a digit, and [0-9a-fA-F] matches any hexadecimal digit. Inside character class brackets, a metacharacter following a backslash loses its special meaning. Therefore, [\-\\] matches a dash or a backslash. Likewise ["A-Z"] matches one of the three characters A, dash, or Z. Leading and trailing dashes in a character class also lose their special meanings, so [+-] and [-+] do what you would expect them to (ie, match only '+' and '-').


2.3.3 Associated Actions

The action associated with a lexical rule consists of Java code enclosed inside block-delimiting curly braces.
{ action }
The Java code action is copied, as given, into the state-driven lexical analyzer produced by JLex.

All curly braces contained in action not part of strings or comments should be balanced.




2.3.3.1 Actions and Recursion:

If no return value is returned in an action, the lexical analyzer will loop, searching for the next match from the input stream and returning the value associated with that match.

The lexical analyzer can be made to recur explicitly with a call to yylex(), as in the following code fragment.
{ ...
return yylex();
... }
This code fragment causes the lexical analyzer to recur, searching for the next match in the input and returning the value associated with that match. The same effect can be had, however, by simply not returning from a given action. This results in the lexer searching for the next match, without the additional overhead of recursion.

The preceding code fragment is an example of tail recursion, since the recursive call comes at the end of the calling function's execution. The following code fragment is an example of a recursive call that is not tail recursive.
{ ...
next = yylex();
... }
Recursive actions that are not tail-recursive work in the expected way, except that variables such as yyline and yychar may be changed during recursion.




2.3.3.2 State Transitions:

If lexical states are declared in the JLex directives section, transitions on these states can be declared within the regular expression actions. State transitions are made by the following function call.
yybegin(state);
The void function yybegin() is passed the state name state and effects a transition to this lexical state.

The state state must be declared within the JLex directives section, or this call will result in a compiler error in the generated source file. The one exception to this declaration requirement is state YYINITIAL, the lexical state implicitly declared by JLex. The generated lexer begins lexical analysis in state YYINITIAL and remains in this state until a transition is made.




2.3.3.3 Available Lexical Values:

The following values, internal to the Yylex class, are available within the action portion of the lexical rules.
Variable or Method ActivationDirective Description
java.lang.String yytext(); Always active. Matched portion of the character input stream.
int yychar; %char Zero-based character index of the first character in the matched portion of the input stream
int yyline; %line Zero-based line number of the start of the matched portion of the input stream




3. Generated Lexical Analyzers

JLex will take a properly-formed specification and transform it into a Java source file for the corresponding lexical analyzer.

The generated lexical analayzer resides in the class Yylex. There are two constructors to this class, both requiring a single argument: the input stream to be tokenized. The input stream may either be of type java.io.InputStream or java.io.Reader (such as StringReader). Note that the java.io.Reader constructor should be used if you are generating a lexer accepting unicode characters, as the JDK 1.0 java.io.InputStream class does not always read unicode correctly.

The access function to the lexer is Yylex.yylex(), which returns the next token from the input stream. The return type is Yytoken and the function is declared as follows.
class Yylex { ...
public Yytoken yylex () {
... }
The user must declare the type of Yytoken and can accomplish this conveniently in the first section of the JLex specification, the user code section. For instance, to make Yylex.yylex() return a wrapper around integers, the user would enter the following code somewhere preceding the first ``%%''.
class Yytoken { int field; Yytoken(int f) { field=f; } }
Then, in the lexical actions, wrapped integers would be returned, in something like this way.
{ ...
return new Yytoken(0);
... }
Likewise, in the user code section, a class could be defined declaring constants that correspond to each of the token types.
class TokenCodes { ...
public static final STRING = 0;
public static final INTEGER = 1;
... }
Then, in the lexical actions, these token codes could be returned.
{ ...
return new Yytoken(STRING);
... }
These are simplified examples; in actual use, one would probably define a token class containing more information than an integer code.

These examples begin to illustrate the object-oriented techniques a user could employ to define an arbitrarily complex token type to be returned by Yylex.yylex(). In particular, inheritance permits the user to return more than one token type. If a distinct token type was needed for strings and integers, the user could make the following declarations.
class Yytoken { ... }
class IntegerToken extends Yytoken { ... }
class StringToken extends Yytoken { ... }
Then the user could return both IntegerToken and StringToken types from the lexical actions.

The names of the lexical analyzer class, the tokening function, and its return type each may be altered using the JLex directives. See the section 2.2.9 for more details.



4. Performance

A benchmark experiment was conducted, comparing the performance of a lexical analyzer generated by JLex to that of a hand-written lexical analyzer. The comparison was made for lexical analyzers of a simple ``toy'' programming language. The hand-written lexical analyzer, like the lexical analyzer generated by JLex, was written in Java.

The experiment consists of running each lexical analyzer on two source files written in the toy language, then measuring the time required to process these files. Each lexical analyzer was invoked by a dummy driver also written in Java.

The generated lexical analyzer proved to be quite quick, as the following results show.
Size of Source File JLex-Generated Lexical Analyzer: Execution Time Hand-Written Lexical Analyzer: Execution Times
177 lines 0.42 seconds 0.53 seconds
897 lines 0.98 seconds 1.28 seconds

The JLex lexical analyzer soundly outperformed the hand-written lexer.

One of the biggest complaints about table-driven lexical analyzers generated by programs like JLex is that these lexical analyzers do not perform as well as hand-written ones. Therefore, this experiment is particularly important in demonstrating the relative speed of JLex lexical analyzers.




5. Implementation Issues




5.1 Unimplemented Features

The following is a (possibly incomplete) list of unimplemented features of JLex.

  1. The regular expression lookahead operator is unimplemented, and not included in the list of special regular expression metacharacters.
  2. The start-of-line operator (^) assumes the following nonstandard behavior. A match on a regular expression that uses this operator will cause the newline that precedes the match to be discarded.



5.2 Unicode vs Ascii

In contrast to the 8-bit character type (char) mandated by Ansi C, Java supports a 16-bit char and the Unicode character set. Java provides a built-in String class to manipulate these Unicode characters.

As of version 1.2.5, JLex uses the JDK 1.1 Reader and Writer classes to read in the JLex specification file and write out the lexical analyzer source file. This means that all unicode characters are allowed in both of these. In order for the generated scanner to work with unicode characters, you must use the java.io.Reader constructor of the generated scanner, and the Reader you provide must properly handle the translation from OS-native format to unicode. You must also specify the %unicode directive in the specification; see section 2.2.14.




5.3 Commas in State Lists

Commas between state names in declaration lists and lexical rules are optional. These lists will be correctly parsed with white space between state names and without comma separators.




5.4 Wish List of Unimplemented Features

The following minor features would be nice to have as part of JLex, but have not been implemented due to their scope or their negative impact upon performance.

  1. Detection of unbalanced braces within the comment portion of lexical actions.
  2. Detection of cycles in macro definitions.



6. Credits and Copyrights




6.1 Credits

The treatment of lexical analyzer generators given in Alan Holub's Compiler Design in C (Prentice-Hall, 1990) provided a starting point for my implementation.

Discussions with Professor Andrew Appel of the Princeton University Computer Science Department provided guidance in the design of JLex.

Java is a trademark of Sun Microsystems Incorporated.




6.2 Copyright

JLex COPYRIGHT NOTICE, LICENSE AND DISCLAIMER.

Copyright 1996 by Elliot Joel Berk.

Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice and warranty disclaimer appear in supporting documentation, and that the name of Elliot Joel Berk not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.

Elliot Joel Berk disclaims all warranties with regard to this software, including all implied warranties of merchantability and fitness. In no event shall Elliot Joel Berk be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this software.


Frank Flannery
Wed Jul 24 00:27:39 EDT 1996
jlex-1.2.6/bugs.html0000644000175000017500000000526707333551274016035 0ustar cjwatsoncjwatson00000000000000 JLex Bugs

JLex Bugs

The following bugs reports are provided to inform users of known problems with Java-Lex. These reports are in the form of email messages describing the problem encountered.

If you have an additional bug report, describing a problem not listed in one of these reports, please mail it to cananian@alumni.princeton.edu.

Please check your JVM before mailing in a bug report! There have been a spate of recent reports which have been traced down to buggy virtual machines (e.g. kaffe) or compilers (e.g. pizza). Please take a second to check that your bug can be reproduced using "standard" tools before reporting a bug.

jlex-1.2.6/bug10.txt0000644000175000017500000000251506403011101015630 0ustar cjwatsoncjwatson00000000000000Date: Mon, 28 Apr 1997 17:27:51 +0000 From: Per Velschow To: ejberk@princeton.edu Subject: Bug in JavaLex Hello! I have been using JavaLex and CUP for a while with JDK 1.0.2. But then I tried using it with the new JDK 1.1.1 with some problems. The biggest problems where in CUP (I will mail my problem to them), but there was also a little problem in JavaLex. It has to do with the new API in JDK 1.1.1. SUN has "deprecated" some constructors and methods so that you get a warning when you try to compile a program using them. Specifically in the generated code from JavaLex it uses the following deprecated constructor: String(byte[], int, int, int) The solution I have found is very easy (if it works) just let the lexer use the constructor String(byte[], int, int) instead. It uses the platform's default character encoding. You can find out more about this here http://www.javasoft.com/products/jdk/1.1/docs/api/java.lang.String.html I actually have made the change myself by replacing the line m_outstream.writeBytes("\t\treturn (new java.lang.String(yy_buffer, 0, \n"); with m_outstream.writeBytes("\t\treturn (new java.lang.String(yy_buffer,\n"); Yours, Per Velschow ===================================== Per Velschow mailto:pervel@isa.dknet.dk http://www.isa.dknet.dk/~pervel/ jlex-1.2.6/bug11.txt0000644000175000017500000000651406403017334015652 0ustar cjwatsoncjwatson00000000000000Date: Mon, 24 Feb 97 14:25:55 -0500 Message-Id: <9702241925.AA02147@Princeton.EDU> Received: from [206.229.41.51] by PACEVM.DAC.PACE.EDU (IBM VM SMTP V2R3) with TCP; Mon, 24 Feb 97 12:52:10 EST From: "Joseph Bergin" Reply-To: "Joseph Bergin" To: appel@princeton.edu Subject: Java compiler book and tools Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Mailer: POPmail 2.3b7 I recently got copies of your Java and ML compiler books. They are very nice and I will probably adopt them in future in my compiler courses. Regarding your tools, however. Some machines can't effectively use a command line for inputting names of files and other things. Therefore I have developed the following class that can be used to put an interface on the tools such as JavaLex and CUP. They should work with any system, but are necessary on something like the Macintosh. They may be freely distributed, but keep my authorship please. ---------cut here------------- package AUX; import java.awt.Frame; import java.awt.FileDialog; /** * BlowPipe can be used to put an interface on programs that normally accept * instructions * from the command line. Therefore it can be used to make standard unix tools * more friendly to Macintosh and Windows systems.

* To use this, put calls to these functions into your main(String [] argv) * function at the beginning, * before you start to decode argv. For example, to decode one argument, that is * to be taken as the input file use:

*  public static void main ( String argv[] ) throws java.io.IOException
*   {	arg = new String[1];
*	arg[0] = AUX.BlowPipe.getOldFileName(null);
*	. . .
*	

* as the beginning of your main function. * *

* Note that all functions in this class are static. You cannot create an object * of type BlowPipe. It is just a code encapsulator.

* @version 1.0 * @author Joseph Bergin, Pace University . */ public class BlowPipe { /** * Return the fully qualified path and file reference for an input file. * @param parent The frame of your application object (or null). * @return file name string prepended with directory information. */ public static String getOldFileName(Frame parent) { FileDialog fd = new FileDialog(parent, "Old File Name", FileDialog.LOAD); fd.setDirectory("."); fd.show(); System.out.println(fd.getDirectory() + fd.getFile()); return fd.getDirectory()+ fd.getFile(); } /** * Return the fully qualified path and file reference for an output file. * @param parent The frame of your application object (or null). * @return file name string prepended with directory information. */ public static String getNewFileName(Frame parent) { FileDialog fd = new FileDialog(parent, "Old File Name", FileDialog.SAVE); fd.setDirectory("."); fd.show(); System.out.println(fd.getDirectory() + fd.getFile()); return fd.getDirectory()+ fd.getFile(); } /** * Return argument strings such as switches. * Not implemented at this time. Returns null. */ public static String getArgString() { return null; // not yet implemented } private BlowPipe(){}; } ---------cut here------------- Joseph Bergin, Professor Pace University, Computer Science, One Pace Plaza, NY NY 10038 EMAIL berginf@pace.edu HOMEPAGE http://csis.pace.edu/~bergin/ jlex-1.2.6/bug12.txt0000644000175000017500000000224506405755717015670 0ustar cjwatsoncjwatson00000000000000Date: Wed, 10 Sep 1997 21:47:06 -0400 To: appel@princeton.edu From: William Uther Subject: JLex Bug Hi, I'm using JLex 1.2 on a macintosh using the Metrowerks compiler and VM. I've been having problems with the EOLN char (\r and/or \n). Might I suggest you use java.lang.System.getProperty("line.separator") to get the line separation string? You could even include an EOLN macro containing the correct end of line regular expression. (On old DOS systems, end of line is two characters, \r\n, so a single character for EOLN wont work if you want to support these). If you were really tricky you might be able to work the EOLN format back into the parser (not just the generator) so that it has the System.getProperty call in it and it will work on any system. I've solved the problem by having my own EOLN=[\r\n] macro. (note, this is not \r\n that I seem to get with m_unix set to false). later, \x/ill :-} William Uther "Most people would sooner die than think; will@cs.cmu.edu In fact, they do so." Dept. of Computer Science, - Bertrand Russell Carnegie Mellon University http://www.cs.cmu.edu/~will/ jlex-1.2.6/bug13.txt0000644000175000017500000000532506544705647015674 0ustar cjwatsoncjwatson00000000000000Date: Thu, 18 Jun 1998 11:22:01 -0700 From: Raimondas Lencevicius Organization: Computer Science Dept., UCSB Subject: JLex Bug: Yylex.class includes methods longer than 65K Hi, I was successfully using JLex in my research project for some time, but now I have a problem. I have to run some code under Java verifier, and it throws an error saying that method in Yylex is longer than 65K. Also, it appears that JDK 1.2Beta3 enforces the verification more strictly, so my program may become unusable. :-((( Here is the warning produced by JDK 1.2Beta3 "javac qbd.lex.java": javac qbd.lex.java qbd.lex.java:2308: This code requires generating a method with more than 64K bytes. Virtual machines may refuse the resulting class file. { -1, -1, -1, -1, -1, -1, -1, -1, ^ 1 warning --------------------------------------------------- Date: Thu, 25 Jun 1998 11:45:37 -0700 From: Raimondas Lencevicius Subject: Re: Fixing JLex Bug: Methods longer than 65K Dear Dr. Appel, I have modified the JLex program in the following way. I have fixed the yy_nxt[][] assignment to take a result of a function that unpacks a string to an integer array. To achieve that, I added "private int [][] unpackFromString(int size1, int size2, String st)" function and coded the yy_nxt[][] values into a string by printing integers into a string and representing sequences of the same integer as "value:length" pairs. This encoding was simpler to implement and, I believe, more compact than encoding of each integer as a character or Unicode escape sequence. If someone wants to apply more sophisticated compression scheme, it's possible to do that. However, the .java file size was reduced 2 times (104K to 52K) with current encoding, which, I think, is reasonable. The .class file size was reduced from 193K to 32K (6 times) for the same grammar. The rewritten JLex compiles and runs under JDK 1.1.5 and JDK 1.2B3. I generated .java file and Yylex.class for my grammar and for the sample grammar available at JLex home page. I have encountered no errors including no 64K limit error. There are a couple of possible negatives of the new version. Some editors and operating systems may not be able to handle the huge one-line generated string. This could be circumvented by cutting the string into more manageable parts while keeping in mind .class constant pool size. Also String unpacking may be slower than a direct array initialization. I have attached the modified version of JLex to this message. My comments are added at the beginning of the file. You are welcome to integrate them into the "release" comments, if you decide to use this version as an official JLex release. Sincerely, Raimondas jlex-1.2.6/bug14.txt0000644000175000017500000000231006746566533015667 0ustar cjwatsoncjwatson00000000000000From Torsten.Hilbrich@bln.de Sun Jul 25 07:54:07 1999 Date: 11 Dec 1997 23:31:02 +0100 From: Torsten Hilbrich To: appel@princeton.edu Subject: JLex 1.2.2: Problems with 8bit characters still laying around I previously had 1.1.1 installed and got strange errors with some HTML files that I parsed. After looking at the home page I recognized that these were problems with 8bit characters (such as german umlauts) and are supposed to be fixed in 1.2.1. However, I just installed 1.2.2 and the error is still there. If I compile the example file and run it using a simple token printer, I get the following message if entering a character > 127: java.lang.ArrayIndexOutOfBoundsException: at Yylex.yylex(html.lex.java:416) at TestLex.main(TestLex.java:6) The code just around line 416 is the following: if (YYEOF != yy_lookahead) { yy_next_state = yy_nxt[yy_rmap[yy_state]][yy_cmap[yy_lookahead]]; } Thanks for your reply, Torsten -- I haven't lost my mind -- it's backed up on tape somewhere. Fortune Cookie PGP Public key available jlex-1.2.6/bug2.txt0000644000175000017500000000602306223521562015570 0ustar cjwatsoncjwatson00000000000000From cananian@phoenix.Princeton.EDUWed Sep 11 14:59:42 1996 Date: Sun, 11 Aug 1996 23:48:42 -0400 (EDT) From: "C. Scott Ananian" To: "Elliot J. Berk" Subject: JavaLex bug reports.... A couple minor cross-platform type JavaLex bugs (I'm developing Java code on a mac right now, so I notice these types of things). 1) Around line 1600, there's the following code fragment: if (m_lexGen.AT_BOL == m_spec.m_current_token) { start = CAlloc.newCNfa(m_spec); start.m_edge = '\n'; anchor = anchor | CSpec.START; m_lexGen.advance(); [etc] I'm pretty sure that the \n should be expanded to include '\r' as well if m_unix is true. I'm working around this and the ^ bug by manually specifying line terminations in my regexps. I don't know your code quite well enough to offer you a prepackaged bug fix for this one. 2) line 3706: while ((byte) '\n' != m_buffer[m_buffer_index]) should probably be something like: while (((byte) '\n' != m_buffer[m_buffer_index] && (byte) '\r' != m_buffer[m_buffer_index]) ) there may be other hacks you'd want to make, but I think because you're throwing away empty lines, "\r\n" should make it through all right (don't know for sure, Mac's just saying '\r') Also, I replaced line 3770 (?) m_line[m_line_read] = (char) m_buffer[m_buffer_index]; with CUtility.assert(((char)m_buffer[m_buffer_index]=='\n')|| ((char)m_buffer[m_buffer_index]=='\r')); m_line[m_line_read] = '\n'; just to be safe. 3) Around line 5055, /* Check for and discard empty line. */ if (0 == m_input.m_line_read || '\n' == m_input.m_line[0]) should probably be: /* Check for and discard empty line. */ if (0 == m_input.m_line_read || '\n' == m_input.m_line[0] || '\r' == m_input.m_line[0]) I think that's all for now. If you happen to dust off the code and get a chance to work on some of your 'Unimplemented' items, I'd appreciate a fix for the start-of-line-discarding-newline bug. Also, I *think* there might be a bug in how quoted strings in regexps are handled. I didn't track this one down, but I was having trouble with a regexp that looked something like: {macro1}{macro2}*":"{macro3}* The problem went away when I rewrote this as {macro1}{macro2}*[:]{macro3}* not quite sure why that was. Anyway, kudos again for JavaLex; it seems to be clearly the best-implemented lexer for Java available right now. --Scott @ @ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-oOO-(_)-OOo-=-=-=-=-= C. Scott Ananian: cananian@princeton.edu/ Declare the Truth boldly and 227 Henry Hall, Princeton University / without hindrance. Princeton, NJ 08544 /META-PARRESIAS AKOLUTOS:Acts 28:31 -.-. .-.. .. ..-. ..-. --- .-. -.. ... -.-. --- - - .- -. .- -. .. .- -. PGP key available via finger and from http://www.princeton.edu/~cananian jlex-1.2.6/bug3.txt0000644000175000017500000000145406223521562015574 0ustar cjwatsoncjwatson00000000000000From root@P-henryv.cs.arizona.eduWed Sep 11 14:59:33 1996 Date: Tue, 3 Sep 1996 09:29:58 -0700 (MST) From: root To: ejberk@Princeton.EDU Subject: JavaLex Hi, I compiled JavaLex.java in a directory that is in my CLASSPATH. I got this error: ------------------------------------------------------------------- P-henryv:/java/JavaLex# javac JavaLex.java JavaLex.java:3328: Warning: Public class JavaLex.Main must be defined in a file called "Main.java". public class Main ^ 1 error P-henryv:/java/JavaLex# -------------------------------------------------------------------- My OS is Linux. Is there a FAQ or a trouble shooting document that you could point me to? Thanks, Henry ___________________________ : henryv@lec.cs.arizona.edu ___________________________ jlex-1.2.6/bug4.txt0000644000175000017500000000113206223521562015566 0ustar cjwatsoncjwatson00000000000000From nshaylor@tcp.co.ukWed Sep 11 14:59:14 1996 Date: Wed, 04 Sep 1996 21:27:13 GMT From: Nik Shaylor To: ejberk@Princeton.EDU Subject: Bug of feature? Hi Elliot, I have just come across what looks to me like small bug with your otherwise excellent program JavaLex. Macros appear not to be expanded if they come immediately after a quoted string e.g. this: w="world" msg="Hello"X{w} produces: "Hello"X"world" but this: w="world" msg="Hello"{w} produces: "Hello"{w} Hope this helps. Thanks for your good work, Nik Shaylor. jlex-1.2.6/bug5.txt0000644000175000017500000000270606223521562015577 0ustar cjwatsoncjwatson00000000000000From nshaylor@tcp.co.ukWed Sep 11 14:59:48 1996 Date: Tue, 10 Sep 1996 00:28:24 GMT From: Nik Shaylor To: ejberk@Princeton.EDU Subject: Bug of feature? Hi Elliot, I have a few more things for you regarding JavaLex. 1. The "%notunix" directive does not do what I thought it would. I expected it to cause all '\r' characters to be ignored. I appears to cause '\r' characters to be counted as newlines like '\n' is. 2. Even with the "%full" directive the output scanner cannot handle characters with ASCII values over 127. This is because yy_lookahead, yy_advance(), and YYEOF are all defined as being of type byte. As byte are signed in Java the construction: yy_next_state = yy_nxt[yy_rmap[yy_state]][yy_cmap[yy_lookahead]]; fails with an array bound exception because yy_lookahead is negative. If you define the three above items as int and "& 0xFF" the characters output by yy_advance() the problem will be solved. 3. I have not looked very hard at the following, but I think there may be a problem with the 'start of line' symbol (^). It looks like it is being taken as a 'just passed end of line' symbol. 4. The yy_getchar() routine returns the number of characters from the start of the file. It would be far more useful (to me at any rate) for it to return the number of characters since the last '\n'. 5. There appears to be no way to put comments into the input file? Hope this helps make your great program even better. Nik Shaylor. jlex-1.2.6/bug6.txt0000644000175000017500000000205206223521563015573 0ustar cjwatsoncjwatson00000000000000From iw@return-online.deWed Sep 11 15:00:08 1996 Date: Wed, 11 Sep 1996 18:25:47 +0200 From: Ingo Wechsung To: ejberk@Princeton.EDU Subject: JavaLex Dear Elliot, congratulations for writing such a tool as JavaLex! I downloaded and tried it today and it works great. However, did you ever try to input a character >127 to the generated program? When I do this (by typing a german umlaut for example), I get an ArrayIndexOutOfBoundsException. Any idea? As far as I could see you store the input in byte arrays which would lead to negative bytes for beyound-ASCII-characters. I could imagine you use the byte values as an index somewhere ... Unfortunately I couldn't debug it yet, but will do tomorrow. BTW, yes, I have the %full directive in my source file ... -- Email: iw@return-online.de Tel.: 07121/928624 Fax.: 07121/928686 CAS Nord GmbH, Geschaeftsstelle Reutlingen -- Return Online GmbH Friedrich-Ebert-Str. 3, 72762 Reutlingen, Germany Please have a look at our Web pages: http://www.return-online.de jlex-1.2.6/bug7.txt0000644000175000017500000000237306235204312015573 0ustar cjwatsoncjwatson00000000000000 The following JavaLex input file is in error; an undefined macro is referenced in the rules section. However, JavaLex generates errors into infinity, which it shouldn't do. -- Elliot **************************************************************************** // // (c) Copyright 1996 PrinceNet, Inc. // All Rights Reserved. // // ------------------------------------------------------------------ // import java.io.*; %% %type java_cup.runtime.Symbol %char %line %{ int firstCharinLine; int tabs = 0; boolean MacroHeaderDetected = false; private int stringstart; private StringBuffer charBuf = null; private java_cup.runtime.Symbol tok(int kind) { return new java_cup.runtime.Symbol(kind, yychar, yychar+yylength()); } private java_cup.runtime.Symbol tok(int kind, Object o) { return new java_cup.runtime.Symbol(kind, yychar, yychar+yylength(), o); } private void error(int line, String text) { System.out.println("Invalid text on line " + line + " " + text); } %} %eofval{ return tok(sym.EOF); %eofval} %% '{' { return tok(sym.LBRACE); } \} { return tok(sym.RBRACE); } " " { } \n { firstCharinLine = yychar+1; } . { error(yyline+1, yytext()); } jlex-1.2.6/bug8.txt0000644000175000017500000000056506236304227015604 0ustar cjwatsoncjwatson00000000000000 From mromeo@Adobe.COM Fri Nov 1 00:18:10 1996 Date: Wed, 30 Oct 1996 15:41:32 -0500 From: Maribeth Romeo To: ejberk@Princeton.EDU Subject: JavaLex Do you see any harm in changing your BUFFER_SIZE in JavaLex from 1024 to 2056? I've hit a limit on this and can't continue my lex program without increasing this number. Maribeth Romeo mromeo@adobe.com jlex-1.2.6/bug9.txt0000644000175000017500000000571606367630466015624 0ustar cjwatsoncjwatson00000000000000SUMMARY: If you have a space character in your regular expression, even inside square brackets or quotes, you must put a backslash before it! This is not ideal; I consider the backslash to be a workaround. -- A. Appel Date: Tue, 29 Jul 1997 15:24:30 -0400 From: bryant@airmics.gatech.edu (Dr. Barrett Bryant) Message-Id: <199707291924.PAA10166@polk.airmics> To: appel@CS.Princeton.EDU Subject: Re: JLex Content-Type: X-sun-attachment ---------- X-Sun-Data-Type: text X-Sun-Data-Description: text X-Sun-Data-Name: text X-Sun-Charset: us-ascii X-Sun-Content-Lines: 15 The state table is not being constructed properly. I have an example, which is essentially an adaptation of the lex example in Aho, Sethi and Ullman. The regular expression for white space doesn't cause any state to be produced so the first blank is flagged as an error. An abstraction of this example is attached (in the form of the JLex source, a driver main program, test input, and typescript), with the same problem occurring. (You'll notice that the action corresponding to white space doesn't appear anywhere in the lexical analyzer.) By the way, this same problem occurs with a declaration of a whitespace regular definition or an (" "|"\t"|"\n") expression. Encountering this error, I didn't make it very far into my toy compiler. As I mentioned, I would like to use this in fall, so any corrected version that arises from the new project maintainer would be greatly appreciated. Also, please let me know if I am using the system in a totally wrong way. Thanks. ---------- X-Sun-Data-Type: default X-Sun-Data-Description: default X-Sun-Data-Name: Ws.jlex X-Sun-Charset: us-ascii X-Sun-Content-Lines: 18 %% %{ private void Echo () { System . out . print (yytext ()); } %} %integer %eofval{ return -1; %eofval} digit [0-9] letter [A-Za-z] %% {digit}+ { System . out . print ("integer "); } {letter}+ { System . out . print ("id "); } [ \t\n] { System . out . print ("white space '"); Echo (); System . out . print ("'"); } . { System . out . println ("illegal character"); System . exit (0); } ---------- X-Sun-Data-Type: default X-Sun-Data-Description: default X-Sun-Data-Name: Ws.java X-Sun-Charset: us-ascii X-Sun-Content-Lines: 10 class Ws { public static void main (String [] args) throws java.io.IOException { Yylex lexer = new Yylex (System . in); while (lexer . yylex () >= 0); } } ---------- X-Sun-Data-Type: default X-Sun-Data-Description: default X-Sun-Data-Name: test.txt X-Sun-Charset: us-ascii X-Sun-Content-Lines: 2 This is a test 0123456789 0123456789 OK ---------- X-Sun-Data-Type: default X-Sun-Data-Description: default X-Sun-Data-Name: typescript X-Sun-Charset: us-ascii X-Sun-Content-Lines: 8 Script started on Tue Jul 29 15:09:30 1997 ]lbryant@polk:/usr/lincoln1/cisd/bryant/java/pl0\polk% java Ws < test.txt java.lang.Error: Lexical Error: Unmatched Input. at Yylex.yylex(Ws.jlex.java:202) at Ws.main(Ws.java:7) polk% exit polk% script done on Tue Jul 29 15:09:55 1997