latexml-0.8.1/0000755000175000017500000000000012507513572013105 5ustar norbertnorbertlatexml-0.8.1/.gitignore0000644000175000017500000000145512507513572015102 0ustar norbertnorbert# Ignore backup files *~ *.bak # Ignore perltidy files *.tdy # generated by running make MYMETA.json MYMETA.yml Makefile # And of course the whole compiled lib pm_to_blib blib/ # These are generated by creating manual.pdf doc/manual/manual.aux doc/manual/manual.idx doc/manual/manual.ilg doc/manual/manual.ind doc/manual/manual.lof doc/manual/manual.log doc/manual/manual.out doc/manual/manual.toc doc/manual/manual.xml doc/manual/pods/ # Generated files... doc/manual/release.tex doc/manual/schema.tex # The compiled manual, which is copied to top-level doc/manual/manual.pdf # These are generated by building the site doc/site/examples/tabular/tabular.aux doc/site/examples/tabular/tabular.log doc/site/examples/tabular/tabular.pdf doc/site/examples/tabular/tabular.xml doc/site/index.xml doc/site/releases.tex latexml-0.8.1/Changes0000644000175000017500000011265212507513572014407 0ustar norbertnorbert0.8.0 2014-05-05 - Too many changes to enumerate... - Generates HTML5, ePub - new Color objects allowing better & more accurate, and extensible color models; binding for the xcolor package. - RDFa support; Thanks Christoph Lange - consistent error reporting, in both conversion & post-processing that supports automated build systems; Thanks Deyan Ginev, Heinrich Stammerjohanns - uses dvipng (IF available) for converting tex code to images; MUCH faster than going via eps. HOWEVER, it wont handle embedded postscript, so we only use it for converting math to images. - Reorganize various non-perl data files into a (hopefully) more manageable arrangement, and with more consistent naming: lib/LaTeXML/resources contains various resources for running the program including * DTD, RelaxNG for holding various schema * XSLT, CSS, javascript for holding various styling & script resources - Mechanism for bindings to request resources (css, javascript) that will be included in generated output if appropriate (eg. if html) - more consistent naming schemes * for classes typically start ltx_, ltx_font_, ltx_role_ etc * LaTeXML.css for core.css * package specific css files: ltx-.css * special purpose css files: LaTeXML-.css - XSLT uses exclusively so same modules cope with both xhtml & html5; More easily extended and overridden by author customizations. 0.7.9xx 2012-08-01 - some slight efficiency improvements (reports Deyan Ginev, Joe Corneli) - fixed bug in Post reading from STDIN (thanks Josh Bialkowski) - various robustness - I/O reorganization, which includes * an extensible pseudo-protocol for sources of TeX and data, including file:, string: * Migration of control of I/O to functions defined in Package.pm InputFile, InputDefinitions, etc * Obsolete $GULLET->input, $GULLET->expandTokens * Tokens lists are now immutable (do NOT implement Mouth API) 0.7.9xx 2012-07-01 - more consistent handling of math spacing; treat \quad and wider as punctuation - better support for adding ID to all elements - parallel math markup will establish node-to-node cross-referencing Particularly pmml+cmml uses id & xref to connect related pmml & cmml nodes. - generate MathML mrow instead of mfenced. - experimentally, do NOT define non-control sequences as mathactive (in order to assign XMTok attributes), but store such properties separately. - Have MathBox store the XMTok properties of simple symbols; create Primitives for simple symbols, rather than Constructors. - switched comparison files from dvi to pdf; more portable 0.7.9xx 2012-06-02 - MathFork updates - use LaTeXML.catalog more consistently; define URN's for RelaxNG modules and XSLT modules - 0.7.9xx 2011-06-01 - various changes and small completions to LaTeX that help to process raw TeX style & definition files from the LaTeX distribution. - Support LaTeX's input encoding mechanism by finding, reading and implementing the encoding definitions - Support LaTeX's fontencoding mechanism and TeX's \char, etal, by implementing FontMap's that map input codepoints to unicode. Also read in font encoding definitions from the LaTeX distribution - Support babel by implementing core support and reading individual language definitions from the LaTeX distribution. 0.7.x 2011-03-16 - Bindings for floatfig, floatflt, JHEP,JHEP2,JHEP3 - improved model, attributes and conversion for floating objects - Have \caption increment the appropriate counter, rather than figure, table, etc This avoids many problems with subfigures, and so forth. 0.7.x 2011-02-28 - Bindings for llncs.cls, rsfs.sty, multicol.sty, enumerate.sty, xspace.sty, caption.sty, subfigure.sty, upgreek.sty - Initial support for AmSTeX (AmSTeX.pool) - Schema changes: new ltx:inline-para, fixed ltx:classification, positionable ltx:p, ltx:bib-data holds original bibentry - improved parallel markup - support for Unicode Plane 1 as alternative to mathvariant - various improved code, implenting \fracwithdelims, \DeclareMathAlphabet, \obeylines, \hypersetup, \centering, \raggedright, \raggedleft, \let, \rotatebox, \reflectbox, \scalebox, \qopname, \Sb, \Sp, \afterassignment, \sidecaption, \@vec, \minCDarrowwidth, \beginsection, \proclaim, \AtBeginDocument, \AtEndDocument, numbering, rules within tables - new Box() function - \iflatexml in latexml.sty namespaced attributes - improved table header heuristics - better handling, and distinguishing broup, begingroup - better ID handling; many elements get ID's, equations always do. - better MathML conversion, even when parsing fails. - support for --icon provides a favicon - more test cases (but not enough) - More careful distinction between ToString vs Stringify vs UnTeX Stringify is for debugging, ToString should return string form, NOT neccesarily for getting TeX many changes to definitions and usage of these. - Many constructors converted to Primitives to support turning accents or Unicode conversions directly to boxes, not requiring constructors; this allows them to be unambiguosly converted to Strings (using ToString) and thus can be put in attributes. - avoid introducing doubled slashes in pathnames, so that (eventually) might use the TeX standard interpretation that treats them as recursive wildcard. (that is, expandable directories) - INCOMPATIBLE: Localization; remove assumed names & symbols from formatting titles, sections, etc. Let style files determine prefixes or formatting of reference numbers. Put these within the ltx:title (eg.) and wrapped in ltx:tag. This is incompatible as different attributes are used, and ltx:tag is used to contain the reference number. - more flexible TOC creation, support for list-of-figures and similar. - Fuller support for seealso, heuristics to try to connect the terms to actual entries. - Safer encode/decode of objects being stored in the ObjectDB. 0.7.x 2009-06-19 (ref 983) - fixed typos in aa_support & sv - added wrapfig.sty, llncs.cls, rsfs.sty, multicol.sty, enumerate.sty, xspace.sty, caption.sty, subfigure.sty - initial implementation of AMSTeX (amstex.tex & amsppt.sty) - enabled better CSS styling via XSLT. Thanks Lee Worden 0.7.0 2009-06-16 (rev 964) - Release 0.7.0 0.6.x 2009-06-15 (rev 959) - mostly complete listings.sty - new classes sv, svmult - better compatibility-mode support - made indices case sensitive and fixed sorting order correspondingly - many small fixes throughout - improved manual - Thanks to Deyan Ginev, Michael Kohlhase, Lee Worden for reports/patches. 0.6.x 2009-05-07 (rev 899) - added various macros to existing packages. - new packages/classes: elsart, iopart, iopams, mn, mn2e - added less obnoxious Info message for things not as severe as Warning. - changes to centering, flush commands; get rid of centering element, try to use class and css for same effect. - fix typos in latexmlpost (Thanks Jason Blevins) - reduced load time, primarily for latexmlmath 0.6.x 2009-03-15 (rev. 824) - efficiency improvements - documentation and error message improvements - various minor fixes, extra macros, - heuristics to handle misused environments - support to "lock" macros from being inappropriately redefined within TeX documents - new packages: paralist, eurosym 0.6.x 2009-01-14 (rev. 740) Preparation for release 0.7.0 - New packages: revtex and revtex4 classes and styles (Thanks Deyan Ginev and Catalin David ) aas styles, amscd, lscape - latexml now processes BibTeX files - latexmlmath; new command for creating images or MathML for individual math expressions. - Improvements to MathGrammar for bra-ket notations, assignment, and other presentation markup like odd sub/superscripts - Improved option handling - More consistent math meaning "ontology" - Some documentation improvements - Rearranged Makefile & dependencies to port to Centos, MacOS - Many additional macros, robustness fixes - Clarified progress and error messages. 0.6.0 2008-04-09 (rev 485) Released. - Reorganized site and manual building. Unfortunately, the manual isn't as extensively rewritten as I'd like. - reorganized build & installation. Should be able to generate rpm's now. 0.5.x 2008-03-18 - Added implementation for supertabular and longtable (the latter isn't perfect; it produces empty rows after header/footers) - random minor fixes 0.5.x 2008-03-03 - Added support for class & package options with new exported functions in LaTeXML::Package (DeclareOption, PassOptions, ProcessOptions, ExecuteOptions). - Modular XSLT: individual XSLT files correspond to the Schema modules and are assembled into composite html and xhtml versions. - Option to latexmlpost to suppress section numbering. - new package ifpdf - more consistent date formatting - switch to using id in html instead of when it is absorbed, whether or not there's a \maketitle. => ... => ... inside of Similarly for This construct gives more flexibility for representing editors, translators, reviewers, etc, And also for various kinds of info about them (address, etc). 0.5.x 2006-07-24 - Fixed a sneaky bug in \def parameters, and gullet->readMatch where TeX collapses multiple spaces! Thanks Ioan! 0.5.x 2006-07-23 - Fixed the TeX font command to recognize more fonts. - Redefined sectioning commands to go via \@startsection, so more author customizations will still work. - added report & book classes. - Other random (minimal) packages: eulervm, yfonts, a4wide 0.5.x 2006-07-22 - Dealing with some font issues, and adding minimal implementations: fixltx2e, textcomp, exscale mathptmx, mathpazo, charter, utopia, chancery, helvet, avant, courier, bookman, newcent, times, palatino, mathptm, mathpple, latexsym, beton, euler, ccfonts, concmath, cmbright, luximono, txfonts, pxfonts, fourier, And a start at pifont... 0.5.x 2006-07-22 - Went through the TeX book and implemented bunches of Appendix B (plain) --- still not complete, but better. 0.5.x 2006-07-21 - Added implementation of amsthm; thanks Ioan Sucan. along with tests, implementation of LaTeX's newtheorem, and a few other tweaks to make it cleaner. 0.5.x 2006-07-17 - Revised handling of sub/superscripts: Abandoned the SUBSUPERSCRIPT combinination of sub+super generated by the parser; now sub and superscripts (including prescripts) are nested in the (seemingly) given order; ams's sideset is done similarly. This imho gives a more sensible semantic structure. The stackscripts attribute was renamed to scriptpos and generalized to track the position (pre, mid, post) and the bracket nesting level where the script was created. This allows the presentation mathml module to determine sensible nesting and positioning so that it can determine under/over vs sub/sup as well as subsuper combinations, and mprescripts according to the given tree. A few other enhancements were made to mathml generation, as well. - Added class attribute to and defined new (generic) block element with class attribute. These can serve as fallback elements for testing purposes. 0.5.x 2006-06-24 - With some trepidations... Converted math arrays to use XMArray/XMRow/XMCell so there's more sharing with tabular, and it can handle the lines, headings, etc. Instead of the more abstract XMApp/[role=ARRAY], etc. - Added a meaning attribute to math. The idea is: role : grammar or presentation info. name : a name for the token, probably from the cs, but not necessarily completely semantic. meaning: a hopefully semantic enough name for the token. This would be used for content conversions. - Recent runs show the Perl function bound as an XPath extension are very costly. Recoded the font match handling to avoid the perl function, using a set of contains calls. Vastly faster! 0.5.x 2006-06-16 - Fixed typos in DTD parsing Thanks Ioan Sucan - Reworked pathname_find to be a little clearer about seaching for files that come with the installation (but can be overridden by SEARCHPATHS). Added pathname_findall which finds all matching files, and used it so that all available catalogs are loaded, in particular any in the SEARCHPATHS. In the process noted a bug that if the environment variable XML_DEBUG_CATALOG is set, XML::LibXML bombs (seeking advice from mailing list). - Fix to \varintjlim (Ioan Sucan)_ 0.5.x 2006-06-01 - Simplifications to PMML; use roles more consistently (which means there are some roles that never appear in Grammar, but which represent presentational structure). 0.5.x 2006-05-20 - Sorted out (hopefully finally) the Unicode nonsense w/chars in 80--FF; Perl is dumm; you really need to use pack; exported handy UTF(hex) from LaTeXML::Package to help. I had been (over)using Unicode::Normalize::NFC to patchup after the fact, but this has screwy effects (translates \langle to something in chinese block !?!) 0.5.x 2006-04-28 - A number of initializations, typos, missing \and fixed. Thanks Christopher B. Hamlin - Fix to eqnarray numbering bug. Thanks Eduardo Tabacman 0.5.1 2006-04-27 - Release 0.5.1 0.5.x 2006-04-24 - Refined the math grammar, added some test cases. 0.5.x 2006-04-09 - Fixed up some math grammatical quirks, redefined default role for :, \mid - Corrected handling of \left.,\right. - Fixed up Presentation MathML handling of unsuccessfully parsed math. 0.5.x 2006-03-28 - straightened out some namespace mismatches in DTD's - Updated documentation to reflect current commands & API's 0.5.0 2006-03-22 - Release 0.4.0 0.4.x 2006-03-18 - Defined \LXDeclMath for "Math Declarations" in latexml package. These declarations can be embedded in the TeX Source. Basically these define patterns to match to scoped portions of the generated document tree (using Rewrite rules), and add declarative attributes to support the math parsing. 0.4.x 2006-03-01 - Modularized the DTD, along with lots of cleanup. 0.4.x 2006-02-09 - Cleaned up Makefile.pm: made ImageMagick optional (tho' without any clear failure mode when used, yet); Safer XSL style file generation. Should be close to able to install on windows. - Wrote some Test::Builder support code, reorganized the test suite, and started adding new tests. 0.4.x 2006-01-27 - Essentially backtracking on changes for 0.2.0, I'm concerned about the number of globals and exports, and formalizing extensible readers for control sequences. Thus, yet another incompatible change in the parameters to code blocks defining macros, primitives and constructors. macro($gullet,@args) primitive($stomach,@args) beforeDigest($stomach) afterDigest($stomach) constructor($document,@args, %properties) beforeDigest($stomach) properties($stomach,@args) afterDigest($stomach,$whatsit) beforeConstruct($document,$whatsit) afterConstruct($document,$whatsit,$node) For Tag code blocks: afterOpen($document,$node,$box); afterClose($document,$node,$box); 0.4.x 2006-01-22 - Yet another rewrite of tabular processing. Now allows @-expressions. - New Tag option: whitspaceTrim; this trims leading & trailing whitespace from the direct text content of these tags. - Significant namespace cleanup. There are 2 prefix/namespace mappings. (1) the one used in code (eg.ltxml) for constructors,etc. (2) the one used to interpret the DocType (dtd). Constructors should always specify a namespace prefix for names, unless they are in the null namespace (NOT default namespace). In fact, there is no longer a notion of default namespace, as such, and RegisterNamespace no longer takes that 3rd argument. DocType takes as extra args prefix=>namespaceURI mappings to be used in interpretting the DTD, and the resulting document will be constructed using those same prefixes. 0.4.1 2006-01-09 - Relase 0.4.1 0.4.x 2006-01-09 - Experimental tabular transformation. More faithful reproduction of latex tabular in html, via CSS. Heuristics for table headers. 0.4.x 2005-12-15 - Fixed some namespace usages, so that constructors containing "..." will work, provided foo was registered (RegisterNamespace) Added a too-simple testcase. - If "-" is used on latexml|latexmlpost command line, they reads the TeX|XML, respectively, from STDIN. 0.4.x 2005-09-27 - Added missing test result file keyval.xml - Patch to postprocessor: only mung LaTeXML's DTDs. 0.4.0 2005-09-26 - Release 0.4.0 0.3.x 2005-09-xx - Hopefully harmless simplifications in DTD regarding text. Combined the and with the element. Changed model for XMath to only allow rather than %Simple.class; and made auto-open so that all non-obviously math things will be wrapped in . - More DTD (and generation) modification to better support a logical versus physical paragraph structure. is a possibly numbered & labeled element generated by the \paragraph command. represents a logical paragraph; It contains block elements, in particular it can contain sequences of

and that represent a logical paragraph. It can have a refnum and label, although it does NOT get the label assigned by \label.

represents a physical paragraph --- a block of text. - implemented various missing plain macros. 0.3.x 2005-08-xx - Added support for LaTeX's picture environment and pstricks (along with pst-node). A Postprocessing module converts the resulting XML into SVG! Thanks very much to Ioan Sucan!! - Reverted the attribute xml:id to id on math nodes (XM*), since XMRef's idref attribute should only refer to XMath nodes. This also avoids conflict with other uses of xml:id that a developer might need to make. 0.3.x 2005-07-xx - INCOMPATIBLE changes. In order to make constructors more flexible, I'm incorporating the possibility to invoke arbitrary functions within constructors. So, something like: would set the attribute bar on the element foo to be the result of applying the function Func to the first argument, and the string 'a'. Note that even w/o args,parens are required (so maybe entities still work). The Incompatibility is due to absorbing previous ad-hoc functionality: ?IfMath is now ?#isMath (since isMath is an internal property of all Whatsits) Accessing bound values is VALUE('name') => &LookupValue('name') A new constructor pattern triggered by '%' is defined such that %value adds a _set_ of attributes to an element, where value would be something like #1, #foo, &KeyVals(#1) such that the value returns a hash reference. - The above also allows KeyVals to be better encapsulated and pulled out from the core of LaTeXML. The functionality of keyvals will now only be available if you \usepackage{keyvals} or RequirePackage('keyval') The Parameter specification for KeyVals is now of the form; RequiredKeyVals:name or OptionalKeyVals:name where name is the name of the keyval set. The first expects keyvals wrapped in the usual {} pair, whereas the second expects optional args wrapped in [], if present. Furthermore, the constructor patterns have been redefined in a more general framework: Accessing KeyVal data: #1{key} is now &KeyVal(#1,'key') Accessing all keys would now be instead of simply - Similar change to argument type semiverb. Instead of {semiverb} you should now write Semiverbatim this reads an {} delimited argument, but with most catcodes turned off. - Conditional patterns in constructors now properly balance the delimiting parentheses. Thus conditionals can now be nested, and function calls used within the patterns should work. - Revamped and regularized Parameter specs, making them more extensible. {KeyVals:foo} => RequiredKeyVals:foo [KeyVals:foo] => OptionalKeyVals:foo Flag:* => OptionalKeyword:* 0.3.x 2005-06-xx - Fixed Subtle bug with conditionals and \else; Special case: \else doesn't get expanded while the conditional test is being expanded! (See TeX: The Program) (Thanks Kohlhase for pointing it out) - Fixed \underline, \overrightarrow, \overleftarrow to work in textmode. 0.3.2 2005-05-16 - Cleanup of LaTeXML.dtd; to be make sure all elements get appropriate attributes defined (should validate mostly). - More tweaks & tuning for more understandable error messages. - implemented \raggedright, \raisebox,\buildrel,\stackrel - New functionality in Constructor patterns: VALUE('keyword') can be used where a value is expected to lookup a value in the state. Also allows args & such, So \ref ends up defined as VALUE('LABEL@#1') This also means that the constructors \@VALUE and \REF are no longer needed, so they're removed. 0.3.1 2005-05-10 - Improved mismatched environment reporting. - More faithful implementation of verbatim & comment environments with fixes to mouth's readRawLines. - Fix in Stringify for XML nodes; apparently a documentation bug in XML::LibXML::Namespace ? (it doesn't implement getValue) - imcremental improvements in latexmlfind 0.3.0 2005-05-06 - Release 0.3.0 - Some speculative code on handling the picture environment, along with pstricks, but not yet settled. - More exports from Package for common operations there, and hopefully reduce the usage of global $STOMACH, etc. - Improved and updated documentation. Still need to document the new Rewrite facilities (but would like to make API more concise) 0.2.99x 2005-04-13 - Allow * flag (ignored) on \newcommand, et.al. - Fixed some problems with fake environments (ie. \begin{small}...\end{small}) 0.2.99 2005-04-07 - Released as 0.2.99 so the Bremen folks can get some work done. Documentation update is needed for 0.3.0 release. 0.2.xx 2005-03-17 - Bigger changes, increment version. - Modified DocType; don't add namespace, use RegisterNamespace instead. - Intestine now creates XML::LibXML structures directly. Module LaTeXML::Node is removed. In fact, Intestine essentially represents the Document itself and thus is now renamed LaTeXML::Document. - Removed global exported Font() and MathFont() - Made more definitions scopable, cleaned up stash & scope implementation. Renamed: methods {de}activateStash - Implemented Rewrite rules that act on the constructed document. They also allow rules defined in terms of TeX strings (tokenized, digested, converted to document fragments and then XPath statements, as needed). These rules can be used to effectively declare variable or symbol's Grammatical roles. Math Parsing is now part of the latexml script and removed from latexmlpost. 0.2.3 2005-01-xx - Fixed a problem where misplaced egroups could inadvertently change the mode. Mode is no longer affected by the TeX stack; they must be explicitly start/finished (even though they also introduce grouping). - Fixed counting of `raw' lines read for "comment"ed environments. Line numbers for errors were getting skewed. - Moved sectional attribute declarations inside the %define.structure; block to ease defining extension DTD's 0.2.2 2005-01-11 - Random minor bug fixes and improvements to error reporting. 0.2.1 2005-01-10 - Bug fixes to stylesheet LaTeXML-xhtml, Thanks Yann Golanski - A few rearrangements and renamings to make a simple top-level 'digest from string' alternative. [ $latexml->readAndDigestString($string) ] Also, renamed the slightly misnamed Stomach methods: readAndDigestChunk => digestNextChunk readAndDigestBody => digestNextBody - A few typos in Stomach fixed - Almost complete implementation of the various AMS packages:amsbsy,amsfonts,amsmath,amsopn,amssymb,amstext,amsxtra Still need to complete and test the various alignment environments 0.2.0 2004-12-25 Extensive changes, so incrementing minor version, but not robust enough for major version! - Added version info to latexml, latexmlpost help output. separated --debug and --trace options. - Removed mathConstructor option to various DefXXs Use new constructor conditional "?IfMath(..)(...)" - POS is an annoying acronym. Role is better and upon reflection, doesn't conflict with OpenMath's ussge. Hence, partOfSpeech and POS have been replaced by role to describe the grammatical role (or `part of speech') of tokens to be interpreted during a math parse. - Reduced introduction of new `name' attributes for math tokens, especially when they add little value. Most greek & math characters are just replaced by thier unicode equivalent; In most cases, a name is synthesized from the control sequence when needed. The intestines will now create an XMTok, if required. Also, it will automatically manage the font and assign a `cs' attribute to record the macro used to create the token. - DefSymbol is deprecated (removed in fact) - DefMath (new) covers what DefSymbol did, and more: handles the common form for functions taking arguments. When the macro takes args and the replacement presentation text involves #1, it generates an XMDual using the replacement as an expansion, but also creates the content form. To avoid duplicating the arguments, the XMArg's containing the arguments in the content branch are marked with an id; in the presentation branch is used. Corresponding code in postprocessor looks up the referred node when needed. XMRef can also be used on it's own: see the macros \@XMArg and \@XMRef in TeX.ltxml. - Constructors take property arguments which supply properties to the whatsit (which can be CODE evaluated at digestion time). These properties can be used in the constructor pattern. - Refactoring of Intestine & DOM; most interaction with Model is done in Intestine. Renamed DOM to Node and renamed it's subclasses. - "In for a dime, in for a dollar": Since I'd found it necessary to use global variables to access the stomach and intestine from strange places, then I might just USE the darn'ed things! Consequently, most places were a $stomach, $intestine (or $gullet and $model) were passed around as arguments, no longer do. Now, just use the globals, which the inlines STOMACH, INTESTINE and GULLET and MODEL return. As a side effect, the `0-th' argument to CODE implementing control sequences is generally the definition (for whatever use that might be), or the Whatsit for constructors. - Made DOM construction more forgiving by using SalvageError when constructing a tree that doesn't conform to DTD. The result may not be valid, but continues processing. This led to major rewriting & cleanup of error reporting, and storing a `locator' on all data objects that record where in the source file they were created. [Thanks to suggestions from Kevin Smith] - Cleanup of math parsing, presentation mathml generation. - Added postprocessing module for generating OpenMath. It is insufficient, but a starting point. - latexml.sty & latexml.ltxml A start at providing special purpose macros that make sense in LaTeX, and do even more interesting things in LaTeXML. Currently, define some silly macros like \XML, \LaTeXML, etc, and provide LaTeX bindings for things like DefMath! - Sadly, I gave up on "overload". Nice idea, but for a big package, it's tricky to get right. The magic creation of methods can lead to hard-to-find performance issues, if you try to do to much with Stringify. So, Object doesn't use overload. To stringify or compare, consider the (newly exported functions in Global): Stringify($ob), ToString($ob) and Equals($a,$b). - added latexml.sty which should get installed in the local branch of the standard texmf directories. Not yet documented, but it provides (or will) LaTeX bindings to interesting LaTeXML declarations, eg. defining math commands. 0.1.2 2004-09-02 - Some experiments to reduce namespace redundancy. C14N is too severe, use of $node->addNewChild is non-portable and awkward. Kludge: leverage the namespace cleanup on _parsing_ !! (which means, write to string & reparse!!!) (acknowledged need on libxml2 end, but not done) - Portability fixes to LaTeXML::Util::Pathname Should work in Windows, thanks Ioan Sucan. - Modifications to Constructor patterns * Changed the `property' value pattern to '#name' (eg. #body instead of %body). (Gratuitous, but simplifies the grammar) * Values in patterns, #1 and #name can now be followed by {key}, for KeyVal arguments, to access the value associated with a given key. * Conditional expressions now recognize general values: ?(...) and also accept an else clause ?(...)(...). * The NOT conditional, !(...), is removed; Use ?()(...) instead. * Prefixing the constructor pattern with '^' allows the generated XML to `float up' to a parent node that is allowed to contain it, according to the document type. The floats keyword for definitions is also removed. * The untex strings for constructors that shouldn't appear in the math TeX string (used for image generation) should now be empty, '', instead of using the floats keyword. 0.1.1 2004-06-15 - Packages: * Made package loading more robust; doesn't re-load; * Crude access to options * Added several missing definitions to TeX & LaTeX * Implemented comment, acronym packages; initial (mostly empty) amssymb - General: * Catch filters that don't actually change the input. * Refactoring: New module Global.pm carries all exported constants and constructors to simplify coding. * Refactoring: name changes & code movement of methods confusingly called `digestFoo' and similar. * Refactoring: Moved macro parameter handling to new module Parameters.pm (and as side-effect had to rename parameters to Register (DefRegister, etc)) * More careful Token equals method, so newline can convert to a T_SPACE w/ newline inside; this means the output nominally preserves lines! (but STOMACH->setValue('preserveNewLines',0); disables it). * New constructor \@VALUE fetches values from stomach during absorbtion in intestines. This (or similar) used to put reference numbers in \ref, like 2nd LaTeX pass. - Math: * Introduced new element which can contain the various alternative representations of math, such as XMath, m:math, ... Moved most of XMath's attributes to Math. - Error/Warning Messages * Added messages to show progress during processing (unless -quiet) * New SalvageError message for things that in principle are errors, but we're going to try to proceed; added some things to this category, like unknown macros, and such. * Added source locator to Whatsit to improve error messages. - PostProcessing: * Fixed xml catalog so it finds mathml dtd and entity files * Fixed latexmlpost and LaTeXML::Post to recognize html and xhtml output formats; refined the stylesheets (LaTeXML-html.xsl and LaTeXML-xhtml.xsl (both of which include LaTeXML-base.xsl)) - Put LaTeXML tags in thier own namespace: http://dlmf.nist.gov/LaTeXML And first pass at fixing postprocessors to recognize this [probably introduced bugs, and in any case, namespace normalization is pretty crummy] - NEED TO DO: * Implement alltt package * Question: Should XMath be duplicated before parsing? (ie w/different status=tokenized|parsed|partially-parsed|....) This would allow more inference and then re-run the parser. * Extend constructor syntax to work with KeyVals, apply random functions? * Refactor DOM? eg. use XML::LibXML, move more analysis to Intestine this needs (at least) resorting Font reduction. Thanks to Michael Kohlhase for comments & examples leading to many of these patches. 0.1.0 2004-05-10 Initial (pre)release ��������������������������������������������������������������������������������������latexml-0.8.1/INSTALL�������������������������������������������������������������������������������0000644�0001750�0001750�00000000466�12507513572�014144� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������INSTALL Notes for LaTeXML perl Makefile.PL make make test make install REQUIREMENTS A sufficiently Unicode supporting Perl: 5.8, maybe 5.6 XML::LibXML and XML::LibXSLT (See www.CPAN.org) (which require libxml2 and libxslt: See http://www.xmlsoft.org/) Parse::RecDescent Image::Magick ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/LICENSE�������������������������������������������������������������������������������0000644�0001750�0001750�00000004230�12507513572�014111� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� THIS SOFTWARE IS IN THE PUBLIC DOMAIN: This software was developed at the National Institute of Standards and Technology by employees of the Federal Government in the course of their official duties. Pursuant to title 17 Section 105 of the United States Code, this software is not subject to copyright protection and is in the public domain. LaTeXML is an experimental system. NIST assumes no responsibility whatsoever for its use by other parties, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. We would appreciate acknowledgement if the software is used. To the extent that any copyright protections may still be considered to be held by the authors of this sofware in some jurisdiction outside the United States, the authors hereby waive those copyright protections and dedicate the software to the public domain. Thus, this license may be considered equivalent to Creative Commons 0: "No Rights Reserved". See http://creativecommons.org/about/cc0. CONTRIBUTOR NOTICE: Contributions of software patches and enhancements to this project are welcome; such contributions are assumed to be under the same terms as the software itself. Specifically, if you contribute code, documention, text samples or any other material, you are asserting and acknowledging that: you are the copyright holder of the material or that it is in the public domain; it does not contain any patented material; and that you waive any copyright protections and dedicate the material to the public domain. DISCLAIMER: The software is expressly provided "AS IS." NIST MAKES NO WARRANTY OF ANY KIND, EXPRESS, IMPLIED, IN FACT OR ARISING BY OPERATION OF LAW, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT AND DATA ACCURACY. NIST NEITHER REPRESENTS NOR WARRANTS THAT THE OPERATION OF THE SOFTWARE WILL BE UNINTERRUPTED OR ERROR-FREE, OR THAT ANY DEFECTS WILL BE CORRECTED. NIST DOES NOT WARRANT OR MAKE ANY REPRESENTATIONS REGARDING THE USE OF THE SOFTWARE OR THE RESULTS THEREOF, INCLUDING BUT NOT LIMITED TO THE CORRECTNESS, ACCURACY, RELIABILITY, OR USEFULNESS OF THE SOFTWARE. ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/MANIFEST������������������������������������������������������������������������������0000644�0001750�0001750�00000103633�12507513572�014244� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#================================================== # Base #================================================== Changes Makefile.PL MANIFEST This list of files MANIFEST.SKIP README INSTALL LICENSE manual.pdf #================================================== # Executables #================================================== bin/latexml bin/latexmlfind bin/latexmlpost bin/latexmlmath bin/latexmlc #================================================== # Master Modules #================================================== lib/LaTeXML.pm lib/LaTeXML/Version.in lib/LaTeXML/Global.pm #================================================== # Core Modules #================================================== lib/LaTeXML/Core.pm lib/LaTeXML/Core/State.pm lib/LaTeXML/Core/Mouth.pm lib/LaTeXML/Core/Mouth/file.pm lib/LaTeXML/Core/Mouth/http.pm lib/LaTeXML/Core/Mouth/https.pm lib/LaTeXML/Core/Mouth/Binding.pm lib/LaTeXML/Core/Gullet.pm lib/LaTeXML/Core/Stomach.pm lib/LaTeXML/Core/Document.pm lib/LaTeXML/Core/Rewrite.pm lib/LaTeXML/Core/Token.pm lib/LaTeXML/Core/Tokens.pm lib/LaTeXML/Core/Box.pm lib/LaTeXML/Core/Comment.pm lib/LaTeXML/Core/List.pm lib/LaTeXML/Core/Whatsit.pm lib/LaTeXML/Core/Definition.pm lib/LaTeXML/Core/Definition/Expandable.pm lib/LaTeXML/Core/Definition/Conditional.pm lib/LaTeXML/Core/Definition/Primitive.pm lib/LaTeXML/Core/Definition/Register.pm lib/LaTeXML/Core/Definition/CharDef.pm lib/LaTeXML/Core/Definition/Constructor.pm lib/LaTeXML/Core/Definition/Constructor/Compiler.pm lib/LaTeXML/Core/Parameter.pm lib/LaTeXML/Core/Parameters.pm lib/LaTeXML/Core/Alignment.pm lib/LaTeXML/Core/Alignment/Template.pm lib/LaTeXML/Core/Array.pm lib/LaTeXML/Core/KeyVals.pm lib/LaTeXML/Core/MuDimension.pm lib/LaTeXML/Core/MuGlue.pm lib/LaTeXML/Core/Pair.pm lib/LaTeXML/Core/PairList.pm # .... lib/LaTeXML/MathGrammar lib/LaTeXML/MathParser.pm #================================================== # Preprocessing Modules #================================================== lib/LaTeXML/Pre/BibTeX.pm lib/LaTeXML/Pre/BibTeX/Entry.pm #================================================== # Postprocessing Modules #================================================== lib/LaTeXML/Post.pm lib/LaTeXML/Post/Collector.pm lib/LaTeXML/Post/CrossRef.pm lib/LaTeXML/Post/Graphics.pm lib/LaTeXML/Post/LaTeXImages.pm lib/LaTeXML/Post/MakeBibliography.pm lib/LaTeXML/Post/MakeIndex.pm lib/LaTeXML/Post/Manifest.pm lib/LaTeXML/Post/Manifest/Epub.pm lib/LaTeXML/Post/MathImages.pm lib/LaTeXML/Post/MathML.pm lib/LaTeXML/Post/MathML/Linebreaker.pm lib/LaTeXML/Post/MathML/Presentation.pm lib/LaTeXML/Post/MathML/Content.pm lib/LaTeXML/Post/OpenMath.pm lib/LaTeXML/Post/PictureImages.pm lib/LaTeXML/Post/Scan.pm lib/LaTeXML/Post/Split.pm lib/LaTeXML/Post/SVG.pm lib/LaTeXML/Post/Writer.pm lib/LaTeXML/Post/XSLT.pm lib/LaTeXML/Post/XMath.pm lib/LaTeXML/LaTeXML.catalog #================================================== # Common Modules #================================================== lib/LaTeXML/Common/Object.pm lib/LaTeXML/Common/Config.pm lib/LaTeXML/Common/Color.pm lib/LaTeXML/Common/Color/rgb.pm lib/LaTeXML/Common/Color/cmy.pm lib/LaTeXML/Common/Color/cmyk.pm lib/LaTeXML/Common/Color/hsb.pm lib/LaTeXML/Common/Color/gray.pm lib/LaTeXML/Common/Color/Derived.pm lib/LaTeXML/Common/Error.pm lib/LaTeXML/Common/Font.pm lib/LaTeXML/Common/Number.pm lib/LaTeXML/Common/Float.pm lib/LaTeXML/Common/Dimension.pm lib/LaTeXML/Common/Glue.pm lib/LaTeXML/Common/Model.pm lib/LaTeXML/Common/Model/DTD.pm lib/LaTeXML/Common/Model/RelaxNG.pm lib/LaTeXML/Common/XML.pm lib/LaTeXML/Common/XML/Parser.pm lib/LaTeXML/Common/XML/XPath.pm lib/LaTeXML/Common/XML/XSLT.pm lib/LaTeXML/Common/XML/RelaxNG.pm #================================================== # Utility Modules #================================================== lib/LaTeXML/Util/Test.pm lib/LaTeXML/Util/ObjectDB.pm lib/LaTeXML/Util/ObjectDB/Entry.pm lib/LaTeXML/Util/Pathname.pm lib/LaTeXML/Util/Pack.pm lib/LaTeXML/Util/WWW.pm lib/LaTeXML/Util/Radix.pm lib/LaTeXML/Util/Image.pm lib/LaTeXML/Util/Geometry.pm lib/LaTeXML/Util/Transform.pm #================================================== # Document Model #================================================== lib/LaTeXML/resources/RelaxNG/LaTeXML.model lib/LaTeXML/resources/RelaxNG/LaTeXML.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-bib.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-bib.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-block.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-block.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-common.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-common.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-inline.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-inline.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-math.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-math.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-misc.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-misc.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-meta.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-meta.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-para.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-para.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-picture.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-picture.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-structure.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-structure.rng lib/LaTeXML/resources/RelaxNG/LaTeXML-tabular.rnc lib/LaTeXML/resources/RelaxNG/LaTeXML-tabular.rng lib/LaTeXML/resources/RelaxNG/svg/svg-animation.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-animation.rng lib/LaTeXML/resources/RelaxNG/svg/svg-animevents-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-animevents-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-basic-clip.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-basic-clip.rng lib/LaTeXML/resources/RelaxNG/svg/svg-basic-filter.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-basic-filter.rng lib/LaTeXML/resources/RelaxNG/svg/svg-basic-font.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-basic-font.rng lib/LaTeXML/resources/RelaxNG/svg/svg-basic-graphics-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-basic-graphics-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-basic-structure.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-basic-structure.rng lib/LaTeXML/resources/RelaxNG/svg/svg-basic-text.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-basic-text.rng lib/LaTeXML/resources/RelaxNG/svg/svg-clip.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-clip.rng lib/LaTeXML/resources/RelaxNG/svg/svg-conditional.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-conditional.rng lib/LaTeXML/resources/RelaxNG/svg/svg-container-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-container-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-core-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-core-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-cursor.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-cursor.rng lib/LaTeXML/resources/RelaxNG/svg/svg-datatypes.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-datatypes.rng lib/LaTeXML/resources/RelaxNG/svg/svg-docevents-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-docevents-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-extensibility.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-extensibility.rng lib/LaTeXML/resources/RelaxNG/svg/svg-extresources-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-extresources-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-filter.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-filter.rng lib/LaTeXML/resources/RelaxNG/svg/svg-font.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-font.rng lib/LaTeXML/resources/RelaxNG/svg/svg-gradient.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-gradient.rng lib/LaTeXML/resources/RelaxNG/svg/svg-graphevents-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-graphevents-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-graphics-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-graphics-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-hyperlink.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-hyperlink.rng lib/LaTeXML/resources/RelaxNG/svg/svg-image.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-image.rng lib/LaTeXML/resources/RelaxNG/svg/svg-marker.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-marker.rng lib/LaTeXML/resources/RelaxNG/svg/svg-mask.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-mask.rng lib/LaTeXML/resources/RelaxNG/svg/svg-opacity-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-opacity-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-paint-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-paint-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-pattern.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-pattern.rng lib/LaTeXML/resources/RelaxNG/svg/svg-profile.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-profile.rng lib/LaTeXML/resources/RelaxNG/svg/svg-qname.rng lib/LaTeXML/resources/RelaxNG/svg/svg-script.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-script.rng lib/LaTeXML/resources/RelaxNG/svg/svg-shape.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-shape.rng lib/LaTeXML/resources/RelaxNG/svg/svg-structure.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-structure.rng lib/LaTeXML/resources/RelaxNG/svg/svg-style.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-style.rng lib/LaTeXML/resources/RelaxNG/svg/svg-text.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-text.rng lib/LaTeXML/resources/RelaxNG/svg/svg-view.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-view.rng lib/LaTeXML/resources/RelaxNG/svg/svg-viewport-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-viewport-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg-xlink-attrib.rnc lib/LaTeXML/resources/RelaxNG/svg/svg-xlink-attrib.rng lib/LaTeXML/resources/RelaxNG/svg/svg11-basic.rng lib/LaTeXML/resources/RelaxNG/svg/svg11-tiny.rng lib/LaTeXML/resources/RelaxNG/svg/svg11.rnc lib/LaTeXML/resources/RelaxNG/svg/svg11.rng lib/LaTeXML/resources/DTD/LaTeXML.dtd #================================================== # XSLT & CSS Support #================================================== lib/LaTeXML/resources/XSLT/LaTeXML-html4.xsl lib/LaTeXML/resources/XSLT/LaTeXML-html5.xsl lib/LaTeXML/resources/XSLT/LaTeXML-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-epub3.xsl lib/LaTeXML/resources/XSLT/LaTeXML-common.xsl lib/LaTeXML/resources/XSLT/LaTeXML-all-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-bib-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-block-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-inline-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-math-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-meta-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-misc-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-para-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-picture-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-structure-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-tabular-xhtml.xsl lib/LaTeXML/resources/XSLT/LaTeXML-webpage-xhtml.xsl lib/LaTeXML/resources/CSS/LaTeXML.css lib/LaTeXML/resources/CSS/LaTeXML-marginpar.css lib/LaTeXML/resources/CSS/LaTeXML-navbar-left.css lib/LaTeXML/resources/CSS/LaTeXML-navbar-right.css lib/LaTeXML/resources/CSS/LaTeXML-blue.css lib/LaTeXML/resources/CSS/ltx-article.css lib/LaTeXML/resources/CSS/ltx-report.css lib/LaTeXML/resources/CSS/ltx-book.css lib/LaTeXML/resources/CSS/ltx-amsart.css lib/LaTeXML/resources/CSS/ltx-apj.css lib/LaTeXML/resources/CSS/ltx-listings.css lib/LaTeXML/resources/CSS/ltx-svjour.css lib/LaTeXML/resources/CSS/ltx-ulem.css lib/LaTeXML/resources/javascript/LaTeXML-maybeMathjax.js lib/LaTeXML/resources/Profiles/fragment-html.opt lib/LaTeXML/resources/Profiles/fragment-xhtml.opt lib/LaTeXML/resources/Profiles/fragment.opt lib/LaTeXML/resources/Profiles/math.opt lib/LaTeXML/resources/Profiles/modern.opt lib/LaTeXML/resources/Profiles/standard.opt lib/LaTeXML/resources/Profiles/stex-migration.opt lib/LaTeXML/resources/Profiles/stex-module.opt lib/LaTeXML/resources/Profiles/stex.opt #================================================== # Supported Packages #================================================== lib/LaTeXML/Package.pm lib/LaTeXML/Package/a0poster.cls.ltxml lib/LaTeXML/Package/a0size.sty.ltxml lib/LaTeXML/Package/a4.sty.ltxml lib/LaTeXML/Package/a4wide.sty.ltxml lib/LaTeXML/Package/aa.cls.ltxml lib/LaTeXML/Package/aa_support.sty.ltxml lib/LaTeXML/Package/aastex.cls.ltxml lib/LaTeXML/Package/aastex.sty.ltxml lib/LaTeXML/Package/aasms.sty.ltxml lib/LaTeXML/Package/aaspp.sty.ltxml lib/LaTeXML/Package/aas_support.sty.ltxml lib/LaTeXML/Package/acronym.sty.ltxml lib/LaTeXML/Package/ae.sty.ltxml lib/LaTeXML/Package/afterpage.sty.ltxml lib/LaTeXML/Package/alltt.sty.ltxml lib/LaTeXML/Package/amsart.cls.ltxml lib/LaTeXML/Package/amsbook.cls.ltxml lib/LaTeXML/Package/amsbsy.sty.ltxml lib/LaTeXML/Package/amscd.sty.ltxml lib/LaTeXML/Package/ams_core.cls.ltxml lib/LaTeXML/Package/ams_support.sty.ltxml lib/LaTeXML/Package/amsfonts.sty.ltxml lib/LaTeXML/Package/amsmath.sty.ltxml lib/LaTeXML/Package/amsopn.sty.ltxml lib/LaTeXML/Package/amsppt.sty.ltxml lib/LaTeXML/Package/amsproc.cls.ltxml lib/LaTeXML/Package/amsrefs.sty.ltxml lib/LaTeXML/Package/amssymb.sty.ltxml lib/LaTeXML/Package/AmSTeX.pool.ltxml lib/LaTeXML/Package/amstex.sty.ltxml lib/LaTeXML/Package/amstex.tex.ltxml lib/LaTeXML/Package/amstext.sty.ltxml lib/LaTeXML/Package/amsthm.sty.ltxml lib/LaTeXML/Package/amsxtra.sty.ltxml lib/LaTeXML/Package/applemac.def.ltxml lib/LaTeXML/Package/array.sty.ltxml lib/LaTeXML/Package/article.cls.ltxml lib/LaTeXML/Package/avant.sty.ltxml lib/LaTeXML/Package/babel.sty.ltxml lib/LaTeXML/Package/babel.def.ltxml lib/LaTeXML/Package/beton.sty.ltxml lib/LaTeXML/Package/BibTeX.pool.ltxml lib/LaTeXML/Package/bbold.sty.ltxml lib/LaTeXML/Package/bbm.sty.ltxml lib/LaTeXML/Package/bm.sty.ltxml lib/LaTeXML/Package/book.cls.ltxml lib/LaTeXML/Package/bookman.sty.ltxml lib/LaTeXML/Package/booktabs.sty.ltxml lib/LaTeXML/Package/braket.sty.ltxml lib/LaTeXML/Package/calc.sty.ltxml lib/LaTeXML/Package/caption.sty.ltxml lib/LaTeXML/Package/cancel.sty.ltxml lib/LaTeXML/Package/ccfonts.sty.ltxml lib/LaTeXML/Package/chancery.sty.ltxml lib/LaTeXML/Package/charter.sty.ltxml lib/LaTeXML/Package/circuitikz.sty.ltxml lib/LaTeXML/Package/cite.sty.ltxml lib/LaTeXML/Package/citesort.sty.ltxml lib/LaTeXML/Package/cmbright.sty.ltxml lib/LaTeXML/Package/color.sty.ltxml lib/LaTeXML/Package/colortbl.sty.ltxml lib/LaTeXML/Package/comment.sty.ltxml lib/LaTeXML/Package/concmath.sty.ltxml lib/LaTeXML/Package/courier.sty.ltxml lib/LaTeXML/Package/cp852.def.ltxml lib/LaTeXML/Package/crop.sty.ltxml lib/LaTeXML/Package/cropmark.sty.ltxml lib/LaTeXML/Package/dcolumn.sty.ltxml lib/LaTeXML/Package/doublespace.sty.ltxml lib/LaTeXML/Package/dsfont.sty.ltxml lib/LaTeXML/Package/ellipsis.sty.ltxml lib/LaTeXML/Package/elsart.cls.ltxml lib/LaTeXML/Package/elsarticle.cls.ltxml lib/LaTeXML/Package/elsart.sty.ltxml lib/LaTeXML/Package/elsart_support.sty.ltxml lib/LaTeXML/Package/emulateapj.cls.ltxml lib/LaTeXML/Package/emulateapj.sty.ltxml lib/LaTeXML/Package/enumerate.sty.ltxml lib/LaTeXML/Package/epigraph.sty.ltxml lib/LaTeXML/Package/epsfig.sty.ltxml lib/LaTeXML/Package/epsf.sty.ltxml lib/LaTeXML/Package/epsf.tex.ltxml lib/LaTeXML/Package/epstopdf.sty.ltxml lib/LaTeXML/Package/eTeX.pool.ltxml lib/LaTeXML/Package/eucal.sty.ltxml lib/LaTeXML/Package/eufrak.sty.ltxml lib/LaTeXML/Package/euler.sty.ltxml lib/LaTeXML/Package/eulervm.sty.ltxml lib/LaTeXML/Package/eurosym.sty.ltxml lib/LaTeXML/Package/euscript.sty.ltxml lib/LaTeXML/Package/exscale.sty.ltxml lib/LaTeXML/Package/fancyhdr.sty.ltxml lib/LaTeXML/Package/fixltx2e.sty.ltxml lib/LaTeXML/Package/fleqn.sty.ltxml lib/LaTeXML/Package/float.sty.ltxml lib/LaTeXML/Package/floatfig.sty.ltxml lib/LaTeXML/Package/floatflt.sty.ltxml lib/LaTeXML/Package/floatpag.sty.ltxml lib/LaTeXML/Package/fontenc.sty.ltxml lib/LaTeXML/Package/fontspec.sty.ltxml lib/LaTeXML/Package/footmisc.sty.ltxml lib/LaTeXML/Package/fourier.sty.ltxml lib/LaTeXML/Package/framed.sty.ltxml lib/LaTeXML/Package/frenchb.ldf.ltxml lib/LaTeXML/Package/fullpage.sty.ltxml lib/LaTeXML/Package/gen-j-l.cls.ltxml lib/LaTeXML/Package/gen-m-l.cls.ltxml lib/LaTeXML/Package/gen-p-l.cls.ltxml lib/LaTeXML/Package/geometry.sty.ltxml lib/LaTeXML/Package/german.sty.ltxml lib/LaTeXML/Package/graphics.sty.ltxml lib/LaTeXML/Package/graphicx.sty.ltxml lib/LaTeXML/Package/here.sty.ltxml lib/LaTeXML/Package/helvet.sty.ltxml lib/LaTeXML/Package/hhline.sty.ltxml lib/LaTeXML/Package/html.sty.ltxml lib/LaTeXML/Package/hyperref.sty.ltxml lib/LaTeXML/Package/hyperxmp.sty.ltxml lib/LaTeXML/Package/ifpdf.sty.ltxml lib/LaTeXML/Package/ifthen.sty.ltxml lib/LaTeXML/Package/ifvtex.sty.ltxml lib/LaTeXML/Package/ifxetex.sty.ltxml lib/LaTeXML/Package/indentfirst.sty.ltxml lib/LaTeXML/Package/inputenc.sty.ltxml lib/LaTeXML/Package/inst_support.sty.ltxml lib/LaTeXML/Package/iopams.sty.ltxml lib/LaTeXML/Package/iopart.cls.ltxml lib/LaTeXML/Package/iopart_support.sty.ltxml lib/LaTeXML/Package/JHEP.cls.ltxml lib/LaTeXML/Package/JHEP2.cls.ltxml lib/LaTeXML/Package/JHEP3.cls.ltxml lib/LaTeXML/Package/keyval.sty.ltxml lib/LaTeXML/Package/latexml.sty.ltxml lib/LaTeXML/Package/LaTeX.pool.ltxml lib/LaTeXML/Package/latexsym.sty.ltxml lib/LaTeXML/Package/latin10.def.ltxml lib/LaTeXML/Package/lineno.sty.ltxml lib/LaTeXML/Package/listings.sty.ltxml lib/LaTeXML/Package/listingsutf8.sty.ltxml lib/LaTeXML/Package/llncs.cls.ltxml lib/LaTeXML/Package/longtable.sty.ltxml lib/LaTeXML/Package/lscape.sty.ltxml lib/LaTeXML/Package/luximono.sty.ltxml lib/LaTeXML/Package/lxRDFa.sty.ltxml lib/LaTeXML/Package/makeidx.sty.ltxml lib/LaTeXML/Package/mathpazo.sty.ltxml lib/LaTeXML/Package/mathpple.sty.ltxml lib/LaTeXML/Package/mathptm.sty.ltxml lib/LaTeXML/Package/mathptmx.sty.ltxml lib/LaTeXML/Package/mathrsfs.sty.ltxml lib/LaTeXML/Package/mn.cls.ltxml lib/LaTeXML/Package/mn2e.cls.ltxml lib/LaTeXML/Package/mn2e_support.sty.ltxml lib/LaTeXML/Package/multicol.sty.ltxml lib/LaTeXML/Package/multido.sty.ltxml lib/LaTeXML/Package/multirow.sty.ltxml lib/LaTeXML/Package/natbib.sty.ltxml lib/LaTeXML/Package/newcent.sty.ltxml lib/LaTeXML/Package/nicefrac.sty.ltxml lib/LaTeXML/Package/ngerman.sty.ltxml lib/LaTeXML/Package/ntheorem.sty.ltxml lib/LaTeXML/Package/numprint.sty.ltxml lib/LaTeXML/Package/OmniBus.cls.ltxml lib/LaTeXML/Package/palatino.sty.ltxml lib/LaTeXML/Package/paralist.sty.ltxml lib/LaTeXML/Package/pdfTeX.pool.ltxml lib/LaTeXML/Package/pdflscape.sty.ltxml lib/LaTeXML/Package/pgf.sty.ltxml lib/LaTeXML/Package/pgfkeys.code.tex.ltxml lib/LaTeXML/Package/pgfplots.sty.ltxml lib/LaTeXML/Package/pgfsys-latexml.def.ltxml lib/LaTeXML/Package/pifont.sty.ltxml lib/LaTeXML/Package/placeins.sty.ltxml lib/LaTeXML/Package/preview.sty.ltxml lib/LaTeXML/Package/psfig.sty.ltxml lib/LaTeXML/Package/pspicture.sty.ltxml lib/LaTeXML/Package/pst-grad.sty.ltxml lib/LaTeXML/Package/pst-node.sty.ltxml lib/LaTeXML/Package/pstricks.sty.ltxml lib/LaTeXML/Package/pxfonts.sty.ltxml lib/LaTeXML/Package/relsize.sty.ltxml lib/LaTeXML/Package/report.cls.ltxml lib/LaTeXML/Package/revsymb.sty.ltxml lib/LaTeXML/Package/revtex3_support.sty.ltxml lib/LaTeXML/Package/revtex4_support.sty.ltxml lib/LaTeXML/Package/revtex4.cls.ltxml lib/LaTeXML/Package/revtex4.sty.ltxml lib/LaTeXML/Package/revtex.cls.ltxml lib/LaTeXML/Package/revtex.sty.ltxml lib/LaTeXML/Package/rotating.sty.ltxml lib/LaTeXML/Package/rsfs.sty.ltxml lib/LaTeXML/Package/scalefnt.sty.ltxml lib/LaTeXML/Package/setspace.sty.ltxml lib/LaTeXML/Package/slides.cls.ltxml lib/LaTeXML/Package/subfigure.sty.ltxml lib/LaTeXML/Package/supertabular.sty.ltxml lib/LaTeXML/Package/svjour.cls.ltxml lib/LaTeXML/Package/svmult.cls.ltxml lib/LaTeXML/Package/sv_support.sty.ltxml lib/LaTeXML/Package/tabularx.sty.ltxml lib/LaTeXML/Package/TeX.pool.ltxml lib/LaTeXML/Package/textcomp.sty.ltxml lib/LaTeXML/Package/texvc.sty.ltxml lib/LaTeXML/Package/theorem.sty.ltxml lib/LaTeXML/Package/tikz.sty.ltxml lib/LaTeXML/Package/tikz-3dplot.sty.ltxml lib/LaTeXML/Package/times.sty.ltxml lib/LaTeXML/Package/tocbibind.sty.ltxml lib/LaTeXML/Package/txfonts.sty.ltxml lib/LaTeXML/Package/t1enc.def.ltxml lib/LaTeXML/Package/units.sty.ltxml lib/LaTeXML/Package/upgreek.sty.ltxml lib/LaTeXML/Package/upref.sty.ltxml lib/LaTeXML/Package/url.sty.ltxml lib/LaTeXML/Package/utopia.sty.ltxml lib/LaTeXML/Package/verbatim.sty.ltxml lib/LaTeXML/Package/wrapfig.sty.ltxml lib/LaTeXML/Package/xargs.sty.ltxml lib/LaTeXML/Package/xcolor.sty.ltxml lib/LaTeXML/Package/xspace.sty.ltxml lib/LaTeXML/Package/yfonts.sty.ltxml lib/LaTeXML/Package/t1.fontmap.ltxml lib/LaTeXML/Package/ts1.fontmap.ltxml lib/LaTeXML/Package/t2a.fontmap.ltxml lib/LaTeXML/Package/t2b.fontmap.ltxml lib/LaTeXML/Package/t2c.fontmap.ltxml lib/LaTeXML/Package/type1cm.sty.ltxml lib/LaTeXML/Package/lgr.fontmap.ltxml lib/LaTeXML/Package/ly1.fontmap.ltxml lib/LaTeXML/Package/amsa.fontmap.ltxml lib/LaTeXML/Package/amsb.fontmap.ltxml lib/LaTeXML/Package/ot4.fontmap.ltxml lib/LaTeXML/Package/ulem.sty.ltxml lib/LaTeXML/Package/utf8.def.ltxml lib/LaTeXML/Package/xunicode.sty.ltxml #================================================== # TeX packages #================================================== lib/LaTeXML/texmf/latexml.sty lib/LaTeXML/texmf/lxRDFa.sty #================================================== # Test Suite. #================================================== t/00_tokenize.t t/10_expansion.t t/12_grouping.t t/20_digestion.t t/22_fonts.t t/30_encoding.t t/40_math.t t/50_structure.t t/52_namespace.t t/53_alignment.t t/55_theorem.t t/56_ams.t t/65_graphics.t t/70_parse.t t/80_complex.t t/81_babel.t t/90_latexmlpost.t t/91_latexmlc_api.t t/92_profiles.t t/93_formats.t t/94_runtimes.t t/95_complex_config.t t/alignment/array.pdf t/alignment/array.tex t/alignment/array.xml t/alignment/foo.png t/alignment/badeqnarray.pdf t/alignment/badeqnarray.tex t/alignment/badeqnarray.xml t/alignment/colortbls.pdf t/alignment/colortbls.tex t/alignment/colortbls.xml t/alignment/eqnarray.pdf t/alignment/eqnarray.tex t/alignment/eqnarray.xml t/alignment/halign.pdf t/alignment/halign.tex t/alignment/halign.xml t/alignment/halignatt.pdf t/alignment/halignatt.tex t/alignment/halignatt.xml t/alignment/mathmix.pdf t/alignment/mathmix.tex t/alignment/mathmix.xml t/alignment/listing.pdf t/alignment/listing.tex t/alignment/listing.xml t/alignment/any.sty.ltxml t/alignment/longtable.pdf t/alignment/longtable.tex t/alignment/longtable.xml t/alignment/morse.pdf t/alignment/morse.tex t/alignment/morse.xml t/alignment/plainmath.pdf t/alignment/plainmath.tex t/alignment/plainmath.xml t/alignment/supertabular.pdf t/alignment/supertabular.tex t/alignment/supertabular.xml t/alignment/tabtab.pdf t/alignment/tabtab.tex t/alignment/tabtab.xml t/alignment/tabular.pdf t/alignment/tabular.tex t/alignment/tabular.xml t/alignment/tabularstar.pdf t/alignment/tabularstar.tex t/alignment/tabularstar.xml t/alignment/vmode.pdf t/alignment/vmode.tex t/alignment/vmode.xml t/ams/amsdisplay.pdf t/ams/amsdisplay.tex t/ams/amsdisplay.xml t/ams/matrix.pdf t/ams/matrix.tex t/ams/matrix.xml t/ams/cd.pdf t/ams/cd.tex t/ams/cd.xml t/ams/genfracs.pdf t/ams/genfracs.tex t/ams/genfracs.xml t/ams/sideset.pdf t/ams/sideset.tex t/ams/sideset.xml t/babel/french.pdf t/babel/french.tex t/babel/french.xml t/babel/german.pdf t/babel/german.tex t/babel/german.xml t/babel/greek.pdf t/babel/greek.tex t/babel/greek.xml t/babel/page545.pdf t/babel/page545.tex t/babel/page545.xml t/babel/numprints.pdf t/babel/numprints.tex t/babel/numprints.xml t/complex/aliceblog.pdf t/complex/aliceblog.tex t/complex/aliceblog.xml t/complex/sunset.jpg t/complex/xii.dtd t/complex/xii.pdf t/complex/xii.latexml t/complex/xii.tex t/complex/xii.xml t/complex/hypertest.pdf t/complex/hypertest.tex t/complex/hypertest.xml t/digestion/box.pdf t/digestion/box.tex t/digestion/box.xml t/digestion/def.pdf t/digestion/def.tex t/digestion/def.xml t/digestion/chardefs.pdf t/digestion/chardefs.tex t/digestion/chardefs.xml t/digestion/defaultunits.pdf t/digestion/defaultunits.tex t/digestion/defaultunits.xml t/digestion/io.pdf t/digestion/io.tex t/digestion/io.xml t/digestion/exists.data t/digestion/primes.pdf t/digestion/primes.tex t/digestion/primes.xml t/digestion/testctr.pdf t/digestion/testctr.tex t/digestion/testctr.xml t/digestion/xargs.pdf t/digestion/xargs.tex t/digestion/xargs.xml t/encoding/ansinew.pdf t/encoding/ansinew.tex t/encoding/ansinew.xml t/encoding/applemac.pdf t/encoding/applemac.tex t/encoding/applemac.xml t/encoding/cp437de.pdf t/encoding/cp437de.tex t/encoding/cp437de.xml t/encoding/cp437.pdf t/encoding/cp437.tex t/encoding/cp437.xml t/encoding/cp850.pdf t/encoding/cp850.tex t/encoding/cp850.xml t/encoding/cp852.pdf t/encoding/cp852.tex t/encoding/cp852.xml t/encoding/cp858.pdf t/encoding/cp858.tex t/encoding/cp858.xml t/encoding/cp865.pdf t/encoding/cp865.tex t/encoding/cp865.xml t/encoding/cp1250.pdf t/encoding/cp1250.tex t/encoding/cp1250.xml t/encoding/cp1252.pdf t/encoding/cp1252.tex t/encoding/cp1252.xml t/encoding/decmulti.pdf t/encoding/decmulti.tex t/encoding/decmulti.xml t/encoding/latin1.pdf t/encoding/latin1.tex t/encoding/latin1.xml t/encoding/latin2.pdf t/encoding/latin2.tex t/encoding/latin2.xml t/encoding/latin3.pdf t/encoding/latin3.tex t/encoding/latin3.xml t/encoding/latin4.pdf t/encoding/latin4.tex t/encoding/latin4.xml t/encoding/latin5.pdf t/encoding/latin5.tex t/encoding/latin5.xml t/encoding/latin9.pdf t/encoding/latin9.tex t/encoding/latin9.xml t/encoding/latin10.pdf t/encoding/latin10.tex t/encoding/latin10.xml t/encoding/ot1.pdf t/encoding/ot1.tex t/encoding/ot1.xml t/encoding/t1.pdf t/encoding/t1.tex t/encoding/t1.xml t/encoding/t2a.pdf t/encoding/t2a.tex t/encoding/t2a.xml t/encoding/t2b.pdf t/encoding/t2b.tex t/encoding/t2b.xml t/encoding/t2c.pdf t/encoding/t2c.tex t/encoding/t2c.xml t/encoding/ts1.pdf t/encoding/ts1.tex t/encoding/ts1.xml t/encoding/ly1.pdf t/encoding/ly1.tex t/encoding/ly1.xml t/expansion/aftergroup.pdf t/expansion/aftergroup.tex t/expansion/aftergroup.xml t/expansion/definedness.pdf t/expansion/definedness.tex t/expansion/definedness.xml t/expansion/env.pdf t/expansion/env.tex t/expansion/env.xml t/expansion/environments.pdf t/expansion/environments.tex t/expansion/environments.xml t/expansion/etex.pdf t/expansion/etex.tex t/expansion/etex.xml t/expansion/for.pdf t/expansion/for.tex t/expansion/for.xml t/expansion/ifthen.pdf t/expansion/ifthen.tex t/expansion/ifthen.xml t/expansion/keywords.pdf t/expansion/keywords.tex t/expansion/keywords.xml t/expansion/lettercase.pdf t/expansion/lettercase.tex t/expansion/lettercase.xml t/expansion/meaning.pdf t/expansion/meaning.tex t/expansion/meaning.xml t/expansion/noexpand.pdf t/expansion/noexpand.tex t/expansion/noexpand.xml t/expansion/testchar.pdf t/expansion/testchar.tex t/expansion/testchar.xml t/expansion/testexpand.pdf t/expansion/testexpand.tex t/expansion/testexpand.xml t/expansion/testif.pdf t/expansion/testif.tex t/expansion/testif.xml t/expansion/testinput.pdf t/expansion/testinput.tex t/expansion/testinput.xml t/expansion/testinput.foo t/expansion/fragment1.tex t/expansion/fragment2.tex t/expansion/testmultido.pdf t/expansion/testmultido.tex t/expansion/testmultido.xml t/expansion/toks.pdf t/expansion/toks.tex t/expansion/toks.xml t/expansion/whichinput.pdf t/expansion/whichinput.tex t/expansion/whichinput.xml t/expansion/whichinput.latexml t/expansion/whichcache.pdf t/expansion/whichcache.tex t/expansion/whichcache.xml t/expansion/whichcache.latexml t/expansion/whichfrag1 t/expansion/whichfrag1.tex t/expansion/whichfrag1.tex.tex t/expansion/whichfrag2 t/expansion/whichfrag2.tex t/expansion/whichfrag3 t/expansion/whichpkga t/expansion/whichpkga.sty t/expansion/whichpkgb t/expansion/whichpkgb.sty t/expansion/whichpkgb.sty.sty t/expansion/whichpkgc.sty t/expansion/subdir/whichpkga.sty t/fonts/accents.pdf t/fonts/accents.xml t/fonts/accents.tex t/fonts/bbold.pdf t/fonts/bbold.xml t/fonts/bbold.tex t/fonts/cancels.tex t/fonts/cancels.xml t/fonts/cancels.pdf t/fonts/fonts.pdf t/fonts/fonts.tex t/fonts/fonts.xml t/fonts/mathaccents.pdf t/fonts/mathaccents.xml t/fonts/mathaccents.tex t/fonts/omencodings.pdf t/fonts/omencodings.tex t/fonts/omencodings.xml t/fonts/mixed.pdf t/fonts/mixed.tex t/fonts/mixed.xml t/fonts/mathcolor.pdf t/fonts/mathcolor.tex t/fonts/mathcolor.xml t/fonts/textcomp.pdf t/fonts/textcomp.tex t/fonts/textcomp.xml t/fonts/textsymbols.pdf t/fonts/textsymbols.tex t/fonts/textsymbols.xml t/fonts/ulem.pdf t/fonts/ulem.tex t/fonts/ulem.xml t/graphics/none.png t/graphics/colors.pdf t/graphics/colors.tex t/graphics/colors.xml t/graphics/calc.pdf t/graphics/calc.tex t/graphics/calc.xml t/graphics/framed.pdf t/graphics/framed.tex t/graphics/framed.xml t/graphics/graphrot.pdf t/graphics/graphrot.tex t/graphics/graphrot.xml t/graphics/keyval.pdf t/graphics/keyval.tex t/graphics/keyval.xml t/graphics/mykeyval.sty.ltxml t/graphics/mykeyval.sty t/graphics/picture.pdf t/graphics/picture.tex t/graphics/picture.xml t/graphics/simplekv.pdf t/graphics/simplekv.tex t/graphics/simplekv.xml t/graphics/xcolors.pdf t/graphics/xcolors.tex t/graphics/xcolors.xml t/grouping/mathgroup.pdf t/grouping/mathgroup.tex t/grouping/mathgroup.xml t/grouping/scopemacro.pdf t/grouping/scopemacro.latexml t/grouping/scopemacro.tex t/grouping/scopemacro.xml t/math/array.pdf t/math/array.tex t/math/array.xml t/math/arrows.pdf t/math/arrows.tex t/math/arrows.xml t/math/choose.pdf t/math/choose.tex t/math/choose.xml t/math/niceunits.pdf t/math/niceunits.tex t/math/niceunits.xml t/math/not.pdf t/math/not.tex t/math/not.xml t/math/simplemath.pdf t/math/simplemath.latexml t/math/simplemath.tex t/math/simplemath.xml t/math/testover.pdf t/math/testover.tex t/math/testover.xml t/math/testscripts.pdf t/math/testscripts.tex t/math/testscripts.xml t/namespace/ns1.dtd t/namespace/ns1.pdf t/namespace/ns1.latexml t/namespace/ns1.tex t/namespace/ns1.xml t/namespace/ns2.dtd t/namespace/ns2.pdf t/namespace/ns2.latexml t/namespace/ns2.tex t/namespace/ns2.xml t/namespace/ns3.dtd t/namespace/ns3.pdf t/namespace/ns3.latexml t/namespace/ns3.tex t/namespace/ns3.xml t/namespace/ns4.dtd t/namespace/ns4.pdf t/namespace/ns4.latexml t/namespace/ns4.tex t/namespace/ns4.xml t/namespace/ns5.dtd t/namespace/ns5.pdf t/namespace/ns5.latexml t/namespace/ns5.tex t/namespace/ns5.xml t/parse/compose.pdf t/parse/compose.tex t/parse/compose.xml t/parse/functions.pdf t/parse/functions.tex t/parse/functions.xml t/parse/kludge.pdf t/parse/kludge.tex t/parse/kludge.xml t/parse/mixedfrac.pdf t/parse/mixedfrac.tex t/parse/mixedfrac.xml t/parse/operators.pdf t/parse/operators.tex t/parse/operators.xml t/parse/parens.pdf t/parse/parens.tex t/parse/parens.xml t/parse/scripts.pdf t/parse/scripts.tex t/parse/scripts.xml t/parse/qm.pdf t/parse/qm.tex t/parse/qm.xml t/parse/relations.pdf t/parse/relations.tex t/parse/relations.xml t/parse/sets.pdf t/parse/sets.tex t/parse/sets.xml t/parse/spacing.pdf t/parse/spacing.tex t/parse/spacing.xml t/parse/terms.pdf t/parse/terms.tex t/parse/terms.xml t/post/simplemath-post.xml t/post/simplemath.xml t/structure/abstract.pdf t/structure/abstract.tex t/structure/abstract.xml t/structure/amsarticle.pdf t/structure/amsarticle.tex t/structure/amsarticle.xml t/structure/article.pdf t/structure/article.tex t/structure/article.xml t/structure/authors.pdf t/structure/authors.tex t/structure/authors.xml t/structure/badabstract.pdf t/structure/badabstract.tex t/structure/badabstract.xml t/structure/beforeafter.pdf t/structure/beforeafter.tex t/structure/beforeafter.xml t/structure/book.pdf t/structure/book.tex t/structure/book.xml t/structure/epitest.pdf t/structure/epitest.tex t/structure/epitest.xml t/structure/fancyhdr.pdf t/structure/fancyhdr.tex t/structure/fancyhdr.xml t/structure/figures.pdf t/structure/figures.tex t/structure/figures.xml t/structure/itemize.pdf t/structure/itemize.tex t/structure/itemize.xml t/structure/options.pdf t/structure/options.tex t/structure/options.xml t/structure/myclass.cls t/structure/myclass.cls.ltxml t/structure/apackage.sty t/structure/apackage.sty.ltxml t/structure/para.pdf t/structure/para.tex t/structure/para.xml t/structure/paralists.pdf t/structure/paralists.tex t/structure/paralists.xml t/structure/report.pdf t/structure/report.tex t/structure/report.xml t/structure/sec.pdf t/structure/sec.tex t/structure/sec.xml t/structure/svabstract.pdf t/structure/svabstract.tex t/structure/svabstract.xml t/theorem/amstheorem.pdf t/theorem/amstheorem.tex t/theorem/amstheorem.xml t/theorem/latextheorem.pdf t/theorem/latextheorem.tex t/theorem/latextheorem.xml t/theorem/theorem.pdf t/theorem/theorem.tex t/theorem/theorem.xml t/theorem/ntheorem.pdf t/theorem/ntheorem.tex t/theorem/ntheorem.xml t/tokenize/alltt.pdf t/tokenize/alltt.tex t/tokenize/alltt.xml t/tokenize/comment.pdf t/tokenize/comment.tex t/tokenize/comment.xml t/tokenize/ligatures.pdf t/tokenize/ligatures.tex t/tokenize/ligatures.xml t/tokenize/mathtokens.pdf t/tokenize/mathtokens.tex t/tokenize/mathtokens.xml t/tokenize/percent.pdf t/tokenize/percent.tex t/tokenize/percent.xml t/tokenize/url.pdf t/tokenize/url.tex t/tokenize/url.xml t/tokenize/verb.pdf t/tokenize/verb.tex t/tokenize/verb.xml t/tokenize/verbata.pdf t/tokenize/verbata.tex t/tokenize/verbata.xml t/daemon/citations.tex t/daemon/document.tex t/daemon/fragment.tex t/daemon/post.tex t/daemon/pre.tex t/daemon/tiny.bib t/daemon/api/port.spec t/daemon/api/port.status t/daemon/api/port.xml t/daemon/complex/exhaustive.spec t/daemon/complex/exhaustive.status t/daemon/complex/exhaustive.xml t/daemon/formats/citation.spec t/daemon/formats/citation.status t/daemon/formats/citation.xml t/daemon/formats/citationraw.spec t/daemon/formats/citationraw.status t/daemon/formats/citationraw.xml t/daemon/formats/makebib.spec t/daemon/formats/makebib.status t/daemon/formats/makebib.xml t/daemon/formats/noparse.spec t/daemon/formats/noparse.status t/daemon/formats/noparse.xml t/daemon/formats/parallel-math-cmml.spec t/daemon/formats/parallel-math-cmml.status t/daemon/formats/parallel-math-cmml.xml t/daemon/formats/parallel-math-om.spec t/daemon/formats/parallel-math-om.status t/daemon/formats/parallel-math-om.xml t/daemon/formats/parallel-math-pmml.spec t/daemon/formats/parallel-math-pmml.status t/daemon/formats/parallel-math-pmml.xml t/daemon/formats/parallel-math-xmath.spec t/daemon/formats/parallel-math-xmath.status t/daemon/formats/parallel-math-xmath.xml t/daemon/profiles/default.spec t/daemon/profiles/default.status t/daemon/profiles/default.xml t/daemon/profiles/fragment.spec t/daemon/profiles/fragment.status t/daemon/profiles/fragment.xml t/daemon/profiles/math.spec t/daemon/profiles/math.status t/daemon/profiles/math.xml t/daemon/profiles/standard.spec t/daemon/profiles/standard.status t/daemon/profiles/standard.xml t/daemon/profiles/stex/stex.spec t/daemon/profiles/stex/stex.status t/daemon/profiles/stex/stex.tex t/daemon/profiles/stex/stex.xml t/daemon/runtimes/timeout.spec t/daemon/runtimes/timeout.status t/daemon/runtimes/timeout.xml �����������������������������������������������������������������������������������������������������latexml-0.8.1/MANIFEST.SKIP�������������������������������������������������������������������������0000644�0001750�0001750�00000000157�12507513572�015006� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������~$ ^\# ^\. ^Makefile$ ^Makefile.old$ \.bak$ ($|/)TAGS$ (^|/)\.svn/ ^doc/ ^blib/ ^release/ ^tools/ ^pm_to_blib$ �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/Makefile.PL���������������������������������������������������������������������������0000644�0001750�0001750�00000026135�12507513572�015066� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- CPERL -*- #====================================================================== # Makefile Maker for LaTeXML # Bruce.Miller@NIST.gov #====================================================================== use ExtUtils::MakeMaker; use strict; use warnings; use FindBin; #====================================================================== # Use "perl Makefile.PL " # Build options are: # OLD_LIBXML : if you only have access to an old version of XML::LibXML (ie. before 1.61). # This is necessary because we will have an additional dependency # (XML::LibXML::XPathContext), and it would be too late to add that # dependence when we discover which version of XML::LibXML we end up with. # "Enterprise" Linuxes, like Centos and RedHat Enterprise are likely # to be stuck with such older versions (till now). # TEXMF= : Installs the tex style files to this texmf tree, # rather than where kpsewhich says TEXMFLOCAL is (useful for spec files?) #====================================================================== our $OLD_LIBXML = grep { /OLD_LIBXML/ } @ARGV; our $TEXMF; my ($texmfspec) = grep { /^TEXMF/ } @ARGV; if ($texmfspec && $texmfspec =~ /^TEXMF\s*=(.*)$/) { $TEXMF = $1; @ARGV = grep { $_ ne $texmfspec } @ARGV; # Remove so MakeMaker doesn't fret. } our @EXCLUSIONS = (); our $MORE_MACROS = {}; our $MORE_MAKERULES = ''; record_revision(); compile_MathGrammar(); install_TeXStyles(); extra_Tests(); WriteMakefile(NAME => 'LaTeXML', AUTHOR => 'Bruce Miller ', ABSTRACT => "transforms TeX and LaTeX into XML/HTML/MathML", VERSION_FROM => 'lib/LaTeXML.pm', MIN_PERL_VERSION => 5.008001, # A very restricted set of licenses are allowed here. No Creative Commons, eg.! # The tag open_source should be an Open Source Initiative approved license; # public domain is sorta included. See http://opensource.org/faq#public-domain LICENSE => 'open_source', CONFIGURE_REQUIRES => { 'version' => 0.77, }, PREREQ_PM => { 'Archive::Zip' => 0, 'DB_File' => 0, 'File::Which' => 0, 'Getopt::Long' => 2.37, 'Image::Size' => 0, 'IO::String' => 0, 'JSON::XS' => 0, 'LWP' => 0, 'Parse::RecDescent' => 0, 'Pod::Parser' => 0, # for Pod::Find 'Test::More' => 0, 'Time::HiRes' => 0, 'URI' => 0, 'version' => 0, # If we have an "old" version of XML::LibXML, # we also need XPathContext. # But we can't determine that additional dependence # after we've already started resolving dependences! ($OLD_LIBXML ? ('XML::LibXML' => 1.58, 'XML::LibXML::XPathContext' => 0) : ('XML::LibXML' => 1.61)), # But > 1.62 is better 'XML::LibXSLT' => 1.58, }, EXE_FILES => ['bin/latexml', 'bin/latexmlpost', 'bin/latexmlfind', 'bin/latexmlmath', 'bin/latexmlc'], macro => $MORE_MACROS, # link to github location to newer MakeMaker (eval { ExtUtils::MakeMaker->VERSION(6.46) } ? (META_MERGE => { 'meta-spec' => { version => 2 }, resources => { repository => { type => 'git', url => 'https://github.com/brucemiller/LaTeXML.git', web => 'https://github.com/brucemiller/LaTeXML'}}}) : () ), ); print STDERR ('=' x 55), "\n", "| If you plan on developing code, please consider using\n", "| the git pre-commit hook to assure style compliant code.\n", "| To install:\n", "| ln -s ../../tools/pre-commit .git/hooks\n", ('=' x 55), "\n" unless -x '.git/hooks/pre-commit'; #********************************************************************** # Overriding ExtUtils::MM methods #********************************************************************** # Exclude the sources used to generate others from the build (See below). sub MY::libscan { my ($self, $path) = @_; if (($path =~ /~$/) || grep { $path eq $_ } @EXCLUSIONS) { return ""; } return $self->MY::SUPER::libscan($path); } # Append any additional Makefile rules added by the following. sub MY::postamble { my ($self, @rules) = @_; return $self->MY::SUPER::postamble(@rules) . $MORE_MAKERULES; } #********************************************************************** # Special Cases #********************************************************************** #====================================================================== # Record the current (svn) repository revision number sub record_revision { # Don't copy the Version template to the installation; it's not needed push(@EXCLUSIONS, 'blib/lib/LaTeXML/Version.in'); # This should be the top-level directory, so it's revision should represent the whole project $$MORE_MACROS{REVISION_BASE} = $FindBin::RealBin; # This is where the REVISION gets stored (along with VERSION, etc) $$MORE_MACROS{REVISION_FILE} = '$(INST_LIBDIR)/LaTeXML/Version.pm'; # Get the current revision # This should be done SAFELY; and work even if svnversion isn't available (esp, windows, mac...) # (When it isn't there's an error, but REVISION ends up "" ... exactly right, I think?) ## This command is appropriate for svn ## $$MORE_MACROS{REVISION} = '$(shell svnversion $(REVISION_BASE))'; ## This command is appropriate for git (I think) if ((-d '.git') && (system("git --version") == 0) ) { # If a git checkout & can run git? $$MORE_MACROS{REVISION} = '$(shell git log --max-count=1 --abbrev-commit --pretty="%h")'; } # Extract the previously recorded revision from the revision file (awkward) $$MORE_MACROS{OLD_REVISION} = '`$(PERLRUN) -ne \'chomp;if(s/.*?REVISION\\s*=\\s*\\"// && s/\\".*//){print;}\' < $(REVISION_FILE)`'; # Substitute the revision into the revision template $$MORE_MACROS{RECORD_REVISION} = '$(PERLRUN) -pe "s@__REVISION__@$(REVISION)@" '; # Have concerns about the $(noecho), but otherwise, it's annoying! # it prints _every_ time you make, even if it doesn't update! $MORE_MAKERULES .= <<'RecordRevision'; # Record the svn revision in the Version module, for more informative diagnostics pure_all :: $(REVISION_FILE) update_revision # Always set version if version module template is newer $(REVISION_FILE): lib/LaTeXML/Version.in $(NOECHO) $(MKPATH) $(INST_LIBDIR)/LaTeXML $(RECORD_REVISION) lib/LaTeXML/Version.in > $(REVISION_FILE) # update version if stored revision if not current update_revision: $(NOECHO) $(MKPATH) $(INST_LIBDIR)/LaTeXML - $(NOECHO) test $(REVISION) = $(OLD_REVISION) \ || $(RECORD_REVISION) lib/LaTeXML/Version.in > $(REVISION_FILE) RecordRevision return; } #====================================================================== # We'll compile the RecDescent grammar during make; don't need to install grammar. sub compile_MathGrammar { push(@EXCLUSIONS, 'blib/lib/LaTeXML/MathGrammar'); $MORE_MAKERULES .= <<'MakeGrammar'; # Precompile the (Recursive Descent) MathGrammar pure_all :: $(INST_LIBDIR)/LaTeXML/MathGrammar.pm $(INST_LIBDIR)/LaTeXML/MathGrammar.pm: lib/LaTeXML/MathGrammar $(PERLRUN) -MParse::RecDescent - lib/LaTeXML/MathGrammar LaTeXML::MathGrammar $(NOECHO) $(MKPATH) $(INST_LIBDIR)/LaTeXML $(MV) MathGrammar.pm blib/lib/LaTeXML/MathGrammar.pm MakeGrammar return; } #====================================================================== # If there appears to be a TeX installation, install our included TeX style # file(s) into the standard TEXMFLOCAL, so that tex/latex can find & use them. # # Note the following complications: # * MakeMaker doesn't natively know how to install TeX styles, # so we have to add explicit rules to the Makefile. # * "staged builds", such as when building & installing rpms # install files to a temporary root directory $(DESTDIR). # (DESTDIR is generally empty for manual make) # * We'll need to run mktexlsr once the files are installed in # their _final_ location, so that they are indexed for tex. # * We need to be careful constructing pathnames to avoid fouling # Windows installations where the pathnames may have spaces. # Not to mention working around dmake's limitations. # # Strategy: # * During "perl Makefile.PL", tentatively use kpsewhich to find # if kpsewhich exists, and if so, where the style files should go. # This directory is stored in the Makefile (hopefully doesn't change later?) # * During "make pure_install", including staged builds, # if we've been supplied with a texmf directory, create the appropriate # subdirectories and install the style files there (but under $(DESTDIR)) # * During "make pure_install", but NOT during staged builds, run mktexlsr. # We test this simply by checking if texmf is writable. # * Add "post install" operations to staged build specfiles # to run mktexlsr. # * Wrap each entire TeX-related pathname in ONE set of double quotes to protect # embedded spaces. sub install_TeXStyles { if (!$TEXMF) { if (system("kpsewhich --expand-var='\$TEXMFLOCAL'") == 0) { # can run kpsewhich? $TEXMF = `kpsewhich --expand-var='\$TEXMFLOCAL'`; # Strip the quotes (they appear in windows, when spaces in pathnames(?)) # These quotes inhibit pasting pathnames togheter, # but we DO need to wrap quotes around all completed paths!! chomp($TEXMF); $TEXMF =~ s/^'//; $TEXMF =~ s/'$//; } } if (!$TEXMF) { warn "Warning: no TeX installation found.\n", " TeX is NOT required, but LaTeXML will have limited functionality.\n"; return; } $$MORE_MACROS{INST_TEXMFDIR} = '$(INST_LIB)/LaTeXML/texmf'; $$MORE_MACROS{INSTALLTEXMFDIR} = "$TEXMF/tex/latex/latexml"; $$MORE_MACROS{DESTINSTALLTEXMFDIR} = '$(DESTDIR)$(INSTALLTEXMFDIR)'; $$MORE_MACROS{INSTALLTEXMFBASEDIR} = "$TEXMF"; $$MORE_MACROS{DESTINSTALLTEXMFBASEDIR} = '$(DESTDIR)$(INSTALLTEXMFBASEDIR)'; $MORE_MAKERULES .= <<'InstallTeXStyles'; pure_install :: $(NOECHO) (($(PERLRUN) -e "exit(1) unless shift;" -- "$(INSTALLTEXMFBASEDIR)") && \ $(MKPATH) "$(DESTINSTALLTEXMFDIR)" && \ $(MOD_INSTALL) \ read "$(INSTALLTEXMFDIR)/.packlist" \ write "$(DESTINSTALLTEXMFDIR)/.packlist" \ "$(INST_TEXMFDIR)" "$(DESTINSTALLTEXMFDIR)" ) \ || echo "No TeX installation, skipping installing LaTeXML TeX packages" $(NOECHO) ($(PERLRUN) -e "exit(1) if -w shift;" -- "$(INSTALLTEXMFBASEDIR)" || mktexlsr) \ || echo "No write permission for $(INSTALLTEXMFBASEDIR), skipping mktexlsr" uninstall :: $(NOECHO) (($(PERLRUN) -e "exit(1) unless -w shift;" -- "$(DESTINSTALLTEXMFBASEDIR)") && \ $(UNINSTALL) "$(INSTALLTEXMFDIR)/.packlist" && \ ($(PERLRUN) -e "exit(1) if -w shift;" -- "$(INSTALLTEXMFDIR)" || mktexlsr)) \ || echo "No write permission for $(INSTALLTEXMFBASEDIR), skipping uninstalling LaTeXML TeX packages" InstallTeXStyles return; } #====================================================================== # Extra tests for Tikz; too slow for everyday tests. sub extra_Tests { $MORE_MAKERULES .= <<'ExtraTests'; EXTRA_TEST_FILES = t/*.tt fulltest : test extratest extratest :: PERL_DL_NONLAZY=1 $(FULLPERLRUN) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(EXTRA_TEST_FILES) ExtraTests return; } #====================================================================== �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/README��������������������������������������������������������������������������������0000644�0001750�0001750�00000001326�12507513572�013767� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������LaTeXML is a TeX and LaTeX to XML/HTML/MathML converter. From XML it can generate various flavors of HTML, MathML and ePub, with other formats under development. See the included manual.pdf for an documentation. The home page is at http://dlmf.nist.gov/LaTeXML/. LaTeXML development is currently hosted at https://github.com/brucemiller/LaTeXML, where you can retrieve and browse the current source, along with an Issue tracker https://github.com/brucemiller/LaTeXML/issues and Wiki https://github.com/brucemiller/LaTeXML/wiki. A mailing list for discussion is hosted at http://lists.jacobs-university.de/mailman/listinfo/project-latexml See the LICENSE file for copyright and licensing information. bruce.miller@nist.gov ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/bin/����������������������������������������������������������������������������������0000755�0001750�0001750�00000000000�12507513572�013655� 5����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/bin/latexml���������������������������������������������������������������������������0000755�0001750�0001750�00000026303�12507513572�015255� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl -w # /=====================================================================\ # # | latexml | # # | main conversion program | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # use strict; use warnings; use FindBin; use lib "$FindBin::RealBin/../lib"; use Getopt::Long qw(:config no_ignore_case); use Pod::Usage; use LaTeXML::Core; use LaTeXML; # Currently, just for version information. use LaTeXML::Util::Pathname; #********************************************************************** # Parse command line my ($verbosity, $strict, $comments, $noparse, $includestyles) = (0, 0, 1, 0, 0); my ($format, $destination, $help, $showversion) = ('xml', ''); my ($preamble, $postamble) = (undef, undef); my ($documentid); my $inputencoding; my $mode = undef; my @paths = (); my (@preload); GetOptions("destination=s" => \$destination, "output=s" => \$destination, "preload=s" => \@preload, "path=s" => \@paths, "preamble=s" => \$preamble, "postamble=s" => \$postamble, "quiet" => sub { $verbosity--; }, "verbose" => sub { $verbosity++; }, "strict" => \$strict, "xml" => sub { $format = 'xml'; }, "tex" => sub { $format = 'tex'; }, "box" => sub { $format = 'box'; }, "bibtex" => sub { $mode = 'BibTeX'; }, "noparse" => \$noparse, "includestyles" => \$includestyles, "inputencoding=s" => \$inputencoding, "comments!" => \$comments, "VERSION" => \$showversion, "debug=s" => sub { no strict 'refs'; ${ 'LaTeXML::' . $_[1] . '::DEBUG' } = 1; }, "documentid=s" => \$documentid, "help" => \$help, ) or pod2usage(-message => $LaTeXML::IDENTITY, -exitval => 1, -verbose => 0, -output => \*STDERR); pod2usage(-message => $LaTeXML::IDENTITY, -exitval => 1, -verbose => 2, -output => \*STDOUT) if $help; if ($showversion) { print STDERR "$LaTeXML::IDENTITY\n"; exit(1); } pod2usage(-message => "$LaTeXML::IDENTITY\nMissing input TeX file", -exitval => 1, -verbose => 0, -output => \*STDERR) unless @ARGV; my $source = $ARGV[0]; #********************************************************************** # Set up the processing. print STDERR "$LaTeXML::IDENTITY\n" if $verbosity > -1; print STDERR "processing started " . localtime() . "\n" if $verbosity > -1; @paths = map { pathname_canonical($_) } @paths; if (my @baddirs = grep { !-d $_ } @paths) { warn "These path directories do not exist: " . join(', ', @baddirs) . "\n"; } my $latexml = LaTeXML::Core->new( preload => [@preload], searchpaths => [grep { -d $_ } reverse(@paths)], graphicspaths => ['.'], verbosity => $verbosity, strict => $strict, includeComments => $comments, inputencoding => $inputencoding, includeStyles => $includestyles, documentid => $documentid, nomathparse => $noparse); # Check that destination is valid before wasting any time... if ($destination) { $destination = pathname_canonical($destination); if (my $dir = pathname_directory($destination)) { pathname_mkdir($dir) or die "Couldn't create destination directory $dir: $!"; } } binmode(STDERR, ":encoding(UTF-8)"); $mode = 'BibTeX' if !defined $mode && ($source =~ /\.bib$/); $mode = 'TeX' unless defined $mode; # ======================================== # First read and digest whatever we're given. my $digested; if ($source eq '-') { { local $/ = undef; $source = "literal:" . <>; } } $digested = $latexml->digestFile($source, mode => $mode, preamble => $preamble, postamble => $postamble); # ======================================== # Now, convert to DOM and output, if desired. my $serialized; if ($digested) { $latexml->withState(sub { if ($format eq 'tex') { $serialized = LaTeXML::Core::Token::UnTeX($digested); } elsif ($format eq 'box') { $serialized = ($verbosity > 0 ? $digested->stringify : $digested->toString); } else { my $dom = $latexml->convertDocument($digested); $serialized = $dom->toString(1); } }); } $latexml->showProfile(); # Show profile (if any) print STDERR "\nConversion complete: " . $latexml->getStatusMessage . ".\n"; print STDERR "processing finished " . localtime() . "\n" if $verbosity > -1; if (!$serialized) { } elsif ($destination) { my $OUTPUT; open($OUTPUT, ">", $destination) or die "Couldn't open output file $destination: $!"; print $OUTPUT $serialized; close($OUTPUT); } else { print $serialized; } # ======================================== # Now, unbind stuff, so we can clear memory $latexml = undef; $digested = undef; $serialized = undef; #********************************************************************** __END__ =head1 NAME C - transforms a TeX/LaTeX file into XML. =head1 SYNOPSIS latexml [options] I Options: --destination=file sets destination file (default stdout). --output=file [obsolete synonym for --destination] --preload=module requests loading of an optional module; can be repeated --preamble=file sets a preamble file which will effectively be prepended to the main file. --postamble=file sets a postamble file which will effectively be appended to the main file. --includestyles allows latexml to load raw *.sty file; by default it avoids this. --path=dir adds to the paths searched for files, modules, etc; --documentid=id assign an id to the document root. --quiet suppress messages (can repeat) --verbose more informative output (can repeat) --strict makes latexml less forgiving of errors --bibtex processes as a BibTeX bibliography. --xml requests xml output (default). --tex requests TeX output after expansion. --box requests box output after expansion and digestion. --noparse suppresses parsing math --nocomments omit comments from the output --inputencoding=enc specify the input encoding. --VERSION show version number. --debug=package enables debugging output for the named package --help shows this help message. If I is '-', latexml reads the TeX source from standard input. If I has an explicit extention of C<.bib>, it is processed as a BibTeX bibliography. =head1 OPTIONS AND ARGUMENTS =over 4 =item C<--destination>=I Specifies the destination file; by default the XML is written to stdout. =item C<--preload>=I Requests the loading of an optional module or package. This may be useful if the TeX code does not specificly require the module (eg. through input or usepackage). For example, use C<--preload=LaTeX.pool> to force LaTeX mode. =item C<--preamble>=I, C<--postamble>=I Specifies a file whose contents will effectively be prepended or appended to the main document file's content. This can be useful when processing TeX fragments, in which case the preamble would contain documentclass and begindocument control sequences. This option is not used when processing BibTeX files. =item C<--includestyles> This optional allows processing of style files (files with extensions C, C, C, C). By default, these files are ignored unless a latexml implementation of them is found (with an extension of C). These style files generally fall into two classes: Those that merely affect document style are ignorable in the XML. Others define new markup and document structure, often using deeper LaTeX macros to achieve their ends. Although the omission will lead to other errors (missing macro definitions), it is unlikely that processing the TeX code in the style file will lead to a correct document. =item C<--path>=I Add I to the search paths used when searching for files, modules, style files, etc; somewhat like TEXINPUTS. This option can be repeated. =item C<--documentid>=I Assigns an ID to the root element of the XML document. This ID is generally inherited as the prefix of ID's on all other elements within the document. This is useful when constructing a site of multiple documents so that all nodes have unique IDs. =item C<--quiet> Reduces the verbosity of output during processing, used twice is pretty silent. =item C<--verbose> Increases the verbosity of output during processing, used twice is pretty chatty. Can be useful for getting more details when errors occur. =item C<--strict> Specifies a strict processing mode. By default, undefined control sequences and invalid document constructs (that violate the DTD) give warning messages, but attempt to continue processing. Using --strict makes them generate fatal errors. =item C<--bibtex> Forces latexml to treat the file as a BibTeX bibliography. Note that the timing is slightly different than the usual case with BibTeX and LaTeX. In the latter case, BibTeX simply selects and formats a subset of the bibliographic entries; the actual TeX expansion is carried out when the result is included in a LaTeX document. In contrast, latexml processes and expands the entire bibliography; the selection of entries is done during postprocessing. This also means that any packages that define macros used in the bibliography must be specified using the C<--preload> option. =item C<--xml> Requests XML output; this is the default. =item C<--tex> Requests TeX output for debugging purposes; processing is only carried out through expansion and digestion. This may not be quite valid TeX, since Unicode may be introduced. =item C<--box> Requests Box output for debugging purposes; processing is carried out through expansion and digestions, and the result is printed. =item C<--nocomments> Normally latexml preserves comments from the source file, and adds a comment every 25 lines as an aid in tracking the source. The option --nocomments discards such comments. =item C<--inputencoding=>I Specify the input encoding, eg. C<--inputencoding=iso-8859-1>. The encoding must be one known to Perl's Encode package. Note that this only enables the translation of the input bytes to UTF-8 used internally by LaTeXML, but does not affect catcodes. It is usually better to use LaTeX's inputenc package. Note that this does not affect the output encoding, which is always UTF-8. =item C<--VERSION> Shows the version number of the LaTeXML package.. =item C<--debug>=I Enables debugging output for the named package. The package is given without the leading LaTeXML::. =item C<--help> Shows this help message. =back =head1 SEE ALSO L, L, L =cut �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/bin/latexmlc��������������������������������������������������������������������������0000755�0001750�0001750�00000021266�12507513572�015423� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl -w # /=====================================================================\ # # | latexmlc | # # | client/server conversion program | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # use strict; use warnings; use Cwd qw(cwd abs_path); use IO::Socket; my $RealBin_safe; use FindBin; use File::Spec::Functions; use File::Which; BEGIN { if ($FindBin::RealBin =~ /^([^\0]+)\z/) { # Valid Unix path TODO: Windows, revisit regexp $RealBin_safe = $1; } die 'Fatal:IO:tainted RealBin was tainted! Failing...' unless ($RealBin_safe && (-e catfile($RealBin_safe, 'latexmlc'))); } # TODO: We probably want file cat for things like /../lib instead of spelling out a Unix path use lib catdir($RealBin_safe, "..", "lib"); #TODO: Do we ever care about ENV PATH that much? Do we miss on some feature like that? #$ENV{PATH} = "$RealBin_safe:/usr/bin:/usr/local/bin:"; use LaTeXML; use LaTeXML::Common::Config; use URI::Escape; use HTTP::Response; use HTTP::Request; use JSON::XS qw(decode_json); binmode(STDERR, ":encoding(UTF-8)"); binmode(STDOUT, ":encoding(UTF-8)"); # Determine if a socket server is installed locally and obtain its pathname: my $latexmls; $latexmls = catfile($RealBin_safe, 'latexmls') if (-e catfile($RealBin_safe, 'latexmls')); $latexmls = which('latexmls') unless defined $latexmls; # Some defaults: my $opts = LaTeXML::Common::Config->new(input_limit => 100); # Parse and load command-line options $opts->read(\@ARGV); my $keyvals = $opts->scan_to_keyvals(\@ARGV); # Client-side options: my ($port, $address, $expire, $local) = map { $opts->get($_) } qw(port address expire); $address = '127.0.0.1' if !$address || ($address eq 'localhost'); $address =~ s/^(\w+)\:\/\///; # strip away any protocol my $route = '/'; if ($address =~ s/\/(.+)$//) { # strip away route $route = '/' . $1; } # Local if peerhost is localhost: $local = ($expire && ($expire == -1)) || ($address eq '127.0.0.1'); $expire = -1 unless ((defined $expire) && $latexmls); $port = ($local ? 3334 : 80) unless $port; #Fall back if all fails... #*************************************************************************** #Add some base, so that relative paths work my $cdir = abs_path(cwd()); $cdir =~ s/ /\\ /g; if (!$opts->get('base')) { $opts->set('base', $cdir); push @$keyvals, ['base', $cdir]; } # Record if destination exists, for summary my $deststat; $deststat = (stat($opts->get('destination')))[9] if $opts->get('destination'); $deststat = 0 unless defined $deststat; push @$keyvals, ['path', $cdir]; #add current path, to ensure never empty push @{ $opts->get('paths') }, $cdir; # Get the full source of interest my $source = $opts->get('source'); $opts->delete('source'); if (!$source) { print STDERR "Input was empty.\n"; exit 1; } if ($source eq '-') { { local $/ = undef; $source = "literal:" . ; # Set the source in the keyvals to be sent over the wire: @$keyvals = grep { $_->[0] !~ /source|tex/ } @$keyvals; push @$keyvals, $source; } } #*************************************************************************** # Prepare output variables: my ($result, $status, $log); # TODO: Talk to the web service via HTTP #Setup client and communicate my $sock = $latexmls && IO::Socket::INET->new( PeerAddr => $address, PeerPort => $port, Proto => 'tcp', ); #Attempt connecting to a service if ((!$sock) && $local && ($expire == -1)) { # Don't boot a server, single job requested: require LaTeXML; $opts->set('local', 1); $opts->push('debug', 'LaTeXML') unless $opts->get('log'); # If we don't request log, print to STDERR my $converter = LaTeXML->get_converter($opts); $converter->prepare_session($opts); my $response = $converter->convert($source); ($result, $status, $log) = map { $$response{$_} } qw(result status log) if defined $response; } else { my $message = q{}; foreach my $entry (@$keyvals) { my ($key, $value) = ($$entry[0], $$entry[1]); $message .= uri_escape($key) . ($value ? '=' . uri_escape($value) : '') . '&'; } chop $message; #Startup daemon and feed in args, if needed system($latexmls, "--port=$port", "--expire=$expire", "--autoflush=" . $opts->get('input_limit')) if !$sock && $local; my $http_response = ($local ? process_local($address, $port, $route, $message, $sock) : process_remote($address, $port, $route, $message, $sock)); if ($http_response->is_success) { my $response = decode_json($http_response->content); ($result, $status, $log) = map { $$response{$_} } qw(result status log) if defined $response; } else { print STDERR "Fatal:HTTP:" . $http_response->code() . " " . $http_response->message() . "\n"; exit 1; } } #*************************************************************************** ### Common treatment of output: # Special features for latexmls: if ($log) { if ($opts->get('log')) { my $clog = $opts->get('log'); my $log_handle; if (!open($log_handle, ">", $clog)) { print STDERR "Fatal:IO:forbidden Couldn't open log file $clog : $!\n"; exit 1; } print $log_handle $log; close $log_handle; } else { print STDERR $log, "\n"; } #STDERR log otherwise } if ($result) { if ($opts->get('destination')) { my $output_handle; if (!open($output_handle, ">", $opts->get('destination'))) { print STDERR "Fatal:IO:forbidden Couldn't open output file " . $opts->get('destination') . ": $!"; exit 1; } print $output_handle $result; close $output_handle; } else { print STDOUT $result, "\n"; } #Output to STDOUT } # Print summary, if requested, to STDERR if ($opts->get('destination')) { print STDERR $status; print STDERR summary($opts->get('destination'), $deststat); } # == Helpers == sub summary { my ($destination, $prior_stat) = @_; my $new_stat = (stat($destination))[9] || 0; return ($new_stat && ($prior_stat != $new_stat)) ? "\nWrote $destination\n" : "\nError! Did not write file $destination\n"; } sub process_local { my ($req_address, $req_port, $req_route, $req_message, $req_sock) = @_; #daemon is running, reconnect and feed in request $req_sock = IO::Socket::INET->new( PeerAddr => $req_address, PeerPort => $req_port, Proto => 'tcp', ) unless $req_sock; if (!$req_sock) { print STDERR "Fatal:perl:socket-create Could not create socket: $!\n"; exit 1; } my $req_message_length = length($req_message); $req_route = "$req_address:$req_port" unless $req_route; my $payload = <<"PAYLOADEND"; POST $route HTTP/1.0 Host: $address:$req_port User-Agent: latexmlc Content-Type: application/x-www-form-urlencoded Content-Length: $req_message_length $req_message PAYLOADEND $req_sock->send($payload); my $response_string = q{}; { local $/ = undef; $response_string = <$req_sock>; } close($req_sock); return ($response_string ? HTTP::Response->parse($response_string) : HTTP::Response->new(500, 'Internal Server Error')); } sub process_remote { my ($req_address, $req_port, $req_route, $req_message, $req_sock) = @_; $req_sock->close if $req_sock; # No need of the socket here, using LWP instead my $payload = HTTP::Request->new(POST => "http://$req_address:$req_port$req_route"); $payload->header('User-Agent', 'latexmlc'); $payload->header('Content-Type', 'application/x-www-form-urlencoded'); $payload->content($req_message); require LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->timeout(10); return $ua->request($payload); } #********************************************************************** __END__ =head1 NAME C - A omni-executable for LaTeXML, capable of stand-alone socket-server and (soon) web service conversion. =head1 SYNOPSYS See the OPTIONS section in L for usage information. Also consult latexmlc --help =head1 DESCRIPTION L provides a client which automatically sets up a LaTeXML local server if necessary (via L). If such server already exists, the client proceeds to communicate normally. A stand-alone conversion (the default) can also be requested via --timeout=-1 =head1 SEE ALSO L, L, L =cut ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/bin/latexmlfind�����������������������������������������������������������������������0000755�0001750�0001750�00000021361�12507513572�016115� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl -w # /=====================================================================\ # # | latexmlfind | # # | xml search utility program | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # use strict; use warnings; use FindBin; use lib "$FindBin::RealBin/../lib"; use Getopt::Long qw(:config no_ignore_case); use Pod::Usage; use LaTeXML::Common::XML; use Text::Wrap; use LaTeXML; # Currently, just for version information. #********************************************************************** # Parse command line my ($verbosity) = (0); my ($help, $showversion, $SKELETON) = (0, 0, 0); my (@symbols, @unknowns, @posfuncs, @labels, @refnums); GetOptions("symbol=s" => \@symbols,, "unknown=s" => \@unknowns, "possiblefunction=s" => \@posfuncs, "label=s" => \@labels, "refnum=s" => \@refnums, "skeleton" => \$SKELETON, "quiet" => sub { $verbosity--; }, "verbose" => sub { $verbosity++; }, "VERSION" => \$showversion, "help" => \$help, ) or pod2usage(-message => $LaTeXML::IDENTITY, -exitval => 1, -verbose => 0, -output => \*STDERR); pod2usage(-message => $LaTeXML::IDENTITY, -exitval => 1, -verbose => 2, -output => \*STDOUT) if $help; if ($showversion) { print STDERR "$LaTeXML::IDENTITY\n"; exit(1); } pod2usage(-message => "$LaTeXML::IDENTITY\nMissing input TeX file", -exitval => 1, -verbose => 0, -output => \*STDERR) unless @ARGV; my $source = $ARGV[0]; #********************************************************************** # Do the processing. print STDERR "$LaTeXML::IDENTITY\n" if $verbosity > -1; binmode(STDOUT, ":encoding(UTF-8)"); # Make sure output can handle UTF8 my $DOC = LaTeXML::Common::XML::Parser->new()->parseFile($source); my $XPATH = LaTeXML::Common::XML::XPath->new(ltx => "http://dlmf.nist.gov/LaTeXML"); $XPATH->registerFunction('match-font', \&LaTeXML::Common::Font::match_font); # Objects being labelled sections, equations, etc. our %OBJECTS = (); our $ROOT_OBJECT = { type => 'Document', subobjects => [], items => [] }; foreach my $symbol (@symbols) { collect_matches("Symbols \"$symbol\"", "//ltx:Math[descendant::ltx:XMTok[\@name='$symbol' or text()='$symbol']" # BUT which isn't in presentation branch of an XMDual!! # ie. no ancestor w/preceding sibling has parent = XMDual . "[not(ancestor-or-self::*[preceding-sibling::*][parent::ltx:XMDual])]" . "]"); } foreach my $spec (@unknowns) { my $symbol = $spec; my $font; $font = $1 if $symbol =~ s/\{([\w\s]*)\}$//; collect_matches("Unknown \"$spec\"", # Find Math containing an XMTok, with role=UNKNOWN "//ltx:Math[descendant::ltx:XMTok[\@role='UNKNOWN']" # whose name or content is the requested symbol . "[\@name='$symbol' or text()='$symbol']" . ($font ? "[\@font='$font']" : '') # BUT which isn't in presentation branch of an XMDual!! # ie. no ancestor w/preceding sibling has parent = XMDual . "[not(ancestor-or-self::*[preceding-sibling::*][parent::ltx:XMDual])]" . "]"); } foreach my $symbol (@posfuncs) { collect_matches("Possible function \"$symbol\"", "//ltx:Math[descendant::ltx:XMTok[\@possibleFunction='yes']" . "[\@name='$symbol' or text()='$symbol']" . "]"); } show_matches($ROOT_OBJECT); #********************************************************************** # This matches fonts when both are converted to strings (toString), # such as when they are set as attributes. sub match_font { my ($font1, $font2) = @_; #print STDERR "Match font \"".($font1 || 'none')."\" to \"".($font2||'none')."\"\n"; return 0 unless $font1 && $font2; my @comp1; my @comp2; if ($font1 =~ /^Font\[(.*)\]$/) { @comp1 = split(',', $1); } if ($font2 =~ /^Font\[(.*)\]$/) { @comp2 = split(',', $1); } while (@comp1) { my $c1 = shift @comp1; my $c2 = shift @comp2; return 0 if ($c1 ne '*') && ($c2 ne '*') && ($c1 ne $c2); } return 1; } #********************************************************************** sub collect_matches { my ($description, $xpath) = @_; my @nodes = $XPATH->findnodes($xpath, $DOC); print "Query $description appears in " . scalar(@nodes) . " places\n"; print " [XPath = \"$xpath\"]\n" if $verbosity > 0; foreach my $node (@nodes) { my $object = id_object($node); push(@{ $$object{items} }, $node); } return; } sub id_object { my ($node) = @_; my $id; while (1) { $node = $node->parentNode; return $ROOT_OBJECT if $node->nodeType != XML_ELEMENT_NODE; last if $id = $node->getAttribute('xml:id'); } if (my $object = $OBJECTS{$id}) { return $object; } else { my $parent_object = id_object($node); my $type = $node->localname; my $labels = $node->getAttribute('labels'); my $refnum = $node->getAttribute('refnum'); my ($title) = $XPATH->findnodes("child::ltx:toctitle | child::ltx:title", $node); my $desc = ($refnum ? ($title ? "$refnum. " . $title->textContent : $refnum) : ($title ? $title->textContent : '')); $desc =~ s/\s+/ /g; $OBJECTS{$id} = $object = { id => $id, labels => $labels, type => $type, description => $desc, children => [], items => [] }; push(@{ $$parent_object{children} }, $object); return $object; } } sub show_matches { my ($object, $indent) = @_; $indent = '' unless defined $indent; print $indent. "$$object{type}:" . ($$object{id} ? " ID=$$object{id}" : '') . ($$object{labels} ? " Labels=$$object{labels}" : '') . ($$object{description} ? " \"$$object{description}\"" : "") . "\n"; if (!$SKELETON) { foreach my $item (@{ $$object{items} }) { show_node($item, $indent . ' | '); } } foreach my $child (@{ $$object{children} }) { show_matches($child, $indent . ' | '); } return; } sub show_node { my ($node, $indent) = @_; if ($node->localname eq 'Math') { my $ptex = $node->getAttribute('tex'); my $ctex = $node->getAttribute('content-tex'); if ($verbosity > 1) { print $indent. $node->toString . "\n"; } else { print wrap($indent, $indent, $ctex || $ptex) . "\n"; } } else { print $indent. $node->toString . "\n"; } return; } #********************************************************************** __END__ =head1 NAME C finds interesting things in LaTeXML generated XML. =head1 SYNOPSIS latexmlfind [options] xmlfile Options: --symbol=symbol finds equations where the symbol appears. --unknown=symbol finds equations where the unknown symbol appears (ie role=UNKNOWN). --possiblefunction=symbol finds equations where symbol is possibly used as a function. --label=symbol finds objects with the given label. --refnum=symbol finds objects with the given refnum (reference number). --quiet suppress messages (can repeat) --verbose more informative output (can repeat) --VERSION show version number. --help shows help message. =head1 OPTIONS AND ARGUMENTS latexmlfind is useful for finding objects within an XML file generated by LaTeXML. =over 4 =item B<--output=>I Specifies the output file; by default the XML is written to stdout. =item B<--unknown=>I Finds equations where the unknown symbol appears. =item B<--possiblefunction=>I Finds equations where symbol is possibly used as a function. =item B<--label=>IÈ-·=È­·>À¸q\r¸ì‡#•}?—^zHYÙfÖ­ß@aa!M­^ ϸó(º-Z¼€x¼È&Æ’’búõmÕï†ëÏAŠÞÍBˆ¨Ë4ç`®ŸÅQº®ƒ@ãçËC0Z¿é*žMX´¿;7HVOKÀ-`„ÿ•Q 1ÇU·• d(²#CGyÕ1 åwÓ“¡s• nïýDp!tŒÔîþ $˜pƒáôfzóÔ€ …u:[]°•)@VÏý(?`¥óÉŽ •ç’óáiYröY'ñÑGŸqã_†sܱGðå—Ó¹ø’H&Uͼcnư¼¹¨·µý“€d2ÉØWßãé§þΞ{tãᇟwÉïÈ^{¤Êï.ãÖ…Ë.ʸqˆÅb´(mæ*¤•üôÓ W½y{À„-fu™Ú­Äx¼V-Óÿð]F^zéIç)…쿈s ÒYÔ/˜ ¦ÆcD¦‘aÚ—±Þ´Ë#›¦å¡åÛÁÖ>^]Ÿ!ÚiŽ‘› k  Ù;F¾®ýü‚´ö‘ÈÎùÈÆ °ÊyMvmÛ†#.¥°°gŸc?ûÙ³çóÒKãX±bµ)œ@—:Û+*l2|æÙWzîÉÄâ·ß>‚wß›ÄôoD—’Du5RB"‘a, ˜LѪ)=eË‘¨NÆÜGK²TÒˆ”M$´jÕ‚Q÷ÿóÏ?]ê$Íã:’D2EÅö Ö¯/cÆ̻>þK·½Ô©iB‹‹‹9¤3»PÛîÚH;b@ÇGÏ9ãèÁ8ݧ!æ.vò‘K‘²$CU2ÏQ0º [БÁ±ZÞZÀ÷cߎ‘رŽQÖº– ämPMÖ·•%¼¾«øëÓ.k¢ÿá‡ðõ´ÿðO¾ÀçSþÃi§Çè¾Ê¤ISRòÖ[ïS¾ù@ðñ¤©èºÎœ¹ æÏ_ÂÐóNaÚ×3X´p)?úœ“O:š·ÇO`ùO+Yµj»t`êÔéö=¼ôò[\~Ù0ÆŒoË?ùÓ¯èÔ©o½5àè£ТEs,XjNýiÌwÿý‘9s „ qãFŒû6cÆŒcÊ”iÔkÿ´Žž&šVà´cZ䨵K“ø±ÇíþæÁ÷Øc¶…€hÚEîA†' ÎÃ%Ÿ/;KcDªÜ2Á ׯ™.(ù‚#‚ò¸Óé#Üil~]ðËšù~òÂÞK¾võíÞrТÑÌÎ…@PTTHeU5o¿ý!/¿ô5à䓆0zôk Ø—É“§rò‰GpüñG¸dëß¿7ýûb_u=vsÉÒ¥KG®½ö"E6£èUWýI•  2!Cy»`Ï=wc¯½vsɾ߾{…ÜŸ·•ûÕ4–/_‹:‰DÉd‚T²Šd²ÚùLUѰ¡Vrܱ=>\ºtÕÁ6”­‚ˆs®yˆY•‚©qðŒrÌdXóà™0e¾Á¬^¸Ìh&2ôYê–c¤¬˜”w:aí_›ú–BúÊyõÁû˜Ë'~üßößoo’IM›Êiذ! hÔ¨Ê6"uÉæÍ[Ðb1Áød Ò3Ÿü^¹Ò;BYé¶}·l^G¯Më–¬\•dKùVª1ß5ÐŒåK âsLãŸ}¾dï/¾([ DQ¦¹‰_aT¼ b^£Fʬünz2L£ÌY!ÂŽ2RÝ5\ÁÎtŒj>^èÊAM#¯Þæç’~µu>¤ìb¥¨ø¦#”Ï?ÿS§NgüøÑÄ8úèüøãúôéÉ÷ßÏdðàÃÐ4~ýfÛöí¾gŸÖ.¤‘ßùÊ£ Y³"C¯Í²/ë—5Ó¦ukZ´(%•JÙ¿é Û[®xšø™Kö‰'sÞyÃI¥t„Ð(,*dذ+‰ÇãY×µ.¥½bMDˆ9uíê|˜`Þ!Òîˆàƒ”2pLÅ®:ÇÈì<ÇHјŒdˆ?‘ =¦=%ióW†úÎÚùðÔgé¸òy_5O+R¸2à¶Ûr]C 8à€}è{h/¦M›ÁÒ¥ËmÙÆ¼ò6´?ݺv¢u« `É’Ÿùê«ï¨Ø^aëíüù‹ùþûYÌ™³€uë7Ø¿¹}{ß~ûÕÕ @аazì·Ç{<úœSoRòåWÓiÙ²”Þ‡ô$×Ì)‚Þ½{f]×J/UDˆ¹‰Ö–yl}Ô™ Ö>Y…Ýè‘ fFÁÏ;ß1By¤Ád˜•c  ¸îÇLçcPMí¯oZßF÷g¸ãlÔŸUÂy'½ï½³&†›ˆ•n= ¿¨¸˜‹.:›'žx„`ëÖm”—ÿB»¶»Øe~øa<ø4EE…Üq×Ã|ùÕ·¬X±šªªj-^FYÙ&[Ž'žø'ßLÿžÁGži”™rp(E……|<ésðñÇŸ1äèAÊ»É>ÛºV"k”cèa›ÇÖ‡ÔBÈ02‡)Ix7r=ëj¡d¨üŒ×P{ÊÄb±HÓA*«e$Bu!'#·Õõ“»/Oþ@€žâ¯o¥Œ’+ÔÑt¿ƒ¶K;>èüλï|Äý£žæþžbõêµÜ?êiFzŠÉŸ~å’ÿCOç‹©ß°lÙ ÆŽϰóNS»`Ÿ}ºóзR\\ÄûïÃoü €ûÒ¬yN:q{íÕ! •Jqî¹§rÕŸÿDË–¥,\¸ÔU‡#®»ŒGy)%Ÿ}>ƒúºêÃgò¨kM‹Æs.£’kcDž—Ì£XµéΚ ‹æ^dF.9FéuÁÎâÉ“É1Rï'/»L…Dk@†*/zßYïÕ³l¬~B}öÝ“ú0 7nÄÀþÆq·Ý:«Yiа„ /<‹ûG=IYÙ&:tlë’cñâeœá–,YŽ‚5kÖ‡ö€Åb1:vlZ·jIyù—ìG>€oº—£Žêï®Îe[×j`_dˆr R:“wÛ1¢Ld¨ä Ræ@2TeËð"FHµ× œ MA­ãLº`ŸÍÚ1RJyÓv–šé‚ÿ~|•ðÔ7îú&C¯.]º`¾EÙèG€>øôC:²uïÞ…>}zrhŸ^4jÔ>‡ö¤Ï¡½èÜ©½»págñãsjn¬ü,ïð ƒõ夓†Ð½{Wû»x5kÖñÎ;QQQA÷îÝ8öØ#ˆÇc¬[WÆÛoO`Ë–_ØP¶‘ëF\ÆGμy y`Ôm|2y*o¿=»ï¾‘Ší<óìËì½Wwf|?“_~ÙÊí·dôèWX¿a×_?œÝºuà?Ó¾cúôhZŒÿp4]ºtâ©§_`Þ¼EôêµS§~ÃQG àì³OaêÔ¯™;w!ÏCçιþúá¡u‹9Ê¢ƒ`l£’Þ ü»æd(œt-ÈЖA‘Í'Üšˆ&áBÕ3mdÐ…‘a€—ogI§ ^~ºLq&/ß"ÿ©º!¬ž=u™– ­ú®Kõ‰g«ÎYáÊêþY¯Sl|Ñcß=˜3{×]wƒÆõ×_Η_Mçã?#GÞIÿþ‡pÓMWÑ­kg–,]ÎE:‡)S¦pä‘ýY¹b5Û+èØ±=»ïÖ•qoÀ?î»™öß— .¼šë®¿œcÌÿûç«,Z¼ŒG}Žk¯½ŒóÎ=‹.†¿qãþÅÞ{uçÉ'ïã¾û#™LÑÿð>ìµWw.¹d˜C†YÔuDˆ9kÚl !«M×A(|dˆ/mØ»02Ló"¦í qҚЬy@êJ5TØ‘ò.†³Ïétº ’c:2tŽítF2Ä÷œmB¶/•½cdý\~Ž! ÷VÏA]Œ5p>ü+Ïd§þ^$ÿð|k>F**ªhب!Ê62gîBöØ£ýúõæ½}Lyy9ßÿ0‹}öÙ€+†ŸÏᇢüX°óÔ»÷³o=Ù}÷®4mÚ„}öރٳ烼ûîD’‰>ø/¼ø«V­eÙ²ŸJJJ8¸÷—ЮÝ.ü¼|E€n‡×u41×D0.%%;2t?åzV® Þ’ó’)¥‚Ó÷ve«PÈQAÌe-ÙæAèx!Ô\”2v®@]PJèBž¦×[ázö2yLJ²ªoBœéÉç­?ûœç”Òy6öE‚œ%ïð‰õ•±íÔ_|Í5×\Lƒ’¤”躎¦Å¨¬¨¤iÓÆ#„ •Ò)(ˆ“L¦Ðõ………$I[¦7¹n¡AƒÀˆølРaëÒØC½I“Fô:èn¸áJÀX‘¦° ””cm#WPP€®n`³‚êÚBÔBÌ=8ëÚÙLØ­G Ô‡ÿ»ÏØ÷ã~¡­{'‚Ò¡˜.„’¡ K2®tcä'Cÿ9«²Ú&’uHIDATwQwå«|éàI=kç#m€ŠpUÐŽ@±Ò–ªI ‰D]ש¨¬báÂ%Œ¼ánúö;˜cŽDÆ 8¬_o&ýû ***˜øÑ§œrò±””3p`_&~ô)º®sûí£X±b5A§NíùòËéLžü%Ê6’L$@B"™$•J¡KI"‘@Oé¤R:Õ‰RJRÉ'x,S¦ü‡ 6²m[W_}3)]§ºÚÈ“L˜×Ð¥¹±0´iÓŠM›Ë=z,••U¡u­¶#BÌ1gæl°Q±ó ”·$ܨäÒQÀý!FLMМ­hoÔ%¿²Ú±ŽÞ眅c”Ñ(:FŠÉÏŸ˜[‡5¡é5v>ÒÔ·¨?·’-ºAYÙ&fΚËAíϳώáûïg1üŠó¹õ–kmÑü^ôTŠçŸ…¿Ý|-ýúõ>r71Mã‘Gžã„ަk·Î°j¶WTpÇí#ù×û“X½z-[¶ü–,ù‰3fÒ¾C[fϙϿÿ=…úòùçÿa—][óòËÿÇë¯¿ËØ1oqë­×SPPÀ«¯Žç Îრ“øê«éôï߇O>™‚@pÙeçóŸÿ|K³fÍ(..­kuU¤¨Ë4çž Ó+·+Gzòx_4‚ÈÐõ­’ö“¡š'³¬ŠñÃˈĀ”ÿëúˆlt¡&ŽQÀµ3:F~]ðHæÑ…t²†‘»WÖ¼,Û¸)±xñ²ÕF !Ñ‘R¨óà”pb‰DSñ¥D×% ¥.u‘L$I&SbÛ¶­ wß½sÃ0[á:ëZÔ[†è‡C±­Z•r¹C„ñµG€‚‚Ç$Š" ÄãÆùãO8Ê¥/Ý»wãŽ;Fºí‡€‘#¯´óY]¢ûï·K·Ûîº Wýù"—ì^p¶ë~ èkËÚ­[gþú×k•* Óm!æŒíóŒÈú/4-šv‘Òû\këýªñ«Àt6ºNÖàûÉ“µL]Û˜ÝpÃO/¶·Äü,Äè­‹ƒMŽ^Ò”Ï8»çž›Î=ôОç‘¡¯Î3’¡™Í¦ä4º`Qó¸ÓéƒêÔKídG/ ÝVÍODˆ¿?<Ýi<çlŒŠWQC  [1wvðŒG@[™…°2ê AÍÈP–±se"ìuÁ'ÃìuAEÍC´tÚêéHšiãÖcæ9 7é©ðbÓ4-•¡˜T(œ¥Ûì|ÒÉ4|’™`ÈL†^½ÍBvŸÞ*eì\™t;à~ðˆbŽA(ëV*çW¬p£’ ¬Ý:.k„pHeÏŸ¬ÉÐûœC †ª îtvº Èà\$€0³Õ[OïHEÈz¦:î!Í<'q¢õgA=g•‰Å4-•=ZÿÂR§PB1Ï„¦CtÁ”ÃK†~]ø•=`Yëv:Yû‰¦]ä0Tã~EÍ·1"sbÔešÊ#Á™ É1Œ +G/D·MD„˜s2X±Ò•:<¨ÌDÈy<hóG+ÌV¢E„®¯p“#„߸J¤öü\-æžèŸV?¤´sI»×Öûs†(u/xÆ­§µsô D„˜kAiŒŠ[ã‚)‚gÒ)sÔcj $2±ZáÛ6ȯ Ïv˜cTs]—5Ô1RÊ(Sq¬ÏÀ?)eÎB[2*ÏÖÿ üÇxΫ-Dͼ^Êú:{B1ZÆ‚RÑU_”rºà½jF2Ì‘à™àû‰ºLs"ÁÈFAœ´q.3¦Uf—òºÓ¡/¢ûG²Ræ– ëvðL}Ñ1fISÜŠHµæ4 ÃQ #÷o¡ì¥ˆÀ f—ÒX Ç͵hn Œ;N†v!KÙ}¥|dè©Ó¬Æ6”Z2"Äß/†í Û 9¹Õ¬ædèUæòËž ]¬˜½2gaµX̪ƒ´u¡5°3`Üwpç«c”gcˆ.(ݨP³…(\è»sè!R?4Æh2Fá’&ß‚g<ÖÔu?BD-ÄÜ…OAœ´q”Ù¨Ôµ1"-ŠªÉ€¯82ô>ç £R#2Ü1ºàd ~ôùLˆà8ÅÙ" +ÝøKSïÁú!ÍáÄpBÉëà™4º !æ¤ V#éyøv–0!K2üíLj"dá1|ÆQ˜ÁPuÁÞ±ŽQvºà&`¥\f]ˆ–ñËAX}úõCzÅèt_šÍ:U׃g2ë¶µs&%ÖàoÚΕI™=È£u+w„²%õÃ12; ¬é 9@Jé ¶2Ž‚ßAi.Ýf¥ƒZtÞtȰVÁ3îßÍÎÑ«nË* ê2ÍyÔƒ1"BD/ ”z©?Á3MÕ ïg@3¬¹ë\XŠ4—nóç!ðÙŒ;N†vA)ãf'{ÈðWévDˆ9g-³º3Fôë  2í""F7¬‡ìvZ§n;F*Ò´#ý0!5p–2t^ºŸ½Ìd¬Ë…:Fµ!Ãß&x&L·Õ†ˆxçZÙ•º<†X,f…’«Æ¯¾>'òX¬ N:7£šÏ„!¥0VdIáèFÔ}êÀ®‡š9KÊÔ‹Pýp'~÷à™ÝÒåÚ붃ˆs R!ëÑ[ 4Í"Ãç1þ¿‡cTóà»"uÖ$pÝüιëXm±XçŒâ#" 9íY“ßÔ Üù¼ä‘ž`ðëL †Û…ß:xÆ?F®h 1÷Bù;F¤"Ó¼]cõÝø¹êáùçǼøÞ{?­¬¬*J$ñD"Y¨ë)-‘HH‰¦ë:º®k …I$¡RèfðE*¥[ýfH=%Ýù@JÝwN·¢ÚuNØÇóÓ̇n&tç¤}]aD=åä)„H麞BK–——¯ª1Z‰QKÑÚ2Lez­´=)?ÀÙvˆJ%ÿ÷ÁvÀUá»fó`Ãe÷Ø’,~×U²pôÔ…ä#Büýa½Ô: ¯X±jS£F —I)…”Ó~hæJcí%aÏ%®Í\”Ë ãÚÊ×Ҋµ¶~Ù^àÐ.ãñäìkË Ý¹¾©¤ºl°RÖx7%BØ-ÃD"¹cŸ8Ëøé¸ _}„eøôÏ>ûr9°(Âxg‹0Ö³,ÀYÊ+Ý^z¹û^Í?kÏ@õ¯>ëB䆲U‹-Y¦ëFƒO"Íe=¤0| ÇVHãå—R×E*¥£ë:•UU"‘HˆíÛ*ÏíÛiÕª´©€âìÈÐå‰gI†¹1o!"ÄÜ€íΟþŸ/Æx.@±y\ŒaðTh!Àåû]f´¼­?Ëð%*Ò!ÖWC¨Ö•ÀÙÏrÀ½Æ%ì%Ì㈔—`ÀO†u1x&LVuÝ܈¨]‰`(·Õ b image/svg+xml latexmlpost latex latexml OtherXMLDocument Mouth Gullet Stomach Output Intestines Rewriter Serializer Postprocessor Tokens ExpandedTokens Boxes Characters DOM DOM Document DVIDocument XMLDocument HTML+ImagesDocument XHTML+MathMLDocument latexml-0.8.1/doc/manual/genpods0000755000175000017500000001512312507513572016516 0ustar norbertnorbert#!/usr/bin/perl -w use strict; use warnings; use FindBin; use lib "$FindBin::RealBin/../blib/lib"; use Getopt::Long qw(:config no_ignore_case); use Pod::Usage; use Pod::LaTeX; #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Prepare LaTeX from various executable's and module's PODs # These go into appendices of the manual #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% my $WORKDIR = $FindBin::RealBin; my $SRCDIR = $WORKDIR . "/../.."; my $GENDIR = "$WORKDIR/pods"; my $identity = "genpods (part of LaTeXML)"; my ($force, $help) = (0, 0); GetOptions("force!" => \$force, "help" => \$help, ) or pod2usage(-message => $identity, -exitval => 1, -verbose => 0, -output => \*STDERR); pod2usage(-message => $identity, -exitval => 1, -verbose => 2, -output => \*STDOUT) if $help; #====================================================================== my @exes = (qw(latexml latexmlpost latexmlmath)); # would be nice to automatically discover documentable modules # Actually, we Can, by looking in ../blib/man1 and ../blib/man3 # However, we'd still need to manually add them (in desired order) to the appropriate Appendx! my @modules = ( # Core modules qw(LaTeXML LaTeXML::Global LaTeXML::Common::Config LaTeXML::Common::Object LaTeXML::Common::Color LaTeXML::Common::Color::rgb LaTeXML::Common::Color::hsb LaTeXML::Common::Color::cmy LaTeXML::Common::Color::cmyk LaTeXML::Common::Color::gray LaTeXML::Common::Color::Derived LaTeXML::Common::Number LaTeXML::Common::Float LaTeXML::Common::Dimension LaTeXML::Common::Glue LaTeXML::Common::Font LaTeXML::Common::Model LaTeXML::Common::Model::DTD LaTeXML::Common::Model::RelaxNG LaTeXML::Common::XML LaTeXML::Common::Error LaTeXML::Package LaTeXML::Core::State LaTeXML::Core::Mouth LaTeXML::Core::Gullet LaTeXML::Core::Stomach LaTeXML::Core::Document LaTeXML::Core::Rewrite LaTeXML::Core::Token LaTeXML::Core::Tokens LaTeXML::Core::Box LaTeXML::Core::List LaTeXML::Core::Comment LaTeXML::Core::Whatsit LaTeXML::Core::Alignment LaTeXML::Core::KeyVals LaTeXML::Core::MuDimension LaTeXML::Core::MuGlue LaTeXML::Core::Pair LaTeXML::Core::PairList LaTeXML::Core::Rewrite LaTeXML::Core::Definition LaTeXML::Core::Definition::Expandable LaTeXML::Core::Definition::Conditional LaTeXML::Core::Definition::Primitive LaTeXML::Core::Definition::Register LaTeXML::Core::Definition::CharDef LaTeXML::Core::Definition::Constructor LaTeXML::Core::Parameter LaTeXML::Core::Parameters LaTeXML::MathParser LaTeXML::Pre::BibTeX ), # Utility qw(LaTeXML::Util::Pathname LaTeXML::Util::WWW LaTeXML::Util::ObjectDB LaTeXML::Util::Pack ), # Postprocessing qw(LaTeXML::Post LaTeXML::Post::MathML LaTeXML::Post::OpenMath ), ); if (!-d $GENDIR) { mkdir($GENDIR) or die "Couldn't create directory for pods: $!"; } foreach my $name (@exes) { my $src = "$SRCDIR/bin/$name"; my $dest = "$GENDIR/$name.tex"; if ($force || (!-f $dest) || (-M $src < -M $dest)) { print "Converting POD for $name to LaTeX\n"; local $::PODDOC = $name; # $::PODDOC =~ s/::/_/g; my $podconverter = MyPodConverter->new(); $podconverter->parse_from_file($src, $dest); } } foreach my $name (@modules) { my $src = $name; $src =~ s|::|/|g; $src = "$SRCDIR/lib/$src.pm"; my $dest = $name; $dest =~ s|::|_|g; $dest = "$GENDIR/$dest.tex"; if ($force || (!-f $dest) || (-M $src < -M $dest)) { print "Converting POD for $name to LaTeX\n"; local $::PODDOC = $name; # $::PODDOC =~ s/::/_/g; my $podconverter = MyPodConverter->new(); $podconverter->parse_from_file($src, $dest); } } #====================================================================== package MyPodConverter; use base qw(Pod::LaTeX); sub new { my ($class, @args) = @_; my $self = $class->SUPER::new(@args); $self->Head1Level(1); $self->LevelNoNum(2); # $self->LevelNoNum(1); $self->ReplaceNAMEwithSection(1); $self->AddPreamble(0); $self->AddPostamble(0); $self->select('!AUTHOR|COPYRIGHT'); return $self; } our %titles; our %ignore; BEGIN { %titles = ("SYNOPSIS" => "Synopsis", "OPTIONS AND ARGUMENTS" => "Options \\& Arguments", "DESCRIPTION" => "Description", "SEE ALSO" => "See also", "METHODS" => "Methods", ); } # Redefined to beautify POD headings sub head { my ($self, $level, $title, $parobj) = @_; my $newtitle = $titles{$title} || $title; return $self->SUPER::head($level, $newtitle, $parobj); } # Redefined to translate links to our PODs sub interior_sequence { my ($self, $seq_command, $seq_argument, $pod_seq) = @_; if (($seq_command eq 'L') && ($seq_argument =~ /^(?:LaTeXML|latexml|\/)/)) { # A reference to somewhere within our own docs. my $text; my $label = $seq_argument; if ($seq_argument =~ /^(.*)\|(.*)$/) { # Separate the text to use, if given $text = $1; $label = $2; } if ($label =~ /^\//) { # reference to section within THIS pod? $label = $::PODDOC . $label; } return ($text ? "\\pod[$text]{$label}" : "\\pod{$label}"); } elsif ($seq_command eq 'X') { return "\\index{$seq_argument\@{\\ttfamily $seq_argument}}"; } else { return $self->SUPER::interior_sequence($seq_command, $seq_argument, $pod_seq); } } # Redefine to better sort indices. # ALL packages start with LaTeXML:: !!! sub _create_index { my $string = $_[0]->SUPER::_create_index($_[1]); my $reftype = "command"; my ($name, @rest) = split('!', $string); if ($name =~ /^LaTeXML/) { $reftype = "module"; my @comp = split('::', $name); $name = $comp[-1] . (@comp > 1 ? '(' . join('::', @comp[0 .. $#comp - 1]) . '::)' : ''); } return join('!', "$name\@{\\ttfamily $name}", $reftype, @rest); } # Redefined to avoid unnecessary math. sub _replace_special_chars_late { my ($self, $paragraph) = @_; $paragraph =~ s/<\s+/\\textless\\ /g; $paragraph =~ s/>\s+/\\textgreater\\ /g; $paragraph =~ s/\|\s+/\\textbar\\ /g; $paragraph =~ s//\\textgreater /g; $paragraph =~ s/\|/\\textbar /g; return $paragraph; } #====================================================================== __END__ =head1 NAME C - convert LaTeXML POD documentation for manual. =head1 SYNOPSIS genpods [options] Options: --force Force regeneration of LaTeX from POD documentation (default: only if needed) --help Shows this help. =cut latexml-0.8.1/doc/manual/genschema0000755000175000017500000000552612507513572017017 0ustar norbertnorbert#!/usr/bin/perl -w use strict; use warnings; use FindBin; use Carp; use Getopt::Long qw(:config no_ignore_case); use Pod::Usage; #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% my $WORKDIR = $FindBin::RealBin; my $SRCDIR = $WORKDIR . "/../.."; my $SCHEMADIR = "$SRCDIR/lib/LaTeXML/resources/RelaxNG"; my $identity = "genschema (part of LaTeXML)"; my ($force, $help) = (0, 0); GetOptions("force!" => \$force, "help" => \$help, ) or pod2usage(-message => $identity, -exitval => 1, -verbose => 0, -output => \*STDERR); pod2usage(-message => $identity, -exitval => 1, -verbose => 2, -output => \*STDOUT) if $help; my ($SCHEMA, $SCHEMADOC) = @ARGV; $SCHEMA = "LaTeXML.rng" unless $SCHEMA; $SCHEMADOC = "$WORKDIR/schema.tex" unless $SCHEMADOC; #====================================================================== # Prepare LaTeX describing the Document Schema # This goes into an appendix of the manual #====================================================================== #my $schemadoc = "$WORKDIR/schema.tex"; if ($force || (!-f $SCHEMADOC) || (-M $SCHEMADIR < -M $SCHEMADOC)) { print "Converting Schema in $SCHEMADIR to LaTeX\n"; my $SCHEMAOUT; open($SCHEMAOUT, '>', $SCHEMADOC) or die "Couldn't open $SCHEMADOC for output:$!"; print $SCHEMAOUT RelaxNGDocumenter::documentSchema($SCHEMA); close($SCHEMAOUT); } #====================================================================== package RelaxNGDocumenter; # Leverage the extracted Schema structure to create documentation. use strict; use lib "$FindBin::RealBin/../../blib/lib"; use LaTeXML::Global; use LaTeXML::Core; use LaTeXML::Common::Model; use LaTeXML::Common::Model::RelaxNG; sub documentSchema { my ($name) = @_; my $latexml = LaTeXML::Core->new(searchpaths => ['.'], verbosity => 1); return $latexml->withState(sub { my $model = $STATE->getModel(); $model->registerNamespace(ltx => "http://dlmf.nist.gov/LaTeXML"); $model->registerNamespace(ltx => "http://dlmf.nist.gov/LaTeXML"); $model->registerNamespace(svg => "http://www.w3.org/2000/svg"); $model->registerNamespace(xev => "http://www.w3.org/2001/xml-events"); $model->registerNamespace(xlink => "http://www.w3.org/1999/xlink"); my $relaxng = $$model{schema} = LaTeXML::Common::Model::RelaxNG->new($model, $name); ## my @schema = $relaxng->scanExternal($name); ## @schema = map { $relaxng->simplify($_) } @schema; $relaxng->loadSchema; return $relaxng->documentModules; }); } #====================================================================== __END__ =head1 NAME C - convert LaTeXML's Schema definitions to LaTeX documentation. =head1 SYNOPSIS genschema [options] Options: --force Force regeneration of LaTeX from Schema definition. (default: only if needed) --help Shows this help. =cut latexml-0.8.1/doc/manual/manual.tex0000644000175000017500000036304512507513572017141 0ustar norbertnorbert\documentclass{book} \tolerance 500% \emergencystretch 1em\relax \hfuzz .2pt\relax \usepackage{latexml} \usepackage{../sty/latexmlman} \usepackage{times} \usepackage{makeidx} \usepackage{listings} \usepackage{tocbibind} \usepackage{wrapfig} % Should the additional keywords be indexed? \lstdefinestyle{shell}{language=bash,escapechar=@,basicstyle=\ttfamily\iflatexml\else\small\fi,% morekeywords={latexml,latexmlpost,latexmlmath}, moredelim=[is][\itshape]{\%}{\%}} \lstdefinestyle{latexml}{basicstyle=\ttfamily\iflatexml\else\small\fi,language=Perl,% morekeywords={CC_ESCAPE,CC_BEGIN,CC_END,CC_MATH,CC_ALIGN,CC_EOL,CC_PARAM,CC_SUPER,% CC_SUB,CC_IGNORE,CC_SPACE,CC_LETTER,CC_OTHER,CC_ACTIVE,CC_COMMENT,CC_INVALID,CC_CS,CC_NOTEXPANDED,% T_BEGIN,T_END,T_MATH,T_ALIGN,T_PARAM,T_SUB,T_SUPER,T_SPACE,T_LETTER,T_OTHER,T_ACTIVE,T_COMMENT,T_CS,T_CR,% Token,Tokens,Tokenize,TokenizeInternal,Explode,UnTeX,StartSemiverbatim,EndSemiverbatim,% Number,Float,Dimension,MuDimension,Glue,MuGlue,Pair,PairList,% NoteProgress,NoteBegin,NoteEnd,Fatal,Error,Warn,Info,% Stringify,ToString,Equals,% DefExpandable,DefMacro,DefMacroI,DefPrimitive,DefPrimitiveI,DefRegister,DefRegisterI,% DefConstructor,DefConstructorI,dualize_arglist, DefMath,DefMathI,DefEnvironment,DefEnvironmentI,convertLaTeXArgs,% RequirePackage,LoadClass,FindFile,DeclareOption,PassOptions,ProcessOptions,ExecuteOptions,AddToMacro,% NewCounter,CounterValue,StepCounter,RefStepCounter,RefStepID,ResetCounter,GenerateID,% Tag,DocType,RelaxNGSchema,RegisterNamespace,% DefRewrite,DefMathRewrite,DefLigature,DefMathLigature,% Expand,Invocation,Digest,RawTeX,Let,% ReadParameters,DefParameterType,DefColumnType,% LookupValue,AssignValue,PushValue,PopValue,UnshiftValue,ShiftValue,LookupCatcode,AssignCatcode,% LookupMeaning,LookupDefinition,InstallDefinition,% CleanLabel,CleanIndexKey,CleanBibKey,CleanURL,UTF,roman,Roman,% MergeFont,% CheckOptions}, } \lstdefinestyle{xml}{basicstyle=\small\sffamily,language=xml,escapechar=@}% \newcommand{\shellcode}{\lstinline[style=shell]} \newcommand{\ltxcode}{\lstinline[style=latexml]} \newcommand{\perlcode}{\lstinline[style=Perl]} \newcommand{\xmlcode}{\lstinline[style=xml]} \input{release.tex} \makeindex \title{\LaTeXML\ \emph{The Manual}} \subtitle{A \LaTeX\ to \XML/\HTML/\MathML\ Converter;\\ Version \emph{\CurrentVersion}} \author{Bruce R.~Miller} \lxKeywords{LaTeXML, LaTeX to XML, LaTeX to HTML, LaTeX to MathML, LaTeX to ePub, converter} %============================================================ \begin{lxNavbar} \lxRef{top}{\includegraphics{../graphics/latexml}}\\ \includegraphics{../graphics/mascot}\\ \lxContextTOC \end{lxNavbar} %============================================================ \begin{document}\label{top} \frontmatter \iflatexml \maketitle \else \begin{titlepage} \parindent=0em \makeatletter \null\vfil \hbox to \textwidth{% \hfill \hbox to 0.5in { \includegraphics{../graphics/mascot.png} } % \kern -2in % \hskip -1.5in \vbox{ \null \centering {\LARGE \@title \par}% \vskip 1.0em% {\large \@subtitle \par}% \vskip 3em% {\large \@author \par} \vskip 1.5em% {\large \@date \par}% % Set date in \large size. \vskip 3em% \null } \hfill } \par\vfil\null \makeatother \end{titlepage} \fi \tableofcontents \listoffigures \mainmatter %%%====================================================================== % \part{Basics} %%%====================================================================== \chapter{Introduction}\label{intro} For many, \LaTeX\ is the prefered format for document authoring, particularly those involving significant mathematical content and where quality typesetting is desired. On the other hand, content-oriented \XML\ is an extremely useful representation for documents, allowing them to be used, and reused, for a variety of purposes, not least, presentation on the Web. Yet, the style and intent of \LaTeX\ markup, as compared to \XML\ markup, not to mention its programmability, presents difficulties in converting documents from the former format to the latter. Perhaps ironically, these difficulties can be particularly large for mathematical material, where there is a tendency for the markup to focus on appearance rather than meaning. The choice of \LaTeX\ for authoring, and \XML\ for delivery were natural and uncontroversial choices for the \URL[Digital Library of Mathematical Functions]{http://dlmf.nist.gov}. Faced with the need to perform this conversion and the lack of suitable tools to perform it, the DLMF project proceeded to develop thier own tool, \LaTeXML, for this purpose. %This document describes a \emph{preview} release of \LaTeXML. \paragraph{Design Goals} The idealistic goals of \LaTeXML\ are: \begin{itemize} \item Faithful emulation of \TeX's behaviour; \item Easily extensible; \item Lossless, preserving both semantic and presentation cues; \item Use an abstract \LaTeX-like, extensible, document type; \item Infer the semantics of mathematical content\\ (\emph{Good} Presentation \MathML, eventually Content \MathML\ and \OpenMath). \end{itemize} As these goals are not entirely practical, even somewhat contradictory, they are implicitly modified by \emph{as much as possible}. Completely mimicing \TeX's, and \LaTeX's, behaviour would seem to require the sneakiest modifications to \TeX, itself; redefining \LaTeX's internals does not really guarantee compatibility. ``Ease of use'' is, of course, in the eye of the beholder; this manual is an attempt to make it easier! More significantly, few documents are likely to have completely unambiguous mathematics markup; human understanding of both the topic and the surrounding text is needed to properly interpret any particular fragment. Thus, while we'll try to provide a ``turn-key'' solution that does the `Right Thing' automatically, we expect that applications requiring high semantic content will require document-specific declarations and tuning to achieve the desired result. Towards this end, we provide a variety of means to customize the processing and declare the author's intent. At the same time, especially for new documents, we encourage a more logical, content-oriented markup style, over a purely presentation-oriented style. \paragraph[Overview]{Overview of this Manual} Chapter \ref{usage} describes the usage of \LaTeXML, along with common use cases and techniques. Chapter \ref{architecture} describes the system architecture in some detail. Strategies for customization and implementation of new packages is described in Chapter \ref{customization}. The special considerations for mathematics, including details of representation and how to improve the conversion, are covered in Chapter \ref{math}. Several specialized topics are covered in the remaining chapters. An overview of outstanding issues and planned future improvements are given in Chapter \ref{todo}. Finally, the Appendices give detailed documentation the system components: Appendix \ref{commands} describes the command-line programs provided by the system; Appendix \ref{included.bindings} lists the \LaTeX\ style packages for which we've provided \LaTeXML-specific bindings. Appendices \ref{modules}, \ref{commonmodules}, \ref{coremodules}, \ref{utilitymodules}, \ref{premodules} and \ref{postmodules} describes the various Perl modules, in groups, that comprise the system. Appendix \ref{schema} describes the \XML\ schema used by \LaTeXML. Appendix \ref{errorcodes} gives an overview of the warning and error messages that \LaTeXML\ may generate. Appendix \ref{cssclasses} describes the strategy and naming conventions used for CSS styling of the resulting \HTML. Using \LaTeXML, and programming for it, can be somewhat confusing as one is dealing with several languages not normally combined, often within the same file, --- Perl, \TeX\ and \XML\ (along with \XSLT, \HTML, \CSS), plus the occasional shell programmming. To help visually distinguish different contexts in this manual we will put `programming' oriented material (Perl, \TeX) in a typewriter font, \texttt{like this}; \XML\ material will be put in a sans-serif face \textsf{like this}. \vskip 1cm\relax If you encounter difficulties, there is a support mailing list at \URL[\texttt{latexml-project}]{http://lists.jacobs-university.de/mailman/listinfo/project-latexml}. Bugs and enhancement requests can be reported at \URL[Github]{https://github.com/brucemiller/LaTeXML}. If all else fails, please consult the source code, or the author. \begin{advanced} Danger! When you see this sign, be warned that the material presented is somewhat advanced and may not make much sense until you have dabbled quite a bit in \LaTeXML's internals. Such advanced or `dangerous' material will be presented like this paragraph to make it easier to skip over. \end{advanced} %%%====================================================================== \chapter{Using \LaTeXML}\label{usage} The main commands provided by the \LaTeXML\ system are \begin{description} \item[\ltxcmd{latexml}] for converting \TeX\ and \BibTeX\ sources to \XML. \item[\ltxcmd{latexmlpost}] for various postprocessing tasks including conversion to \HTML, processing images, conversion to \MathML\ and so on. \end{description} \noindent The usage of these commands can be as simple as \begin{lstlisting}[style=shell] latexml doc.tex | latexmpost --dest=doc.html \end{lstlisting} \noindent to convert a single document into \textsc{HTML5} document, or as complicated as \begin{lstlisting}[style=shell] latexml --dest=1.xml ch1 latexml --dest=2.xml ch2 @ \hbox{}\hspace{1in}$\vdots$ @ latexml --dest=b.xml b latexml --dest=B.bib.xml B.bib latexmlpost --prescan --db=my.db --dest=1.html 1 latexmlpost --prescan --db=my.db --dest=2.html 2 @ \hbox{}\hspace{1in}$\vdots$ @ latexmlpost --prescan --db=my.db --dest=b.html b latexmlpost --noscan --db=my.db --dest=1.html 1 latexmlpost --noscan --db=my.db --dest=2.html 2 @ \hbox{}\hspace{1in}$\vdots$ @ latexmlpost --noscan --db=my.db --dest=b.html b \end{lstlisting} \noindent to convert a whole set of documents, including a bibliography, into a complete interconnected site. How best to use the commands depends, of course, on what you are trying to achieve. In the next section, we'll describe the use of \ltxcmd{latexml}, which performs the conversion to \XML. The following sections consider a sequence of successively more complicated postprocessing situations, using \ltxcmd{latexmlpost}, by which one or more \TeX\ sources can be converted into one or more web documents or a complete site. Additionally, there is a convenience command \ltxcmd{latexmlmath} for converting individual formula into various formats. %%%---------------------------------------------------------------------- \section[Conversion]{Basic \XML\ Conversion}\label{usage.conversion}\index{latexml!basic usage} The command \begin{lstlisting}[style=shell] latexml {options} --destination=%doc%.xml %doc% \end{lstlisting} converts the \TeX\ document \texttt{doc.tex}, or standard input if \texttt{-} is used in place of the filename, to \XML. It loads any required definition bindings (see below), reads, tokenizes, expands and digests the document creating an \XML\ structure. It then performs some document rewriting, parses the mathematical content and writes the result, in this case, to \texttt{doc.xml}; if no \shellcode|--destination| is suppplied, it writes the result to standard output. For details on the processing, see Chapter \ref{architecture}, and Chapter \ref{math} for more information about math parsing. \paragraph{\BibTeX\ processing}\label{usage.conversion.bibtex} If the source file has an explicit extension of \texttt{.bib}, or if the \cmd{--bibtex} option is used, the source will be treated as a \BibTeX\ database. See \ref{usage.post.bibtex} for how \BibTeX\ files are included in the final output. \begin{advanced} Note that the timing is different than with \BibTeX\ and \LaTeX. Normally, \BibTeX\ simply selects and formats a subset of the bibliographic entries according to the \texttt{.aux} file; all \TeX\ expansion and processing is carried out only when the result is included in the main \LaTeX\ document. In contrast, \cmd{latexml} processes and expands the entire bibliography, including any \TeX\ markup within it, when it is converted to XML; the selection of entries is done during postprocessing. One implication is that latexml does not know about packages included in the main document; if the bibliography uses macros defined in such packages, the packages must be explicitly specified using the \cmd{--preload} option. \end{advanced} \paragraph{Useful Options}\label{usage.conversion.options} The number and detail of progress and debugging messages printed during processing can be controlled using \begin{lstlisting}[style=shell] --verbose %or% --quiet \end{lstlisting} They can be repeated to get even more or fewer details. Directories to search (in addition to the working directory) for various files can be specified using \begin{lstlisting}[style=shell] --path={directory} \end{lstlisting} This option can be repeated. Whenever multiple sources are being used (including multiple bibliographies), the option \begin{lstlisting}[style=shell] --documentid=%id% \end{lstlisting} should be used to provide a unique ID for the document root element. This ID is used as the base for id's of the child-elements within the document, so that they are unique, as well. See the documentation for the command \ltxcmd{latexml} for less common options. \paragraph{Loading Bindings}\label{usage.conversion.loading} Although \LaTeXML\ is reasonably adept at processing \TeX\ macros, it generally benefits from having its own implementation of the macros, primitives, environments and other control sequences appearing in a document because these are what define the mapping into \XML. The \LaTeXML-analogue of a style or class file we call a \LaTeXML-binding file, or \emph{binding} for short; these files have an additional extension \texttt{.ltxml}. In fact, since style files often bypass structurally or semantically meaningful macros by directly invoking macros internal to \LaTeX, \LaTeXML\ actually avoids processing style files when a binding is unavailable. The option \begin{lstlisting}[style=shell] --includestyles \end{lstlisting} can be used to override this behaviour and allow \LaTeXML\ to (attempt to) process raw style files. [A more selective, per-file, option may be developed in the future, if there is sufficient demand --- please provide use cases.] \LaTeXML\ always starts with the \texttt{TeX.pool} binding loaded, and if \LaTeX-specific commands are recognized, \texttt{LaTeX.pool} as well. Any input directives within the source loads the appropriate binding. For example, \verb|\documentclass{article}| or \verb|\usepackage{graphicx}| will load the bindings \texttt{article.cls.ltxml} or \texttt{graphicx.sty.ltxml}, respectively; the obsolete directive \verb|\documentstyle| is also recognized. An \verb|\input| directive will search for files with both \texttt{.tex} and \texttt{.sty} extensions; it will prefer a binding file if one is found, but will load and digest a \texttt{.tex} if no binding is found. An \verb|\include| directive (and related ones) search only for a \texttt{.tex} file, which is processed and digested as usual. There are two mechanisms for customization: a document-specific binding file \shellcode|%doc%.latexml| will be loaded, if present; the option \begin{lstlisting}[style=shell] --preload=%binding% \end{lstlisting} will load the binding file \shellcode|%binding%.ltxml|. The \shellcode|--preload| option can be repeated; both kinds of preload are loaded before document processing, and are processed in order. See Chapter \ref{customization} for details about what can go in these bindings; and Appendix \ref{included.bindings} for a list of bindings currently included in the distribution. %%%---------------------------------------------------------------------- \section[Postprocessing]{Basic Postprocessing}\label{usage.post}\index{latexmlpost!basic usage} In the simplest situation, you have a single \TeX\ source document from which you want to generate a single output document. The command \begin{lstlisting}[style=shell] latexmlpost %options% --destination=%doc%.html %doc% \end{lstlisting} or similarly with \shellcode|--destination=%doc%.html4|, \shellcode|--destination=%doc%.xhtml|, will carry out a set of appropriate transformations in sequence: \begin{itemize} \item scanning of labels and ids; \item filling in the index and bibliography (if needed); \item cross-referencing; \item conversion of math; \item conversion of graphics and picture environments to web format (png); \item applying an \XSLT\ stylesheet. \end{itemize} The output format affects the defaults for each step, and particularly, the \XSLT\ stylesheet that is used, and is determined by the file extension of \shellcode{--destination}, or by the option \begin{lstlisting}[style=shell] --format=(html|html5|html4|xhtml|xml) \end{lstlisting} which overrides the extension used in the destination. The recognized formats are: \begin{description} \item[html or html5] math is converted to Presentation \MathML, some `vector' style graphics are converted to SVG, other graphics are converted to images; \code{LaTeXML-html5.xslt} is used. The file extension html is generates html5 \item[html4] both math and graphics are converted to png images; \code{LaTeXML-html4.xslt} is used. \item[xhtml] math is converted to Presentation \MathML, other graphics are converted to images; \code{LaTeXML-xhtml.xslt} is used. \item[xml] no math, graphics or \XSLT\ conversion is carried out. \end{description} Of course, all of these conversions can be controlled or overridden by explicit options described below. For more details about less common options, see the command documentation \ltxcmd{latexmlpost}, as well as Appendix \ref{postmodules}. \paragraph{Scanning}\label{usage.post.scanning} The scanning step collects information about all labels, ids, indexing commands, cross-references and so on, to be used in the following postprocessing stages. \paragraph{Indexing}\label{usage.post.indexing} An index is built from \verb|\index| markup, if \shellcode{makeidx}'s \verb|\printindex| command has been used, but this can be disabled by \begin{lstlisting}[style=shell] --noindex \end{lstlisting} The index entries can be permuted with the option \begin{lstlisting}[style=shell] --permutedindex \end{lstlisting} Thus \verb|\index{term a!term b}| also shows up as \verb|\index{term b!term a}|. This leads to a more complete, but possibly rather silly, index, depending on how the terms have been written. \paragraph{Bibliography}\label{usage.post.bibtex} When a document contains a request for bibliographies, typically due to the \verb|\bibliography{..}| command, the postprocessor will look for the named bibliographies. It first looks for preconverted bibliographies with the extention \verb|.bib.xml|, otherwise it will look for \verb|.bib| and convert it internally (the latter is a somewhat experimental feature). If you want to override that search, for example using a bibliography with a different name, you can supply that filename using the option \begin{lstlisting}[style=shell] --bibliography=%bibfile%.bib.xml \end{lstlisting} Note that the internal bibliography list will then be ignored. The bibliography would have typically been produced by running \begin{lstlisting}[style=shell] latexml --dest=bibfile.bib.xml bibfile.bib \end{lstlisting} Note that the \XML\ file, bibfile, is not used to directly produce an \HTML-formatted bibliography, rather it is used to fill in the \verb|\bibliography{..}| within a \TeX\ document. \paragraph{Cross-Referencing}\label{usage.post.crossref} In this stage, the scanned information is used to fill in the text and links of cross-references within the document. The option \begin{lstlisting}[style=shell] --urlstyle=(server|negotiated|file) \end{lstlisting} can control the format of urls with the document. \begin{description} \item[server] formats urls appropriate for use from a web server. In particular, trailing \code{index.html} are omitted. (default) \item[negotiated] formats urls appropriate for use by a server that implements content negotiation. File extensions for \code{html} and \code{xhtml} are omitted. This enables you to set up a server that serves the appropriate format depending on the browser being used. \item[file] formats urls explicitly, with full filename and extension. This allows the files to be browsed from the local filesystem. \end{description} \paragraph{Math Conversion}\label{usage.post.math} Specific conversions of the mathematics can be requested using the options \begin{lstlisting}[style=shell] --mathimages # converts math to png images, --presentationmathml %or% --pmml # creates Presentation @\MathML@ --contentmathml %or% --cmml # creates Content @\MathML@ --openmath %or% --om # creates @\OpenMath@ --keepXMath # preserves @\LaTeXML@'s XMath \end{lstlisting} (Each of these options can also be negated if needed, eg.~\shellcode{--nomathimages}) It must be pointed out that the Content \MathML\ and \OpenMath\ conversions are currently rather experimental. If more than one of these conversions are requested, parallel math markup will be generated with the first format being the primary one, and the additional ones added as secondary formats. The secondary format is incorporated using whatever means the primary format uses; eg. \MathML\ combines formats using \texttt{m:semantics} and \texttt{m:annotation-xml}. Given the state of current browsers, when generating MathML it may be useful to conditionally include the `polyfill' library \URL[MathJax]{http://mathjax.org/}, for rendering MathML in browsers that don't support it natively. The following option includes a short Javascript that will load MathJax into such browsers: \begin{lstlisting}[style=shell] --javascript=LaTeXML-maybeMathJax.js \end{lstlisting} (See \ref{usage.post.xslt} for more information about including Javascript.) \paragraph[Graphics]{Graphics processing}\label{usage.post.graphics} Conversion of graphics (eg.~from the \code{graphic(s|x)} packages' \verb|\includegraphics|) can be enabled or disabled using \begin{lstlisting}[style=shell] --graphicsimages %or% --nographicsimages \end{lstlisting} Similarly, the conversion of \code{picture} environments can be controlled with \begin{lstlisting}[style=shell] --pictureimages %or% --nopictureimages \end{lstlisting} An experimental capability for converting the latter to \textsc{SVG} can be controlled by \begin{lstlisting}[style=shell] --svg %or% --nosvg \end{lstlisting} \paragraph{Stylesheets and Javascript}\label{usage.post.xslt} If you wish to provide your own \XSLT\, \CSS\ stylesheets or javascript programs, the options \begin{lstlisting}[style=shell] --stylesheet=%stylesheet%.xsl --css=%stylesheet%.css --nodefaultcss --javascript=%program%.js \end{lstlisting} can be used. The \shellcode{--css} and \shellcode{--javascript} options provide \CSS\ stylesheets and javascript programs respectively; they can be repeated to include multiple files. In both cases, if a local file is referenced, it will be copied to the destination directory, but otherwise urls are accepted. The core \CSS\ stylesheet, \code{LaTeXML.css}, helps match the basic styling of \LaTeX\ to \HTML; certain bindings, such as \texttt{amsart}, automatically include additional stylesheets to better match the desired style. You can also request the inclusion of your own stylesheets from the commandline using \shellcode{--css} option. Some sample \CSS\ enhancements are included with the distribution: \begin{description} \item[\code{LaTeXML-navbar-left.css}] Places a navigation bar on the left. \item[\code{LaTeXML-navbar-right.css}] Places a navigation bar on the left. \item[\code{LaTeXML-blue.css}] Colors various features in a soft blue. \end{description} In cases where you wish to completely manage the \CSS\, the option \shellcode{--nodefaultcss} causes only explicitly requested css files to be included. Javascript files are included in the generated \HTML\ by using the \shellcode{--javascript} option. The distribution includes a sample \code{LaTeXML-maybeMathjax.js} which is useful for supporting MathML: it invokes MathJax\footnote{http://mathjax.org} to render the mathematics in browsers without native support for MathML. Alternatively, you can invoke MathJax unconditionally, from the `cloud' by using: \begin{lstlisting}[style=shell] latexmlpost --format=html5 \ --javascript='http://cdn.mathjax.org/mathjax/latest/MathJax.js' \ --destination=%somewhere/doc%.html %doc% \end{lstlisting} See \ref{customization.latexmlpost.css} for more information on developing your own stylesheets. To develop \CSS\ and \XSLT\ stylesheets, a knowledge of the \LaTeXML\ document type is also necessary; see Appendix \ref{schema}. %%%---------------------------------------------------------------------- \section[Splitting]{Splitting the Output}\label{usage.splitting}\index{latexmlpost!basic usage!split pages} For larger documents, it is often desirable to break the result into several interlinked pages. This split, carried out before scanning, is requested by \begin{lstlisting}[style=shell] --splitat=%level% \end{lstlisting} where \textit{level} is one of \texttt{chapter}, \texttt{section}, \texttt{subsection}, or \texttt{subsubsection}. For example, \texttt{section} would split the document into chapters (if any) and sections, along with separate bibliography, index and any appendices. (See also \shellcode{--splitxpath} in \ltxcmd{latexml}.) The removed document nodes are replaced by a Table of Contents. The extra files are named using either the id or label of the root node of each new page document according to \begin{lstlisting}[style=shell] --splitnaming=(id|idrelative|label|labelrelative) \end{lstlisting} The relative foms create shorter names in subdirectories for each level of splitting. (See also \shellcode{--urlstyle} and \shellcode{--documentid} in \ltxcmd{latexml}.) Additionally, the index and bibliography can be split into separate pages according to the initial letter of entries by using the options \begin{lstlisting}[style=shell] --splitindex %and% --splitbibliography \end{lstlisting} %%%---------------------------------------------------------------------- \section[Sites]{Site processing}\label{usage.site}\index{latexmlpost!basic usage!site building} A more complicated situation combines several \TeX\ sources into a single interlinked site consisting of multiple pages and a composite index and bibliography. \begin{description} \item[Conversion] First, all \TeX\ sources must be converted to \XML, using \ltxcmd{latexml}. Since every target-able element in all files to be combined must have a unique identifier, it is useful to prefix each identifier with a unique value for each file. The \ltxcmd{latexml} option \shellcode{--documentid=%id%} provides this. \item[Scanning] Secondly, all \XML\ files must be split and scanned using the command \begin{lstlisting}[style=shell] latexmlpost --prescan --dbfile=%DB% --dest=%i%.xhtml %i% \end{lstlisting} where \varfile{DB} names a file in which to store the scanned data. Other conversions, including writing the output file, are skipped in this prescanning step. \item[Pagination] Finally, all \XML\ files are cross-referenced and converted into the final format using the command \begin{lstlisting}[style=shell] latexmlpost --noscan --dbfile=%DB% --dest=%i%.xhtml %i% \end{lstlisting} which skips the unnecessary scanning step. \end{description} %%%---------------------------------------------------------------------- \section{Individual Formula}\label{usage.latexmlmath}\index{latexmlmath!basic usage} For cases where you'd just like to convert a single formula to, say, \MathML, and don't mind the overhead, we've combined the pre- and post-processing into a single, handy, command \ltxcmd{latexmlmath}. For example, \begin{lstlisting}[style=shell] latexmlmath --pmml=- \\frac{b\\pm\\sqrt{b^2-4ac}}{2a} \end{lstlisting} will print the \MathML\ to standard output. To convert the formula to a \texttt{png} image, say \texttt{quad.png}, use the option \shellcode{--mathimage=quad.png}. Note that this involves putting \TeX\ code on the command line. You've got to `slashify' your code in whatever way is necessary so that after your shell is finished with it, the string that is passed to \ltxcmd{latexmlmath} sees is normal \TeX. In the example above, in most unix-like shells, we only needed to double-up the backslashes. %%%====================================================================== \chapter{Architecture}\label{architecture} As has been said, \LaTeXML\ consists of two main programs: \ltxcmd{latexml} responsible for converting the \TeX\ source into \XML; and \ltxcmd{latexmlpost} responsible for converting to target formats. See Figure \ref{fig:dataflow} for illustration. The casual user needs only a superficial understanding of the architecture. The programmer who wants to extend or customize \LaTeXML\ will, however, need a fairly good understanding of the process and the distinctions between text, Tokens, Boxes, Whatsits and \XML, on the one hand, and Macros, Primitives and Constructors, on the other. In a way, the implementer of a \LaTeXML\ binding for a \LaTeX\ package may need a \emph{better} understanding than when implementing for \LaTeX\ since they have to understand not only the \TeX-view, primarily just the macros and the intended appearance, but also the \LaTeXML-view, with \XML\ and representation questions, aw well. \begin{figure}[tb] \begin{center} \includegraphics[width=\textwidth]{figures/digestion} \end{center} \caption{Flow of data through \LaTeXML's digestive tract.\label{fig:dataflow}} \end{figure} The intention is that all semantics of the original document is preserved by \ltxcmd{latexml}, or even inferred by parsing; \ltxcmd{latexmlpost} is for formatting and conversion. Depending on your needs, the \LaTeXML\ document resulting from \ltxcmd{latexml} may be sufficient. Alternatively, you may want to enhance the document by applying third party programs before postprocessing. %%%---------------------------------------------------------------------- \section{latexml architecture}\label{latexmlarchitecture} \index{LaTeXML!architecture}% Like \TeX, \ltxcmd{latexml} is data-driven: the text and executable control sequences (ie.~macros and primitives) in the source file (and any packages loaded) direct the processing. For \LaTeXML, the user exerts control over the conversion, and customizes it, by providing alternative bindings of the control sequences and packages, by declaring properties of the desired document structure, and by defining rewrite rules to be applied to the constructed document tree. The top-level class, \pod{LaTeXML}, manages the processing, providing several methods for converting a \TeX\ document or string into an \XML\ document, with varying degrees of postprocessing and writing the document to file. It binds a \ltxpod{Core::State} object (to \ltxcode|$STATE|)%$ to maintain the current state of bindings for control sequence definitions and emulates \TeX's scoping rules. The processing is broken into the following stages \begin{description} \item[Digestion] the \TeX-like digestion phase which converts the input into boxes. \item[Construction] converts the resulting boxes into an \XML\ DOM. \item[Rewriting] applies rewrite rules to modify the DOM. \item[Math Parsing] parses the tokenized mathematics. \item[Serialization] converts the \XML\ DOM to a string, or writes to file. \end{description} %%%---------------------------------------------------------------------- \subsection{Digestion}\label{architecture.digestion} \index{Mouth(LaTeXML::)@{\ttfamily Mouth(LaTeXML::)}!architecture}% \index{Gullet(LaTeXML::)@{\ttfamily Gullet(LaTeXML::)}!architecture}% \index{Stomach(LaTeXML::)@{\ttfamily Stomach(LaTeXML::)}!architecture}% \index{Token(LaTeXML::)@{\ttfamily Token(LaTeXML::)}!architecture}% \index{Tokens(LaTeXML::)@{\ttfamily Tokens(LaTeXML::)}!architecture}% \index{Definition(LaTeXML::)@{\ttfamily Definition(LaTeXML::)}!architecture}% \index{Expandable(LaTeXML::)@{\ttfamily Expandable(LaTeXML::)}!architecture}% \index{Primitive(LaTeXML::)@{\ttfamily Primitive(LaTeXML::)}!architecture}% \index{Box(LaTeXML::)@{\ttfamily Box(LaTeXML::)}!architecture}% \index{List(LaTeXML::)@{\ttfamily List(LaTeXML::)}!architecture}% \index{Whatsit(LaTeXML::)@{\ttfamily Whatsit(LaTeXML::)}!architecture}% Digestion is carried out primarily in a \emph{pull} mode: The \ltxpod{Core::Stomach} pulls expanded \ltxpod{Core::Token}s from the \ltxpod{Core::Gullet}, which itself pulls \ltxpod{Core::Token}s from the \ltxpod{Core::Mouth}. The \ltxpod{Core::Mouth} converts characters from the plain text input into \ltxpod{Core::Token}s according to the current \emph{catcodes} (category codes) assigned to them (as bound in the \ltxpod{Core::State}). The \ltxpod{Core::Gullet} is responsible for expanding Macros, that is, control sequences currently bound to \ltxpod{Core::Definition::Expandable}s and for parsing sequences of tokens into common core datatypes (\ltxpod{Common::Number}, \ltxpod{Common::Dimension}, etc.). See \ref{customization.latexml.expansion} for how to define macros and affect expansion. The \ltxpod{Core::Stomach} then digests these tokens by executing \ltxpod{Core::Definition::Primitive} control sequences, usually for side effect, but often for converting material into \ltxpod{Core::List}s of \ltxpod{Core::Box}es and \ltxpod{Core::Whatsit}s (A Macro should never digest). Normally, textual tokens are converted to \ltxpod{Core::Box}es in the current font. The main (intentional) deviation of \LaTeXML's digestion from that of \TeX\ is the introduction of a new type of definition, a \ltxpod{Core::Definition::Constructor}, responsible for constructing \XML\ fragments. A control sequence bound to \ltxpod{Core::Definition::Constructor} is digested by reading and processing its arguments and wrapping these up in a \ltxpod{Core::Whatsit}. Before- and after-daemons, essentially anonymous primitives, associated with the \ltxpod{Core::Definition::Constructor} are executed before and after digesting the \ltxpod{Core::Definition::Constructor} arguments' markup, which can affect the context of that digestion, as well as augmenting the \ltxpod{Core::Whatsit} with additional properties. See \ref{customization.latexml.digestion} for how to define primitives and affect digestion. %%%---------------------------------------------------------------------- \subsection{Construction}\label{architecture.construction} \index{Constructor (LaTeXML::)!architecture}% \index{Document (LaTeXML::)!architecture}% \index{Model (LaTeXML::)!architecture}% Given the \ltxpod{Core::List} of \ltxpod{Core::Box}es and \ltxpod{Core::Whatsit}s, we proceed to constructing an \XML\ document. This consists of creating an \ltxpod{Core::Document} object, containing a libxml2 document, \pod{XML::LibXML::Document}, and having it absorb the digested material. Absorbing a \ltxpod{Core::Box} converts it to text content, with provision made to track and set the current font. A \ltxpod{Core::Whatsit} is absorbed by invoking the associated \ltxpod{Core::Definition::Constructor} to insert an appropriate \XML\ fragment, including elements and attributes, and recursively processing their arguments as necessary See \ref{customization.latexml.construction} for how to define constructors. A \ltxpod{Common::Model} is maintained througout the digestion phase which accumulates any document model declarations, in particular the document type (RelaxNG is preferred, but DTD is also supported). As \LaTeX\ markup is more like \SGML\ than \XML, additional declarations may be used (see \code{Tag} in \ltxpod{Package}) to indicate which elements may be automatically opened or closed when needed to build a document tree that matches the document type. As an example, a \xmlcode{} will automaticall be closed when a \xmlcode{

} is begun. Additionally, extra bits of code can be executed whenever particularly elements are openned or closed (also specified by \code{Tag}). See \ref{customization.latexml.schema} for how to affect the schema. %%%---------------------------------------------------------------------- \subsection{Rewriting}\label{architecture.rewriting} \index{Rewrite (LaTeXML::)!architecture}% Once the basic document is constructed, \ltxpod{Core::Rewrite} rules are applied which can perform various functions. Ligatures and combining mathematics digits and letters (in certain fonts) into composite math tokens are handled this way. Additionally, declarations of the type or grammatical role of math tokens can be applied here See \ref{customization.latexml.rewriting} for how to define rewrite rules. %%%---------------------------------------------------------------------- \subsection{MathParsing}\label{architecture.mathparsing} \index{MathParser (LaTeXML::)!architecture}% After rewriting, a grammar based parser is applied to the mathematical nodes in order to infer, at least, the structure of the expressions, if not the meaning. Mathematics parsing, and how to control it, is covered in detail in Chapter \ref{math}. %%%---------------------------------------------------------------------- \subsection{Serialization}\label{architecture.serialization} Here, we simple convert the DOM into string form, and output it. %%%---------------------------------------------------------------------- \section{latexmlpost architecture}\label{latexmlpostarchitecture} \index{Post (LaTeXML::)!architecture}% \LaTeXML's postprocessor is primarily for format conversion. It operates by applying a sequence of filters responsible for transforming or splitting documents, or their parts, from one format to another. Exactly which postprocessing filter modules are applied depends on the commandline options to \ltxcmd{latexmlpost}. Postprocessing filter modules are generally applied in the following order: \begin{description} \item[Split] splits the document into several `page' documents, according to \cmd{--split} or \cmd{--splitxpath} options. \item[Scan] scans the document for all ID's, labels and cross-references. This data may be stored in an external database, depending on the \cmd{--db} option. \item[MakeIndex] fills in the \elementref{index} element (due to a \verb|\printindex|) with material generated by \verb|index|. \item[MakeBibliography] fills in the \elementref{bibliography} element (from \verb|\bibliography|) with material extracted from the file specified by the \cmd{--bibilography} option, for all \verb|\cite|'d items. \item[CrossRef] establishes all cross-references between documents and parts thereof, filling in the references with appropriate text for the hyperlink. \item[MathImages, MathML, OpenMath] performs various conversions of the internal Math representation. \item[PictureImages, Graphics, SVG] performs various graphics conversions. \item[XSLT] applies an XSLT transformation to each document. \item[Writer] writes the document to a file in the appropriate location. \end{description} See \ref{customization.latexmlpost} for how to customize the postprocessing. %%%====================================================================== % \part{Intermediate} %%%====================================================================== \chapter{Customization}\label{customization} The processsing of the \LaTeX\ document, its conversion into \XML\ and ultimately to \XHTML\ or other formats can be customized in various ways, at different stages of processing and in different levels of complexity. Depending on what you are trying to achieve, some approaches may be easier than others: Recall Larry Wall's adage ``There's more than one way to do it.'' By far, the easiest way to customize the style of the output is by modifying the CSS, see \ref{customization.latexmlpost.css}, so that is the recommended way when it applies. The basic conversion from \TeX\ markup to \XML\ is done by \ltxcmd{latexml}, and is obviously affected by the mapping between the \TeX\ markup and the \XML\ markup. This mapping is defined by macros, primitives and, of course, constructors; The mapping that is in force at any time is determined by the \LaTeXML-specific implementations of the \TeX\ packages involved, what we call `bindings'. Consequently, you can customize the conversion by modifying the bindings used by \ltxcmd{latexml}. Likewise, you extend \ltxcmd{latexml} by creating bindings for \TeX\ styles that hadn't been covered. Or by defining your own \TeX\ style file along with it's \LaTeXML\ binding. In all these cases, you'll need the same skills: understanding and using text, tokens, boxes and whatsits, as well as macros and macro expansion, primitives and digestion, and finally whatsits and constructors. Understanding \TeX\ helps; reading the \LaTeXML\ bindings in the distribution will give an idea of how we use it. %%% %% Understand TeX; %% to implement: Understand the style file! %% Realize that people will misuse/abuse/stretch, %% often getting below the level of the advertised interface. %%% To teach \LaTeXML\ about new macros, to implement bindings for a package not yet covered, or to modify the way \TeX\ control sequences are converted to \XML, you will want to look at \ref{customization.latexml}. To modify the way that \XML\ is converted to other formats such as \HTML, see \ref{customization.latexmlpost}. A particularly powerful strategy when you have control over the source documents is to develop a semantically oriented \LaTeX\ style file, say \texttt{smacros.sty}, and then provide a \LaTeXML\ binding as \texttt{smacros.sty.ltxml}. In the \LaTeX\ version, you may style the terms as you like; in the \LaTeXML\ version, you could control the conversion so as to preserve the semantics in the \XML. If \LaTeXML's schema is insufficient, then you would need to extend it with your own representation; although that is beyond the scope of the current manual, see the discussion below in \ref{customization.latexml.schema}. In such a case, you would also need to extend the \XSLT\ stylesheets, as discussed in \ref{customization.latexmlpost.xslt}. %%%---------------------------------------------------------------------- \section{latexml Customization}\label{customization.latexml} This layer of customization deals with modifying the way a \LaTeX\ document is transformed into \LaTeXML's \XML, primarily through defining the way that control sequences are handled. In \ref{usage.conversion.loading} the loading of various bindings was described. The facilities described in the following subsections apply in all such cases, whether used to customize the processing of a particular document or to implement a new \LaTeX\ package. We make no attempt to be comprehensive here; please consult the documentation for \ltxpod{Global} and \ltxpod{Package}, as well as the binding files included with the system for more guidance. A \LaTeXML\ binding is actually a Perl module, and as such, a familiarity with Perl is helpful. A binding file will look something like: \begin{lstlisting}[style=latexml] use LaTeXML::Package; use strict; use warnings; # Your code here! 1; \end{lstlisting} The final `1' is required; it tells Perl that the module has loaded successfully. In between, comes any Perl code you wish, along with the definitions and declarations as described here. Actually, familiarity with Perl is more than merely helpful, as is familiarity with \TeX\ and \XML! When writing a binding, you will be programming with all three languages. Of course, you need to know the \TeX\ corresponding to the macros that you intend to implement, but sometimes it is most convenient to implement them completely, or in part, in \TeX, itself (eg. using \ltxcode|DefMacro|), rather then in Perl. At the other end, constructors (eg. using \ltxcode|DefConstructor|) are usually defined by patterns of \XML. \subsection[Expansion]{Expansion \& Macros}\label{customization.latexml.expansion} Macros are defined using \texttt{DefMacro}, such as the pointless: \begin{lstlisting}[style=latexml] DefMacro('\mybold{}','\textbf{#1}'); \end{lstlisting} The two arguments to \texttt{DefMacro} we call the \emph{prototype} and the \emph{replacement}. In the prototype, the \verb|{}| specifies a single normal \TeX\ parameter. The replacement is here a string which will be tokenized and the \verb|#1| will be replaced by the tokens of the argument. Presumably the entire result will eventually be further expanded and or processed. Whereas, \TeX\ normally uses \verb|#1|, and \LaTeX\ has developed a complex scheme where it is often necessary to peek ahead token by token to recognize optional arguments, we have attempted to develop a suggestive, and easier to use, notation for parameters. Thus a prototype \verb|\foo{}| specifies a single normal argument, wheere \verb|\foo[]{}| would take an optional argument followed by a required one. More complex argument prototypes can be found in \ltxpod{Package}. As in \TeX, the macro's arguments are neither expanded nor digested until the expansion itself is further expanded or digested. The macro's replacement can also be Perl code, typically an anonymous \texttt{sub}, which gets the current \ltxpod{Core::Gullet} followed by the macro's arguments as its arguments. It must return a list of \ltxpod{Core::Token}'s which will be used as the expansion of the macro. The following two examples show alternative ways of writing the above macro: \begin{lstlisting}[style=latexml] DefMacro('\mybold{}', sub { my($gullet,$arg)=@_; (T_CS('\textbf'),T_BEGIN,$arg,T_END); }); \end{lstlisting} or alternatively \begin{lstlisting}[style=latexml] DefMacro('\mybold{}', sub { Invocation(T_CS('\textbf'),$_[1]); }); \end{lstlisting} Generally, the body of the macro should \emph{not} involve side-effects, assignments or other changes to state other than reading \ltxpod{Core::Token}'s from the \ltxpod{Core::Gullet}; of course, the macro may expand into control sequences which do have side-effects. Functions that are useful for dealing with \ltxpod{Core::Token}s and writing macros include the following: \begin{itemize} \item Constants for the corresponding \TeX\ catcodes: \begin{lstlisting}[style=latexml] CC_ESCAPE, CC_BEGIN, CC_END, CC_MATH, CC_ALIGN, CC_EOL, CC_PARAM, CC_SUPER, CC_SUB, CC_IGNORE, CC_SPACE, CC_LETTER, CC_OTHER, CC_ACTIVE, CC_COMMENT, CC_INVALID \end{lstlisting} % \ltxcode|CC_CS|, \ltxcode|CC_NOTEXPANDED| \item Constants for tokens with the appropriate content and catcode: \begin{lstlisting}[style=latexml] T_BEGIN, T_END, T_MATH, T_ALIGN, T_PARAM, T_SUB, T_SUPER, T_SPACE, T_CR \end{lstlisting} \item \ltxcode|T_LETTER($char)|, \ltxcode|T_OTHER($char)|, \ltxcode|T_ACTIVE($char)|, create tokens of the appropriate catcode with the given text content. % \ltxcode|T_COMMENT($string)|, \item \ltxcode|T_CS($cs)| creates a control sequence token; the string \ltxcode|$cs| should typically begin with the slash. \item \ltxcode|Token($string,$catcode)| creates a token with the given content and catcode. \item \ltxcode|Tokens($token,...)| creates a \ltxpod{Core::Tokens} object containing the list of \ltxpod{Core::Token}s. \item \ltxcode|Tokenize($string)| converts the string to a \ltxpod{Core::Tokens}, using \TeX's standard catcode assignments. \item \ltxcode|TokenizeInternal($string)| like \texttt{Tokenize}, but treating \ltxcode|@| as a letter. \item \ltxcode|Explode($string)| converts the string to a \ltxpod{Core::Tokens} where letter character are given catcode \ltxcode|CC_OTHER|. \item \ltxcode|Expand($tokens| expands \ltxcode|$tokens| (a \ltxpod{Core::Tokens}), returning a \ltxpod{Core::Tokens}; there should be no expandable tokens in the result. \item \ltxcode|Invocation($cstoken,$arg,...)| Returns a \ltxpod{Core::Tokens} representing the sequence needed to invoke \ltxcode|$cstoken| on the given arguments (each are \ltxpod{Core::Tokens}, or undef for an unsupplied optional argument). \end{itemize} % Are any of the keyword options worth bringing up? % Probably I should at least mention that they are there... \subsection[Digestion]{Digestion \& Primitives}\label{customization.latexml.digestion} Primitives are processed during the digestion phase in the \ltxpod{Core::Stomach}, after macro expansion (in the \ltxpod{Core::Gullet}), and before document construction (in the \ltxpod{Core::Document}). Our primitives generalize \TeX's notion of primitive; they are used to implement \TeX's primitives, invoke other side effects and to convert Tokens into Boxes, in particular, Unicode strings in a particular font. Here are a few primitives from \texttt{TeX.pool}: \begin{lstlisting}[style=latexml] DefPrimitive('\begingroup',sub { $_[0]->begingroup; }); DefPrimitive('\endgroup', sub { $_[0]->endgroup; }); DefPrimitiveI('\batchmode', undef,undef); DefPrimitiveI('\OE', undef, "\x{0152}"); DefPrimitiveI('\tiny', undef, undef, font=>{size=>5}); \end{lstlisting} Other than for implementing \TeX's own primitives, \texttt{DefPrimitive} is needed less often than \texttt{DefMacro} or \texttt{DefConstructor}. The main thing to keep in mind is that primitives are processed after macro expansion, by the \ltxpod{Core::Stomach}. They are most useful for side-effects, changing the \ltxpod{Core::State}. \par\noindent\ltxcode|DefPrimitive($prototype,$replacement,%options)| \par The replacement is either a string which will be used to create a Box in the current font, or can be code taking the \ltxpod{Core::Stomach} and the control sequence arguments as argument; like macros, these arguments are not expanded or digested by default, they must be explicitly digested if necessary. The replacement code must either return nothing (eg. ending with \ltxcode|return;|) or should return a list (ie. a Perl list \ltxcode|(...)|) of digested \ltxpod{Core::Box}es or \ltxpod{Core::Whatsit}s. Options to DefPrimitive are: \begin{itemize} \item \ltxcode.mode=>('math'|'text'). switches to math or text mode, if needed; \item \ltxcode|requireMath=>1|, \ltxcode|forbidMath=>1| requires, or forbids, this primitive to appear in math mode; \item \ltxcode|bounded=>1| specifies that all digestion (of arguments and daemons) will take place within an implicit \TeX\ group, so that any side-effects are localized, rather than affecting the global state; \item \ltxcode|font=>{%hash}| switches the font used for any created text; recognized font keys are \texttt{family}, \texttt{series}, \texttt{shape}, \texttt{size}, \texttt{color}; Note that if the font change should only affect the material digested within this command itself, then \ltxcode|bounded=>1| should be used; otherwise, the font change will remain in effect after the command is processed. \item \ltxcode|beforeDigest=>CODE($stomach)|,\\ \ltxcode|afterDigest=>CODE($stomach)| provides code to be digested before and after processing the main part of the primitive. \end{itemize} % DefRegister ? Other functions useful for dealing with digestion and state are important for writing before \& after daemons in constructors, as well as in Primitives; we give an overview here: \begin{itemize} \item \ltxcode|Digest($tokens)| digests \ltxcode|$tokens| (a \ltxpod{Core::Tokens}), returning a list of \ltxpod{Core::Box}es and \ltxpod{Core::Whatsit}s. \item \ltxcode|Let($token1,$token2)| gives \ltxcode|$token1| the same meaning as \ltxcode|$token2|, like \verb|\let|. \end{itemize} \paragraph{Bindings} The following functions are useful for accessing and storing information in the current \ltxpod{Core::State}. It maintains a stack-like structure that mimics \TeX's approach to binding; braces \verb|{| and \verb|}| open and close stack frames. (The \ltxpod{Core::Stomach} methods \texttt{bgroup} and \texttt{egroup} can be used when explicitly needed.) \begin{itemize} \item \ltxcode|LookupValue($symbol)|, \ltxcode|AssignValue($string,$value,$scope)| maintain arbitrary values in the current \ltxpod{Core::State}, looking up or assigning the current value bound to \ltxcode|$symbol| (a string). For assignments, the \ltxcode|$scope| can be \texttt{'local'} (the default, if \ltxcode|$scope| is omitted), which changes the binding in the current stack frame. If \ltxcode|$scope| is \texttt{'global'}, it assigns the value globally by undoing all bindings. The \ltxcode|$scope| can also be another string, which indicates a named scope --- but that is a more advanced topic. \item \ltxcode|PushValue($symbol,$value,...)|, \ltxcode|PopValue($symbol)|,\hfil\\ \ltxcode|UnshiftValue($symbol,$value,...)|, \ltxcode|ShiftValue($symbol)| These maintain the value of \ltxcode|$symbol| as a list, with the operatations having the same sense as in Perl; modifications are always global. \item \ltxcode|LookupCatcode($char)|, \ltxcode|AssignCatcode($char,$catcode,$scope)| maintain the catcodes associated with characters. \item \ltxcode|LookupMeaning($token)|, \ltxcode|LookupDefinition($token)| looks up the current meaning of the token, being any executable definition bound for it. If there is no such defniition \texttt{LookupMeaning} returns the token itself, \texttt{LookupDefinition} returns \texttt{undef}. % \ltxcode|InstallDefinition()| \end{itemize} \paragraph{Counters} % Should this go under Constructors? The following functions maintain \LaTeX-like counters, and generally also associate an \texttt{ID} with them. A counter's print form (ie. \verb|\theequation| for equations) often ends up on the \attr{refnum} attribute of elements; the associated \texttt{ID} is used for the \attr{xml:id} attribute. \begin{itemize} \item \ltxcode|NewCounter($name,$within,%options)|, creates a \LaTeX-style counters. When \ltxcode|$within| is used, the given counter will be reset whenever the counter \ltxcode|$within| is incremented. This also causes the associated \texttt{ID} to be prefixed with \ltxcode|$within|'s \texttt{ID}. The option \ltxcode|idprefix=>$string| causes the \texttt{ID} to be prefixed with that string. For example, \begin{lstlisting}[style=latexml] NewCounter('section', 'document', idprefix=>'S'); NewCounter('equation','document', idprefix=>'E', idwithin=>'section'); \end{lstlisting} would cause the third equation in the second section to have \xmlcode{ID='S2.E3'}. \item \ltxcode|CounterValue($name)| returns the \ltxpod{Common::Number} representing the current value. \item \ltxcode|ResetCounter($name)| resets the counter to 0. \item \ltxcode|StepCounter($name)| steps the counter (and resets any others `within' it), and returns the expansion of \verb|\the$name|. \item \ltxcode|RefStepCounter($name)| steps the counter and any ID's associated with it. It returns a hash containing \texttt{refnum} (expansion of \verb|\the$name|) and \texttt{id} (expansion of \verb|\the$name@ID|) \item \ltxcode|RefStepID($name)| steps the ID associated with the counter, without actually stepping the counter; this is useful for unnumbered units that normally would have both a refnum and ID. %\ltxcode|GenerateID()| \end{itemize} \subsection[Construction]{Construction \& Constructors}\label{customization.latexml.construction} Constructors are where things get interesting, but also complex; they are responsible for defining how the \XML\ is built. There are basic constructors corresponding to normal control sequences, as well as environments. Mathematics generally comes down to constructors, as well, but is covered in Chapter \ref{math}. Here are a couple of trivial examples of constructors: \begin{lstlisting}[style=latexml] DefConstructor('\emph{}', "#1", mode=>'text'); DefConstructor('\item[]', "?#1(#1)"); DefEnvironment('{quote}', '#body', beforeDigest=>sub{ Let('\\\\','\@block@cr');}); DefConstructor('\footnote[]{}', "#2", mode=>'text', properties=> sub { ($_[1] ? (refnum=>$_[1]) : RefStepCounter('footnote')) }); \end{lstlisting} \par\noindent\ltxcode|DefConstructor($prototype,$replacement,%options)| \par The \ltxcode|$replacement| for a constructor describes the \XML\ to %$ be generated during the construction phase. It can either be a string representing the \XML\ as a pattern (described below), or a subroutine \ltxcode|CODE($document,$arg1,...%props)| receiving the arguments and properties from the \ltxpod{Core::Whatsit}; it would invoke the methods of \ltxpod{Core::Document} to construct the desired \XML. The pattern as illustrated above, simply represents a serialization of the desired \XML, with extensions to allow substituting arguments or other data into the pattern. In addition to literal replacement, the following may appear: \begin{itemize} \item \ltxcode|#1,#2,...#name| inserts the construction of the argument or property in the \XML; \item \ltxcode|&function($a,$b,...)| invokes the named function on the given arguments and inserts its value in place; \item \ltxcode|?COND(pattern)| or \ltxcode|?COND(ifpattern)(elsepattern)| conditionally inserts the patterns depending on the result of the conditional. \texttt{COND} would typically be testing the presence of an argument, \ltxcode|#1|, or property \ltxcode|#name| or invoking a function; \item \ltxcode|^| if this appears at the beginning of the pattern, the replacement is allowed to \emph{float} up the current tree to whereever it might be allowed. \end{itemize} A subroutine used as the \ltxcode|$replacement|, %$ allows programmatic insertion of \XML\ into, or modification of, the document being constructed. Although one could use LibXML's DOM API to manipulate the document tree, it is \emph{strongly} recommended to use \ltxpod{Core::Document}'s API whereever possible as it maintains consistency and manages namespace prefixes. This is particularly true for insertion of new content, setting attributes and finding existing nodes in the tree using XPath. Options: \begin{itemize} \item \ltxcode.mode=>('math'|'text'). switches to math or text mode, if needed; \item \ltxcode|requireMath=>1|, \ltxcode|forbidMath=>1| requires, or forbids, this constructor to appear in math mode; \item \ltxcode|bounded=>1| specifies that all digestion (of arguments and daemons) will take place within an implicit \TeX\ group, so that any side-effects are localized, rather than affecting the global state; \item \ltxcode|font=>{%hash}| switches the font used for any created text; recognized font keys are \texttt{family}, \texttt{series}, \texttt{shape}, \texttt{size}, \texttt{color}; \item \ltxcode.properties=> {%hash} | CODE($stomach,$arg1,...). provides a set of properties to store in the \ltxpod{Core::Whatsit} for eventual use in the constructor \ltxcode|$replacement|. If a subroutine is used, it also should return a hash of properties; \item \ltxcode|beforeDigest=>CODE($stomach)|,\\ \ltxcode|afterDigest=>CODE($stomach,$whatsit)| provides code to be digested before and after digesting the arguments of the constructor, typically to alter the context of the digestion (before), or to augment the properties of the \ltxpod{Core::Whatsit} (after); \item \ltxcode|beforeConstruct=>CODE($document,$whatsit)|,\\ \ltxcode|afterConstruct=>CODE($document,$whatit)| provides code to be run before and after the main \ltxcode|$replacement| is effected; occassionaly it is convenient to use the pattern form for the main \ltxcode|$replacement|, but one still wants to execute a bit of Perl code, as well; %\item \ltxcode|nargs| HUH? \item \ltxcode.captureBody=>(1 | $token). specifies that an additional argument (like an environment body) wiil be read until the current \TeX\ grouping ends, or until the specified \ltxcode|$token| is encountered. This argument is available to \ltxcode|$replacement| as \ltxcode|$body|; \item \ltxcode.scope=>('global'|'local'|$name). specifies whether this definition is made globally, or in the current stack frame (default), (or in a named scope); \item \ltxcode&reversion=>$string|CODE(...)&, \ltxcode|alias=>$cs| can be used when the \ltxpod{Core::Whatsit} needs to be reverted into \TeX\ code, and the default of simply reassembling based on the prototype is not desired. See the code for examples. \end{itemize} Some additional functions useful when writing constructors: \begin{itemize} \item \ltxcode|ToString($stuff)| converts \ltxcode|$stuff| to a string, hopefully without \TeX\ markup, suitable for use as document content and attribute values. Note that if \ltxcode|$stuff| contains Whatsits generated by Constructors, it may not be possible to avoid \TeX\ code. Constrast \ltxcode|ToString| to the following two functions. \item \ltxcode|UnTeX($stuff)| returns a string containing the \TeX\ code that would generate \ltxcode|$stuff| (this might not be the original \TeX). The function \ltxcode|Revert($stuff)| returns the same information as a Tokens list. \item \ltxcode|Stringify($stuff)| returns a string more intended for debugging purposes; it reveals more of the structure and type information of the object and its parts. \item \ltxcode|CleanLabel($arg)|, \ltxcode|CleanIndexKey($arg)|, \ltxcode|CleanBibKey($arg)|,\hfil\\ \ltxcode|CleanURL($arg)| cleans up arguments (converting to string, handling invalid characters, etc) to make the argument appropriate for use as an attribute representing a label, index ID, etc. \item \ltxcode|UTF($hex)| returns the Unicode character for the given codepoint; this is useful for characters below \texttt{0x100} where Perl becomes confused about the encoding. \end{itemize} \par\noindent \ltxcode|DefEnvironment($prototypte,$replacement,%options)| \par Environments are largely a special case of constructors, but the prototype starts with \verb|{envname}|, rather than \verb|\cmd|, the replacement will also typically involve \verb|#body| representing the contents of the environment. \texttt{DefEnvironment} takes the same options as \texttt{DefConstructor}, with the addition of \begin{itemize} \item \ltxcode|afterDigestBegin=>CODE($stomach,$whatsit)| provides code to digest after the \verb|\begin{env}| is digested; \item \ltxcode|beforeDigestEnd=>CODE($stomach)| provides code to digest before the \verb|\end{env}| is digested. \end{itemize} For those cases where you do not want an environment to correspond to a constructor, you may still (as in \LaTeX), define the two control sequences \verb|\envname| and \verb|\endenvname| as you like. \subsection{Document Model}\label{customization.latexml.schema} The following declarations are typically only needed when customizing the schema used by \LaTeXML. \begin{itemize} \item \ltxcode|RelaxNGSchema($schema,%namespaces)| declares the created \XML\ document should be fit to the RelaxNG schema in \ltxcode|$schema|; A file \ltxcode|$schema.rng| should be findable in the current search paths. (Note that currently, \LaTeXML\ is unable to directly parse compact notation). \item \ltxcode|RegisterNamespace($prefix,$url)| associates the prefix with the given namespace url. This allows you to use \ltxcode|$prefix| as a namespace prefix when writing \ltxpod{Core::Definition::Constructor} patterns or XPath expressions. \item \ltxcode|Tag($tag,%properties)| specifies properties for the given \XML\ \ltxcode|$tag|. Recognized properties include: \ltxcode|autoOpen=>1| indicates that the tag can automatically be opened if needed to create a valid document; \ltxcode|autoClose=>1| indicates that the tag can automatically be closed if needed to create a valid document; \ltxcode|afterOpen=>$code| specifies code to be executed before opening the tag; the code is passed the \ltxpod{Core::Document} being constructed as well as the \ltxpod{Core::Box} (or \ltxpod{Core::Whatsit}) responsible for its creation; \ltxcode|afterClose=>code| similar to \texttt{afterOpen}, but executed after closing the element. % DocType \end{itemize} \subsection{Rewriting}\label{customization.latexml.rewriting} The following functions are a bit tricky to use (and describe), but can be quite useful in some circumstances. \begin{itemize} \item \ltxcode|DefLigature($regexp,%options)| applies a regular expression to substitute textnodes after they are closed; the only option is \ltxcode|fontTest=>$code| which restricts the ligature to text nodes where the current font passes \ltxcode|&$code($font)|. \item \ltxcode|DefMathLigature($code)| allows replacement of sequences of math nodes. It applies \ltxcode|$code| to the current \ltxpod{Core::Document} and each sequence of math nodes encountered in the document; if a replacement should occur, \ltxcode|$code| should return a list of the form \ltxcode|($n,$string,%attributes)| in which case, the text content of the first node is replaced by \ltxcode|$string|, the given attributes are added, and the following \ltxcode|$n-1| nodes are removed. \item \ltxcode|DefRewrite(%spec)|, \ltxcode|DefMathRewrite(%spec)| defines document rewrite rules. These specifications describe what document nodes match: \begin{itemize} \item \ltxcode|label=>$label| restricts to nodes contained within an element whose \attr{labels} includes \ltxcode|$label|; \item \ltxcode|scope=>$scope| generalizes \texttt{label}; the most useful form a string like \texttt{'section:1.3.2'} where it matches the \elementref{section} element whose \attr{refnum} is \texttt{1.3.2}; \item \ltxcode|xpath=>$xpath| selects nodes matching the given XPath; \item \ltxcode|match=>$tex| selects nodes that look like what processing the \TeX\ string \ltxcode|$tex| would produce; \item \ltxcode|regexp=>$regexp| selects text nodes that match the given regular expression. \end{itemize} The following specifications describe what to do with the matched nodes: \begin{itemize} \item \ltxcode|attributes=>{%attr}| adds the given attributes to the matching nodes; \item \ltxcode|replace=>$tex| replaces the matching nodes with the result of processing the \TeX\ string \ltxcode|$tex|. \end{itemize} \end{itemize} \subsection{Packages and Options}\label{customization.latexml.packages} The following declarations are useful for defining \LaTeXML\ bindings, including option handling. As when defining \LaTeX\ packages, the following, if needed at all, need to appear in the order shown. \begin{itemize} %===== Declarations of options \item \ltxcode|DeclareOption($option,$handler)| specifies the handler for \ltxcode|$option| when it is passed to the current package or class. If \ltxcode|$option| is \texttt{undef}, it defines the default handler, for options that are otherwise unrecognized. \ltxcode|$handler| can be either a string to be expanded, or a sub which is executed like a primitive. \item \ltxcode|PassOptions($name,$type,@options)| specifies that the given options should be passed to the package (if \ltxcode|$type| is \texttt{sty}) or class (if \ltxcode|$type| is \texttt{cls}) \ltxcode|$name|, if it is ever loaded. %===== Execution of options \item \ltxcode|ProcessOptions(%keys)| processes any options that have been passed to the current package or class. If \ltxcode|inorder=>1| is specified, the options will be processed in the order passed to the package (\verb|\ProcessOptions*|); otherwise they will be processed in the declared order (\verb|\ProcessOptions|). \item \ltxcode|ExecuteOptions(@options)| executes the handlers for the specific set of options \ltxcode|@options|. %===== Package loading \item \ltxcode|RequirePackage($pkgname,%keys)| loads the specified package. The keyword options have the following effect: \ltxcode|options=>$options| can provide an explicit array of string specifying the options to pass to the package; \ltxcode|withoptions=>1| means that the options passed to the currently loading class or package should be passed to the requested package; \ltxcode|type=>$ext| specifies the type of the package file (default is \texttt{sty}); \ltxcode|raw=>1| specifies that reading the raw style file (eg. \texttt{pkg.sty}) is permissible if there is no specific \LaTeXML\ binding (eg. \texttt{pkg.sty.ltxml}) \ltxcode|after=>$after| specifies a string or \ltxpod{Core::Tokens} to be expanded after the package has finished loading. \item \ltxcode|LoadClass($classname,%keys)| Similar to \texttt{RequirePackage}, but loads a class file (\ltxcode|type=>'cls'|). %==== Special commands \item \ltxcode|AddToMacro($cstoken,$tokens)| a little used utilty to add material to the expansion of \ltxcode|$cstoken|, like an \verb|\edef|; typically used to add code to a class or package hook. \end{itemize} \subsection{Miscellaneous}\label{customization.latexml.misc} Other useful stuff: \begin{itemize} \item \ltxcode|RawTeX($texstring)| expands and processes the \ltxcode|$texstring|; This is typically useful to include definitions copied from a \TeX\ stylefile, when they are approriate for \LaTeXML, as is. Single-quoting the \ltxcode|$texstring| is useful, since it isn't interpolated by Perl, and avoids having to double all the slashes! \end{itemize} %\ltxcode|MergeFont()| %%%---------------------------------------------------------------------- \section{latexmlpost Customization}\label{customization.latexmlpost} The current postprocessing framework works by passing the document through a sequence of postprocessing filter modules. Each module is responsible for carrying out a specific transformation, augmentation or conversion on the document. In principle, this architecture has the flexibility to employ new filters to perform new or customized conversions. However, the driver, \ltxcmd{latexmlpost}, currently provides no convenient means to instanciate and incorporate outside filters, short of developing your own specialized version. Consequently, we will consider custom postprocessing filters outside the scope of this manual (but of course, you are welcome to explore the code, or contact us with suggestions). The two areas where customization is most practical is in altering the XSLT transforms used and extending the \CSS\ stylesheets. \subsection{XSLT}\label{customization.latexmlpost.xslt} \LaTeXML\ provides stylesheets for transforming its \XML\ format to \XHTML\ and \HTML. These stylesheets are modular with components corresponding to the schema modules. Probably the best strategy for customizing the transform involves making a copy of the standard base stylesheets, \texttt{LaTeXML-xhtml.xsl}, \texttt{LaTeXML-html.xsl} and \texttt{LaTeXML-html5.xsl}, found at \textit{installationdir}\texttt{/LaTeXML/style/} --- they're short, consisting mainly of an \texttt{xsl:include} and setting appropriate parameters and output method; thus modifying the parameters and and adding your own rules, or including your own modules should be relatively easy. Naturally, this requires a familiarity with \LaTeXML's schema (see \ref{schema}), as well as \XSLT\ and \XHTML. See the other stylesheet modules in the same directory as the base stylesheet for guidance. Generally the strategy is to use various parameters to switch between common behaviors and to use templates with \texttt{mode}s that can be overridden in the less common cases. Conversion to formats other than \XHTML\ are, of course, possible, as well, but are neither supplied nor covered here. How complex the transformation will be depends on the extent that the \LaTeXML\ schema can be mapped to the desired one, and to what extent \LaTeXML\ has lost or hidden information represented in the original document. Again, familiarity with the schema is needed, and the provided \XHTML\ stylesheets may suggest an approach. NOTE: I'm trying to make stylesheets \emph{easily} customizable. However, this is getting tricky. \begin{itemize} \item You can import stylesheets which allows the templates to be overridden. \item You can call the overridden stylesheet using \texttt{apply-imports} \item You can \emph{not} call \texttt{apply-imports} to call an overridden \emph{named} template! (although you seemingly can override them?) \item You can refer to xslt modules using URN's, provided you have loaded the \texttt{LaTeXML.catalog}: {\small \begin{lstlisting}[style=xml] \end{lstlisting} } \end{itemize} \subsection{CSS}\label{customization.latexmlpost.css} \CSS\ stylesheets can be supplied to \ltxcmd{latexmlpost} to be included in the generated documents in addition to, or as a replacement for, the standard stylesheet \texttt{LaTeXML.css}. See the directory \textit{installationdir}\texttt{/LaTeXML/style/} for samples. To best take advantage of this capability so as to design \CSS\ rules with the correct specificity, the following points are helpful: \begin{itemize} \item \LaTeXML\ converts the \TeX\ to its own schema, with structural elements (like \elementref{equation}) getting their own tag; others are transformed to something more generic, such as \elementref{note}. In the latter case, a class attribute is often used to distinguish. For example, a \verb|\footnote| generates \begin{lstlisting}[style=xml] ... \end{lstlisting} whereas an \verb|\endnote| generates \begin{lstlisting}[style=xml] ... \end{lstlisting} \item The provided \XSLT\ stylesheets transform \LaTeXML's schema to \XHTML, generating a combined class attribute consisting of any class attributes already present as well as the \LaTeXML\ tag name. However, there are some variations on the theme. For example, \LaTeX's \verb|\section| yeilds a \LaTeXML\ element \elementref{section}, with a \elementref{title} element underneath. When transformed to \XHTML, the former becomes a \xmlcode{
}, while the latter becomes \xmlcode{

} (for example, the h-level may vary with the document structure), \end{itemize} \paragraph{Mode \texttt{begin} and \texttt{end}} For most elements, once the main html element has been opened and the primary attributes have been added but before any content has been added, a template with mode \texttt{begin} is called; thus it can add either attributes or content. Just before closing the main html element, a template with mode \texttt{end} is called. \paragraph{Computing class and style} Templates with mode \texttt{classes} and \texttt{styling}. %%%====================================================================== \chapter{Mathematics}\label{math} %\ltxcode|DefMath($prototype,$replacement,%options)| There are several issues that have to be dealt with in treating the mathematics. On the one hand, the \TeX\ markup gives a pretty good indication of what the author wants the math to look like, and so we would seem to have a good handle on the conversion to presentation forms. On the other hand, content formats are desirable as well; there are a few, but too few, clues about what the intent of the mathematics is. And in fact, the generation of even Presentation MathML of high quality requires recognizing the mathematical structure, if not the actual semantics. The mathematics processing must therefore preserve the presentational information provided by the author, while inferring, likely with some help, the mathematical content. From a parsing point of view, the \TeX-like processing serves as the lexer, tokenizing the input which \LaTeXML\ will then parse [perhaps eventually a type-analysis phase will be added]. Of course, there are a few twists. For one, the tokens, represented by \elementref{XMTok}, can carry extra attributes such as font and style, but also the name, meaning and grammatical role, with defaults that can be overridden by the author --- more on those, in a moment. Another twist is that, although \LaTeX's math markup is not nearly as semantic as we might like, there is considerable semantics and structure in the markup that we can exploit. For example, given a \verb|\frac|, we've already established the numerator and denominator which can be parsed individually, but the fraction as a whole can be directly represented as an application, using \elementref{XMApp}, of a fraction operator; the resulting structure can be treated as atomic within its containing expression.This \emph{structure preserving} character greatly simplifies the parsing task and helps reduce misinterpretation. The parser, invoked by the postprocessor, works only with the top-level lists of lexical tokens, or with those sublists contained in an \elementref{XMArg}. The grammar works primarily through the name and grammatical role. The name is given by an attribute, or the content if it is the same. The role (things like ID, FUNCTION, OPERATOR, OPEN, \ldots) is also given by an attribute, or, if not present, the name is looked up in a document-specific dictionary (\varfile[dict]{jobname}), or in a default dictionary. Additional exceptions that need fuller explanation are: \begin{itemize} \item \ltxpod{Core::Definition::Constructor}s may wish to create a dual object (\elementref{XMDual}) whose children are the semantic and presentational forms. \item Spacing and similar markup generates \elementref{XMHint} elements, which are currently ignored during parsing, but probably shouldn't. \end{itemize} %%%---------------------------------------------------------------------- \section{Math Details}\label{math.details} \LaTeXML\ processes mathematical material by proceeding through several stages: \begin{itemize} \item Basic processing of macros, primitives and constructors resulting in an XML document; the math is primarily represented by a sequence of tokens (\elementref{XMTok}) or structured items (\elementref{XMApp}, \elementref{XMDual}) and hints (\elementref{XMHint}, which are ignored). \item Document tree rewriting, where rules are applied to modify the document tree. User supplied rules can be used here to clarify the intent of markup used in the document. \item Math Parsing; a grammar based parser is applied, depth first, to each level of the math. In particular, at the top level of each math expression, as well as each subexpression within structured items (these will have been contained in an \elementref{XMArg} or \elementref{XMWrap} element). This results in an expression tree that will hopefully be an accurate representation of the expression's structure, but may be ambigous in specifics (eg. what the meaning of a superscript is). The parsing is driven almost entirely by the grammatical \attr{role} assigned to each item. \item \emph{Not yet implemented} a following stage must be developed to resolve the semantic ambiguities by analyzing and augmenting the expression tree. \item Target conversion: from the internal \texttt{XM*} representation to \MathML\ or \OpenMath. \end{itemize} The \elementref{Math} element is a top-level container for any math mode material, serving as the container for various representations of the math including images (through attributes \attr{mathimage}, \attr{width} and \attr{height}), textual (through attributes \attr{tex}, \attr{content-tex} and \attr{text}), \MathML\ and the internal representation itself. The \attr{mode} attribute specifies whether the math should be in display or inline mode. \subsection{Internal Math Representation}\label{math.details.representation} The \elementref{XMath} element is the container for the internal representation The following attributes can appear on all \texttt{XM*} elements: \begin{description} \item[\attr{role}] the grammatical role that this element plays \item[\attr{open}, \attr{close}] parenthese or delimiters that were used to wrap the expression represented by this element. \item[\attr{argopen}, \attr{argclose}, \attr{separators}] delimiters on an function or operator (the first element of an \elementref{XMApp}) that were used to delimit the arguments of the function. The separators is a string of the punctuation characters used to separate arguments. \item[\attr{xml:id}] a unique identifier to allow reference (\elementref{XMRef}) to this element. \end{description} \paragraph{Math Tags} The following tags are used for the intermediate math representation: \begin{description} \item[\elementref{XMTok}] represents a math token. It may contain text for presentation. Additional attributes are: \begin{description} \item[\attr{name}] the name that represents the \emph{meaning} of the token; this overrides the content for identifying the token. \item[\attr{omcd}] the \OpenMath\ content dictionary that the name belongs to. \item[\attr{font}] the font to be used for presenting the content. \item[\attr{style}] ? \item[\attr{size}] ? \item[\attr{stackscripts}] whether scripts should be stacked above/below the item, instead of the usual script position. \end{description} \item[\elementref{XMApp}] represents the generalized application of some function or operator to arguments. The first child element is the operator, the remainig elements are the arguments. Additional attributes: \begin{description} \item[\attr{name}] the name that represents the meaning of the construct as a whole. \item[\attr{stackscripts}] ? \end{description} \item[\elementref{XMDual}] combines representations of the content (the first child) and presentation (the second child), useful when the two structures are not easily related. \item[\elementref{XMHint}] represents spacing or other apparent purely presentation material. \begin{description} \item[\attr{name}] names the effect that the hint was intended to achieve. \item[\attr{style}] ? \end{description} \item[\elementref{XMWrap}] serves to assert the expected type or role of a subexpression that may otherwise be difficult to interpret --- the parser is more forgiving about these. \begin{description} \item[\attr{name}] ? \item[\attr{style}] ? \end{description} \item[\elementref{XMArg}] serves to wrap individual arguments or subexpressions, created by structured markup, such as \verb|\frac|. These subexpressions can be parsed individually. \begin{description} \item[\attr{rule}] the grammar rule that this subexpression should match. \end{description} \item[\elementref{XMRef}] refers to another subexpression,. This is used to avoid duplicating arguments when constructing an \elementref{XMDual} to represent a function application, for example. The arguments will be placed in the content branch (wrapped in an \elementref{XMArg}) while \elementref{XMRef}'s will be placed in the presentation branch. \begin{description} \item[\attr{idref}] the identifier of the referenced math subexpression. \end{description} \end{description} \subsection{Grammatical Roles}\label{math.details.roles} As mentioned above, the grammar take advantage of the structure (however minimal) of the markup. Thus, the grammer is applied in layers, to sequences of tokens or \emph{atomic} subexpressions (like a fractions or arrays). It is the \attr{role} attribute that indicates the syntactic and/or presentational nature of each item. On the one hand, this drives the parsing: the grammar rules are keyed on the \attr{role} (say, \code{ADDOP}), rather than content (say + or -), of the nodes [In some cases, the content is used to distinguish special synthesized roles]. The \attr{role} is also used to drive the conversion to presentation markup, (say, as an infix operator), especially Presentation \MathML. Some values of \attr{role} are used only in the grammar, some are only used in presentation; most are used both ways. The following grammatical roles are recognized by the math parser. These values can be specified in the \attr{role} attribute during the initial document construction or by rewrite rules. Although the precedence of operators is loosely described in the following, since the grammar contains various special case productions, no rigidly ordered precedence is given. Also note that in the current design, an expresssion has only a single role, although that role may be involved in grammatical rules with distinct syntax and semantics (some roles directly reflect this ambiguity). \begin{description} \item[\code{ATOM}] a general atomic subexpression (atomic at the level of the expression; it may have internal structure); \item[\code{ID}] a variable-like token, whether scalar or otherwise, but not a function; \item[\code{NUMBER}] a number; \item[\code{ARRAY}] a structure with internal components and alignments; typically has a particular syntactic relationship to \code{OPEN} and \code{CLOSE} tokens. \item[\code{UNKNOWN}] an unknown expression. This is the default for token elements. Such tokens are treated essential as \code{ID}, but generate a warning if it seems to be used as a function. \item[\code{OPEN},\code{CLOSE}] opening and closing delimiters, group expressions or enclose arguments among other structures; \item[\code{MIDDLE}] a middle operator used to group items between an \code{OPEN}, \code{CLOSE} pair; \item[\code{PUNCT},\code{PERIOD}] punctuation; a period `ends' formula (note that numbers, including floating point, are recognized earlier in processing); \item[\code{VERTBAR}] a vertical bar (single or doubled) which serves a confusing variety of notations: absolute values, ``at'', divides; \item[\code{RELOP}] a relational operator, loosely binding; \item[\code{ARROW}] an arrow operator (with little semantic significance), but generally treated equivalently to \code{RELOP}; \item[\code{METARELOP}] an operator used for relations between relations, with lower precedence; \item[\code{MODIFIER}] an atomic expression following an object that `modifies' it in some way, such as a restriction $(<0)$ or modulus expression; \item[\code{MODIFIEROP}] an operator (such as mod) between two expressions such that the latter modifies the former; \item[\code{ADDOP}] an addition operator, between \code{RELOP} and \code{MULOP} operators in precedence; \item[\code{MULOP}] a multiplicative operator, high precedence than \code{ADDOOP}; \item[\code{BINOP}] a generic infix operator, can act as either an \code{ADDOP} or \code{MULOP}, typically used for cases wrapped in \verb|\mathbin|; \item[\code{SUPOP}] An operator appearing \emph{in} a superscript, such as a collection of primes, or perhaps a T for transpose. This is distinct from an expression in a superscript with an implied power or index operator; \item[\code{PREFIX}] for a prefix operator; \item[\code{POSTFIX}] for a postfix operator; \item[\code{FUNCTION}] a function which (may) apply to following arguments with higher precedence than addition and multiplication, or to parenthesized arguments (enclosed between \code{OPEN},\code{CLOSE}); \item[\code{OPFUNCTION}] a variant of \code{FUNCTION} which doesn't require fenced arguments; \item[\code{TRIGFUNCTION}] a variant of \code{OPFUNCTION} with special rules for recognizing which following tokens are arguments and which are not; \item[\code{APPLYOP}] an explicit infix application operator (high precedence); \item[\code{COMPOSEOP}] an infix operator that composes two \code{FUNCTION}'s (resulting in another \code{FUNCTION}); \item[\code{OPERATOR}] a general operator; higher precedence than function application. For example, for an operator $A$, and function $F$, $A F x$ would be interpretted as $(A(F))(x)$; \item[\code{SUMOP},\code{INTOP}, \code{LIMITOP},\code{DIFFOP},\code{BIGOP}] a summation/union, integral, limiting, differential or general purpose operator. These are treated equivalently by the grammar, but are distinguished to facilitate (\emph{eventually}) analyzing the argument structure (eg bound variables and differentials within an integral). \textbf{Note} are \code{SUMOP} and \code{LIMITOP} significantly different in this sense? \item[\code{POSTSUBSCRIPT},\code{POSTSUPERSCRIPT}] intermediate form of sub- and superscript, roughly as \TeX\ processes them. The script is (essentially) treated as an argument but the base will be determined by parsing. \item[\code{FLOATINGSUBSCRIPT},\code{FLOATINGSUPERSCRIPT}] A special case for a sub- and superscript on an empty base, ie. \verb|{}^{x}|. It is often used to place a pre-superscript or for non-math uses (eg. \verb|10${}^{th}|); \end{description} The following roles are not used in the grammar, but are used to capture the presentation style; they are typically used directly in macros that construct structured objects, or used in representing the results of parsing an expression. \begin{description} \item[\code{STACKED}] corresponds to stacked structures, such as \verb|\atop|, and the presentation of binomial coefficients. \item[\code{SUPERSCRIPTOP},\code{SUBSCRIPTOP}] after parsing, the operator involved in various sub/superscript constructs above will be comverted to these; \item[\code{OVERACCENT},\code{UNDERACCENT}] these are special cases of the above that indicate the 2nd operand acts as an accent (typically smaller), expressions using these roles are usually directly constructed for accenting macros; \item[\code{FENCED}] this operator is used to represent containers enclosed by \code{OPEN} and \code{CLOSE}, possibly with punctuation, particularly when no semantic is known for the construct, such as an arbitrary list. \end{description} The content of a token is actually used in a few special cases to distinguish distinct syntactic constructs, but these roles are \emph{not} assigned to the \attr{role} attribute of expressions: \begin{description} \item[\code{LANGLE},\code{RANGLE}] recognizes use of $<$ and $>$ in the bra-ket notation used in quantum mechanics; \item[\code{LBRACE},\code{RBRACE}] recognizes use of \{ and \} on either side of stacked or array constructions representing various kinds of cases or choices; \item[\code{SCRIPTOPEN}] recognizes the use of \{ in opening specialized set notations. \end{description} %%%====================================================================== % \part{Advanced Topics} %%%====================================================================== \chapter{Localization}\label{localization} In this chapter, a few issues relating to various national or cultural styles, languages or text encodings, which we'll refer to collectively as `localization', are breifly discussed. \section{Numbering}\label{localization.numbering} Generally when titles and captions are formatted or when equations are numbered and when they are referred to in a cross reference or table of contents, text consisting of some combination of the raw title or caption text, a reference number and a type name (eg.~`Chapter') or symbol (eg.~\S) is composed and used. The exact compositions that is used at each level can depend on language, culture, the subject matter as well as both journal and individual style preferences. \LaTeX\ has evolved to accommodate many of these styles and \LaTeXML\ attempts to follow that lead, while preserve its options (the demands of extensively hyper-linked online material sometimes seems to demand more options and flexibility than traditional print formatting). For example, the various macros \cs{chaptername}, \cs{partname}, \cs{refname}, etc. are respected and used. Likewise, the various counters and formatters such as \cs{theequation} are supported. \LaTeX's mechanism for formatting caption tags (\cs{fnum@figure} and \cs{fnum@table}) is extended to cover more cases. If you define \cs{fnum@\textit{type}}, (where \textit{type} is \texttt{chapter}, \texttt{section}, \texttt{subsection}, etc.) it will be used to format the reference number and/or type name for instances of that \textit{type}. The macro \cs{fnum@toc@\textit{type}} is used when formatting numbers for tables of contents. Alternatively, you can define a macro \cs{format@title@\textit{type}} that will be used format the whole title including reference number and type as desired; it takes a single argument, the title text. The macro \cs{format@toctitle@\textit{type}} is used for the formatting a (typically) short form use in tables of contents. \section{Input Encodings}\label{localization.inputencodings} \LaTeXML\ supports the standard \LaTeX\ mechanism for handling non-ASCII encodings of the input \TeX\ sources: using the \code{inputenc} package. The \LaTeXML\ binding of \code{inputenc} loads the encoding definition (generally with extension \code{def}) directly from the \LaTeX\ distribution (which are generally well-enough behaved to be easily processed). These encoding definitions make the upper 128 code points (of 8 bit) active and define \TeX\ macros to handle them. Using the commandline option \shellcode{--inputencoding=utf8} to \ltxcmd{latexml} allows processing of sources encoded as utf8, without any special packages loaded. [future work will make \LaTeXML\ compatible with xetex] \section{Output Encodings}\label{localization.outputencodings} At some level, as far as \TeX\ is concerned, what you type ends up pointing into a font that causes some blob of ink to be printed. This mechanism is used to print a unique mathematical operator, say `subset of and not equals'. It is also used to print greek when you seemed to have been typing ASCII! So, we must accomodate that mechanism, as well. At the stage when character tokens are digested to create boxes in the current font, a font encoding table (a FontMap) is consulted to map the token's text (viewed as an index into the table) to Unicode. The declaration \code{DeclareFontMap} is used to associate a FontMap with an encoding name, or font. Note that this mapping is only used for text originating from the source document; The text within Constructor's \XML\ pattern is used \emph{without} any such font conversion. \section{Babel}\label{localization.babel} The \code{babel} package for supporting multiple languages by redefining various internal bits of text to replace, eg. ``Chapter'' by ``Kapital'' and by defining various shorthand mechanisms to make it easy to type the extra non-latin characters and glyphs used by those languages. Each supported language or dialect has a module which is loaded to provide the needed definitions. To the extent: that \LaTeXML's input and output encoding handling is sufficient; that its processing of raw \TeX\ is good enough; and that it proceeds through the appropriate \LaTeX\ internals, \LaTeXML\ should be able to support \code{babel} and arbitrary languages by reading in the raw \TeX\ implementation of the language module from the \TeX\ distribution itself. At least, that is the strategy that we use. %%%====================================================================== \chapter{Alignments}\label{alignments} There are several situations where \TeX\ stacks or aligns a number of objects into a one or two dimensional grids. In most cases, these are built upon low-level primitives, like \verb|\halign|, and so share characteristics: using \& to separate alignment columns; either \verb|\\| or \verb|\cr| to separate rows. Yet, there are many different markup patterns and environments used for quite different purposes from tabular text to math arrays to composing symbols and so it is worth recognizing the intended semantics in each case, while still processing them as \TeX\ would. In this chapter, we will describe some of the special complications presented by alignments and the strategies used to infer and represent the appropriate semantic structures, particularly for math. \section{\TeX\ Alignments}\label{texalignments} \textbf{NOTE} This section needs to be written. Many utilities for setting up and processing alignments are defined in \code{TeX.pool} with support from the module \ltxpod{Core::Alignment}. Typically, one binds a set of control sequences specially for the alignment environment or structure encountered, particularly for \& and \verb|\\|. An alignment object is created which records information about each row and cell that was processed, such as width, alignment, span, etc. Then the alignment is converted to XML by specifying what tag wraps the entire alignment, each row and each cell. The content of aligments is being expanded before the column and row markers are recognized; this allows more flexibility in defining markup since row and column markers can be hidden in macros, but it also means that simple means, such as delimited parameter lists, to parse the structure won't work. \section{Tabular Header Heuristics}\label{tabular} \emph{To be written} \section{Math Forks}\label{mathfork} There are several constructs for aligning mathematics in \LaTeX, and common packages. Here we are concerned with the large scale alignments where one or more equations are displayed in a grid, such as \code{eqnarray}, in standard \LaTeX, and a suite of constructs of the amsmath packages. The arrangements are worth preserving as they often convey important information to the reader by the grouping, or by drawing attention to similarities or differences in the formula. At the same time, the individual fragments within the grid cells often have little `meaning' on their own: it is subsequences of these fragments that represent the logical mathematical objects or formula. Thus, we would also like to recognize those sequences and synthesize complete formula for use in content-oriented services. We therefore have to devise an \XML\ structure to represent this duality, as well as developing strategies for inferring and rearranging the mathematics as it was authored into the desired form. The needed structure shares some characteristics with \elementref{XMDual}, \emph{which needs to be described}, but needs to resided at the document level, containing several, possibly numbered, equations each of which provide two views. Additional objects, such as textual insertions (such as amsmath's \verb|\intertext|), must also be accomodated. The following \XML\ is used to represent these structures: \begin{lstlisting}[style=xml] @\textit{logical math here}@ @\textit{cell math}@@\ldots@ @\emph{or}@ @\ldots@ @\textit{inter-text}@ @\ldots \textit{more text or equations}@ \end{lstlisting} Typically, the contents of the \elementref{MathBranch} will be a sequence of \elementref{td}, each containing an \elementref{Math}, or of \elementref{tr}, each containing sequence of such \elementref{td}. This structure can thus represent both \code{eqnarray} where a logical equation consists of one or more complete rows, as well as AMS' \code{aligned} where equations consist of pairs of columns. The \XSLT\ transformation that converts to end formats recognizes which case and lays out appropriately. In most cases, the material that will yield a \elementref{MathFork} is given as a set of partial math expressions representing rows and/or columnns; these must be concatenated (and parsed) to form the composite logical expression. Any ID's within the expressions (and references to them) must be modified to avoid duplicate ids. Moreover, a useful application associates the displayed tokens from the aligned presentation of the \elementref{MathBranch} with the presumably semantic tokens in the logcal content of the main branch of the \elementref{MathFork}. Thus, we desire that the IDs in the two branches to have a known relationship; in particular, those in the branch should have \code{.fork1} appended. \section{eqnarray}\label{eqnarray} The \code{eqnarray} environment seems intended to represent one or more equations, but each equation can be continued with additional right-hand-sides (by omitting the 1st column), or the RHS itself can be continued on multiple lines by omitting the 1st two columns on a row. With our goal of constructing well-structured mathematics, this gives us a fun little puzzle to sort out. However, being essentially the only structure for aligning mathematical stuff in standard \LaTeX, \code{eqnarray} tended to be stretched into various other use cases; aligning numbered equations with bits of text on the side, for example. We therefore have some work to do to guess what the intent is. The strategy used for \code{eqnarray} is process the material as an alignment in math mode and convert initially to the following \XML\ structure: \begin{lstlisting}[style=xml] @\textit{column math here}@ @\ldots@ @\ldots@ \end{lstlisting} The results are then studied to recognize the patterns of empty columns so that the rows can be regrouped into logical equations. \elementref{MathFork} structures are used to contain those logical equations while preserving the layout in the \elementref{MathBranch}. \textbf{NOTE} We need to deal better with the cases that have more rows numbered that we would like. \section{AMS Alignments}\label{amsalign} The AMS math packages define a number of useful math alignment structures. These have been well thought out and designed with particular logical structures in mind, as well as the layout. Thus these environments are less often abused than is \code{eqnarray}. In this section, we list the environments, their expected use case and describe the strategy used for converting them. \emph{To be done} Describe alternates for \code{equation} and things inside equations; Describe single vs multiple logical equations. (and started variants) This list outlines the \emph{intended} use of the AMS alignment environments The following constructs are intended as top-level environments, used like \code{equation}. Several of the constructs are used in place of a top-level \code{equation} and represent one or more logical equations. The following describes the intended usage, as a guide to understanding the implementation code (or its limitations!) \begin{itemize} \item \code{align},\code{flalign},\code{alignat},\code{xalignat}: Each row may be numbered; has even number of columns; Each pair of columns, aligned right then left, represents a logical equation; Note that the documentation suggests that annotative text can be added by putting \verb|\text{}| in a column followed by an empty column. \item \code{gather}: Each row is a single centered column representing an equation. \item \code{multline}: This environment represents a single equation broken to multiple lines; the lines are aligned left, center (repeated) and finally, right. \emph{alignment not yet implemented} \end{itemize} The following environments are used \emph{within} an equation (or similar) environment and thus do not generate \elementref{MathFork} structures. Moreover, except for \code{aligned}, their semantic intent is less clear. The preservation of the alignment have not yet been implemented; they; presumably would yeiled an \elementref{XMDual}. \begin{itemize} \item \code{split} \item \code{gathered} \item \code{aligned},\code{alignedat} \end{itemize} Note that the case of a single equation containing a single \code{aligned} is transformed into and treated equivalently to a top-level \code{align}. %% \section{DLMF Alignments} %% I'll just mention \code{equationmix},\code{equationgroup}. %%%====================================================================== \chapter{Metadata}\label{metadata} \section{RDFa}\label{RDFa} \LaTeXML\ has support for representing and generating RDFa metadata in \LaTeXML\ documents. The core attributes \attr{property}, \attr{rel}, \attr{rev}, \attr{about} \attr{resource}, \attr{typeof} and \attr{content} are included. Provision is also made for \attr{about} and \attr{resource} to be specified using \LaTeX-style labels, or plain \XML\ id's. The default set of vocabularies is specified in \URL[HTML Role Vocabulary]{http://www.w3.org/1999/xhtml/vocab/#XHTMLRoleVocabulary}, and the associated set of prefixes are predefined. It is intended that the support will be extended to automatically generate RDFa data from the implied semantics of \LaTeX\ markup; the idea would be not to inadvertently override any explicitly provided metadata supplied by one of the following packages. \paragraph{The hyperref package} The hyperref and hyperxmp packages provide a means to specify metadata which will be embedded in the generated pdf file; \LaTeXML\ converts that data to RDFa in its generated XML. \paragraph{The lxRDFa package} There is also a \LaTeXML-specific package, lxRDFa, which provides several commands for annotating the generated XML. The most powerful of which is \verb|\lxRDFa| which allows you to specify any set or subset of RDFa attributes on the current XML element and thus take advantage of the arbitrary shorthands, chaining and partial triples that RDFa allows. Correspondingly, you are must beware of clashes or unintended changes to the set of triples generated by explicit and hidden RDFa data. %%%====================================================================== \chapter{ToDo}\label{todo} Lots\ldots! \begin{itemize} \item Many useful \LaTeX\ packages have not been implemented, and those that are aren't necessarily complete. Contributed bindings are, of course, welcome! \item Low-level \TeX\ capabilities, such as text modes (eg. vertical, horizonatal), box details like width and depth, as well as fonts, aren't mimicked faithfully, although it isn't clear how much can be done at the `semantic' level. \item a richer math grammar, or more flexible parsing engine, better inferencing of math structure, better inferencing of math \emph{meaning}\ldots and thus better Content MathML and OpenMath support! \item Could be faster. \item Easier customization of the document schema, XSLT stylesheets. \item \ldots um, \ldots \emph{documentation}! \end{itemize} %%%====================================================================== \chapter*{Acknowledgements}\label{acknowledgements} Thanks to the DLMF project and it's Editors --- Frank Olver, Dan Lozier, Ron Boisvert, and Charles Clark --- for providing the motivation and opportunity to pursue this. Thanks to the arXMLiv project, in particular Michael Kohlhase and Heinrich Stamerjohanns, for providing a rich testbed and testing framework to exercise the system. Additionally, thanks to Ioan Sucan, Catalin David and Silviu Oprea for testing help and for implementing additional packages. Particular thanks go to Deyan Ginev as an enthusiastic supporter and developer. %%%====================================================================== \appendix \chapter[Commands]{Command Documentation}\label{commands} % \input the 1st, to avoid quasi-blank page; include the rest % Actually, \input them all, since there's too much blank space... % % Make verbatim blocks smaller \makeatletter\def\verbatim@font{\small\normalfont\ttfamily}\makeatother \input{pods/latexml} \input{pods/latexmlpost} \input{pods/latexmlmath} %%%====================================================================== \chapter[Bindings]{Implemented Bindings}\label{included.bindings} Bindings for the following classes and packages are supplied with the distribution: \begin{description} \item[classes:] \CurrentClasses \item[packages:] \CurrentPackages \end{description} %%%====================================================================== \chapter[Modules]{Top-level Module Documentation}\label{modules} \input{pods/LaTeXML} \input{pods/LaTeXML_Global} \input{pods/LaTeXML_Package} \input{pods/LaTeXML_MathParser} \chapter[Common Modules]{Common Module Documentation}\label{commonmodules} \input{pods/LaTeXML_Common_Config} \input{pods/LaTeXML_Common_Object} \input{pods/LaTeXML_Common_Color} \input{pods/LaTeXML_Common_Color_rgb} \input{pods/LaTeXML_Common_Color_hsb} \input{pods/LaTeXML_Common_Color_cmy} \input{pods/LaTeXML_Common_Color_cmyk} \input{pods/LaTeXML_Common_Color_gray} \input{pods/LaTeXML_Common_Color_Derived} \input{pods/LaTeXML_Common_Number} \input{pods/LaTeXML_Common_Float} \input{pods/LaTeXML_Common_Dimension} \input{pods/LaTeXML_Common_Glue} \input{pods/LaTeXML_Common_Font} \input{pods/LaTeXML_Common_Model} \input{pods/LaTeXML_Common_Model_DTD} \input{pods/LaTeXML_Common_Model_RelaxNG} \input{pods/LaTeXML_Common_XML} \input{pods/LaTeXML_Common_Error} \chapter[Core Modules]{Core Module Documentation}\label{coremodules} \input{pods/LaTeXML_Core_State} % Core digestion \input{pods/LaTeXML_Core_Mouth} \input{pods/LaTeXML_Core_Gullet} \input{pods/LaTeXML_Core_Stomach} \input{pods/LaTeXML_Core_Document} \input{pods/LaTeXML_Core_Rewrite} % Core Objects \input{pods/LaTeXML_Core_Token} \input{pods/LaTeXML_Core_Tokens} \input{pods/LaTeXML_Core_Box} \input{pods/LaTeXML_Core_List} \input{pods/LaTeXML_Core_Comment} \input{pods/LaTeXML_Core_Whatsit} % Various objects \input{pods/LaTeXML_Core_Alignment} \input{pods/LaTeXML_Core_KeyVals} \input{pods/LaTeXML_Core_MuDimension} \input{pods/LaTeXML_Core_MuGlue} \input{pods/LaTeXML_Core_Pair} \input{pods/LaTeXML_Core_PairList} % Definitions \input{pods/LaTeXML_Core_Definition} \input{pods/LaTeXML_Core_Definition_CharDef} \input{pods/LaTeXML_Core_Definition_Conditional} \input{pods/LaTeXML_Core_Definition_Constructor} \input{pods/LaTeXML_Core_Definition_Expandable} \input{pods/LaTeXML_Core_Definition_Primitive} \input{pods/LaTeXML_Core_Definition_Register} \input{pods/LaTeXML_Core_Parameter} \input{pods/LaTeXML_Core_Parameters} %%%====================================================================== \chapter[Utility Modules]{Utility Module Documentation}\label{utilitymodules} \input{pods/LaTeXML_Util_Pathname} \input{pods/LaTeXML_Util_WWW} \input{pods/LaTeXML_Util_ObjectDB} \input{pods/LaTeXML_Util_Pack} %%%====================================================================== \chapter[Preprocessing Modules]{Preprocessing Module Documentation}\label{premodules} \input{pods/LaTeXML_Pre_BibTeX} %%%====================================================================== \chapter[Postprocessing Modules]{Postprocessing Module Documentation}\label{postmodules} \input{pods/LaTeXML_Post} \input{pods/LaTeXML_Post_MathML} \input{pods/LaTeXML_Post_OpenMath} % Restore verbatim font after PODs \makeatletter\def\verbatim@font{\normalfont\ttfamily}\makeatother %%%====================================================================== \chapter[Schema]{\LaTeXML\ Schema}\label{schema} The document type used by \LaTeXML\ is modular in the sense that it is composed of several modules that define different sets of elements related to, eg., inline content, block content, math and high-level document structure. This allows the possibility of mixing models or extension by predefining certain parameter entities. \input{schema} %%%====================================================================== \chapter{Error Codes}\label{errorcodes} Warning and Error messages are printed to STDERR during the execution of \ltxcmd{latexml} and \ltxcmd{latexmlpost}. As with \TeX, it is not always possible to indicate where the real underying mistake originated; sometimes it is only realized later on that some problem has occurred, such as a missing brace. Moreover, whereas error messages from \TeX\ may be safely assumed to indicate errors with the source document, with \LaTeXML\ they may also indicate \LaTeXML's inability to figure out what you wanted, or simply bugs in \LaTeXML\ or the librarys it uses. \begin{description} \item[Warnings] are generally informative that the generated result may not be as good as it can be, but is most likely properly formed. A typical warning is that the math parser failed to recognize an expression. \item[Errors] generally indicate a more serious problem that is likely to lead to a malformed result. A typical error would be an undefined control sequence. Generally, processing continues so that you can (hopefully) solve all errors at once. \item[Fatals] are errors so serious as to make it unlikely that processing can continue; the system is likely to be out-of-sync, for example not knowing from which point in the input to continue reading. A fatal error is also generated when too many (typically 100 regular errors have been encountered. \end{description} Warning and Error messages are slightly structured to allow unattended processing of documents to classify the degree of success in processing. A typical message satisfies the following regular expression: \begin{lstlisting}[escapechar=@,basicstyle=\ttfamily\small] @severity@:@category@:@object@ @summary@ @source locator@ @description@ @\ldots@ @stack trace@ \end{lstlisting} the second and following lines are indented using a tab. \begin{description} \item[\textit{severity}] One of \texttt{Info}, \texttt{Warn}, \texttt{Error} or \texttt{Fatal}, indicating the severity of the problem; \item[\textit{category}] classifies the error or warning into an open-ended set of categories indicating whether something was \texttt{expected}, or \texttt{undefined}; \item[\textit{object}] indicates the offending object; what filename was missing, or which token was undefined; \item[\textit{summary}] gives a brief readable summary of the condition; \item[\textit{source locator}] indicates where in the source document the error occurred; \item[\textit{description}] gives one or more lines of more detailed information; \item[\textit{stack trace}] optionally gives a brief or long trace of the current execution stack. \end{description} The type is followed by one or more keywords separated by colons, then a space, and a human readable error message. Generally, this line is followed by one or more lines describing where in the source document the error occured (or was detected). For example: {\small \begin{verbatim} Error:undefined:\foo The control sequence \foo is undefined. \end{verbatim} } Some of the more common keywords following the message type are listed below, where we assume that \textit{arg} is the second keyword (if any). The following errors are generally due to malformed \TeX\ input, incomplete \LaTeXML\ bindings, or bindings that do not properly account for the way \TeX, or the macros, are actually used. \begin{description} \item[\texttt{undefined}]: The operation indicated by \textit{arg}, typically a control sequence or other operation, is undefined. \item[\texttt{ignore}]: Indicates that \textit{arg} is being ignored; typically it is a duplicated definition, or a definition of something that cannot be redefined. \item[\texttt{expected}]: A particular token, or other type of data object, indicated by \textit{arg}, was expected in the input but was missing. \item[\texttt{unexpected}]: \textit{arg} was not expected to appear in the input. \item[\texttt{not\_parsed}]: A mathematical formula could not be successfully parsed. \item[\texttt{missing\_file}]: the file \textit{arg} could not be found. \item[\texttt{latex}]: An error or message generated from \LaTeX\ code. and the corresponding \LaTeXML\ code should be updated. \item[\texttt{too\_many\_errors}]: Too many non-fatal errors were encountered, causing a Fatal error and program termination. \end{description} The following errors are more likely to be due to programming errors in the \LaTeXML\ core, or in binding files, or in the document model. \begin{description} \item[\texttt{misdefined}]: The operation indicated by \textit{arg}, typically a control sequence or other operation, has not been defined properly. \item[\texttt{deprecated}]: Indicates that \textit{arg} is a deprecated usage. \item[\texttt{malformed}]: The document is malformed, or will be made so by insert \textit{arg} into it. \item[\texttt{I/O}]: some problem with input/output of the file \textit{arg}, such as it not being readable. The exact error is reported in the additional details. \item[\texttt{perl}]: A perl-level error or warning, not specifically recognized by LaTeXML, was encountered. \textit{arg} will typically \texttt{die}, \texttt{interrupt} or \texttt{warn}. \item[\texttt{internal}]: Something unexpected happened; most likey an internal coding error within \LaTeXML. \end{description} %%%====================================================================== \chapter{CSS Classes}\label{cssclasses} When the target format is in the HTML family (XHTML, HTML or HTML5), \LaTeXML\ adds various classes to the generated html elements. This provides a trail back to the originating markup, and leverage to apply CSS styling to the results. Recall that the class attribute is a space-seperated list of class names. This appendix describes the class names used. The basic strategy is the following: \begin{description} \item[\texttt{ltx\_}\textit{element}] with \textit{element} being the \LaTeXML\ element name that generated the html element. These elements reflect the original \TeX/\LaTeX\ markup, but are not identical. See Appendix \ref{schema} for details. \item[\texttt{ltx\_font\_}\textit{font}] where \textit{font} can indicate any of the font characteristics: \begin{description} \item[\textit{family}]: \texttt{serif}, \texttt{sansserif}, \texttt{typewriter}, \texttt{caligraphic}, \texttt{fraktur}, \texttt{script}; \item[\textit{series}]: \texttt{bold}, \texttt{medium}; \item[\textit{shape}]: \texttt{upright}, \texttt{italic}, \texttt{slanted}, \texttt{smallcaps}; % \item[\textit{size}]: \texttt{TINY}, \texttt{Tiny}, \texttt{tiny}, \texttt{script}, % \texttt{footnote}, \texttt{small}, \texttt{normal}, \texttt{large}, % \texttt{Large}, \texttt{LARGE}, \texttt{huge}, \texttt{Huge}, \texttt{HUGE}. \end{description} These sets are open-ended. \item[\texttt{ltx\_align\_}\textit{alignment}] where \textit{alignment} indicates the alignment of the contents within the element. \begin{description} \item[\textit{horizontally}]: \texttt{left}, \texttt{right}, \texttt{center}, \texttt{justify}; \item[\textit{vertically}]: \texttt{top}, \texttt{bottom}, \texttt{baseline}, \texttt{middle}. \end{description} \item[\texttt{ltx\_border\_}\textit{edges}] indicates single or double borders on an element with \textit{edges} being: \texttt{t}, \texttt{r}, \texttt{b}, \texttt{l}, \texttt{tt}, \texttt{rr}, \texttt{bb}, \texttt{ll}; these are typically used for table cells. \item[\texttt{ltx\_role\_}\textit{role}] reflects the distinct uses a particular \LaTeXML\ elements serve which is indicated by the \attr{role} attribute. Examples include \elementref{creator}, for `document creators', where the \attr{role} may be \texttt{author}, \texttt{editor}, \texttt{translator} or others. Thus, depending on your purposes and the expected markup, you might choose to write CSS rules for \texttt{ltx\_creator} or \texttt{ltx\_role\_author}. Similarly, \elementref{quote} is stretched to accomodate \texttt{translation} or \texttt{verse}. \item[\texttt{ltx\_title\_}\textit{section}] marks the titles of various sectional units. For example, a chapter's title will have two classes: \texttt{ltx\_title} and \texttt{ltx\_title\_chapter}. \item[\texttt{ltx\_theorem\_}\textit{type}] marks various types of `theorem-like' objects, where the \textit{type} is whatever was used in \verb|\newtheorem|. \item[\texttt{ltx\_float\_}\textit{type}] marks various types of floating objects, such as might be defined using the \texttt{float} package using \verb|\newfloat|. \item[\texttt{ltx\_lst\_}\textit{role}] reflects the various roles of items within listings, such as those created using the \texttt{listings} package (whose containing element would have class \texttt{ltx\_lstlisting}). Such classes include: \texttt{ltx\_lst\_language\_}\textit{lang}, \texttt{ltx\_lst\_}\textit{keywordclass}, \texttt{ltx\_lxt\_line}, \texttt{ltx\_lst\_linenum}. \item[\texttt{ltx\_bib\_}\textit{item}] indicates various items in bibliographys, typically generated via \BibTeX; the items include \texttt{key}, \texttt{number}, \texttt{type}, \texttt{author}, \texttt{editor}, \texttt{year}, \texttt{title}, \texttt{author-year}, \texttt{edition}, \texttt{series}, \texttt{part}, \texttt{journal}, \texttt{volume}, \texttt{number}, \texttt{status}, \texttt{pages}, \texttt{language}, \texttt{publisher}, \texttt{place}, \texttt{status}, \texttt{crossref}, \texttt{external}, \texttt{cited} and \emph{others}. \item[\texttt{ltx\_toc\_}\textit{item}] reflects the levels of Table of Contents lists: they carry the \texttt{ltx\_toclist} class, from the element used to represent them, and also \texttt{ltx\_toc\_}\textit{section} naming the sectional unit for which this list applies to assist in styling. A nested TOC for a chapter might thus have \texttt{ul}'s carrying \texttt{ltx\_toc\_chapter} and \texttt{ltx\_toc\_section}. Additionally, \texttt{ltx\_toc\_compact} and \texttt{ltx\_toc\_verycompact} can be added to style compact and very compact styles (eg single line). Note that the generated \texttt{li} items will have class \texttt{ltx\_tocentry}. \item[\texttt{ltx\_ref\_}\textit{item}] hypertext links, whether within or across documents, whether created from \verb|\ref| or \verb|\href|, will get \texttt{ltx\_ref} and, sometimes, extra classes applied. For example, a reference that ends up pointing to the current page is marked with \texttt{ltx\_ref\_self}. Cross-referencing material used to fill-in the contents of the reference is marked: a reference number gets \texttt{ltx\_ref\_tag}; a title \texttt{ltx\_ref\_title}. \item[\texttt{ltx\_note\_}\textit{part}] reflects the separate parts of notes; Note that the kind of note is generally reflected in the \attr{role} attribute, such as \texttt{footnote}, \texttt{endnote}, etc. The parts are separated to facilitate formatting, hover effects, etc: \texttt{outer} contains the whole; \texttt{mark} for the mark, if any; \texttt{content} the actual contents of the note. \texttt{type} is for an extra span indicating the type of note if it is unusual. \item[\texttt{ltx\_page\_}\textit{item}] reflects page layout components created during the XSLT; \textit{item}s include: \texttt{main}, \texttt{content}, \texttt{header}, \texttt{footer}, \texttt{navbar} \texttt{logo}, \texttt{columns}, \texttt{column1}, \texttt{column2}. \item[\texttt{ltx\_eqn\_}\textit{item}] reflects different parts related to equation formatting: \texttt{pad} reflects padding to align equations on the page; \texttt{eqnarray} and \texttt{lefteqn} arise from \LaTeX's \texttt{eqnarray} environment; \texttt{gather} and \texttt{align} arise from AMS environments; \texttt{intertext} arises from text injected between aligned equations. \end{description} Any other explicit use of the \verb|addClass(class)| function or of the \verb|\lxAddClass{class}| macro from the \texttt{latexml} package will add the given class as is, without any additional \texttt{ltx\_} prefix. Two oddball items that may get refactored away are: \texttt{ltx\_phantom} and \texttt{ltx\_centering}. The latter seems slightly distinct from \texttt{ltx\_align\_center}. %%%====================================================================== \backmatter \printindex \end{document} latexml-0.8.1/doc/site/0000755000175000017500000000000012507513572014616 5ustar norbertnorbertlatexml-0.8.1/doc/site/examples/0000755000175000017500000000000012507513572016434 5ustar norbertnorbertlatexml-0.8.1/doc/site/examples/pushing/0000755000175000017500000000000012507513572020111 5ustar norbertnorbertlatexml-0.8.1/doc/site/examples/pushing/pushing.tex0000644000175000017500000000172312507513572022313 0ustar norbertnorbert\documentclass{article} \title{Pushing it} \begin{document} \section{Hbox and vbox} \par\noindent An hbox to 10 em: \hbox to 10em{word} done. \par\noindent An hbox spread 10 em: \hbox spread 10em{word} done. \par\noindent A vbox \vbox{a\\b\\c} done. \par\noindent A vtop \vbox{a\\b\\c} done. \section{Moving boxes} %\par\noindent %Moveleft 2em \vbox{\moveleft 2em \hbox{word}} done. %\par\noindent %Moveright 2em \vbox{\moveright 2em \hbox{word}} done. \par\noindent Lower 2ex \lower 2ex \hbox{word} done. \par\noindent Raise 2ex \raise 2ex \hbox{word} done. \par\noindent Raisebox 2ex \raisebox{2ex}{word} done. \par\noindent Textsuperscript \textsuperscript{word} done. \section{Framing} \par\noindent An underlined box \underline{word} done. \par\noindent An fbox \fbox{word} done. \par\noindent A framed box to 10em \framebox[10em]{word} done. \par\noindent A rule \rule[1ex]{1em}{1ex} done. \par\noindent Circled stuff \textcircled{word} done. \end{document} latexml-0.8.1/doc/site/examples/tabular/0000755000175000017500000000000012507513572020066 5ustar norbertnorbertlatexml-0.8.1/doc/site/examples/tabular/tabular.tex0000644000175000017500000000105012507513572022236 0ustar norbertnorbert\documentclass{article} % Tabular example from LaTeX manual, p.205 \begin{document} \begin{tabular}{|r||r@{--}l|p{1.25in}|} \hline \multicolumn{4}{|c|}{GG\&A Hoofed Stock} \\ \hline\hline &\multicolumn{2}{c|}{Price}& \\ \cline{2-3} \multicolumn{1}{|c||}{Year} & \multicolumn{1}{r@{\,\vline\,}}{low} & high & \multicolumn{1}{c|}{Comments} \\ \hline 1971 & 97 & 245 & Bad year.\\ \hline 72 & 245 & 245 & Light trading due to a heavy winter. \\ \hline 73 & 245 & 2001 & No gnus was very good gnus this year. \\ \hline \end{tabular} \end{document} latexml-0.8.1/doc/site/index.tex0000644000175000017500000006455712507513572016470 0ustar norbertnorbert\documentclass{article} \usepackage{latexml} \usepackage{hyperref} \usepackage{../sty/latexmldoc} \usepackage{listings} % Should the additional keywords be indexed? \lstdefinestyle{shell}{language=bash,escapechar=@,basicstyle=\ttfamily\small,% morekeywords={latexml,latexmlpost,latexmlmath}, moredelim=[is][\itshape]{\%}{\%}} \input{releases.tex} \newcommand{\PDFIcon}{\includegraphics{pdf}} \title{\LaTeXML\ \emph{A \LaTeX\ to XML/HTML/MathML Converter}} \lxKeywords{LaTeXML, LaTeX to XML, LaTeX to HTML, LaTeX to MathML, LaTeX to ePub, converter} %============================================================ \begin{lxNavbar} \lxRef{top}{\includegraphics{../graphics/latexml}}\\ \includegraphics{../graphics/mascot}\\ \lxContextTOC\\ % Direct link to manual from navbar \vspace{1cm} \URL[\hspace{4em}\LARGE The\ Manual]{./manual/} \end{lxNavbar} %============================================================ \begin{document} \label{top} \maketitle %============================================================ \emph{Now available}: \htmlref{\LaTeXML\ \CurrentVersion}{get}! In the process of developing the \href{http://dlmf.nist.gov/}{Digital Library of Mathematical Functions}, we needed a means of transforming the \LaTeX\ sources of our material into XML which would be used for further manipulations, rearrangements and construction of the web site. In particular, a true `Digital Library' should focus on the \emph{semantics} of the material, and so we should convert the mathematical material into both content and presentation MathML. At the time, we found no software suitable to our needs, so we began development of \LaTeXML\ in-house. In brief, \texttt{latexml} is a program, written in Perl, that attempts to faithfully mimic \TeX's behavior, but produces XML instead of dvi. The document model of the target XML makes explicit the model implied by \LaTeX. The processing and model are both extensible; you can define the mapping between \TeX\ constructs and the XML fragments to be created. A postprocessor, \texttt{latexmlpost} converts this XML into other formats such as HTML or XHTML, with options to convert the math into MathML (currently only presentation) or images. \emph{Caveats}: It isn't finished, there are gaps in the coverage, particularly in missing implementations of the many useful \LaTeX\ packages. But is beginning to stabilize and interested parties are invited to try it out, give feedback and even to help out. %============================================================ \section{Examples}\label{examples}\index{examples} At the moment, the best example of \LaTeXML's output is the \href{http://dlmf.nist.gov/}{DLMF} itself. There is, of course, a fair amount of insider, special case, code, but it shows what can be done. Some highlights: \begin{description} \item[\href{examples/tabular/tabular.html}{LaTeX tabular}] from the \LaTeX\ manual, p.205. (\href{examples/tabular/tabular.tex}{\TeX}, \href{examples/tabular/tabular.pdf}{\PDFIcon}) \item[\url{http://latexml.mathweb.org/editor}] an online editor/showcase of things that \LaTeXML\ can do. \item[\url{http://arxmliv.kwarc.info}] An experiment processing the entire \url{http://arXiv.org}. \end{description} And, of course \begin{description} \item[\href{http://dlmf.nist.gov/}{DLMF}] The Digital Library of Mathematical Functions was the primary instigator for this project. \item[\href{manual/}{\LaTeXML\ Manual}] The \LaTeXML\ User's manual (\href{manual.pdf}{\PDFIcon}). \item[And these pages] were produced using \LaTeXML, as well. \end{description} %============================================================ \section{Get \LaTeXML}\label{get}\index{get} \paragraph{Current Release:}\label{download.current} The current release is \CurrentVersion. (see the \href{Changes}{Change Log}). There are several ways to install \LaTeXML, depending on your OS platform, whether you want a bleeding-edge version from GitHub. Since \LaTeXML\ depends on, or can use if available, several other Perl Modules and external programs, consult \ref{prerequisites} before deciding. \begin{itemize} \item \emph{Normally}, it is preferable to install a platform-specific, prebuilt release, if available, which will install all prerequisites, usually including the optional ones. \item You may install from \htmlref{CPAN}{install.cpan} which will install most prerequisites, although you may prefer to pre-install prerequisites using the platform-specific approach and should pre-install any optional prerequisites that you want. \item You may download and install the \htmlref{tarball}{install.tarball}, but you will need to pre-install prerequisites (including any optional ones) using \htmlref{CPAN}{install.cpan.prereq} or a platform-specific approach. \item If you want to use the development version with the latest patches and improvements, you may fetch the source from \htmlref{GitHub}{install.github}. Again you will need to pre-install prerequisites (including optional ones) using \htmlref{CPAN}{install.cpan.prereq} or a platform-specific approach. \end{itemize} \par\noindent \begin{tabular}{l|l|l} Platform & To install prebuilt & To install prerequisites \\\hline RPM-based Linux & see \htmlref{RPM prebuit}{install.rpm} & see \htmlref{RPM prerequisites}{install.rpm.prereq} \\ Debian-based Linux & see \htmlref{Debian prebuilt}{install.deb} & see \htmlref{Debian prerequisites}{install.deb.prereq}\\ Macintosh OS w/\href{http://www.macports.org}{MacPorts} & see \htmlref{MacPorts prebuilt}{install.mac} & see \htmlref{MacPorts prerequisites}{install.mac.prereq}\\ Windows w/\href{http://strawberryperl.com}{Strawberry Perl} & see \htmlref{Windows CPAN}{install.windows} & see \htmlref{Windows prerequisites}{install.windows.prereq} \\ Other & see \htmlref{CPAN}{install.cpan} or \htmlref{GitHub}{install.github} & see \htmlref{CPAN prerequisites}{install.cpan.prereq} \\ \end{tabular} Note that there is \emph{no} implied endorsement of any of these systems. \paragraph{Prerequisites}\label{prerequisites} \LaTeXML\ requires several Perl modules to do it's job. Most are automatically installed by the platform-specific installation or CPAN. However, CPAN will \emph{not} install the required C libraries needed for \texttt{XML::LibXML}, and \texttt{XML::LibXSLT}. If \texttt{libxml2} and \texttt{libxslt} are are not already installed, follow the instructions at \href{http://www.xmlsoft.org}{XMLSoft} to download and install the most recent versions of \texttt{libxml2} and \texttt{libxslt}. Note that Strawberry Perl, on Windows, already includes these libraries. \paragraph{Optional Prerequisites} The following packages are optional because they are sometimes difficult to find or install, or in order to allow for minimal installs in unusual circumstances. Most users should consider them as required and install them if at all possible. \begin{description} \item[\TeX] Virtually all users of \LaTeXML\ will want to install \TeX. \LaTeXML\ \emph{should} find whatever \TeX-installation you have, and will use \TeX's style files directly in some cases, providing broader coverage, particularly for the more complex styles like \texttt{babel} and \texttt{tikz}. Moreover, if \TeX\ is present when \LaTeXML\ is being installed, \LaTeXML\ will install a couple of its own style files that can be used with regular \TeX, or \LaTeX\ runs; So if you are going to install \TeX, install it first! \item[Image::Magick] provides a handy library of image manipulation routines. When they are present \LaTeXML\ is able to carry out more image processing, such as transformations by the \texttt{graphicx} package, and conversion of math to images; otherwise, some such operations will not be supported. Please do \emph{not} try to install \texttt{Image::Magick} from CPAN, however: the module there seldom matches the underlying \texttt{ImageMagick} library. It is recommended to install the Perl binding for \texttt{Image::Magick} from the same source as the library was obtained, either from your system's repository, or from the \href{http://www.imagemagick.org/}{ImageMagick} site, itself. In the latter case, follow the instructions at \href{http://www.imagemagick.org/}{ImageMagick} to download and install the latest version of ImageMagick being sure to enable and build the Perl binding along with it. \item[Graphics::Magick] is an \emph{alternative} to \texttt{Image::Magick} that \LaTeXML\ will use if is found on the system; it may (or may not ) be easier to install, although it is less widely available. \item[UUID::Tiny] generates unique identifiers that can be used to make better ePub documents (it can be installed using \htmlref{CPAN}{install.cpan.prereq}). \end{description} \emph{Note to packagers:} If you are preparing a compiled installation package (such as rpm or deb) for \LaTeXML, and the above packages are easily installable in your distribution, you probably should include them as dependencies of \LaTeXML. %\subsection[OS-Specific Notes]{Operating System Specific Notes}\label{install.osnotes} %With \emph{no} implied endorsement of any of these systems. \subsection[RPM-based systems]{RPM-based systems}\label{install.rpm} \index{rpm}\index{Fedora}\index{Redhat}\index{Centos} For Fedora and RedHat Enterprise based distributions (Redhat Enterprise Linux 6, Centos 6,\ldots) and similar, most software is obtained and installed via the yum repository. \paragraph{Installing prebuilt}\\ The following commands will install \LaTeXML, including its required and optional prerequisites: \begin{itemize} \item Download \CurrentFedora\ or \CurrentRedHat\ (but enable the elp repository), as appropriate. \item Install \LaTeXML, and its prerequisites, using the command: \begin{lstlisting}[style=shell] yum --nogpgcheck localinstall LaTeXML-@\CurrentVersion@-*.noarch.rpm \end{lstlisting} \end{itemize} \paragraph{Installing prerequisites}\label{install.rpm.prereq} The prerequisites (including the optional ones) can be installed by running this command as root: \begin{lstlisting}[style=shell] yum install \ perl-Archive-Zip perl-DB_File perl-File-Which \ perl-Getopt-Long perl-Image-Size perl-IO-String perl-JSON-XS \ perl-libwww-perl perl-Parse-RecDescent perl-Pod-Parser \ perl-Time-HiRes perl-URI perl-XML-LibXML perl-XML-LibXSLT \ perl-UUID-Tiny texlive ImageMagick ImageMagick-perl \end{lstlisting} Continue by installing \LaTeXML\ from \htmlref{tarball}{install.tarball}, \htmlref{CPAN}{install.cpan} or \htmlref{GitHub}{install.github}, as desired. \subsection{Debian-based systems}\label{install.deb}\index{Debian} For Debian-based systems (including Ubuntu), the deb repositories are generally used for software installation. Thanks to Atsuhito Kohda, \LaTeXML\ is available from Debian's unstable repositories. \paragraph{Installing prebuilt} The following command will install \LaTeXML, including its required and optional prerequisites: \begin{lstlisting}[style=shell] sudo apt-get install latexml \end{lstlisting} \paragraph{Installing prerequisites}\label{install.deb.prereq} The prerequisites (including optional ones) can be installed by running this command: \begin{lstlisting}[style=shell] sudo apt-get install \ libarchive-zip-perl libfile-which-perl libimage-size-perl \ libio-string-perl libjson-xs-perl libparse-recdescent-perl \ liburi-perl libuuid-tiny-perl libwww-perl \ libxml2 libxml-libxml-perl libxslt1.1 libxml-libxslt-perl \ texlive-latex-base imagemagick perlmagick \end{lstlisting} [\emph{NOTE} Recheck this dependency list when the port is done!] Continue by installing \LaTeXML\ from \htmlref{tarball}{install.tarball}, \htmlref{CPAN}{install.cpan} or \htmlref{GitHub}{install.github}, as desired. \subsection{MacOS}\label{install.mac}\index{Apple Macintosh} For Apple Macintosh systems, the \href{http://www.macports.org}{MacPorts} repository is the most convenient way to install \LaTeXML; thanks to Sean and Andrew Fernandes. Download and install MacPorts from that site. \paragraph{Installing prebuilt} Since some users prefer \href{http://tug.org/mactex/}{MacTeX} or another \TeX\ system over MacPort's \texttt{texlive}, and since \texttt{texlive} is quite large, the \emph{default} \LaTeXML\ installation does \emph{not} include a dependency on a \TeX-installation nor does it install \LaTeXML's own style files! However, we provide two `variants' that do include or assume such a dependency: \subparagraph{MacPort's TeXLive:} To install \LaTeXML, including MacPort's \texttt{texlive} along with \LaTeXML's other prerequisites and style files, use the command: \begin{lstlisting}[style=shell] sudo port install LaTeXML +texlive \end{lstlisting} \subparagraph{MacTeX:} To install \LaTeXML, along with its prerequisites and style files, using your (preinstalled) MacTeX, use the command: \begin{lstlisting}[style=shell] sudo port install LaTeXML +mactex \end{lstlisting} \subparagraph{Other:} To install \LaTeXML\ without assuming any installed \TeX, use the command: \begin{lstlisting}[style=shell] sudo port install LaTeXML \end{lstlisting} Note that the latter will \emph{not} install \LaTeXML's style files into any texmf directory, although \LaTeXML\ will still use any \TeX\ system it finds at runtime. \paragraph{Installing prerequisites}\label{install.mac.prereq} The prerequisites (including optional ones except \TeX) can be installed by running this command as root: \begin{lstlisting}[style=shell] sudo port install \ p5-archive-zip p5-file-which p5-getopt-long p5-image-size \ p5-io-string p5-json-xs p5-libwww-perl p5-parse-recdescent \ p5-time-hires p5-uri p5-xml-libxml p5-xml-libxslt \ p5-perlmagick \end{lstlisting} Additionally, either add \texttt{texlive} to the above list, or install \href{http://tug.org/mactex/}{MacTeX}. Continue by installing \LaTeXML\ from \htmlref{tarball}{install.tarball}, \htmlref{CPAN}{install.cpan} or \htmlref{GitHub}{install.github}, as desired. % \emph{Note:} There have been issues reported regarding \verb|DB_File| % not being installed; Apparently you must install the % the db `variant' of perl, rather than the gdbm variant; % that is, you must run \verb|sudo port install perl +db| % (possibly after uninstalling perl first?). % \emph{Note:} There have been issues reported with recent % installations of Perl with MacPorts: it will install \LaTeXML's % executables in a directory specific to the current version of Perl % (eg. \texttt{/opt/local/libexec/perl5.12/sitebin}, % instead of \texttt{/opt/local/bin} which would be in your \texttt{PATH} environment variable). % Apparently this is a feature, not a bug; it only happens when installing from source or git; % not when installing the MacPorts port. There are three workarounds, each with disadvantages: % \begin{itemize} % \item Watch for where the scripts get installed and that directory to your \texttt{PATH} % environment variable; % \item Set up symbolic links from a directory in your path, such as \texttt{/opt/local/bin}, % to the actual installed locations; % \item Use the makefile options to choose an installation directory (see \ref{install.options}): % \begin{lstlisting}[style=shell] % perl Makefile.PL INSTALLSITEBIN=/opt/local/bin INSTALLSITESCRIPT=/opt/local/bin % \end{lstlisting} % \end{itemize} \subsection{Windows}\index{windows} These installation instructions assume you will use \href{http://strawberryperl.com}{Strawberry Perl}, which comes with \emph{many} of our prerequisites pre-installed, and provides other needed commands (\texttt{perl}, \texttt{cpan}, \texttt{dmake}). \paragraph{Installing prebuilt}\label{install.windows} There is currently no prebuilt \LaTeXML\ for Windows, but it should install cleanly under \href{http://strawberryperl.com}{Strawberry Perl}'s CPAN. Install the \TeX-system of your choice (if desired), ImageMagick (see \ref{install.windows.imagemagick}) and then install LaTeXML using: \begin{lstlisting}[style=shell] cpan LaTeXML \end{lstlisting} Installing the optional package \texttt{Image::Magick} on Windows seems to be problematic, so we have omitted it from these instructions. You may want to try \href{ImageMagick}, but you're on your own, there! You may have better luck with \texttt{Graphics::Magick}. \paragraph{Installing prerequisites}\label{install.windows.prereq}\\ Install \href{http://strawberryperl.com}{Strawberry Perl}, and the \TeX-system of your choice (if desired), ImageMagick (if you want to try; see \ref{install.windows.imagemagick}), and then install the additional prerequisites as \begin{lstlisting}[style=shell] cpan Image::Size Parse::RecDescent UUID::Tiny \end{lstlisting} Continue by installing \LaTeXML\ from \htmlref{tarball}{install.tarball}, \htmlref{CPAN}{install.cpan} or \htmlref{GitHub}{install.github}, as desired, but note that the \texttt{dmake} command should be used in place of \texttt{make}. \paragraph{Installing ImageMagick under Windows}\label{install.windows.imagemagick} In principle, you should be able to install the binary from ImageMagick and then install the Perl binding using CPAN; unfortunately, it seems that the CPAN version seldom matches the binary or fails for other reasons. What should work the following: Download and install the main ImageMagick binary appropriate for your Windows system from \href{http://imagemagick.org/script/binary-releases.php#windows}{ImageMagick}. Then fetch the \texttt{PerlMagick} tarball \emph{with the same version} from \href{http://imagemagick.com/download/perl/}{ImageMagick/perl}. Use the following commands to compile and install the PerlMagick, with X.XX being the version you downloaded: \begin{lstlisting}[style=shell] tar -zxvf PerlMagick-X.XX.tar.gz cd PerlMagick-X.XX perl Makefile.PL dmake dmake test dmake install \end{lstlisting} \subsection{CPAN installation}\label{install.cpan} The following command will install \LaTeXML\ and its Perl prerequisites, but you may need to pre-install \texttt{libxml2} and \texttt{libxslt} (See \ref{prerequisites}). Pre-install \TeX\ and any optional Perl modules, if desired. \begin{lstlisting}[style=shell] cpan LaTeXML \end{lstlisting} \paragraph{Installing prerequisites}\label{install.cpan.prereq} The following command will install the Perl prerequisites (including optional) for \LaTeXML, but you may need to pre-install \texttt{libxml2} and \texttt{libxslt} (See \ref{prerequisites}). \begin{lstlisting}[style=shell] cpan Archive::Zip DB_File File::Which Image::Size \ IO::String JSON::XS LWP Parse::RecDescent \ URI XML::LibXML XML::LibXSLT UUID::Tiny \end{lstlisting} You may still want to install \TeX\ and \texttt{Image::Magick} using other means. \subsection{Installing tarball}\label{install.tarball} \paragraph{Source tarball}\label{source.tarball} To download the tar file containing the source: \begin{itemize} \item Download \CurrentTarball \item Unpack using \begin{lstlisting}[style=shell] tar zxvf LaTeXML-@\CurrentVersion@.tar.gz \end{lstlisting} \end{itemize} \paragraph{Building}\label{build.source} Whether you've downloaded using the tar file or from git, you build the system using the standard Perl procedure (On Windows systems, use \texttt{dmake} instead of \texttt{make}): \begin{lstlisting}[style=shell] cd LaTeXML [@or@ LaTeXML-@\CurrentVersion@ @or@ LaTeXML-master] perl Makefile.PL make make test \end{lstlisting} The last step runs the system through some basic tests, which should complete without error (although some tests may be `skipped' under certain circumstances). \emph{Note:} You can specify nonstandard place to install files --- possibly avoiding the need to install as root! --- by modifying the Makefile creating command above to \begin{lstlisting}[style=shell] perl Makefile.PL PREFIX=%perldir% TEXMF=%texdir% \end{lstlisting} where \emph{perldir} is where you want the perl related files to go and \emph{texdir} is where you want the \TeX\ style files to go. (See \texttt{perl perlmodinstall} for more details and options.) \paragraph{Installing}\label{install.source} After building, you will finally want to install \LaTeXML\ where the OS can find the files. You'll typically need to be root: \begin{lstlisting}[style=shell] su make install \end{lstlisting} or perhaps \begin{lstlisting}[style=shell] sudo make install \end{lstlisting} (On Windows, use \texttt{dmake}). \subsection{Installing from GitHub}\label{install.github} The development version of \LaTeXML\ is available on \href{https://github.com}{GitHub}. This will include the latest patches and enhancements between official releases, but may of course introduce new bugs or incompatibilities with little notice (you may wish to subscribe to the \LaTeXML\ \htmlref{mailing list}{contact.list}). You can also browse the current code at GitHub, as well as file bug reports and enhancement requests. Fetch the development version using the following command: \begin{lstlisting}[style=shell] git clone https://github.com/brucemiller/LaTeXML.git \end{lstlisting} Continue with building and installing as described in \ref{build.source}. Keep up-to-date by occasionally running the command: \begin{lstlisting}[style=shell] git pull \end{lstlisting} in the source directory, and then repeating the building and install commands. You can avoid directly using \texttt{git}, but still get the development sources by downloading \href{https://github.com/brucemiller/LaTeXML/archive/master.zip}{LaTeXML-master.zip} and running \begin{lstlisting}[style=shell] unzip LaTeXML-master.zip \end{lstlisting} \subsection{Archived Releases:}\label{archive} \AllReleases. %============================================================ \section{Documentation}\label{docs} If you're lucky, all that should be needed to convert a \TeX\ file, \textit{mydoc}\texttt{.tex} to XML, and then to HTML would be: \begin{lstlisting}[style=shell] latexml --dest=%mydoc%.xml %mydoc% latexmlpost -dest=%somewhere/mydoc%.html %mydoc%.xml \end{lstlisting} This will carry out a default transformation into HTML5, which represents mathematics using MathML. Using an extension of xhtml would generate XHTML including MathML. Adding the option \verb|--format=html4| will generate HTML (version 4), using images to represent the math. If you're not so lucky, or want to get fancy, well \ldots dig deeper: \begin{description} \item[\href{manual/}{LaTeXML Manual}] (\href{manual.pdf}{\PDFIcon}). \item[\href{manual/commands/latexml.html}{\texttt{latexml}}] describes the \texttt{latexml} command. \item[\href{manual/commands/latexmlpost.html}{\texttt{latexmlpost} command}] describes the \texttt{latexmlpost} command for postprocessing. \end{description} % Possibly, eventually, want to expose: % http://www.mathweb.org/wiki/???? % But, it doesn't have anything in it yet. %============================================================ \section{Contacts \& Support}\label{contact} \paragraph{Mailing List}\label{contact.list} There is a low-volume mailing list for questions, support and comments. See \href{http://lists.jacobs-university.de/mailman/listinfo/project-latexml}{\texttt{latexml-project}} for subscription information. \paragraph{Bug-Tracker}\label{contact.git} We are using the git repository hosted at \href{https://github.com/brucemiller/LaTeXML/}{https://github.com/brucemiller/LaTeXML/}. You can browse the code, the latest changes, and check-out the current code from there (see \ref{get}). There is also an Issues database where you can file bug reports or feature requests. % There is a Trac bug-tracking system for reporting bugs, or checking the % status of previously reported bugs at % \href{https://trac.mathweb.org/LaTeXML/}{Bug-Tracker}. % To report bugs, please: % \begin{itemize} % \item \href{http://trac.mathweb.org/register/register}{Register} a Trac account % (preferably give an email so that you'll get notifications about activity regarding the bug). % \item \href{http://trac.mathweb.org/LaTeXML/newticket}{Create a ticket} % \end{itemize} \paragraph{Thanks} to our friends at the \href{http://kwarc.info}{KWARC Research Group} for hosting the mailing list, the original Trac system and svn repository, as well as general moral support. %%Thanks also to \href{http://www.nist.gov/el/msid/sima/}{Systems Integration for Manufacturing Applications} %%for funding portions of the research and development. \paragraph{Author} \href{mailto:bruce.miller@nist.gov}{Bruce Miller}. %============================================================ \section{License \& Notices}\label{notices} \paragraph{License} The research software provided on this web site (``software'') is provided by NIST as a public service. You may use, copy and distribute copies of the software in any medium, provided that you keep intact this entire notice. You may improve, modify and create derivative works of the software or any portion of the software, and you may copy and distribute such modifications or works. Modified works should carry a notice stating that you changed the software and should note the date and nature of any such change. Please explicitly acknowledge the National Institute of Standards and Technology as the source of the software. The software was developed by NIST employees. NIST employee contributions are not subject to copyright protection within the United States. The software is thus released into the Public Domain. Note that according to \href{http://www.gnu.org/licences/license-list.html#PublicDomain}{Gnu.org} public domain is compatible with GPL. \paragraph{Disclaimer} The software is expressly provided ``AS IS.'' NIST MAKES NO WARRANTY OF ANY KIND, EXPRESS, IMPLIED, IN FACT OR ARISING BY OPERATION OF LAW, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT AND DATA ACCURACY. NIST NEITHER REPRESENTS NOR WARRANTS THAT THE OPERATION OF THE SOFTWARE WILL BE UNINTERRUPTED OR ERROR-FREE, OR THAT ANY DEFECTS WILL BE CORRECTED. NIST DOES NOT WARRANT OR MAKE ANY REPRESENTATIONS REGARDING THE USE OF THE SOFTWARE OR THE RESULTS THEREOF, INCLUDING BUT NOT LIMITED TO THE CORRECTNESS, ACCURACY, RELIABILITY, OR USEFULNESS OF THE SOFTWARE. You are solely responsible for determining the appropriateness of using and distributing the software and you assume all risks associated with its use, including but not limited to the risks and costs of program errors, compliance with applicable laws, damage to or loss of data, programs or equipment, and the unavailability or interruption of operation. This software is not intended to be used in any situation where a failure could cause risk of injury or damage to property. \paragraph{Privacy Notice} We adhere to \href{http://www.nist.gov/public_affairs/privacy.cfm}{NIST's Privacy, Security and Accessibility Policy}. %============================================================ \end{document} latexml-0.8.1/doc/site/pdf.eps0000644000175000017500000001412212507513572016100 0ustar norbertnorbert%!PS-Adobe-3.0 EPSF-3.0 %%Creator: (ImageMagick) %%Title: (site/pdf.eps) %%CreationDate: (Wed Apr 9 15:10:11 2008) %%BoundingBox: 0 0 15 15 %%HiResBoundingBox: 0 0 15 15 %%DocumentData: Clean7Bit %%LanguageLevel: 1 %%Pages: 1 %%EndComments %%BeginDefaults %%EndDefaults %%BeginProlog % % Display a color image. The image is displayed in color on % Postscript viewers or printers that support color, otherwise % it is displayed as grayscale. % /DirectClassPacket { % % Get a DirectClass packet. % % Parameters: % red. % green. % blue. % length: number of pixels minus one of this color (optional). % currentfile color_packet readhexstring pop pop compression 0 eq { /number_pixels 3 def } { currentfile byte readhexstring pop 0 get /number_pixels exch 1 add 3 mul def } ifelse 0 3 number_pixels 1 sub { pixels exch color_packet putinterval } for pixels 0 number_pixels getinterval } bind def /DirectClassImage { % % Display a DirectClass image. % systemdict /colorimage known { columns rows 8 [ columns 0 0 rows neg 0 rows ] { DirectClassPacket } false 3 colorimage } { % % No colorimage operator; convert to grayscale. % columns rows 8 [ columns 0 0 rows neg 0 rows ] { GrayDirectClassPacket } image } ifelse } bind def /GrayDirectClassPacket { % % Get a DirectClass packet; convert to grayscale. % % Parameters: % red % green % blue % length: number of pixels minus one of this color (optional). % currentfile color_packet readhexstring pop pop color_packet 0 get 0.299 mul color_packet 1 get 0.587 mul add color_packet 2 get 0.114 mul add cvi /gray_packet exch def compression 0 eq { /number_pixels 1 def } { currentfile byte readhexstring pop 0 get /number_pixels exch 1 add def } ifelse 0 1 number_pixels 1 sub { pixels exch gray_packet put } for pixels 0 number_pixels getinterval } bind def /GrayPseudoClassPacket { % % Get a PseudoClass packet; convert to grayscale. % % Parameters: % index: index into the colormap. % length: number of pixels minus one of this color (optional). % currentfile byte readhexstring pop 0 get /offset exch 3 mul def /color_packet colormap offset 3 getinterval def color_packet 0 get 0.299 mul color_packet 1 get 0.587 mul add color_packet 2 get 0.114 mul add cvi /gray_packet exch def compression 0 eq { /number_pixels 1 def } { currentfile byte readhexstring pop 0 get /number_pixels exch 1 add def } ifelse 0 1 number_pixels 1 sub { pixels exch gray_packet put } for pixels 0 number_pixels getinterval } bind def /PseudoClassPacket { % % Get a PseudoClass packet. % % Parameters: % index: index into the colormap. % length: number of pixels minus one of this color (optional). % currentfile byte readhexstring pop 0 get /offset exch 3 mul def /color_packet colormap offset 3 getinterval def compression 0 eq { /number_pixels 3 def } { currentfile byte readhexstring pop 0 get /number_pixels exch 1 add 3 mul def } ifelse 0 3 number_pixels 1 sub { pixels exch color_packet putinterval } for pixels 0 number_pixels getinterval } bind def /PseudoClassImage { % % Display a PseudoClass image. % % Parameters: % class: 0-PseudoClass or 1-Grayscale. % currentfile buffer readline pop token pop /class exch def pop class 0 gt { currentfile buffer readline pop token pop /depth exch def pop /grays columns 8 add depth sub depth mul 8 idiv string def columns rows depth [ columns 0 0 rows neg 0 rows ] { currentfile grays readhexstring pop } image } { % % Parameters: % colors: number of colors in the colormap. % colormap: red, green, blue color packets. % currentfile buffer readline pop token pop /colors exch def pop /colors colors 3 mul def /colormap colors string def currentfile colormap readhexstring pop pop systemdict /colorimage known { columns rows 8 [ columns 0 0 rows neg 0 rows ] { PseudoClassPacket } false 3 colorimage } { % % No colorimage operator; convert to grayscale. % columns rows 8 [ columns 0 0 rows neg 0 rows ] { GrayPseudoClassPacket } image } ifelse } ifelse } bind def /DisplayImage { % % Display a DirectClass or PseudoClass image. % % Parameters: % x & y translation. % x & y scale. % label pointsize. % image label. % image columns & rows. % class: 0-DirectClass or 1-PseudoClass. % compression: 0-none or 1-RunlengthEncoded. % hex color packets. % gsave /buffer 512 string def /byte 1 string def /color_packet 3 string def /pixels 768 string def currentfile buffer readline pop token pop /x exch def token pop /y exch def pop x y translate currentfile buffer readline pop token pop /x exch def token pop /y exch def pop currentfile buffer readline pop token pop /pointsize exch def pop /Times-Roman findfont pointsize scalefont setfont x y scale currentfile buffer readline pop token pop /columns exch def token pop /rows exch def pop currentfile buffer readline pop token pop /class exch def pop currentfile buffer readline pop token pop /compression exch def pop class 0 gt { PseudoClassImage } { DirectClassImage } ifelse grestore } bind def %%EndProlog %%Page: 1 1 %%PageBoundingBox: 0 0 15 15 userdict begin DisplayImageend %%PageTrailer %%Trailer %%EOF latexml-0.8.1/doc/site/pdf.png0000644000175000017500000000030612507513572016074 0ustar norbertnorbert‰PNG  IHDRF¸}Ù PLTEÿÿÿîîîݾ¡j pHYsHHFÉk> vpAgcKIDAT×cøÿÿÿ†lxΟ²>Ÿ²>ØüaK„¦…†0„fˆ(95€!Ô3jC¨×ʆ %b˜RCœ¦ŠdÑ•QæõIEND®B`‚latexml-0.8.1/doc/sty/0000755000175000017500000000000012507513572014471 5ustar norbertnorbertlatexml-0.8.1/doc/sty/latexmldoc.sty0000644000175000017500000000313212507513572017365 0ustar norbertnorbert%====================================================================== % Collected separately so I can override for LaTeXML %====================================================================== \usepackage{latexml} \usepackage{color} %====================================================================== \def\toctitle#1{} %====================================================================== \let\@@LaTeXML\LaTeXML \def\LaTeXML{\ifHy@pdfstring LaTeXML\else\@@LaTeXML\fi} \def\BibTeX{{\rm B}{\sc ib}\TeX} \let\@title\@empty \def\subtitle#1{\gdef\@subtitle{#1}} \def\@maketitle{% \newpage \null \vskip 2em% \begin{center}% \let \footnote \thanks {\LARGE \@title \par}% \ifx\@subtitle\@empty\else \vskip 1.0em% {\large \@subtitle \par}% \fi \vskip 1.5em% {\large \lineskip .5em% \begin{tabular}[t]{c}% \@author \end{tabular}\par}% \vskip 1em% {\large \@date}% \end{center}% \par \vskip 1.5em} \renewcommand\maketitle{\begin{titlepage}% \let\footnotesize\small \let\footnoterule\relax \let \footnote \thanks \null\vfil \vskip 60\p@ \begin{center}% {\LARGE \@title \par}% \ifx\@subtitle\@empty\else \vskip 1.0em% {\large \@subtitle \par}% \fi \vskip 3em% {\large \lineskip .75em% \begin{tabular}[t]{c}% \@author \end{tabular}\par}% \vskip 1.5em% {\large \@date \par}% % Set date in \large size. \end{center}\par \@thanks \vfil\null \end{titlepage}% } \usepackage{graphicx} %====================================================================== latexml-0.8.1/doc/sty/latexmldoc.sty.ltxml0000644000175000017500000000132312507513572020524 0ustar norbertnorbert# -*- CPERL -*- #====================================================================== # Collected separately so I can override for LaTeXML #====================================================================== package LaTeXML::Package::Pool; use strict; use warnings; use LaTeXML::Package; RequirePackage('latexml'); RequirePackage('graphicx'); RequirePackage('hyperref'); DefMacro('\subtitle{}', '\@add@frontmatter{ltx:subtitle}{#1}'); DefMacro('\toctitle{}', '\@add@frontmatter{ltx:toctitle}{#1}'); DefMacro('\BibTeX', 'BibTeX'); DefMacro('\thesection', ''); DefMacro('\thesubsection', ''); DefMacro('\thesubsubsection', ''); #====================================================================== 1; latexml-0.8.1/doc/sty/latexmlman.sty0000644000175000017500000001724312507513572017403 0ustar norbertnorbert%====================================================================== % Collected separately so I can override for LaTeXML %====================================================================== \usepackage{latexml} \usepackage{color} %====================================================================== \def\toctitle#1{} %====================================================================== \let\@@LaTeXML\LaTeXML \def\LaTeXML{\ifHy@pdfstring LaTeXML\else\@@LaTeXML\fi} \def\BibTeX{{\rm B}{\sc ib}\TeX} \let\@title\@empty \def\subtitle#1{\gdef\@subtitle{#1}} \def\@maketitle{% \newpage \null \vskip 2em% \begin{center}% \let \footnote \thanks {\LARGE \@title \par}% \ifx\@subtitle\@empty\else \vskip 1.0em% {\large \@subtitle \par}% \fi \vskip 1.5em% {\large \lineskip .5em% \begin{tabular}[t]{c}% \@author \end{tabular}\par}% \vskip 1em% {\large \@date}% \end{center}% \par \vskip 1.5em} \renewcommand\maketitle{\begin{titlepage}% \let\footnotesize\small \let\footnoterule\relax \let \footnote \thanks \null\vfil \vskip 60\p@ \begin{center}% {\LARGE \@title \par}% \ifx\@subtitle\@empty\else \vskip 1.0em% {\large \@subtitle \par}% \fi \vskip 3em% {\large \lineskip .75em% \begin{tabular}[t]{c}% \@author \end{tabular}\par}% \vskip 1.5em% {\large \@date \par}% % Set date in \large size. \end{center}\par \@thanks \vfil\null \end{titlepage}% } \usepackage{graphicx} %====================================================================== % Containers for advanced material, examples, ... % Sort of a "double bend" thing? %\newenvironment{advanced}{\vskip 2ex\textcolor{red}{\LARGE !!}}{\vskip 1ex} \newenvironment{advanced}{% %\vskip 2ex \medbreak \noindent\hangindent=1cm\hangafter=-3\relax \hbox to 0pt{\hskip-1.1cm\vbox to 12pt{\noindent\includegraphics{../graphics/scratch.png}}\hfill}% }{\medbreak} %====================================================================== % Phrase-level markup semantic or otherwise \def\perlfont{\ttfamily} \def\shellfont{\ttfamily} \def\latexfont{\ttfamily} \def\schemafont{\sffamily} \def\patternfont{\sffamily\slshape} \def\@@ltxprefix{LaTeXML} \def\@@pkgprefix#1::#2\end{#1} \def\@@latexmlcmd{latexml} \def\@@latexmlpostcmd{latexmlpost} \def\@@latexmlmathcmd{latexmlmath} \newcommand{\pod}[1]{% {\edef\@prefix{\@@pkgprefix#1::\end} \edef\@podcmd{#1} \ifx\@@ltxprefix\@prefix \htmlref{{\perlfont #1}}{#1}% \else\ifx\@podcmd\@@latexmlcmd \htmlref{{\perlfont #1}}{#1}% \else\ifx\@podcmd\@@latexmlpostcmd \htmlref{{\perlfont #1}}{#1}% \else\ifx\@podcmd\@@latexmlmathcmd \htmlref{{\perlfont #1}}{#1}% \else \href{http://search.cpan.org/search?query=#1&mode=module}{{\perlfont #1}}% \fi\fi\fi\fi}} \newcommand{\ltxcmd}[1]{\htmlref{{\latexfont #1}}{#1}} % The idea here is that since the class names are so long, LaTeXML::Core::Definition::Expandable; % Show the long form first, but then abbreviate to just the last name. % Of course, there are eventually, potential ambiguities; We COULD adapt to that, % abbreviating only to the most recent usage? \newcommand{\ltxpod}[1]{% \@ifundefined{podseen@#1}{% \ltxpod@long{#1}% \expandafter\def\csname podseen@#1\endcsname{seen}% }{% \ltxpod@short{#1}% }} \def\ltxpod@long#1{% \ltxpod@ref{(\ltxpod@@pre LaTeXML::#1::\end)\allowbreak\ltxpod@@short#1::\end}{#1}} \def\ltxpod@short#1{% \ltxpod@ref{\ltxpod@@short#1::\end}{#1}} \def\ltxpod@@short#1::#2\end{\ifx.#2.#1\else\ltxpod@@short#2\end\fi} \def\ltxpod@@pre#1::#2\end{\ifx.#2.\else#1::\allowbreak\ltxpod@@pre#2\end\fi} \def\ltxpod@ref#1#2{\htmlref{\perlfont #1}{LaTeXML::#2}} \newcommand{\cmd}[1]{{\shellfont #1}} \newcommand{\code}[1]{{\perlfont #1}} \newcommand{\method}[1]{{\perlfont ->#1}} \newcommand{\attr}[1]{{\schemafont #1}} \newcommand{\attrval}[1]{{\perlfont #1}} \newcommand{\varfile}[2][]{{\shellfont \textit{#2}\if.#1.\else.#1\fi}} \newcommand{\cs}[1]{{\latexfont $\backslash$#1}} %====================================================================== % For generated documentation of Schema. \newenvironment{moduledescription}{\begin{description}}{\end{description}}% \newenvironment{elementdescription}{\begin{description}}{\end{description}}% \newenvironment{patterndescription}{\begin{description}}{\end{description}}% %\newcommand{\typename}[1]{{\perlfont #1}} \newcommand{\typename}[1]{\textit{#1}} \newenvironment{schemamodule}[1]{% \section{Module {\perlfont #1}}\label{schema.module.#1} \raggedright \begin{moduledescription}}{\end{moduledescription}} %\def\cleanhypername#1{{\def\_{\string_}\@cleanhypername#1:\end}} %\def\cleanhypername#1{\@cleanhypername#1:\end} % \def\cleanhypername#1{\expandafter\@@cleanhypername\@cleanhypername#1:\end\_\end} % \def\@cleanhypername#1:#2\end{\ifx.#2.#1\else#1..\@cleanhypername#2\end\fi} % %%%\def\@@cleanhypername#1\_#2\end{\ifx.#2.#1\else#1\string_\@@cleanhypername#2\end\fi} % \def\@@cleanhypername#1\_#2\end{\ifx.#2.#1\else#1-\@@cleanhypername#2\end\fi} \def\cleanhypername#1{\@cleanhypername#1:\end} \def\@cleanhypername#1:#2\end{% \ifx.#2.\@@cleanhypername#1\_\end \else\@@cleanhypername#1\_\end..\@cleanhypername#2\end\fi} \def\@@cleanhypername#1\_#2\end{\ifx.#2.#1\else#1\string_\@@cleanhypername#2\end\fi} % \elementdef{name}{doc}{body} \newcommand{\elementdef}[3]{ \item[\textit{Element }% \hypertarget{\cleanhypername{schema.element.#1}}{{\bfseries\schemafont #1}}% \index{#1@{\schemafont #1}!element}% ] \hspace{1em} #2 \ifx.#3.\else\begin{elementdescription}#3\end{elementdescription}\fi } % \attrdef{name}{doc}{content} \newcommand{\attrdef}[3]{ \item[\textit{Attribute }% {\bfseries\schemafont #1}% \index{#1@{\schemafont #1}!attribute}% ] =\ #3\par\noindent #2 } \newif\if@elpattern\@elpatternfalse \def\test@elpattern#1{% \@elpatternfalse \test@model@pattern#1\_model\end% \test@attributes@pattern#1\_attributes\end} \def\test@model@pattern#1\_model#2\end{\if.#2.\else\@elpatterntrue\fi} \def\test@attributes@pattern#1\_attributes#2\end{\if.#2.\else\@elpatterntrue\fi} % \patterndef{name}{doc}{body} \newcommand{\patterndef}[3]{ \item[\textit{Pattern }% \hypertarget{\cleanhypername{schema.pattern.#1}}{{\bfseries\patternfont #1}}% \test@elpattern{#1}\if@elpattern\else \index{#1@{\patternfont #1}!schema pattern}\fi ] \hspace{1em} #2 \ifx.#3.\else\begin{patterndescription}#3\end{patterndescription}\fi } % \patternadd{name}{doc}{body} \newcommand{\patternadd}[3]{ \item[\textit{Add to }{\bfseries\patternfont #1}] \hspace{1em} #2 %% \item[\textit{Add to }% %% \@ifundefined{defined.schema.pattern.#1}{\hypertarget{schema.pattern.#1}{{\bfseries\patternfont #1}}}{{\bfseries\patternfont #1}} %% ] \hspace{1em} #2 \ifx.#3.\else\begin{patterndescription}#3\end{patterndescription}\fi } \newcommand{\patterndefadd}[3]{ \item[\textit{Add to }% \hypertarget{\cleanhypername{schema.pattern.#1}}{{\bfseries\patternfont #1}}% ] \hspace{1em} #2 \ifx.#3.\else\begin{patterndescription}#3\end{patterndescription}\fi } \newcommand{\moduleref}[1]{\htmlref{{\perlfont #1}}{\cleanhypername{schema.module.#1}}} \newcommand{\patternref}[1]{\hyperlink{\cleanhypername{schema.pattern.#1}}{{\patternfont #1}}} \newcommand{\elementref}[1]{\hyperlink{\cleanhypername{schema.element.#1}}{{\schemafont #1}}} %====================================================================== \definecolor{linkblue}{cmyk}{1.0,0.2,0.0,0.50}% \usepackage[colorlinks=true, linkcolor=linkblue, citecolor=linkblue, urlcolor=blue, plainpages=false, pdfpagelabels]{hyperref} %====================================================================== latexml-0.8.1/doc/sty/latexmlman.sty.ltxml0000644000175000017500000000711612507513572020540 0ustar norbertnorbert# -*- CPERL -*- #====================================================================== # Collected separately so I can override for LaTeXML #====================================================================== package LaTeXML::Package::Pool; use strict; use warnings; use LaTeXML::Package; Let('\orig@maketitle', '\maketitle'); # Save!!! UnshiftValue(SEARCHPATHS => 'sty'); # Load the raw style file, to get most of the definitions, as is. InputDefinitions('latexmlman', type => 'sty', noltxml => 1); Let('\maketitle', '\orig@maketitle'); # Restore!!! # Redefine title page & front matter stuff DefMacro('\subtitle{}', '\@add@frontmatter{ltx:subtitle}{#1}'); DefMacro('\toctitle{}', '\@add@frontmatter{ltx:toctitle}{#1}'); DefMacro('\BibTeX', 'BibTeX'); DefConstructor('\ltxcmd{}', "#1", properties => sub { (label => CleanLabel($_[1])); }); #DefConstructor('\ltxpod{}', "#1", # afterDigest => sub { # my $name = $_[1]->getArg(1)->toString; # $_[1]->setProperty(ref => CleanLabel("LaTeXML::" . $name)); }); DefConstructor('\ltxpod@ref{}{}', "#1", afterDigest => sub { my $name = $_[1]->getArg(2)->toString; $_[1]->setProperty(ref => CleanLabel('LaTeXML::' . $name)); }); DefConstructor('\pod[] Semiverbatim', "#text", afterDigest => sub { my $text = $_[1]->getArg(1); my $name = $_[1]->getArg(2)->toString; if (($name =~ /^LaTeXML/) || ($name =~ /^latexml/)) { my $label = $name; # reformat the label if ($label =~ /^(.*?)\/(.*)$/) { # a reference to a POD/SECTION $label = $1; my $sec = $2; if ($sec =~ /^\"(.*?)\"$/) { # Strip quotes $sec = $1; } $text = $sec unless $text; $sec =~ s/\s+/_/g; # convert spaces $label .= '_' . $sec; } $_[1]->setProperty(ref => CleanLabel($label)); $_[1]->setProperty(text => $text || $name); } else { $name =~ s/::/%3A%3A/g; my $url = "http://search.cpan.org/search?query=$name&mode=module"; $_[1]->setProperty(href => $url); } return; }); #====================================================================== DefEnvironment('{advanced}', "#body"); #====================================================================== DefEnvironment('{moduledescription}', "#body", properties => sub { beginItemize('description'); }); DefEnvironment('{elementdescription}', "#body", properties => sub { beginItemize('description'); }); DefEnvironment('{patterndescription}', "#body", properties => sub { beginItemize('description'); }); #====================================================================== # Section heading styles... DefMacroI('\chaptername', undef, 'Chapter'); DefMacroI('\sectionname', undef, '\S'); DefMacroI('\subsectionname', undef, '\S'); DefMacroI('\subsubsectionname', undef, '\S'); DefMacroI('\paragraphname', undef, '\P'); DefMacroI('\subparagraphname', undef, '\P'); DefMacroI('\appendixname', undef, 'Appendix'); DefMacroI('\appendix', undef, '\@appendix' . '\def\sectionname{}' . '\def\subsectionname{}' . '\def\subsubsectionname{}'); #====================================================================== 1; latexml-0.8.1/lib/0000755000175000017500000000000012507513572013653 5ustar norbertnorbertlatexml-0.8.1/lib/LaTeXML.pm0000644000175000017500000010230312507513572015416 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML | # # | Overall LaTeXML Converter | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Deyan Ginev #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML; use strict; use warnings; use Carp; use Encode; use Data::Dumper; use File::Temp; File::Temp->safe_level(File::Temp::HIGH); use File::Path qw(rmtree); use File::Spec; use LaTeXML::Common::Config; use LaTeXML::Core; use LaTeXML::Util::Pack; use LaTeXML::Util::Pathname; use LaTeXML::Util::WWW; use LaTeXML::Util::ObjectDB; use LaTeXML::Post::Scan; use vars qw($VERSION); # This is the main version of LaTeXML being claimed. use version; our $VERSION = version->declare("0.8.1"); use LaTeXML::Version; # Derived, more informative version numbers our $FULLVERSION = "LaTeXML version $LaTeXML::VERSION" . ($LaTeXML::Version::REVISION ? "; revision $LaTeXML::Version::REVISION" : ''); # Handy identifier string for any executable. our $IDENTITY = "$FindBin::Script ($LaTeXML::FULLVERSION)"; our $LOG_STACK = 0; #********************************************************************** #our @IGNORABLE = qw(timeout profile port preamble postamble port destination log removed_math_formats whatsin whatsout math_formats input_limit input_counter dographics mathimagemag ); # Switching to white-listing options that are important for new_latexml: our @COMPARABLE = qw(preload paths verbosity strict comments inputencoding includestyles documentid mathparse); our %DAEMON_DB = (); sub new { my ($class, $config) = @_; $config = LaTeXML::Common::Config->new() unless (defined $config); # The daemon should be setting the identity: my $self = bless { opts => $config->options, ready => 0, log => q{}, runtime => {}, latexml => undef }, $class; # Special check if the debug directive is on, just to neutralize the bind_log my $debug_directives = $$self{opts}->{debug}; $LaTeXML::DEBUG = 1 if (ref $debug_directives eq 'ARRAY') && (grep { /latexml/i } @$debug_directives); $self->bind_log; my $rv = eval { $config->check; }; $$self{log} .= $self->flush_log; return $self; } sub prepare_session { my ($self, $config) = @_; # TODO: The defaults feature was never used, do we really want it?? #0. Ensure all default keys are present: # (always, as users can specify partial options that build on the defaults) #foreach (keys %{$$self{defaults}}) { # $config->{$_} = $$self{defaults}->{$_} unless exists $config->{$_}; #} # 1. Ensure option "sanity" $self->bind_log; my $rv = eval { $config->check; }; $$self{log} .= $self->flush_log; my $opts = $config->options; my $opts_comparable = { map { $_ => $$opts{$_} } @COMPARABLE }; my $self_opts_comparable = { map { $_ => $$self{opts}{$_} } @COMPARABLE }; #TODO: Some options like paths and includes are additive, we need special treatment there #2.2. Compare old and new $opts hash my $something_to_do; $something_to_do = LaTeXML::Util::ObjectDB::compare($opts_comparable, $self_opts_comparable) ? 0 : 1; #2.3. Set new options in converter: $$self{opts} = $opts; #3. If there is something to do, initialize a session: $self->initialize_session if ($something_to_do || (!$$self{ready})); return; } sub initialize_session { my ($self) = @_; $$self{runtime} = {}; $self->bind_log; # Empty the package namespace foreach my $subname (keys %LaTeXML::Package::Pool::) { delete $LaTeXML::Package::Pool::{$subname}; } my $latexml; my $init_eval_return = eval { # Prepare LaTeXML object $latexml = new_latexml($$self{opts}); 1; }; local $@ = 'Fatal:conversion:unknown Session initialization failed! (Unknown reason)' if ((!$init_eval_return) && (!$@)); if ($@) { #Fatal occured! print STDERR "$@\n"; print STDERR "\nInitialization complete: " . $latexml->getStatusMessage . ". Aborting.\n" if defined $latexml; # Close and restore STDERR to original condition. $$self{log} .= $self->flush_log; $$self{ready} = 0; return; } else { # Demand errorless initialization my $init_status = $latexml->getStatusMessage; if ($init_status =~ /error/i) { print STDERR "\nInitialization complete: " . $init_status . ". Aborting.\n"; $$self{log} .= $self->flush_log; $$self{ready} = 0; return; } } # Save latexml in object: $$self{log} .= $self->flush_log; $$self{latexml} = $latexml; $$self{ready} = 1; return; } sub convert { my ($self, $source) = @_; # 1 Prepare for conversion # 1.1 Initialize session if needed: $$self{runtime} = {}; $self->initialize_session unless $$self{ready}; if (!$$self{ready}) { # We can't initialize, return error: return { result => undef, log => $$self{log}, status => "Initialization failed.", status_code => 3 }; } $self->bind_log; # 1.2 Inform of identity, increase conversion counter my $opts = $$self{opts}; my $runtime = $$self{runtime}; ($$runtime{status}, $$runtime{status_code}) = (undef, undef); if ($$opts{verbosity} >= 0) { print STDERR "$LaTeXML::IDENTITY\n"; print STDERR "invoked as [$0 " . join(' ', @ARGV) . "]\n" if $$opts{verbosity} >= 1; print STDERR "processing started " . localtime() . "\n"; } # 1.3 Prepare for What's IN: # We use a new temporary variable to avoid confusion with daemon caching my ($current_preamble, $current_postamble); # 1.3.1 Math needs to magically trigger math mode if needed if ($$opts{whatsin} eq "math") { $current_preamble = 'literal:\begin{document}\ensuremathfollows'; $current_postamble = 'literal:\ensuremathpreceeds\end{document}'; } # 1.3.2 Fragments need to have a default pre- and postamble, if none provided elsif ($$opts{whatsin} eq 'fragment') { $current_preamble = $$opts{preamble} || 'standard_preamble.tex'; $current_postamble = $$opts{postamble} || 'standard_postamble.tex'; } # 1.3.3 Archives need to get unpacked in a sandbox (with sufficient bookkeeping) elsif ($$opts{whatsin} =~ /^archive/) { # Sandbox the input $$opts{archive_sourcedirectory} = $$opts{sourcedirectory}; my $sandbox_directory = File::Temp->newdir(TMPDIR => 1); $$opts{sourcedirectory} = $sandbox_directory; # Extract the archive in the sandbox $source = unpack_source($source, $sandbox_directory); if (!defined $source) { # Unpacking failed to find a source $$opts{sourcedirectory} = $$opts{archive_sourcedirectory}; my $log = $self->flush_log; return { result => undef, log => $log, status => "Fatal:IO:Archive Can't detect a source TeX file!", status_code => 3 }; } # Destination magic: If we expect an archive on output, we need to invent the appropriate destination ourselves when not given. # Since the LaTeXML API never writes the final archive file to disk, we just use a pretend sourcename.zip: if (($$opts{whatsout} =~ /^archive/) && (!$$opts{destination})) { $$opts{placeholder_destination} = 1; $$opts{destination} = pathname_name($source) . ".zip"; } } # 1.4 Prepare for What's OUT (if we need a sandbox) if ($$opts{whatsout} =~ /^archive/) { $$opts{archive_sitedirectory} = $$opts{sitedirectory}; $$opts{archive_destination} = $$opts{destination}; my $destination_name = $$opts{destination} ? pathname_name($$opts{destination}) : 'document'; my $sandbox_directory = File::Temp->newdir(TMPDIR => 1); my $extension = $$opts{format}; $extension =~ s/\d+$//; $extension =~ s/^epub|mobi$/xhtml/; my $sandbox_destination = "$destination_name.$extension"; $$opts{sitedirectory} = $sandbox_directory; if ($$opts{format} eq 'epub') { $$opts{resource_directory} = File::Spec->catdir($sandbox_directory, 'OPS'); $$opts{destination} = pathname_concat(File::Spec->catdir($sandbox_directory, 'OPS'), $sandbox_destination); } else { $$opts{destination} = pathname_concat($sandbox_directory, $sandbox_destination); } } # 1.5 Prepare a daemon frame my $latexml = $$self{latexml}; $latexml->withState(sub { my ($state) = @_; # Sandbox state $state->pushDaemonFrame; $state->assignValue('_authlist', $$opts{authlist}, 'global'); $state->assignValue('REMOTE_REQUEST', (!$$opts{local}), 'global'); }); # 2 Beginning Core conversion - digest the source: my ($digested, $dom, $serialized) = (undef, undef, undef); my $convert_eval_return = eval { local $SIG{'ALRM'} = sub { die "Fatal:conversion:timeout Conversion timed out after " . $$opts{timeout} . " seconds!\n"; }; alarm($$opts{timeout}); my $mode = ($$opts{type} eq 'auto') ? 'TeX' : $$opts{type}; $digested = $latexml->digestFile($source, preamble => $current_preamble, postamble => $current_postamble, mode => $mode, noinitialize => 1); # 2.1 Now, convert to DOM and output, if desired. if ($digested) { $latexml->withState(sub { if ($$opts{format} eq 'tex') { $serialized = LaTeXML::Core::Token::UnTeX($digested); } elsif ($$opts{format} eq 'box') { $serialized = ($$opts{verbosity} > 0 ? $digested->stringify : $digested->toString); } else { # Default is XML $dom = $latexml->convertDocument($digested); } }); } alarm(0); 1; }; # 2.2 Bookkeeping in case fatal errors occurred local $@ = 'Fatal:conversion:unknown TeX to XML conversion failed! (Unknown Reason)' if ((!$convert_eval_return) && (!$@)); my $eval_report = $@; $$runtime{status} = $latexml->getStatusMessage; $$runtime{status_code} = $latexml->getStatusCode; $$runtime{status_data}->{$_} = $$latexml{state}->{status}->{$_} foreach (qw(warning error fatal)); # End daemon run, by popping frame: $latexml->withState(sub { my ($state) = @_; # Remove current state frame $$opts{searchpaths} = $state->lookupValue('SEARCHPATHS'); # save the searchpaths for post-processing $state->popDaemonFrame; $$state{status} = {}; }); if ($eval_report || ($$runtime{status_code} == 3)) { # Terminate immediately on Fatal errors $$runtime{status_code} = 3; print STDERR $eval_report . "\n" if $eval_report; print STDERR "\nConversion complete: " . $$runtime{status} . ".\n"; print STDERR "Status:conversion:" . ($$runtime{status_code} || '0') . "\n"; # If we just processed an archive, clean up sandbox directory. if ($$opts{whatsin} =~ /^archive/) { rmtree($$opts{sourcedirectory}); $$opts{sourcedirectory} = $$opts{archive_sourcedirectory}; } # Close and restore STDERR to original condition. my $log = $self->flush_log; $serialized = $dom if ($$opts{format} eq 'dom'); $serialized = $dom->toString if ($dom && (!defined $serialized)); $self->sanitize($log); return { result => $serialized, log => $log, status => $$runtime{status}, status_code => $$runtime{status_code} }; } else { # Standard report, if we're not in a Fatal case print STDERR "\nConversion complete: " . $$runtime{status} . ".\n"; } # 2.3 Clean up and exit if we only wanted the serialization of the core conversion if ($serialized) { # If serialized has been set, we are done with the job # If we just processed an archive, clean up sandbox directory. if ($$opts{whatsin} =~ /^archive/) { rmtree($$opts{sourcedirectory}); $$opts{sourcedirectory} = $$opts{archive_sourcedirectory}; } my $log = $self->flush_log; return { result => $serialized, log => $log, status => $$runtime{status}, status_code => $$runtime{status_code} }; } # 3 If desired, post-process my $result = $dom; if ($$opts{post} && $dom && $dom->documentElement) { my $post_eval_return = eval { local $SIG{'ALRM'} = sub { die "alarm\n" }; alarm($$opts{timeout}); $result = $self->convert_post($dom); alarm(0); 1; }; # 3.1 Bookkeeping if a post-processing Fatal error occurred local $@ = 'Fatal:conversion:unknown Post-processing failed! (Unknown Reason)' if ((!$post_eval_return) && (!$@)); if ($@) { #Fatal occured! $$runtime{status_code} = 3; if ($@ =~ "Fatal:perl:die alarm") { #Alarm handler: (treat timeouts as fatals) print STDERR "Fatal:post:timeout Postprocessing timeout after " . $$opts{timeout} . " seconds!\n"; } else { print STDERR "Fatal:post:generic Post-processor crashed! $@\n"; } #Since this is postprocessing, we don't need to do anything # just avoid crashing... $result = undef; } } # 4 Clean-up: undo everything we sandboxed if ($$opts{whatsin} =~ /^archive/) { rmtree($$opts{sourcedirectory}); $$opts{sourcedirectory} = $$opts{archive_sourcedirectory}; } if ($$opts{whatsout} =~ /^archive/) { rmtree($$opts{sitedirectory}); $$opts{sitedirectory} = $$opts{archive_sitedirectory}; $$opts{destination} = $$opts{archive_destination}; if (delete $$opts{placeholder_destination}) { delete $$opts{destination}; } } # 5 Output # 5.1 Serialize the XML/HTML result (or just return the Perl object, if requested) undef $serialized; if ((defined $result) && ref($result) && (ref($result) =~ /^(:?LaTe)?XML/)) { if ($$opts{format} =~ 'x(ht)?ml') { $serialized = $result->toString(1); } elsif ($$opts{format} =~ /^html/) { if (ref($result) =~ '^LaTeXML::(Post::)?Document$') { # Special for documents $serialized = $result->getDocument->toStringHTML; } else { # Regular for fragments do { local $XML::LibXML::setTagCompression = 1; $serialized = $result->toString(1); } } } elsif ($$opts{format} eq 'dom') { $serialized = $result; } } else { $serialized = $result; } # Compressed case # 5.2 Finalize logging and return a response containing the document result, log and status print STDERR "Status:conversion:" . ($$runtime{status_code} || '0') . " \n"; my $log = $self->flush_log; $self->sanitize($log) if ($$runtime{status_code} == 3); return { result => $serialized, log => $log, status => $$runtime{status}, 'status_code' => $$runtime{status_code} }; } ########################################### #### Converter Management ##### ########################################### sub get_converter { my ($self, $config) = @_; # TODO: Make this more flexible via an admin interface later my $key = $config->get('cache_key') || $config->get('profile') || 'custom'; my $d = $DAEMON_DB{$key}; if (!defined $d) { $d = LaTeXML->new($config->clone); $DAEMON_DB{$key} = $d; } return $d; } ########################################### #### Helper routines ##### ########################################### sub convert_post { my ($self, $dom) = @_; my $opts = $$self{opts}; my $runtime = $$self{runtime}; my ($xslt, $parallel, $math_formats, $format, $verbosity, $defaultresources, $embed) = map { $$opts{$_} } qw(stylesheet parallelmath math_formats format verbosity defaultresources embed); $verbosity = $verbosity || 0; my %PostOPS = (verbosity => $verbosity, validate => $$opts{validate}, sourceDirectory => $$opts{sourcedirectory}, siteDirectory => $$opts{sitedirectory}, resource_directory => $$opts{resource_directory}, searchpaths => $$opts{searchpaths}, nocache => 1, destination => $$opts{destination}, is_html => $$opts{is_html}); #Postprocess $parallel = $parallel || 0; my $DOCUMENT = LaTeXML::Post::Document->new($dom, %PostOPS); my @procs = (); #TODO: Add support for the following: my $dbfile = $$opts{dbfile}; if (defined $dbfile && !-f $dbfile) { if (my $dbdir = pathname_directory($dbfile)) { pathname_mkdir($dbdir); } } my $DB = LaTeXML::Util::ObjectDB->new(dbfile => $dbfile, %PostOPS); ### Advanced Processors: if ($$opts{split}) { require LaTeXML::Post::Split; push(@procs, LaTeXML::Post::Split->new(split_xpath => $$opts{splitpath}, splitnaming => $$opts{splitnaming}, db => $DB, %PostOPS)); } my $scanner = ($$opts{scan} || $DB) && (LaTeXML::Post::Scan->new(db => $DB, %PostOPS)); push(@procs, $scanner) if $$opts{scan}; if (!($$opts{prescan})) { if ($$opts{index}) { require LaTeXML::Post::MakeIndex; push(@procs, LaTeXML::Post::MakeIndex->new(db => $DB, permuted => $$opts{permutedindex}, split => $$opts{splitindex}, scanner => $scanner, %PostOPS)); } require LaTeXML::Post::MakeBibliography; push(@procs, LaTeXML::Post::MakeBibliography->new(db => $DB, bibliographies => $$opts{bibliographies}, split => $$opts{splitbibliography}, scanner => $scanner, %PostOPS)); if ($$opts{crossref}) { require LaTeXML::Post::CrossRef; push(@procs, LaTeXML::Post::CrossRef->new( db => $DB, urlstyle => $$opts{urlstyle}, extension => $$opts{extension}, ($$opts{numbersections} ? (number_sections => 1) : ()), ($$opts{navtoc} ? (navigation_toc => $$opts{navtoc}) : ()), %PostOPS)); } if ($$opts{picimages}) { require LaTeXML::Post::PictureImages; push(@procs, LaTeXML::Post::PictureImages->new(%PostOPS)); } if ($$opts{dographics}) { # TODO: Rethink full-fledged graphics support require LaTeXML::Post::Graphics; my @g_options = (); if ($$opts{graphicsmaps} && scalar(@{ $$opts{graphicsmaps} })) { my @maps = map { [split(/\./, $_)] } @{ $$opts{graphicsmaps} }; push(@g_options, (graphics_types => [map { $$_[0] } @maps], type_properties => { map { ($$_[0] => { destination_type => ($$_[1] || $$_[0]) }) } @maps })); } push(@procs, LaTeXML::Post::Graphics->new(@g_options, %PostOPS)); } if ($$opts{svg}) { require LaTeXML::Post::SVG; push(@procs, LaTeXML::Post::SVG->new(%PostOPS)); } if (@$math_formats) { my @mprocs = (); ### # If XMath is not first, it must be at END! Or... ??? foreach my $fmt (@$math_formats) { if ($fmt eq 'xmath') { require LaTeXML::Post::XMath; push(@mprocs, LaTeXML::Post::XMath->new(%PostOPS)); } elsif ($fmt eq 'pmml') { require LaTeXML::Post::MathML::Presentation; push(@mprocs, LaTeXML::Post::MathML::Presentation->new( linelength => $$opts{linelength}, (defined $$opts{plane1} ? (plane1 => $$opts{plane1}) : (plane1 => 1)), ($$opts{hackplane1} ? (hackplane1 => 1) : ()), %PostOPS)); } elsif ($fmt eq 'cmml') { require LaTeXML::Post::MathML::Content; push(@mprocs, LaTeXML::Post::MathML::Content->new( (defined $$opts{plane1} ? (plane1 => $$opts{plane1}) : (plane1 => 1)), ($$opts{hackplane1} ? (hackplane1 => 1) : ()), %PostOPS)); } elsif ($fmt eq 'om') { require LaTeXML::Post::OpenMath; push(@mprocs, LaTeXML::Post::OpenMath->new( (defined $$opts{plane1} ? (plane1 => $$opts{plane1}) : (plane1 => 1)), ($$opts{hackplane1} ? (hackplane1 => 1) : ()), %PostOPS)); } elsif ($fmt eq 'images') { require LaTeXML::Post::MathImages; push(@mprocs, LaTeXML::Post::MathImages->new(magnification => $$opts{mathimagemag}, %PostOPS)); } elsif ($fmt eq 'svg') { require LaTeXML::Post::MathImages; push(@mprocs, LaTeXML::Post::MathImages->new(magnification => $$opts{mathimagemag}, imagetype => 'svg', %PostOPS)); } } ### $keepXMath = 0 unless defined $keepXMath; ### OR is $parallelmath ALWAYS on whenever there's more than one math processor? if ($parallel) { my $main = shift(@mprocs); $main->setParallel(@mprocs); push(@procs, $main); } else { push(@procs, @mprocs); } } if ($xslt) { require LaTeXML::Post::XSLT; my $parameters = { LATEXML_VERSION => "'$LaTeXML::VERSION'" }; my @searchpaths = ('.', $DOCUMENT->getSearchPaths); foreach my $css (@{ $$opts{css} }) { if (pathname_is_url($css)) { # external url ? no need to copy print STDERR "Using CSS=$css\n" if $verbosity > 0; push(@{ $$parameters{CSS} }, $css); } elsif (my $csssource = pathname_find($css, types => ['css'], paths => [@searchpaths], installation_subdir => 'style')) { print STDERR "Using CSS=$csssource\n" if $verbosity > 0; my $cssdest = pathname_absolute($css, pathname_directory($$opts{destination})); $cssdest .= '.css' unless $cssdest =~ /\.css$/; warn "CSS source $csssource is same as destination!" if $csssource eq $cssdest; pathname_copy($csssource, $cssdest) if ($$opts{local} || ($$opts{whatsout} =~ /^archive/)); # TODO: Look into local copying carefully push(@{ $$parameters{CSS} }, $cssdest); } else { warn "Couldn't find CSS file $css in paths " . join(',', @searchpaths) . "\n"; push(@{ $$parameters{CSS} }, $css); } } # but still put the link in! foreach my $js (@{ $$opts{javascript} }) { if (pathname_is_url($js)) { # external url ? no need to copy print STDERR "Using JAVASCRIPT=$js\n" if $verbosity > 0; push(@{ $$parameters{JAVASCRIPT} }, $js); } elsif (my $jssource = pathname_find($js, types => ['js'], paths => [@searchpaths], installation_subdir => 'style')) { print STDERR "Using JAVASCRIPT=$jssource\n" if $verbosity > 0; my $jsdest = pathname_absolute($js, pathname_directory($$opts{destination})); $jsdest .= '.js' unless $jsdest =~ /\.js$/; warn "Javascript source $jssource is same as destination!" if $jssource eq $jsdest; pathname_copy($jssource, $jsdest) if ($$opts{local} || ($$opts{whatsout} =~ /^archive/)); #TODO: Local handling push(@{ $$parameters{JAVASCRIPT} }, $jsdest); } else { warn "Couldn't find Javascript file $js in paths " . join(',', @searchpaths) . "\n"; push(@{ $$parameters{JAVASCRIPT} }, $js); } } # but still put the link in! if ($$opts{icon}) { if (my $iconsrc = pathname_find($$opts{icon}, paths => [$DOCUMENT->getSearchPaths])) { print STDERR "Using icon=$iconsrc\n" if $verbosity > 0; my $icondest = pathname_absolute($$opts{icon}, pathname_directory($$opts{destination})); pathname_copy($iconsrc, $icondest) if ($$opts{local} || ($$opts{whatsout} =~ /^archive/)); $$parameters{ICON} = $icondest; } else { warn "Couldn't find ICON " . $$opts{icon} . " in paths " . join(',', @searchpaths) . "\n"; $$parameters{ICON} = $$opts{icon}; } } if (!defined $$opts{timestamp}) { $$opts{timestamp} = localtime(); } if ($$opts{timestamp}) { $$parameters{TIMESTAMP} = "'" . $$opts{timestamp} . "'"; } # Now add in the explicitly given XSLT parameters foreach my $parm (@{ $$opts{xsltparameters} }) { if ($parm =~ /^\s*(\w+)\s*:\s*(.*)$/) { $$parameters{$1} = "'" . $2 . "'"; } else { warn "xsltparameter not in recognized format: 'name:value' got: '$parm'\n"; } } push(@procs, LaTeXML::Post::XSLT->new(stylesheet => $xslt, parameters => $parameters, searchpaths => [@searchpaths], noresources => (defined $$opts{defaultresources}) && !$$opts{defaultresources}, %PostOPS)); } } # If we are doing a local conversion OR # we are going to package into an archive # write all the files to disk during post-processing if ($$opts{destination} && (($$opts{local} && ($$opts{whatsout} eq 'document')) || ($$opts{whatsout} =~ /^archive/))) { require LaTeXML::Post::Writer; push(@procs, LaTeXML::Post::Writer->new( format => $format, omit_doctype => $$opts{omit_doctype}, %PostOPS)); } # Do the actual post-processing: my @postdocs; my $latexmlpost = LaTeXML::Post->new(verbosity => $verbosity || 0); @postdocs = $latexmlpost->ProcessChain($DOCUMENT, @procs); # Finalize by arranging any manifests and packaging the output. # If our format requires a manifest, create one if (($$opts{whatsout} =~ /^archive/) && ($format !~ /^x?html|xml/)) { require LaTeXML::Post::Manifest; my $manifest_maker = LaTeXML::Post::Manifest->new(db => $DB, format => $format, %PostOPS); $manifest_maker->process(@postdocs); } # Archives: when a relative --log is requested, write to sandbox prior packing if ($$opts{log} && ($$opts{whatsout} =~ /^archive/) && (!pathname_is_absolute($$opts{log}))) { my $destination_directory = $postdocs[0]->getDestinationDirectory(); my $log_file = pathname_absolute($$opts{log}, $destination_directory); if (pathname_is_contained($log_file, $destination_directory)) { print STDERR "\nPost-processing complete: " . $latexmlpost->getStatusMessage . "\n"; print STDERR "processing finished " . localtime() . "\n" if $verbosity >= 0; print STDERR "Status:conversion:" . ($$self{runtime}->{status_code} || '0') . " \n"; open my $log_fh, '>', $log_file; print $log_fh $self->flush_log; close $log_fh; $self->bind_log; } else { print STDERR "Error:IO:log The target log file isn't contained in the destination directory!\n"; } } # Handle the output packaging my ($postdoc) = pack_collection(collection => [@postdocs], whatsout => $$opts{whatsout}, format => $format, %PostOPS); $DB->finish; # TODO: Refactor once we know how to merge the core and post State objects # Merge postprocessing and main processing reports foreach my $message_type (qw(warning error fatal)) { my $count = $$latexmlpost{status}->{$message_type} || 0; $$runtime{status_data}->{$message_type} += $count; } $$runtime{status} = getStatusMessage($$runtime{status_data}); $$runtime{status_code} = getStatusCode($$runtime{status_data}); print STDERR "\nPost-processing complete: " . $latexmlpost->getStatusMessage . "\n"; print STDERR "processing finished " . localtime() . "\n" if $verbosity >= 0; # Avoid writing the main file twice (non-archive documents): if ($$opts{destination} && $$opts{local} && ($$opts{whatsout} eq 'document')) { undef $postdoc; } return $postdoc; } sub new_latexml { my ($opts) = @_; # TODO: Do this in a GOOD way to support filepath/URL/string snippets # If we are given string preloads, load them and remove them from the preload list: my $preloads = $$opts{preload}; my (@pre, @str_pre); foreach my $pre (@$preloads) { if (pathname_is_literaldata($pre)) { push @str_pre, $pre; } else { push @pre, $pre; } } require LaTeXML; my $latexml = LaTeXML::Core->new(preload => [@pre], searchpaths => [@{ $$opts{paths} }], graphicspaths => ['.'], verbosity => $$opts{verbosity}, strict => $$opts{strict}, includeComments => $$opts{comments}, inputencoding => $$opts{inputencoding}, includeStyles => $$opts{includestyles}, documentid => $$opts{documentid}, nomathparse => $$opts{nomathparse}, # Backwards compatibility mathparse => $$opts{mathparse}); if (my @baddirs = grep { !-d $_ } @{ $$opts{paths} }) { warn "\n$LaTeXML::IDENTITY : these path directories do not exist: " . join(', ', @baddirs) . "\n"; } $latexml->withState(sub { my ($state) = @_; $latexml->initializeState('TeX.pool', @{ $$latexml{preload} || [] }); }); # TODO: Do again, need to do this in a GOOD way as well: $latexml->digestFile($_, noinitialize => 1) foreach (@str_pre); print STDERR "\n\n"; # Flush a pair of newlines to delimit the initalization return $latexml; } sub bind_log { my ($self) = @_; # HACK HACK HACK !!! Refactor with proplery scoped logging !!! $LaTeXML::LOG_STACK++; # May the modern Perl community forgive me for this hack... return if $LaTeXML::LOG_STACK > 1; # TODO: Move away from global file handles, they will inevitably end up causing problems.. if (!$LaTeXML::DEBUG) { # Debug will use STDERR for logs # Tie STDERR to log: my $log_handle; open($log_handle, ">>", \$$self{log}) or croak "Can't redirect STDERR to log! Dying..."; *STDERR_SAVED = *STDERR; *STDERR = *$log_handle; binmode(STDERR, ':encoding(UTF-8)'); $$self{log_handle} = $log_handle; } return; } sub flush_log { my ($self) = @_; # HACK HACK HACK !!! Refactor with proplery scoped logging !!! $LaTeXML::LOG_STACK--; # May the modern Perl community forgive me for this hack... return '' if $LaTeXML::LOG_STACK > 0; # Close and restore STDERR to original condition. if (!$LaTeXML::DEBUG) { close $$self{log_handle}; delete $$self{log_handle}; *STDERR = *STDERR_SAVED; } my $log = $$self{log}; $$self{log} = q{}; return $log; } sub sanitize { my ($self, $log) = @_; if ($log =~ m/^Fatal:internal/m) { # TODO : Anything else? Clean up the whole stomach etc? $$self{latexml}->withState(sub { my ($state) = @_; # Remove current state frame my $stomach = $state->getStomach; undef $stomach; }); $$self{ready} = 0; } return; } sub getStatusMessage { my ($status) = @_; my @report = (); push(@report, "$$status{warning} warning" . ($$status{warning} > 1 ? 's' : '')) if $$status{warning}; push(@report, "$$status{error} error" . ($$status{error} > 1 ? 's' : '')) if $$status{error}; push(@report, "$$status{fatal} fatal error" . ($$status{fatal} > 1 ? 's' : '')) if $$status{fatal}; return join('; ', @report) || 'No obvious problems'; } sub getStatusCode { my ($status) = @_; my $code; if ($$status{fatal} && $$status{fatal} > 0) { $code = 3; } elsif ($$status{error} && $$status{error} > 0) { $code = 2; } elsif ($$status{warning} && $$status{warning} > 0) { $code = 1; } else { $code = 0; } return $code; } 1; __END__ =pod =head1 NAME C - A converter that transforms TeX and LaTeX into XML/HTML/MathML =head1 SYNOPSIS use LaTeXML; my $converter = LaTeXML->get_converter($config); my $converter = LaTeXML->new($config); $converter->prepare_session($opts); $converter->initialize_session; # SHOULD BE INTERNAL $hashref = $converter->convert($tex); my ($result,$log,$status) = map {$hashref->{$_}} qw(result log status); =head1 DESCRIPTION LaTeXML is a converter that transforms TeX and LaTeX into XML/HTML/MathML and other formats. A LaTeXML object represents a converter instance and can convert files on demand, until dismissed. =head2 METHODS =over 4 =item C<< my $converter = LaTeXML->new($config); >> Creates a new converter object for a given LaTeXML::Common::Config object, $config. =item C<< my $converter = LaTeXML->get_converter($config); >> Either creates, or looks up a cached converter for the $config configuration object. =item C<< $converter->prepare_session($opts); >> Top-level preparation routine that prepares both a correct options object and an initialized LaTeXML object, using the "initialize_options" and "initialize_session" routines, when needed. Contains optimization checks that skip initializations unless necessary. Also adds support for partial option specifications during daemon runtime, falling back on the option defaults given when converter object was created. =item C<< my ($result,$status,$log) = $converter->convert($tex); >> Converts a TeX input string $tex into the LaTeXML::Core::Document object $result. Supplies detailed information of the conversion log ($log), as well as a brief conversion status summary ($status). =back =head2 INTERNAL ROUTINES =over 4 =item C<< $converter->initialize_session($opts); >> Given an options hash reference $opts, initializes a session by creating a new LaTeXML object with initialized state and loading a daemonized preamble (if any). Sets the "ready" flag to true, making a subsequent "convert" call immediately possible. =item C<< my $latexml = new_latexml($opts); >> Creates a new LaTeXML object and initializes its state. =item C<< my $postdoc = $converter->convert_post($dom); >> Post-processes a LaTeXML::Core::Document object $dom into a final format, based on the preferences specified in $$self{opts}. Typically used only internally by C. =item C<< $converter->bind_log; >> Binds STDERR to a "log" field in the $converter object =item C<< my $log = $converter->flush_log; >> Flushes out the accumulated conversion log into $log, reseting STDERR to its usual stream. =back =head1 AUTHOR Bruce Miller Deyan Ginev =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. You may consider this as released under the CC0 License. =cut latexml-0.8.1/lib/LaTeXML/0000755000175000017500000000000012507513572015061 5ustar norbertnorbertlatexml-0.8.1/lib/LaTeXML/Common/0000755000175000017500000000000012507513572016311 5ustar norbertnorbertlatexml-0.8.1/lib/LaTeXML/Common/Color.pm0000644000175000017500000002052112507513572017725 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Common::Color ,... | # # | Representation of colors in various color models | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Color; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use base qw(LaTeXML::Common::Object); use base qw(Exporter); our @EXPORT = ( # Global STATE; This gets bound by LaTeXML.pm qw( &Color &Black &White), ); #====================================================================== # Exported constructors sub Color { my ($model, @components) = @_; return LaTeXML::Common::Color->new(ToString($model), map { ToString($_) } @components); } use constant Black => bless ['rgb', 0, 0, 0], 'LaTeXML::Common::Color::rgb'; use constant White => bless ['rgb', 1, 1, 1], 'LaTeXML::Common::Color::rgb'; #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Color objects; objects representing color in "arbitrary" color models # We'd like to provide a set of "core" color models (rgb,cmy,cmyk,hsb) # and allow derived color models (with scaled ranges, or whatever; see xcolor). # There is some awkwardness in that we'd like to support the core models # directly with built-in code, but support derived models that possibly # are defined in terms of macros defined as part of a style file. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # NOTE: This class is in Common since it could conceivably be useful # in Postprocessing --- But the API, includes, etc haven't been tuned for that! # They only use $STATE to get derived color information, Error, min & max. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Color Objects our %core_color_models = map { ($_ => 1) } qw(rgb cmy cmyk hsb gray); # [CONSTANT] # slightly contrived to avoid 'use'ing all the models in here # (which causes compiler redefined issues, and preloads them all) sub new { my ($class, @components) = @_; if (ref $class) { # from $self->new(...) return bless [$$class[0], @components], ref $class; } else { # Else, $model is the 1st element of @components; my $model = shift(@components); my $type = ($core_color_models{$model} ? $model : 'Derived'); my $class = 'LaTeXML::Common::Color::' . $type; if (($type eq 'Derived') && !$STATE->lookupValue('derived_color_model_' . $model)) { Error('unexpected', $model, undef, "Unrecognized color model '$model'"); } my $module = $class . '.pm'; $module =~ s|::|/|g; require $module unless exists $INC{$module}; # Load if not already loaded return bless [$model, @components], $class; } } sub model { my ($self) = @_; return $$self[0]; } sub components { my ($self) = @_; my ($m, @comp) = @$self; return @comp; } # Convert a color to another model sub convert { my ($self, $tomodel) = @_; if ($self->model eq $tomodel) { # Already the correct model return $self; } elsif ($core_color_models{$tomodel}) { # target must be core model return $self->toCore->$tomodel; } elsif (my $data = $STATE->lookupValue('derived_color_model_' . $tomodel)) { # Ah, target is a derived color my $coremodel = $$data[0]; my $convertfrom = $$data[2]; return &{$convertfrom}($self->$coremodel); } else { Error('unexpected', $tomodel, undef, "Unrecognized color model '$tomodel'"); return $self; } } sub toString { my ($self) = @_; my ($model, @comp) = @$self; return $model . "(" . join(',', @comp) . ")"; } sub toHex { my ($self) = @_; return $self->rgb->toHex; } sub toAttribute { my ($self) = @_; return $self->rgb->toHex; } # Convert the color to a core model; Assume it already is! # Color::Derived MUST override this... sub toCore { my ($self) = @_; return $self; } #====================================================================== # By default, just complement components (works for rgb, cmy, gray) sub complement { my ($self) = @_; return $self->new(map { 1 - $_ } $self->components); } # Mix $self*$fraction + $color*(1-$fraction) sub mix { my ($self, $color, $fraction) = @_; $color = $color->convert($self->model) unless $self->model eq $color->model; my @a = $self->components; my @b = $color->components; return $self->new(map { $fraction * $a[$_] + (1 - $fraction) * $b[$_] } 0 .. $#a); } sub add { my ($self, $color) = @_; $color = $color->convert($self->model) unless $self->model eq $color->model; my @a = $self->components; my @b = $color->components; return $self->new(map { $a[$_] + $b[$_] } 0 .. $#a); } # The next 2 methods multiply the components of a color by some value(s) # This assumes that such a thing makes sense in the given model, for some purpose. # It may be that the components should be truncated to 1 (or some other max?) # Multiply all components by a constant sub scale { my ($self, $m) = @_; return $self->new(map { $m * $_ } $self->components); } # Multiply by a vector (must have same number of components) # This may or may not make sense for any given color model or purpose. sub multiply { my ($self, @m) = @_; my @c = $self->components; if (scalar(@m) != scalar(@c)) { Error('misdefined', 'multiply', "Multiplying color components by wrong number of parts", "The color is " . ToString($self) . " while the multipliers are " . join(',', @m)); return $self; } else { return $self->new(map { $c[$_] * $m[$_] } 0 .. $#c); } } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 1; __END__ =pod =head1 NAME C - abstract class representating colors using various color models; extends L. =head2 Exported functions =over 4 =item C<< $color = Color($model,@components); >> Creates a Color object using the given color model, and with the given components. The core color models are C, C, C, C and C. The components of colors using core color models are between 0 and 1 (inclusive) =item C<< Black >>, C<< White >> Constant color objects representing black and white, respectively. =back =head2 Methods =over 4 =item C<< $model = $color->model; >> Return the name of the color model. =item C<< @components = $color->components; >> Return the components of the color. =item C<< $other = $color->convert($tomodel); >> Converts the color to another color model. =item C<< $string = $color->toString; >> Returns a printed representation of the color. =item C<< $hex = $color->toHex; >> Returns a string representing the color as RGB in hexadecimal (6 digits). =item C<< $other = $color->toCore(); >> Converts the color to one of the core colors. =item C<< $complement = $color->complement(); >> Returns the complement color (works for colors in C, C and C color models). =item C<< $new = $color->mix($other,$fraction); >> Returns a new color which results from mixing a C<$fraction> of C<$color> with C<(1-$fraction)> of color C<$other>. =item C<< $new = $color->add($other); >> Returns a new color made by adding the components of the two colors. =item C<< $new = $color->scale($m); >> Returns a new color made by mulitiplying the components by C<$n>. =item C<< $new = $color->multiply(@m); >> Returns a new color made by mulitiplying the components by the corresponding component from C<@n>. =back =head1 SEE ALSO Supported color models: L, L, L, L, L and L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Common/Color/0000755000175000017500000000000012507513572017367 5ustar norbertnorbertlatexml-0.8.1/lib/LaTeXML/Common/Color/Derived.pm0000644000175000017500000000527112507513572021314 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Common::Color::Derived | # # | A representation of colors in color models derived from core | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Color::Derived; use strict; use warnings; use LaTeXML::Common::Color; use base qw(LaTeXML::Common::Color); use LaTeXML::Global; use LaTeXML::Common::Error; # Convert this derived color to one of the Core colors # Subclasses of this color need to set the variable: # derived_color_model_ => [$coremodel, &convertto($self), &convertfrom($core) ] # $coremodel is the core model class name associated with this color model # &convertto($self) converts an instance of the derived model to the core color # &convertfrom($core) converts an instance od the core model to the derived color sub toCore { my ($self) = @_; my $model = $$self[0]; if (my $data = $STATE->lookupValue('derived_color_model_' . $model)) { my $convertto = $$data[1]; return &{$convertto}($self); } else { Error('unexpected', $self->model, undef, "Color is not in valid model '$model'"); return Black; } } sub rgb { my ($self) = @_; return $self->convert('rgb'); } sub cmy { my ($self) = @_; return $self->convert('cmy'); } sub cmyk { my ($self) = @_; return $self->convert('cmyk'); } sub hsb { my ($self) = @_; return $self->convert('hsb'); } sub gray { my ($self) = @_; return $self->convert('gray'); } #====================================================================== 1; __END__ =head1 NAME C - represents colors in derived color models =head1 SYNOPSIS C represents colors in derived color models. These are used to support various color models defined and definable via the C package, such as colors where the components are in different ranges. It extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Common/Color/cmy.pm0000644000175000017500000000431112507513572020514 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Common::Color::cmy | # # | A representation of colors in the cmy color model | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Color::cmy; use strict; use warnings; use base qw(LaTeXML::Common::Color); use LaTeXML::Global; use List::Util qw(min max); sub cmy { my ($self) = @_; return $self; } sub rgb { my ($self) = @_; return LaTeXML::Common::Color->new('rgb', 1 - $$self[1], 1 - $$self[2], 1 - $$self[3]); } sub hsb { my ($self) = @_; return $self->rgb->hsb; } sub gray { my ($self) = @_; return LaTeXML::Common::Color->new('gray', 1 - (0.3 * $$self[1] + 0.59 * $$self[2] + 0.11 * $$self[3])); } sub cmyk { my ($self) = @_; my ($model, $c, $m, $y) = @$self; # These beta parameters are linear coefficients for "undercolor-removal", and "black-generation" # In xcolor, they could come from \adjustUCRBG my ($bc, $bm, $by, $bk) = (1, 1, 1, 1); my $k = min($c, min($m, $y)); return LaTeXML::Common::Color->new('cmyk', min(1, max(0, $c - $bc * $k)), min(1, max(0, $m - $bm * $k)), min(1, max(0, $y - $by * $k)), $bk * $k); } #====================================================================== 1; __END__ =head1 NAME C - represents colors in the cmy color model: cyan, magenta and yellow [0..1]; extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Common/Color/cmyk.pm0000644000175000017500000000374712507513572020703 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Common::Color::cmyk | # # | A representation of colors in the cmyk color model | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Color::cmyk; use strict; use warnings; use base qw(LaTeXML::Common::Color); use LaTeXML::Global; use List::Util qw(min max); sub cmyk { my ($self) = @_; return $self; } sub cmy { my ($self) = @_; my ($model, $c, $m, $y, $k) = @$self; return LaTeXML::Common::Color->new('cmy', min(1, $c + $k), min(1, $m + $k), min(1, $y + $k)); } sub rgb { my ($self) = @_; return $self->cmy->rgb; } sub hsb { my ($self) = @_; return $self->cmy->hsb; } sub gray { my ($self) = @_; my ($model, $c, $m, $y, $k) = @$self; return LaTeXML::Common::Color->new('gray', 1 - min(1, 0.3 * $c + 0.59 * $m + 0.11 * $y + $k)); } sub complement { my ($self) = @_; return $self->cmy->complement->cmyk; } #====================================================================== 1; __END__ =head1 NAME C - represents colors in the cmyk color model: cyan, magenta, yellow and black in [0..1]; extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Common/Color/gray.pm0000644000175000017500000000351612507513572020674 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Common::Color::gray | # # | A representation of colors in the gray color model | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Color::gray; use strict; use warnings; use base qw(LaTeXML::Common::Color); use LaTeXML::Global; sub gray { my ($self) = @_; return $self; } sub rgb { my ($self) = @_; return LaTeXML::Common::Color->new('rgb', $$self[1], $$self[1], $$self[1]); } sub cmy { my ($self) = @_; return LaTeXML::Common::Color->new('cmy', 1 - $$self[1], 1 - $$self[1], 1 - $$self[1]); } sub cmyk { my ($self) = @_; return LaTeXML::Common::Color->new('cmy', 0, 0, 0, 1 - $$self[1]); } sub hsb { my ($self) = @_; return LaTeXML::Common::Color->new('hsb', 0, 0, $$self[1]); } #====================================================================== 1; __END__ =head1 NAME C - represents colors in the gray color model: gray value in [0..1]; extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Common/Color/hsb.pm0000644000175000017500000000542112507513572020503 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Common::Color::hsb | # # | A representation of colors in the hsb color model | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Color::hsb; use strict; use warnings; use base qw(LaTeXML::Common::Color); use LaTeXML::Global; sub hsb { my ($self) = @_; return $self; } sub rgb { my ($self) = @_; my ($model, $h, $s, $b) = @$self; my $i = int(6 * $h); my $f = 6 * $h - $i; my $u = $b * (1 - $s * (1 - $f)); my $v = $b * (1 - $s * $f); my $w = $b * (1 - $s); if ($i == 0) { return LaTeXML::Common::Color->new('rgb', $b, $u, $w); } elsif ($i == 1) { return LaTeXML::Common::Color->new('rgb', $v, $b, $w); } elsif ($i == 2) { return LaTeXML::Common::Color->new('rgb', $w, $b, $u); } elsif ($i == 3) { return LaTeXML::Common::Color->new('rgb', $w, $v, $b); } elsif ($i == 4) { return LaTeXML::Common::Color->new('rgb', $u, $w, $b); } elsif ($i == 5) { return LaTeXML::Common::Color->new('rgb', $b, $w, $v); } elsif ($i == 6) { return LaTeXML::Common::Color->new('rgb', $b, $w, $w); } } sub cmy { my ($self) = @_; return $self->rgb->cmy; } sub cmyk { my ($self) = @_; return $self->rgb->cmyk; } sub gray { my ($self) = @_; return $self->rgb->gray; } sub complement { my ($self) = @_; my ($h, $s, $b) = $self->components; my $hp = ($h < 0.5 ? $h + 0.5 : $h - 0.5); my $bp = 1 - $b * (1 - $s); my $sp = ($bp == 0 ? 0 : $b * $s / $bp); return $self->new($hp, $sp, $bp); } sub mix { my ($self, $color, $fraction) = @_; # I don't quite follow what Kern's saying, on a quick read, # so we'll punt by doing the conversion in rgb space, then converting back. return $self->rgb->mix($color, $fraction)->hsb; } #====================================================================== 1; __END__ =head1 NAME C - represents colors in the hsb color model: hue, saturation, brightness in [0..1]; extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Common/Color/rgb.pm0000644000175000017500000000544412507513572020506 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Common::Color::rgb | # # | A representation of colors in the rgb color model | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Color::rgb; use strict; use warnings; use base qw(LaTeXML::Common::Color); use LaTeXML::Global; sub rgb { my ($self) = @_; return $self; } # Convert to cmy, cmyk, sub cmy { my ($self) = @_; return LaTeXML::Common::Color->new('cmy', 1 - $$self[1], 1 - $$self[2], 1 - $$self[3]); } sub cmyk { my ($self) = @_; return $self->cmy->cmyk; } sub gray { my ($self) = @_; return LaTeXML::Common::Color->new('gray', 0.3 * $$self[1] + 0.59 * $$self[2] + 0.11 * $$self[3]); } # See Section 6.3.1 in xcolor documentation Dr.Uwe Kern; xcolor.pdf sub Phi { my ($x, $y, $z, $u, $v) = @_; return LaTeXML::Common::Color->new('hsb', ($u * ($x - $z) + $v * ($x - $y)) / (6 * ($x - $z)), ($x - $z) / $x, $x); } sub hsb { my ($self) = @_; my ($m, $r, $g, $b) = @$self; my $i = 4 * ($r >= $g) + 2 * ($g >= $b) + ($b >= $r); if ($i == 1) { return Phi($b, $g, $r, 3, 1); } elsif ($i == 2) { return Phi($g, $r, $b, 1, 1); } elsif ($i == 3) { return Phi($g, $b, $r, 3, -1); } elsif ($i == 4) { return Phi($r, $b, $g, 5, 1); } elsif ($i == 5) { return Phi($b, $r, $g, 5, -1); } elsif ($i == 6) { return Phi($r, $g, $b, 1, -1); } elsif ($i == 7) { return LaTeXML::Common::Color->new('hsb', 0, 0, $b); } } my @hex = qw(0 1 2 3 4 5 6 7 8 9 A B C D E F); # [CONSTANT] sub hex2 { my ($n) = @_; my $nn = LaTeXML::Common::Number::roundto($n * 255, 0); return $hex[int($nn / 16)] . $hex[$nn % 16]; } sub toHex { my ($self) = @_; my ($model, $r, $g, $b) = @$self; return '#' . hex2($r) . hex2($g) . hex2($b); } #====================================================================== 1; __END__ =head1 NAME C - represents colors in the rgb color model: red, green and blue in [0..1]; extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Common/Config.pm0000644000175000017500000013306612507513572020065 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Common::Config | # # | Configuration logic for LaTeXML | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller | # # | Deyan Ginev #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Config; use strict; use warnings; use Carp; use Getopt::Long qw(:config no_ignore_case); use Pod::Usage; use Pod::Find qw(pod_where); use LaTeXML::Util::Pathname; use LaTeXML::Global; use LaTeXML::Common::Error; use Data::Dumper; our $PROFILES_DB = {}; # Class-wide, caches all profiles that get used while the server is alive our $is_bibtex = qr/(^literal\:\s*\@)|(\.bib$)/; our $is_archive = qr/(^literal\:PK)|(\.zip$)/; sub new { my ($class, %opts) = @_; #TODO: How about defaults in the daemon server use case? Should we support those here? # or are defaults always bad/confusing to allow? %opts = () unless %opts; return bless { dirty => 1, opts => \%opts }, $class; } ########################################### #### Command-line reader ##### ########################################### sub getopt_specification { my (%options) = @_; my $opts = $options{options} || {}; my $spec = { # Basics and Paths "output=s" => \$$opts{destination}, "destination=s" => \$$opts{destination}, "log=s" => \$$opts{log}, "preload=s" => \@{ $$opts{preload} }, "preamble=s" => \$$opts{preamble}, "postamble=s" => \$$opts{postamble}, "base=s" => \$$opts{base}, "path=s" => \@{ $$opts{paths} }, "quiet" => sub { $$opts{verbosity}--; }, "verbose" => sub { $$opts{verbosity}++; }, "strict" => \$$opts{strict}, "includestyles" => \$$opts{includestyles}, "inputencoding=s" => \$$opts{inputencoding}, # Formats "xml" => sub { $$opts{format} = 'xml'; }, "tex" => sub { $$opts{format} = 'tex'; }, "box" => sub { $$opts{format} = 'box'; }, "bibtex" => sub { $$opts{type} = 'BibTeX'; }, "noparse" => sub { $$opts{mathparse} = 'no'; }, "format=s" => \$$opts{format}, "parse=s" => \$$opts{mathparse}, # Profiles "profile=s" => \$$opts{profile}, "cache_key=s" => \$$opts{cache_key}, "mode=s" => \$$opts{profile}, "source=s" => \$$opts{source}, # Output framing "embed" => sub { $$opts{whatsout} = 'fragment'; }, "whatsin=s" => \$$opts{whatsin}, "whatsout=s" => \$$opts{whatsout}, # Daemon options "autoflush=i" => \$$opts{input_limit}, "timeout=i" => \$$opts{timeout}, "expire=i" => \$$opts{expire}, "address=s" => \$$opts{address}, "port=i" => \$$opts{port}, # Post-processing "post!" => \$$opts{post}, "validate!" => \$$opts{validate}, "omitdoctype!" => \$$opts{omitdoctype}, "numbersections!" => \$$opts{numbersections}, "timestamp=s" => \$$opts{timestamp}, # Various choices for math processing. # Note: Could want OM embedded in mml annotation, too. # In general, could(?) want multiple math reps within # OR, multiple math reps combined with # or, in fact, _other_ parallel means? (om?, omdoc? ...) # So, need to separate multiple transformations from the combination. # However, IF combining, then will need to support a id/ref mechanism. "mathimagemagnification=f" => \$$opts{mathimagemag}, "linelength=i" => \$$opts{linelength}, "plane1!" => \$$opts{plane1}, "hackplane1!" => \$$opts{hackplane1}, "mathimages" => sub { _addMathFormat($opts, 'images'); }, "nomathimages" => sub { _removeMathFormat($opts, 'images'); }, "mathsvg" => sub { _addMathFormat($opts, 'svg'); }, "nomathsvg" => sub { _removeMathFormat($opts, 'svg'); }, "presentationmathml|pmml" => sub { _addMathFormat($opts, 'pmml'); }, "contentmathml|cmml" => sub { _addMathFormat($opts, 'cmml'); }, "openmath|om" => sub { _addMathFormat($opts, 'om'); }, "keepXMath|xmath" => sub { _addMathFormat($opts, 'xmath'); }, "nopresentationmathml|nopmml" => sub { _removeMathFormat($opts, 'pmml'); }, "nocontentmathml|nocmml" => sub { _removeMathFormat($opts, 'cmml'); }, "noopenmath|noom" => sub { _removeMathFormat($opts, 'om'); }, "nokeepXMath|noxmath" => sub { _removeMathFormat($opts, 'xmath'); }, "parallelmath" => \$$opts{parallelmath}, # Some general XSLT/CSS/JavaScript options. "stylesheet=s" => \$$opts{stylesheet}, "xsltparameter=s" => \@{ $$opts{xsltparameters} }, "css=s" => \@{ $$opts{css} }, "defaultresources!" => \$$opts{defaultresources}, "javascript=s" => \@{ $$opts{javascript} }, "icon=s" => \$$opts{icon}, # Options for broader document set processing "split!" => \$$opts{split}, "splitat=s" => sub { $$opts{splitat} = $_[1]; $$opts{split} = 1 unless defined $$opts{split}; }, "splitpath=s" => sub { $$opts{splitpath} = $_[1]; $$opts{split} = 1 unless defined $$opts{split}; }, "splitnaming=s" => sub { $$opts{splitnaming} = $_[1]; $$opts{split} = 1 unless defined $$opts{split}; }, "scan!" => \$$opts{scan}, "crossref!" => \$$opts{crossref}, "urlstyle=s" => \$$opts{urlstyle}, "navigationtoc=s" => \$$opts{navtoc}, "navtoc=s" => \$$opts{navtoc}, # Generating indices "index!" => \$$opts{index}, "permutedindex!" => \$$opts{permutedindex}, "splitindex!" => \$$opts{splitindex}, # Generating Bibliographies "bibliography=s" => \@{ $$opts{bibliographies} }, # TODO: Document "splitbibliography!" => \$$opts{splitbibliography}, # Options for two phase processing "prescan" => \$$opts{prescan}, "dbfile=s" => \$$opts{dbfile}, "sitedirectory=s" => \$$opts{sitedirectory}, "sourcedirectory=s" => \$$opts{sourcedirectory}, # For graphics: vaguely similar issues, but more limited. # includegraphics images (eg. ps) can be converted to webimages (eg.png) # picture/pstricks images can be converted to png or possibly svg. "graphicimages!" => \$$opts{dographics}, "graphicsmap=s" => \@{ $$opts{graphicsmaps} }, "svg!" => \$$opts{svg}, "pictureimages!" => \$$opts{picimages}, # HELP "comments!" => \$$opts{comments}, "VERSION!" => \$$opts{showversion}, "debug=s" => \@{ $$opts{debug} }, "documentid=s" => \$$opts{documentid}, "help" => \$$opts{help} }; return ($spec, $opts) unless ($options{type} && ($options{type} eq 'keyvals')); # Representation use case: my $keyvals = $options{keyvals} || []; my $rep_spec = {}; # Representation specification foreach my $key (keys %$spec) { if ($key =~ /^(.+)=\w$/) { my $name = $1; $$rep_spec{$key} = sub { CORE::push @$keyvals, [$name, $_[1]] }; } else { $$rep_spec{$key} = sub { my $ctl = $_[0]->{ctl}; my $used = ($$ctl[0] ? 'no' : '') . $$ctl[1]; CORE::push @$keyvals, [$used, undef] }; } } return ($rep_spec, $keyvals); } # TODO: Separate the keyvals scan from getopt_specification() # into its own sub, using @GETOPT_KEYS entirely. our @GETOPT_KEYS = keys %{ (getopt_specification())[0] }; sub read { my ($self, $argref, %read_options) = @_; my $opts = $$self{opts}; local @ARGV = @$argref; my ($spec) = getopt_specification(options => $opts); my $silent = %read_options && $read_options{silent}; my $getOptions_success = GetOptions(%{$spec}); if (!$getOptions_success && !$silent) { pod2usage(-message => $LaTeXML::IDENTITY, -exitval => 1, -verbose => 99, -input => pod_where({ -inc => 1 }, __PACKAGE__), -sections => 'OPTIONS/SYNOPSIS', -output => \*STDERR); } if (!$silent && $$opts{help}) { pod2usage(-message => $LaTeXML::IDENTITY, -exitval => 1, -verbose => 99, -input => pod_where({ -inc => 1 }, __PACKAGE__), -sections => 'OPTIONS/SYNOPSIS', output => \*STDOUT); } # Check that options for system I/O (destination and log) are valid before wasting any time... foreach my $IO_option(qw(destination log)) { if ($$opts{$IO_option}) { $$opts{$IO_option} = pathname_canonical($$opts{$IO_option}); if (my $dir = pathname_directory($$opts{$IO_option})) { pathname_mkdir($dir) or croak "Couldn't create $IO_option directory $dir: $!"; } } } # Removed math formats are irrelevant for conversion: delete $$opts{removed_math_formats}; if ($$opts{showversion}) { print STDERR "$LaTeXML::IDENTITY\n"; exit(1); } $$opts{source} = $ARGV[0] unless $$opts{source}; # Special source-based guessing needs to happen here, # as we won't have access to the source file/literal/resource later on: if (!$$opts{type} || ($$opts{type} eq 'auto')) { $$opts{type} = 'BibTeX' if ($$opts{source} && ($$opts{source} =~ /$is_bibtex/)); } if (!$$opts{whatsin}) { $$opts{whatsin} = 'archive' if ($$opts{source} && ($$opts{source} =~ /$is_archive/)); } return $getOptions_success; } sub read_keyvals { my ($self, $conversion_options, %read_options) = @_; my $cmdopts = []; while (my ($key, $value) = splice(@$conversion_options, 0, 2)) { # TODO: Is skipping over empty values ever harmful? Do we have non-empty defaults anywhere? next if (!length($value)) && (grep { /^$key\=/ } @GETOPT_KEYS); $key = "--$key" unless $key =~ /^\-\-/; $value = length($value) ? "=$value" : ''; CORE::push @$cmdopts, "$key$value"; } # Read into a Config object: return $self->read($cmdopts, %read_options); } sub scan_to_keyvals { my ($self, $argref, %read_options) = @_; local @ARGV = @$argref; my ($spec, $keyvals) = getopt_specification(type => 'keyvals'); my $silent = %read_options && $read_options{silent}; my $getOptions_success = GetOptions(%$spec); if (!$getOptions_success && !$silent) { pod2usage(-message => $LaTeXML::IDENTITY, -exitval => 1, -verbose => 99, -input => pod_where({ -inc => 1 }, __PACKAGE__), -sections => 'OPTIONS/SYNOPSIS', -output => \*STDERR); } CORE::push @$keyvals, ['source', $ARGV[0]] if $ARGV[0]; return $getOptions_success && $keyvals; } ########################################### #### Options Object Hashlike API ##### ########################################### sub get { my ($self, $key, $value) = @_; return $$self{opts}{$key}; } sub set { my ($self, $key, $value) = @_; $$self{dirty} = 1; $$self{opts}{$key} = $value; return; } sub push { my ($self, $key, $value) = @_; $$self{dirty} = 1; $$self{opts}{$key} = [] unless ref $$self{opts}{$key}; CORE::push @{ $$self{opts}{$key} }, $value; return; } sub delete { my ($self, $key) = @_; $$self{dirty} = 1; delete $$self{opts}{$key}; return; } sub exists { my ($self, $key) = @_; return exists $$self{opts}{$key}; } sub keys { my ($self) = @_; return keys %{ $$self{opts} }; } sub options { my ($self) = @_; return $$self{opts}; } sub clone { my ($self) = @_; my $clone = LaTeXML::Common::Config->new(%{ $self->options }); $$clone{dirty} = $$self{dirty}; return $clone; } ########################################### #### Option Sanity Checking ##### ########################################### # Perform all option sanity checks sub check { my ($self) = @_; return unless $$self{dirty}; # 1. Resolve profile $self->_obey_profile; # 2. Place sane defaults where needed return $self->_prepare_options; } sub _obey_profile { my ($self) = @_; $$self{dirty} = 1; my $opts = $$self{opts}; my $profile = lc($$opts{profile} || 'custom'); $profile =~ s/\.opt$//; # Look at the PROFILES_DB or find a profiles file (otherwise fallback to custom) my $profile_opts = {}; if ($profile ne 'custom') { if (defined $$PROFILES_DB{$profile}) { %$profile_opts = %{ $$PROFILES_DB{$profile} } } elsif (my $file = pathname_find($profile . '.opt', paths => [], types => [], installation_subdir => 'resources/Profiles')) { my $conf_tmp = LaTeXML::Common::Config->new; $conf_tmp->read(_read_options_file($file)); $profile_opts = $conf_tmp->options; } else { # Throw an error, fallback to custom carp("Warning:unexpected:$profile Profile $profile was not recognized, reverting to 'custom'\n"); $$opts{profile} = 'custom'; $profile = 'custom'; } } # Erase the profile, save it as cache key delete $$opts{profile}; $$opts{cache_key} = $profile unless defined $$opts{cache_key}; if (%$profile_opts) { # Merge the new options with the profile defaults: for my $key (grep { defined $$opts{$_} } (CORE::keys %$opts)) { if ($key =~ /^p(ath|reload)/) { # Paths and preloads get merged in $$profile_opts{$key} = [] unless defined $$profile_opts{$key}; foreach my $entry (@{ $$opts{$key} }) { my $new = 1; foreach (@{ $$profile_opts{$key} }) { if ($entry eq $_) { $new = 0; last; } } # If new to the array, push: CORE::push(@{ $$profile_opts{$key} }, $entry) if ($new); } } else { # The other options get overwritten $$profile_opts{$key} = $$opts{$key}; } } %$opts = %$profile_opts; # Move back into the user options } return; } # TODO: Best way to throw errors when options don't work out? # How about in the case of Extras::ReadOptions? # Error() and Warn() would be neat, but we have to make sure STDERR is caught beforehand. # Also, there is no eval() here, so we might need a softer handling of Error()s. sub _prepare_options { my ($self) = @_; my $opts = $$self{opts}; #====================================================================== # I. Sanity check and Completion of Core options. #====================================================================== # "safe" and semi-perlcrtic acceptable way to set DEBUG inside arbitrary modules. # Note: 'LaTeXML' refers to the top-level class { no strict 'refs'; foreach my $ltx_class (@{ $$opts{debug} || [] }) { if ($ltx_class eq 'LaTeXML') { ${'LaTeXML::DEBUG'} = 1; } else { ${ 'LaTeXML::' . $ltx_class . '::DEBUG' } = 1; } } } $$opts{timeout} = 600 if ((!defined $$opts{timeout}) || ($$opts{timeout} !~ /\d+/)); # 10 minute timeout default $$opts{expire} = 600 if ((!defined $$opts{expire}) || ($$opts{expire} !~ /\d+/)); # 10 minute timeout default $$opts{mathparse} = 'RecDescent' unless defined $$opts{mathparse}; if ($$opts{mathparse} eq 'no') { $$opts{mathparse} = 0; $$opts{nomathparse} = 1; } #Backwards compatible $$opts{verbosity} = 0 unless defined $$opts{verbosity}; $$opts{preload} = [] unless defined $$opts{preload}; $$opts{paths} = ['.'] unless defined $$opts{paths}; @{ $$opts{paths} } = map { pathname_canonical($_) } @{ $$opts{paths} }; foreach (('destination', 'dbfile', 'sourcedirectory', 'sitedirectory')) { $$opts{$_} = pathname_canonical($$opts{$_}) if defined $$opts{$_}; } if (!defined $$opts{whatsin}) { if ($$opts{preamble} || $$opts{postamble}) { # Preamble or postamble imply a fragment whatsin $$opts{whatsin} = 'fragment'; } else { # Default input chunk is a document $$opts{whatsin} = 'document'; } } $$opts{whatsout} = 'document' unless defined $$opts{whatsout}; $$opts{type} = 'auto' unless defined $$opts{type}; unshift(@{ $$opts{preload} }, ('TeX.pool', 'LaTeX.pool', 'BibTeX.pool')) if ($$opts{type} eq 'BibTeX'); # Destination extension might indicate the format: if ((!defined $$opts{extension}) && (defined $$opts{destination})) { if ($$opts{destination} =~ /\.([^.]+)$/) { $$opts{extension} = $1; } } if ((!defined $$opts{format}) && (defined $$opts{destination})) { if ($$opts{destination} =~ /\.([^.]+)$/) { $$opts{format} = $1; } } if ((!defined $$opts{extension}) && (defined $$opts{format})) { if ($$opts{format} =~ /^html/) { $$opts{extension} = 'html'; } elsif ($$opts{format} =~ /^xhtml/) { $$opts{extension} = 'xhtml'; } else { $$opts{extension} = 'xml'; } } if ($$opts{format}) { # Lower-case for sanity's sake $$opts{format} = lc($$opts{format}); $$opts{format} = 'html5' if $$opts{format} eq 'html'; # Default if ($$opts{format} eq 'zip') { # Not encouraged! But try to produce something sensible anyway... $$opts{format} = 'html5'; $$opts{whatsout} = 'archive'; } $$opts{is_html} = ($$opts{format} =~ /^html/); $$opts{is_xhtml} = ($$opts{format} =~ /^(xhtml5?|epub|mobi)$/); $$opts{whatsout} = 'archive' if (($$opts{format} eq 'epub') || ($$opts{format} eq 'mobi')); } #====================================================================== # II. Sanity check and Completion of Post options. #====================================================================== # Any post switch implies post (TODO: whew, lots of those, add them all!): $$opts{math_formats} = [] unless defined $$opts{math_formats}; $$opts{post} = 1 if ((!defined $$opts{post}) && (scalar(@{ $$opts{math_formats} })) || ($$opts{stylesheet}) || $$opts{is_html} || $$opts{is_xhtml} || ($$opts{whatsout} && ($$opts{whatsout} ne 'document')) ); # || ... || ... || ... # $$opts{post}=0 if (defined $$opts{mathparse} && (! $$opts{mathparse})); # No-parse overrides post-processing if ($$opts{post}) { # No need to bother if we're not post-processing # Default: scan and crossref on, other advanced off $$opts{prescan} = undef unless defined $$opts{prescan}; $$opts{dbfile} = undef unless defined $$opts{dbfile}; $$opts{scan} = 1 unless defined $$opts{scan}; $$opts{index} = 1 unless defined $$opts{index}; $$opts{crossref} = 1 unless defined $$opts{crossref}; $$opts{sitedirectory} = defined $$opts{sitedirectory} ? $$opts{sitedirectory} : (defined $$opts{destination} ? pathname_directory($$opts{destination}) : (defined $$opts{dbfile} ? pathname_directory($$opts{dbfile}) : ".")); $$opts{sourcedirectory} = undef unless defined $$opts{sourcedirectory}; $$opts{numbersections} = 1 unless defined $$opts{numbersections}; $$opts{navtoc} = undef unless defined $$opts{numbersections}; $$opts{navtocstyles} = { context => 1, normal => 1, none => 1 } unless defined $$opts{navtocstyles}; $$opts{navtoc} = lc($$opts{navtoc}) if defined $$opts{navtoc}; delete $$opts{navtoc} if ($$opts{navtoc} && ($$opts{navtoc} eq 'none')); if ($$opts{navtoc}) { if (!$$opts{navtocstyles}->{ $$opts{navtoc} }) { croak($$opts{navtoc} . " is not a recognized style of navigation TOC"); } if (!$$opts{crossref}) { croak("Cannot use option \"navigationtoc\" (" . $$opts{navtoc} . ") without \"crossref\""); } } $$opts{urlstyle} = 'server' unless defined $$opts{urlstyle}; $$opts{bibliographies} = [] unless defined $$opts{bibliographies}; # Validation: $$opts{validate} = 1 unless defined $$opts{validate}; # Graphics: $$opts{mathimagemag} = 1.75 unless defined $$opts{mathimagemag}; if ((defined $$opts{destination}) || ($$opts{whatsout} =~ /^archive/)) { # We want the graphics enabled by default, but only when we have a destination $$opts{dographics} = 1 unless defined $$opts{dographics}; $$opts{picimages} = 1 if ($$opts{format} eq "html4") && !defined $$opts{picimages}; } # Split sanity: if ($$opts{split}) { $$opts{splitat} = 'section' unless defined $$opts{splitat}; $$opts{splitnaming} = 'id' unless defined $$opts{splitnaming}; $$opts{splitback} = "//ltx:bibliography | //ltx:appendix | //ltx:index" unless defined $$opts{splitback}; $$opts{splitpaths} = { part => "//ltx:part | " . $$opts{splitback}, chapter => "//ltx:part | //ltx:chapter | " . $$opts{splitback}, section => "//ltx:part | //ltx:chapter | //ltx:section | " . $$opts{splitback}, subsection => "//ltx:part | //ltx:chapter | //ltx:section | //ltx:subsection | " . $$opts{splitback}, subsubsection => "//ltx:part | //ltx:chapter | //ltx:section | //ltx:subsection | //ltx:subsubsection | " . $$opts{splitback} } unless defined $$opts{splitpaths}; $$opts{splitnaming} = _checkOptionValue('--splitnaming', $$opts{splitnaming}, qw(id idrelative label labelrelative)); $$opts{splitat} = _checkOptionValue('--splitat', $$opts{splitat}, CORE::keys %{ $$opts{splitpaths} }); $$opts{splitpath} = $$opts{splitpaths}->{ $$opts{splitat} } unless defined $$opts{splitpath}; } # Check for appropriate combination of split, scan, prescan, dbfile, crossref if ($$opts{split} && (!defined $$opts{destination}) && ($$opts{whatsout} !~ /^archive/)) { croak("Must supply --destination when using --split"); } if ($$opts{prescan} && !$$opts{scan}) { croak("Makes no sense to --prescan with scanning disabled (--noscan)"); } if ($$opts{prescan} && (!defined $$opts{dbfile})) { croak("Cannot prescan documents (--prescan) without specifying --dbfile"); } if (!$$opts{prescan} && $$opts{crossref} && !($$opts{scan} || (defined $$opts{dbfile}))) { croak("Cannot cross-reference (--crossref) without --scan or --dbfile "); } if ($$opts{crossref}) { $$opts{urlstyle} = _checkOptionValue('--urlstyle', $$opts{urlstyle}, qw(server negotiated file)); } if (($$opts{permutedindex} || $$opts{splitindex}) && (!defined $$opts{index})) { $$opts{index} = 1; } if (!$$opts{prescan} && $$opts{index} && !($$opts{scan} || defined $$opts{crossref})) { croak("Cannot generate index (--index) without --scan or --dbfile"); } if (!$$opts{prescan} && @{ $$opts{bibliographies} } && !($$opts{scan} || defined $$opts{crossref})) { croak("Cannot generate bibliography (--bibliography) without --scan or --dbfile"); } if ((!defined $$opts{destination}) && ($$opts{whatsout} !~ /^archive/) && (_checkMathFormat($opts, 'images') || _checkMathFormat($opts, 'svg') || $$opts{dographics} || $$opts{picimages})) { croak("Must supply --destination unless all auxilliary file writing is disabled" . "(--nomathimages --nomathsvg --nographicimages --nopictureimages --nodefaultcss)"); } # Format: #Default is XHTML, XML otherwise (TODO: Expand) $$opts{format} = "xml" if ($$opts{stylesheet}) && (!defined $$opts{format}); $$opts{format} = "xhtml" unless defined $$opts{format}; if (!$$opts{stylesheet}) { if ($$opts{format} eq 'xhtml') { $$opts{stylesheet} = "LaTeXML-xhtml.xsl"; } elsif ($$opts{format} eq "html4") { $$opts{stylesheet} = "LaTeXML-html4.xsl"; } elsif ($$opts{format} =~ /^epub|mobi$/) { $$opts{stylesheet} = "LaTeXML-epub3.xsl"; } elsif ($$opts{format} eq "html5") { $$opts{stylesheet} = "LaTeXML-html5.xsl"; } elsif ($$opts{format} eq "xml") { delete $$opts{stylesheet}; } else { croak("Unrecognized target format: " . $$opts{format}); } } # Check format and complete math and image options if ($$opts{format} eq 'html4') { $$opts{svg} = 0 unless defined $$opts{svg}; # No SVG by default in HTML. croak("Default html4 stylesheet only supports math images, not " . join(', ', @{ $$opts{math_formats} })) if (!defined $$opts{stylesheet}) && scalar(grep { $_ ne 'images' } @{ $$opts{math_formats} }); croak("Default html stylesheet does not support svg") if $$opts{svg}; $$opts{math_formats} = []; _maybeAddMathFormat($opts, 'images'); } $$opts{svg} = 1 unless defined $$opts{svg}; # If we're not making HTML, SVG is on by default # PMML default if we're HTMLy and all else fails and no mathimages: if (((!defined $$opts{math_formats}) || (!scalar(@{ $$opts{math_formats} }))) && ($$opts{is_html} || $$opts{is_xhtml})) { CORE::push @{ $$opts{math_formats} }, 'pmml'; } # use parallel markup if there are multiple formats requested. $$opts{parallelmath} = 1 if ($$opts{math_formats} && (@{ $$opts{math_formats} } > 1)); } # If really nothing hints to define format, then default it to XML $$opts{format} = 'xml' unless defined $$opts{format}; $$self{dirty} = 0; return; } ## Utilities: sub _addMathFormat { my ($opts, $fmt) = @_; $$opts{math_formats} = [] unless defined $$opts{math_formats}; CORE::push(@{ $$opts{math_formats} }, $fmt) unless (grep { $_ eq $fmt } @{ $$opts{math_formats} }) || $$opts{removed_math_formats}->{$fmt}; return; } sub _removeMathFormat { my ($opts, $fmt) = @_; @{ $$opts{math_formats} } = grep { $_ ne $fmt } @{ $$opts{math_formats} }; $$opts{removed_math_formats}->{$fmt} = 1; return; } sub _maybeAddMathFormat { my ($opts, $fmt) = @_; unshift(@{ $$opts{math_formats} }, $fmt) unless (grep { $_ eq $fmt } @{ $$opts{math_formats} }) || $$opts{removed_math_formats}{$fmt}; return; } sub _checkMathFormat { my ($opts, $fmt) = @_; return grep { $_ eq $fmt } @{ $$opts{math_formats} }; } sub _checkOptionValue { my ($option, $value, @choices) = @_; if ($value) { foreach my $choice (@choices) { return $choice if substr($choice, 0, length($value)) eq $value; } } croak("Value for $option, $value, doesn't match " . join(', ', @choices)); } ### This is from t/lib/TestDaemon.pm and ideally belongs in Util::Pathname sub _read_options_file { my ($file) = @_; my $opts = []; my $OPT; print STDERR "(Loading profile $file..."; unless (open($OPT, "<", $file)) { Error('expected', $file, "Could not open options file '$file'"); return; } while (my $line = <$OPT>) { # Cleanup comments, padding on the input line. $line =~ s/(? - Configuration logic for LaTeXML =head1 SYNPOSIS use LaTeXML::Common::Config; my $config = LaTeXML::Common::Config->new( profile=>'name', timeout=>number ... ); $config->read(\@ARGV); $config->check; my $value = $config->get($name); $config->set($name,$value); $config->delete($name); my $bool = $config->exists($name); my @keys = $config->keys; my $options_hashref = $config->options; my $config_clone = $config->clone; =head1 DESCRIPTION Configuration management class for LaTeXML options. * Responsible for defining the options interface and parsing the usual Perl command-line options syntax * Provides the intuitive getters, setters, as well as hash methods for manipulating the option values. * Also supports cloning into new configuration objects. =head2 METHODS =over 4 =item C<< my $config = LaTeXML::Common::Config->new(%options); >> Creates a new configuration object. Note that you should try not to provide your own %options hash but rather create an empty configuration and use $config->read to read in the options. =item C<< $config->read(\@ARGV); >> This is the main method for parsing in LaTeXML options. The input array should either be @ARGV, e.g. when the options were provided from the command line using the classic Getopt::Long syntax, or any other array reference that conforms to that setup. =item C<< $config->check; >> Ensures that the configuration obeys the given profile and performs a set of assignments of meaningful defaults (when needed) and normalizations (for relative paths, etc). =item C<< my $value = $config->get($name); >> Classic getter for the $value of an option $name. =item C<< $config->set($name,$value); >> Classic setter for the $value of an option $name. =item C<< $config->delete($name); >> Deletes option $name from the configuration. =item C<< my $bool = $config->exists($name); >> Checks whether the key $name exists in the options hash of the configuration. Similarly to Perl's "exist" for hashes, it returns true even when the option's value is undefined. =item C<< my @keys = $config->keys; >> Similar to "keys %hash" in Perl. Returns an array of all option names. =item C<< my $options_hashref = $config->options; >> Returns the actual hash reference that holds all options within the configuration object. =item C<< my $config_clone = $config->clone; >> Clones $config into a new LaTeXML::Common::Config object, $config_clone. =back =head1 OPTIONS =head2 SYNOPSIS latexmls/latexmlc [options] Options: --destination=file specifies destination file. --output=file [obsolete synonym for --destination] --preload=module requests loading of an optional module; can be repeated --preamble=file loads a tex file containing document frontmatter. MUST include \begin{document} or equivalent --postamble=file loads a tex file containing document backmatter. MUST include \end{document} or equivalent --includestyles allows latexml to load raw *.sty file; by default it avoids this. --base=dir sets the base directory that the server operates in. Useful when converting documents that employ relative paths. --path=dir adds dir to the paths searched for files, modules, etc; --log=file specifies log file (default: STDERR) --autoflush=count Automatically restart the daemon after "count" inputs. Good practice for vast batch jobs. (default: 100) --timeout=secs Timecap for conversions (default 600) --expire=secs Timecap for server inactivity (default 600) --address=URL Specify server address (default: localhost) --port=number Specify server port (default: 3354) --documentid=id assign an id to the document root. --quiet suppress messages (can repeat) --verbose more informative output (can repeat) --strict makes latexml less forgiving of errors --bibtex processes a BibTeX bibliography. --xml requests xml output (default). --tex requests TeX output after expansion. --box requests box output after expansion and digestion. --format=name requests "name" as the output format. Supported: tex,box,xml,html4,html5,xhtml html implies html5 --noparse suppresses parsing math (default: off) --parse=name enables parsing math (default: on) and selects parser framework "name". Supported: Marpa, RecDescent --profile=name specify profile as defined in LaTeXML::Common::Config Supported: standard|math|fragment|... (default: standard) --mode=name Alias for profile --whatsin=chunk Defines the provided input chunk, choose from document (default), fragment and formula --whatsout=chunk Defines the expected output chunk, choose from document (default), fragment and formula --post requests a followup post-processing --embed requests an embeddable XHTML snippet (requires: --post,--profile=fragment) DEPRECATED: Use --whatsout=fragment TODO: Remove completely --stylesheet specifies a stylesheet, to be used by the post-processor. --css=cssfile adds a css stylesheet to html/xhtml (can be repeated) --nodefaultresources disables processing built-in resources --javscript=jsfile adds a link to a javascript file into html/html5/xhtml (can be repeated) --xsltparameter=name:value passes parameters to the XSLT. --sitedirectory=dir sets the base directory of the site --sourcedirectory=dir sets the base directory of the original TeX source --mathimages converts math to images (default for html4 format) --nomathimages disables the above --mathimagemagnification=mag specifies magnification factor --plane1 use plane-1 unicode for symbols (default, if needed) --noplane1 do not use plane-1 unicode --pmml converts math to Presentation MathML (default for xhtml and html5 formats) --cmml converts math to Content MathML --openmath converts math to OpenMath --keepXMath keeps the XMath of a formula as a MathML annotation-xml element --nocomments omit comments from the output --inputencoding=enc specify the input encoding. --VERSION show version number. --debug=package enables debugging output for the named package --help shows this help message. Note that the profiles come with a variety of preset options. To customize your own conversion setup, use --whatsin=math|fragment|document instead, respectively, as well as --whatsout=math|fragment|document. If you want to provide a TeX snippet directly on input, rather than supply a filename, use the C protocol to prefix your snippet. For reliable communication and a stable conversion experience, invoke latexmls only through the latexmlc client (you need to set --expire to a positive value, in order to request auto-spawning of a dedicated conversion server). =head2 DETAILS =over 4 =item C<--destination>=I Specifies the destination file; by default the XML is written to STDOUT. =item C<--preload>=I Requests the loading of an optional module or package. This may be useful if the TeX code does not specificly require the module (eg. through input or usepackage). For example, use C<--preload=LaTeX.pool> to force LaTeX mode. =item C<--preamble>=I Requests the loading of a tex file with document frontmatter, to be read in before the converted document, but after all --preload entries. Note that the given file MUST contain \begin{document} or an equivalent environment start, when processing LaTeX documents. If the file does not contain content to appear in the final document, but only macro definitions and setting of internal counters, it is more appropriate to use --preload instead. =item C<--postamble>=I Requests the loading of a tex file with document backmatter, to be read in after the converted document. Note that the given file MUST contain \end{document} or an equivalent environment end, when processing LaTeX documents. =item C<--includestyles> This optional allows processing of style files (files with extensions C, C, C, C). By default, these files are ignored unless a latexml implementation of them is found (with an extension of C). These style files generally fall into two classes: Those that merely affect document style are ignorable in the XML. Others define new markup and document structure, often using deeper LaTeX macros to achieve their ends. Although the omission will lead to other errors (missing macro definitions), it is unlikely that processing the TeX code in the style file will lead to a correct document. =item C<--path>=I Add I to the search paths used when searching for files, modules, style files, etc; somewhat like TEXINPUTS. This option can be repeated. =item C<--log>=I Specifies the log file; be default any conversion messages are printed to STDERR. =item C<--autoflush>=I Automatically restart the daemon after converting "count" inputs. Good practice for vast batch jobs. (default: 100) =item C<--expire>=I Set an inactivity timeout value in seconds. If the daemon is not given any input for the timeout period it will automatically self-destruct. The default value is 600 seconds, set to 0 to never expire, -1 to entirely opt out of using a server. =item C<--timeout>=I Set time cap for conversion jobs, in seconds. Any job failing to convert in the time range would return with a Fatal error of timing out. Default value is 600, set to 0 to disable. =item C<--address>=I Specify server address (default: localhost) =item C<--port>=I Specify server port (default: 3334 for math, 3344 for fragment and 3354 for standard) =item C<--documentid>=I Assigns an ID to the root element of the XML document. This ID is generally inherited as the prefix of ID's on all other elements within the document. This is useful when constructing a site of multiple documents so that all nodes have unique IDs. =item C<--quiet> Reduces the verbosity of output during processing, used twice is pretty silent. =item C<--verbose> Increases the verbosity of output during processing, used twice is pretty chatty. Can be useful for getting more details when errors occur. =item C<--strict> Specifies a strict processing mode. By default, undefined control sequences and invalid document constructs (that violate the DTD) give warning messages, but attempt to continue processing. Using C<--strict> makes them generate fatal errors. =item C<--bibtex> Forces latexml to treat the file as a BibTeX bibliography. Note that the timing is slightly different than the usual case with BibTeX and LaTeX. In the latter case, BibTeX simply selects and formats a subset of the bibliographic entries; the actual TeX expansion is carried out when the result is included in a LaTeX document. In contrast, latexml processes and expands the entire bibliography; the selection of entries is done during post-processing. This also means that any packages that define macros used in the bibliography must be specified using the C<--preload> option. =item C<--xml> Requests XML output; this is the default. =item C<--tex> Requests TeX output for debugging purposes; processing is only carried out through expansion and digestion. This may not be quite valid TeX, since Unicode may be introduced. =item C<--box> Requests Box output for debugging purposes; processing is carried out through expansion and digestions, and the result is printed. =item C<--format=name> Requests an explicitly provided "name" as the output format of the conversion. Currently supported: tex, box, xml, html4, html5, xhtml Tip: If you wish to apply your own custom XSLT stylesheet, select "xml" as the desired format. =item C<--noparse> Suppresses parsing math (default: parsing is on) =item C<--parse=name> Enables parsing math (default: parsing is on) and selects parser framework "name". Supported: Marpa, RecDescent, no Tip: --parse=no is equivalent to --noparse =item C<--profile> Variety of shorthand profiles, described at C. Example: C =item C<--post> Request post-processing. Enabled by default is processing graphics and cross-referencing. =item C<--embed> TODO: Deprecated, use --whatsout=fragment Requests an embeddable XHTML div (requires: --post --format=xhtml), respectively the top division of the document's body. Caveat: This experimental mode is enabled only for fragment profile and post-processed documents (to XHTML). =item C<--mathimages>, C<--nomathimages> Requests or disables the conversion of math to images. Conversion is the default for html4 format. =item C<--mathsvg>, C<--nomathsvg> Requests or disables the conversion of math to svg images. =item C<--mathimagemagnification=>I Specifies the magnification used for math images, if they are made. Default is 1.75. =item C<--pmml> Requests conversion of math to Presentation MathML. Presentation MathML is the default math processor for the XHTML/HTML5 formats. Will enable C<--post>. =item C<--cmml> Requests or disables conversion of math to Content MathML. Conversion is disabled by default. B that this conversion is only partially implemented. Will enable C<--post>. =item C<--openmath> Requests or disables conversion of math to OpenMath. Conversion is disabled by default. B that this conversion is not yet supported in C. Will enable C<--post>. =item C<--xmath> and C<--keepXMath> By default, when any of the MathML or OpenMath conversions are used, the intermediate math representation will be removed; Explicitly specifying --xmath|keepXMath preserves this format. Will enable C<--post>. =item C<--stylesheet>=I Sets a stylesheet of choice to be used by the postprocessor. Will enable C<--post>. =item C<--css>=I Adds I as a css stylesheet to be used in the transformed html/xhtml. Multiple stylesheets can be used; they are included in the html in the order given, following the default C (but see C<--nodefaultresources>). Some stylesheets included in the distribution are --css=navbar-left Puts a navigation bar on the left. (default omits navbar) --css=navbar-right Puts a navigation bar on the left. --css=theme-blue A blue coloring theme for headings. --css=amsart A style suitable for journal articles. =item C<--javascript>=I Includes a link to the javascript file I, to be used in the transformed html/html5/xhtml. Multiple javascript files can be included; they are linked in the html in the order given. The javascript file is copied to the destination directory, unless it is an absolute url. =item C<--nodefaultresources> Disables the copying and inclusion of resources added by the binding files; This includes CSS, javascript or other files. This does not affect resources explicitly requested by the C<--css> or C<--javascript> options. =item C<--timestamp>=I Provides a timestamp (typically a time and date) to be embedded in the comments by the stock XSLT stylesheets. If you don't supply a timestamp, the current time and date will be used. (You can use C<--timestamp=0> to omit the timestamp). =item C<--xsltparameter>=I:I Passes parameters to the XSLT stylesheet. See the manual or the stylesheet itself for available parameters. =item C<--nocomments> Normally latexml preserves comments from the source file, and adds a comment every 25 lines as an aid in tracking the source. The option --nocomments discards such comments. =item C<--sitedirectory=>I Specifies the base directory of the overall web site. Pathnames in the database are stored in a form relative to this directory to make it more portable. =item C<--sourcedirectory>=I Specifies the directory where the original latex source is located. Unless LaTeXML is run from that directory, or it can be determined from the xml filename, it may be necessary to specify this option in order to find graphics and style files. =item C<--inputencoding=>I Specify the input encoding, eg. C<--inputencoding=iso-8859-1>. The encoding must be one known to Perl's Encode package. Note that this only enables the translation of the input bytes to UTF-8 used internally by LaTeXML, but does not affect catcodes. In such cases, you should be using the inputenc package. Note also that this does not affect the output encoding, which is always UTF-8. =item C<--VERSION> Shows the version number of the LaTeXML package.. =item C<--debug>=I Enables debugging output for the named package. The package is given without the leading LaTeXML::. =item C<--help> Shows this help message. =back =head1 AUTHOR Bruce Miller Deyan Ginev =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Dimension.pm�������������������������������������������������������0000644�0001750�0001750�00000006122�12507513572�020575� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Dimension | # # | Representation of Dimensions | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Dimension; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use base qw(LaTeXML::Common::Number); use base qw(Exporter); our @EXPORT = (qw(&Dimension)); #====================================================================== # Exported constructor. sub Dimension { my ($scaledpoints) = @_; return LaTeXML::Common::Dimension->new($scaledpoints); } #====================================================================== sub new { my ($class, $sp) = @_; $sp = "0" unless $sp; if ($sp =~ /^(-?\d*\.?\d*)([a-zA-Z][a-zA-Z])$/) { # Dimensions given. $sp = $1 * $STATE->convertUnit($2); } return bless [$sp || "0"], $class; } sub toString { my ($self) = @_; return pointformat($$self[0]); } sub toAttribute { my ($self) = @_; return attributeformat($$self[0]); } sub stringify { my ($self) = @_; return "Dimension[" . $$self[0] . "]"; } # Utility for formatting scaled points sanely. sub pointformat { my ($sp) = @_; # As much as I'd like to make this more friendly & readable # there's TeX code that depends on getting enough precision # If you use %.5f, tikz (for example) will sometimes hang trying to do arithmetic! # But see toAttribute for friendlier forms.... # [do we need the juggling in attributeFormat to be reproducible?] my $s = sprintf("%.6f", ($sp / 65536)); $s =~ s/0+$// if $s =~ /\./; # $s =~ s/\.$//; $s =~ s/\.$/.0/; # Seems TeX prints .0 which in odd corner cases, people use? return $s . 'pt'; } sub attributeformat { my ($sp) = @_; return sprintf('%.1fpt', LaTeXML::Common::Number::roundto($sp / 65536, 1)); } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of dimensions; extends L. =head2 Exported functions =over 4 =item C<< $dimension = Dimension($dim); >> Creates a Dimension object. C<$num> can be a string with the number and units (with any of the usual TeX recognized units), or just a number standing for scaled points (sp). =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Error.pm�����������������������������������������������������������0000644�0001750�0001750�00000047322�12507513572�017750� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Error | # # | Error handler | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Error; use strict; use warnings; use LaTeXML::Global; ##use LaTeXML::Common::Object; use Time::HiRes; use base qw(Exporter); our @EXPORT = ( # Error Reporting qw(&Fatal &Error &Warn &Info), # Progress reporting qw( &NoteProgress &NoteProgressDetailed &NoteBegin &NoteEnd), ); #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Note: The exported symbols should ultimately be exported as part # of LaTeXML::Common, or something like that, to be used BOTH in # Digestion & Post-Processing. # ====================================================================== # We want LaTeXML::Global to import this package, # but we also want to use some of it's low-level functions. sub ToString { my ($item, @more) = @_; return ($LaTeXML::BAILOUT ? "$item" : LaTeXML::Common::Object::ToString($item, @more)); } sub Stringify { my ($item, @more) = @_; return ($LaTeXML::BAILOUT ? "$item" : LaTeXML::Common::Object::Stringify($item, @more)); } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Error reporting # Public API sub Fatal { my ($category, $object, $where, $message, @details) = @_; my $state = $STATE; my $verbosity = $state && $state->lookupValue('VERBOSITY') || 0; ## if (!$LaTeXML::Common::Error::InHandler && defined($^S)) { if (!$LaTeXML::Common::Error::InHandler) { local $LaTeXML::BAILOUT = $LaTeXML::BAILOUT; if (checkRecursiveError()) { $LaTeXML::BAILOUT = 1; push(@details, "Recursive Error!"); } $state && $state->noteStatus('fatal'); $message = generateMessage("Fatal:" . $category . ":" . ToString($object), $where, $message, 1, # ?!?!?!?!?! # or just verbosity code >>>1 ??? @details, ($verbosity > 0 ? ("Stack Trace:", stacktrace()) : ())); # We're about to DIE, which will bypass the usual status message, so add it here. $message .= $state->getStatusMessage if $state; } else { # If we ARE in a recursive call, the actual message is $details[0] $message = $details[0] if $details[0]; } local $LaTeXML::Common::Error::InHandler = 1; local $$SIG{__DIE__} = undef; die $message; } sub checkRecursiveError { my @caller; for (my $frame = 2 ; @caller = caller($frame) ; $frame++) { if ($caller[3] =~ /^LaTeXML::(Global::ToString|Global::Stringify)$/) { # print STDERR "RECURSED ON $caller[3]\n"; return 1; } } return; } # Note that "100" is hardwired into TeX, The Program!!! my $MAXERRORS = 100; # [CONSTANT] # Should be fatal if strict is set, else warn. sub Error { my ($category, $object, $where, $message, @details) = @_; my $state = $STATE; my $verbosity = $state && $state->lookupValue('VERBOSITY') || 0; if ($state && $state->lookupValue('STRICT')) { Fatal($category, $object, $where, $message, @details); } else { $state && $state->noteStatus('error'); print STDERR generateMessage("Error:" . $category . ":" . ToString($object), $where, $message, 1, @details) if $verbosity >= -2; } if (!$state || ($state->getStatus('error') || 0) > $MAXERRORS) { Fatal('too_many_errors', $MAXERRORS, $where, "Too many errors (> $MAXERRORS)!"); } return; } # Warning message; results may be OK, but somewhat unlikely sub Warn { my ($category, $object, $where, $message, @details) = @_; my $state = $STATE; my $verbosity = $state && $state->lookupValue('VERBOSITY') || 0; $state && $state->noteStatus('warning'); print STDERR generateMessage("Warning:" . $category . ":" . ToString($object), $where, $message, 0, @details) if $verbosity >= -1; return; } # Informational message; results likely unaffected # but the message may give clues about subsequent warnings or errors sub Info { my ($category, $object, $where, $message, @details) = @_; my $state = $STATE; my $verbosity = $state && $state->lookupValue('VERBOSITY') || 0; $state && $state->noteStatus('info'); print STDERR generateMessage("Info:" . $category . ":" . ToString($object), $where, $message, -1, @details) if $verbosity >= 0; return; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Progress Reporting #********************************************************************** # Progress reporting. sub NoteProgress { my (@stuff) = @_; my $state = $STATE; my $verbosity = $state && $state->lookupValue('VERBOSITY') || 0; print STDERR @stuff if $verbosity >= 0; return; } sub NoteProgressDetailed { my (@stuff) = @_; my $state = $STATE; my $verbosity = $state && $state->lookupValue('VERBOSITY') || 0; print STDERR @stuff if $verbosity >= 1; return; } sub NoteBegin { my ($stage) = @_; my $state = $STATE; my $verbosity = $state && $state->lookupValue('VERBOSITY') || 0; if ($state && ($verbosity >= 0)) { $state->assignMapping('NOTE_TIMERS', $stage, [Time::HiRes::gettimeofday]); print STDERR "\n($stage..."; } return; } sub NoteEnd { my ($stage) = @_; my $state = $STATE; my $verbosity = $state && $state->lookupValue('VERBOSITY') || 0; if (my $start = $state && $state->lookupMapping('NOTE_TIMERS', $stage)) { $state->assignMapping('NOTE_TIMERS', $stage, undef); if ($verbosity >= 0) { my $elapsed = Time::HiRes::tv_interval($start, [Time::HiRes::gettimeofday]); print STDERR sprintf(" %.2f sec)", $elapsed); } } return; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Handlers for perl's die & warn # We'll try to decode some common errors to make them more usable # for build systems. my $quoted_re = qr/\"([^\"]*)\"/; # [CONSTANT] my $cantcall_re = qr/Can't call method/; # [CONSTANT] my $cantlocate_re = qr/Can't locate object method/; # [CONSTANT] my $noself_re = qr/on an undefined value|without a package or object reference/; # [CONSTANT] my $via_re = qr/via package/; # [CONSTANT] my $at_re = qr/at (.*)/; # [CONSTANT] sub perl_die_handler { my (@line) = @_; # We try to find a meaningful name for where the error occurred; # That's the thing that is "misdefined", after all. # Not completely sure we're looking in the right place up the stack, though. if ($line[0] =~ /^$cantcall_re\s+$cantcall_re\s+($noself_re)\s+$at_re$/) { my ($method, $kind, $where) = ($1, $2, $3); Fatal('misdefined', callerName(2), $where, @line); } elsif ($line[0] =~ /^$cantlocate_re\s+$quoted_re\s+$via_re\s+$quoted_re\s+$at_re$/) { my ($method, $class, $where) = ($1, $2, $3); Fatal('misdefined', callerName(2), $where, @line); } elsif ($line[0] =~ /^Not an? (\w*) reference at (.*)$/) { my ($type, $where) = ($1, $2); Fatal('misdefined', callerName(2), $where, @line); } elsif ($line[0] =~ /^File (.*?) had an error:/) { my ($file) = ($1); Fatal('misdefined', $file, undef, @line); } else { Fatal('perl', 'die', undef, "Perl died", @line); } return; } sub perl_warn_handler { my (@line) = @_; Warn('perl', 'warn', undef, "Perl warning", @line); return; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Internals # Synthesize an error message describing what happened, and where. # $detail specifies the level of detail # $detail == -1 : no context or stack # $detail == 0 : context, no stack # $detail == +1 : context & stack # including a level requesting full stack trace? sub generateMessage { my ($errorcode, $where, $message, $detail, @extra) = @_; #---------------------------------------- # Generate location information; basic and for stack trace. # If we've been given an object $where, where the error occurred, use it. my $docloc = getLocation($where); # $message and each of @extra should be single lines ($message, @extra) = grep { $_ ne '' } map { split("\n", $_) } grep { defined $_ } $message, @extra; # The initial portion of the message will consist of: $message = '' unless defined $message; my @lines = ( # Start with the error code & primary error message $errorcode . ' ' . $message, # Followed by single line location of where the message occurred (if we know) ($docloc ? ($docloc) : ()), # and then any additional message lines supplied @extra); #---------------------------------------- # Now add some additional context # NOTE: Should skip this for INFO # NOTE: Need to pass more of this onto the objects themselves.... # What should it be called? # showErrorContext() ????? $detail = 0 unless defined $detail; # Increment $detail if $verbosity > 0, unless $detail = -1, my $verbosity = ($STATE && $STATE->lookupValue('VERBOSITY')) || 0; if (($detail > -1) && ($verbosity > 0)) { $detail = 0 if defined $verbosity && $verbosity < -1; $detail++ if defined $verbosity && $verbosity > +1; } # FIRST line of stack trace information ought to look at the $where my $wheretype = ref $where; if ($detail <= 0) { } # No extra context elsif ($wheretype =~ /^XML::LibXML/) { push(@lines, "Node is " . Stringify($where)); } ## Hmm... if we're being verbose or level is high, we might do this: ### "Currently in ".$doc->getInsertionContext); } elsif ($wheretype =~ 'LaTeXML::Core::Gullet') { push(@lines, $where->showUnexpected); } # Or better? elsif ($wheretype =~ 'LaTeXML::Core::Stomach') { push(@lines, "Recently digested: " . join(' ', map { Stringify($_) } @LaTeXML::LIST)) if $verbosity > 1; } #---------------------------------------- # Add Stack Trace, if that seems worthwhile. if ($detail > -1) { my $nstack = ($detail > 1 ? undef : ($detail > 0 ? 4 : 1)); if (my @objects = objectStack($nstack)) { my $top = shift(@objects); push(@lines, "In " . trim(ToString($top)) . ' ' . ToString(Locator($top))); push(@objects, '...') if @objects && defined $nstack; push(@lines, join('', (map { ' <= ' . trim(ToString($_)) } @objects))) if @objects; } } # finally, join the result into a block of lines, indenting all but the 1st line. return "\n" . join("\n\t", @lines) . "\n"; } sub Locator { my ($object) = @_; return ($object && $object->can('getLocator') ? $object->getLocator : "???"); } sub getLocation { my ($where) = @_; my $wheretype = ref $where; if ($wheretype && ($wheretype =~ /^XML::LibXML/)) { my $box = $LaTeXML::DOCUMENT->getNodeBox($where); return Locator($box) if $box; } elsif ($wheretype && $where->can('getLocator')) { return $where->getLocator; } elsif (defined $where) { return $where; } # Otherwise, try to guess where the error came from! elsif ($LaTeXML::DOCUMENT) { # During construction? my $node = $LaTeXML::DOCUMENT->getNode; my $box = $LaTeXML::DOCUMENT->getNodeBox($node); return Locator($box) if $box; } if ($LaTeXML::BOX) { # In constructor? return Locator($LaTeXML::BOX); } if ($STATE && $STATE->getStomach) { my $gullet = $STATE->getStomach->getGullet; # NOTE: Problems here. # (1) With obsoleting Tokens as a Mouth, we can get pointless "Anonymous String" locators! # (2) If gullet is the source, we probably want to include next token, etc or return $gullet->getLocator(); } return; } sub callerName { my ($frame) = @_; my %info = caller_info(($frame || 0) + 2); return $info{sub}; } sub callerInfo { my ($frame) = @_; my %info = caller_info(($frame || 0) + 2); return "$info{call} @ $info{file} line $info{line}"; } #====================================================================== # This portion adapted from Carp; simplified (but hopefully still correct), # allow stringify overload, handle methods, make more concise! #====================================================================== my $MAXARGS = 8; # [CONSTANT] my $MAXLEN = 40; # Or more? [CONSTANT] sub trim { my ($string) = @_; return $string unless defined $string; $string = substr($string, 0, $MAXLEN - 3) . "..." if (length($string) > $MAXLEN); $string =~ s/\n/\x{240D}/gs; # symbol for CR return $string; } sub caller_info { my ($i) = @_; my (%info, @args); { package DB; @info{qw(package file line sub has_args wantarray evaltext is_require)} = caller($i); @args = @DB::args; } return () unless defined $info{package}; # Work out the effective sub name, or eval, or method ... my $call = ''; if (defined $info{evaltext}) { my $eval = $info{evaltext}; if ($info{is_require}) { $call = "require $eval"; } else { $eval =~ s/([\\\'])/\\$1/g; $call = "eval '" . trim($eval) . "'"; } } elsif ($info{sub} eq '(eval)') { $call = "eval {...}"; } else { $call = $info{sub}; my $method = $call; $method =~ s/^.*:://; # If $arg[0] is blessed, and `can' do $method, then we'll guess it's a method call? if ($info{has_args} && @args && ref $args[0] && ((ref $args[0]) !~ /^(?:SCALAR|ARRAY|HASH|CODE|REF|GLOB|LVALUE)$/) && $args[0]->can($method)) { $call = format_arg(shift(@args)) . "->" . $method; } } # Append arguments, if any. if ($info{has_args}) { @args = map { format_arg($_) } @args; if (@args > $MAXARGS) { $#args = $MAXARGS; push(@args, '...'); } $call .= '(' . join(',', @args) . ')'; } $info{call} = $call; return %info; } sub format_arg { my ($arg) = @_; if (not defined $arg) { $arg = 'undef'; } elsif (ref $arg) { $arg = Stringify($arg); } # Allow overloaded stringify! elsif ($arg =~ /^-?[\d.]+\z/) { } # Leave numbers alone. else { # Otherwise, string, so quote $arg =~ s/'/\\'/g; # Slashify ' $arg =~ s/([[:cntrl::]])/ "\\".chr(ord($1)+ord('A'))/ge; $arg = "'$arg'" } return trim($arg); } # Semi-traditional (but reformatted) stack trace sub stacktrace { my $frame = 0; my $trace = ""; while (my %info = caller_info($frame++)) { next if $info{sub} =~ /^LaTeXML::Common::Error/; ## $info{call} = '' if $info{sub} =~ /^LaTeXML::Common::Error::(?:Fatal|Error|Warn|Info)/; $trace .= "\t$info{call} @ $info{file} line $info{line}\n"; } return $trace; } # Extract blessed `interesting' objects on stack. # Get a maximum of $maxdepth objects (if $maxdepth is defined). sub objectStack { my ($maxdepth) = @_; my $frame = 0; my @objects = (); while (1) { my (%info, @args); { package DB; @info{qw(package file line sub has_args wantarray evaltext is_require)} = caller($frame++); @args = @DB::args; } last unless defined $info{package}; next if ($info{sub} eq '(eval)') || !$info{has_args} || !@args; my $self = $args[0]; # If $arg[0] is blessed, and `can' do $method, then we'll guess it's a method call? # We'll collect such objects provided they can ->getLocator if ((ref $self) && ((ref $self) !~ /^(?:SCALAR|ARRAY|HASH|CODE|REF|GLOB|LVALUE)$/)) { my $method = $info{sub}; $method =~ s/^.*:://; if ($self->can($method)) { next if @objects && ($self eq $objects[-1]); next unless $self->can('getLocator'); push(@objects, $self); last if $maxdepth && (scalar(@objects) >= $maxdepth); } } } return @objects; } #********************************************************************** 1; __END__ =pod =head1 NAME C - Error and Progress Reporting and Logging support. =head1 DESCRIPTION C does some simple stack analysis to generate more informative, readable, error messages for LaTeXML. Its routines are used by the error reporting methods from L, namely C, C and C. =head2 Error Reporting The Error reporting functions all take a similar set of arguments, the differences are in the implied severity of the situation, and in the amount of detail that will be reported. The C<$category> is a string naming a broad category of errors, such as "undefined". The set is open-ended, but see the manual for a list of recognized categories. C<$object> is the object whose presence or lack caused the problem. C<$where> indicates where the problem occurred; passs in the C<$gullet> or C<$stomach> if the problem occurred during expansion or digestion; pass in a document node if it occurred there. A string will be used as is; if an undefined value is used, the error handler will try to guess. The C<$message> should be a somewhat concise, but readable, explanation of the problem, but ought to not refer to the document or any "incident specific" information, so as to support indexing in build systems. C<@details> provides additional lines of information that may be indident specific. =over 4 =item C<< Fatal($category,$object,$where,$message,@details); >> Signals an fatal error, printing C<$message> along with some context. In verbose mode a stack trace is printed. =item C<< Error($category,$object,$where,$message,@details); >> Signals an error, printing C<$message> along with some context. If in strict mode, this is the same as Fatal(). Otherwise, it attempts to continue processing.. =item C<< Warn($category,$object,$where,$message,@details); >> Prints a warning message along with a short indicator of the input context, unless verbosity is quiet. =item C<< Info($category,$object,$where,$message,@details); >> Prints an informational message along with a short indicator of the input context, unless verbosity is quiet. =item C<< NoteProgress($message); >> Prints C<$message> unless the verbosity level below 0. Typically just a short mark to indicate motion, but can be longer; provide your own newlines, if needed. =item C<< NoteProgressDetailed($message); >> Like C, but for noiser progress, only prints when verbosity >= 1. =back =head2 Internal Functions No user serviceable parts inside. These symbols are not exported. =over 4 =item C<< $string = LaTeXML::Common::Error::generateMessage($typ,$msg,$lng,@more); >> Constructs an error or warning message based on the current stack and the current location in the document. C<$typ> is a short string characterizing the type of message, such as "Error". C<$msg> is the error message itself. If C<$lng> is true, will generate a more verbose message; this also uses the VERBOSITY set in the C<$STATE>. Longer messages will show a trace of the objects invoked on the stack, C<@more> are additional strings to include in the message. =item C<< $string = LaTeXML::Common::Error::stacktrace; >> Return a formatted string showing a trace of the stackframes up until this function was invoked. =item C<< @objects = LaTeXML::Common::Error::objectStack; >> Return a list of objects invoked on the stack. This procedure only considers those stackframes which involve methods, and the objects are those (unique) objects that the method was called on. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Float.pm�����������������������������������������������������������0000644�0001750�0001750�00000004604�12507513572�017720� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Float | # # | Representation of floating point objects | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Float; use LaTeXML::Global; use strict; use warnings; use base qw(LaTeXML::Common::Number); use base qw(Exporter); our @EXPORT = (qw(&Float)); #====================================================================== # Exported constructor. sub Float { my ($number) = @_; return LaTeXML::Common::Float->new($number); } #====================================================================== # Strictly speaking, Float isn't part of TeX, but it's handy. sub toString { my ($self) = @_; return floatformat($$self[0]); } sub multiply { my ($self, $other) = @_; return (ref $self)->new($self->valueOf * (ref $other ? $other->valueOf : $other)); } sub stringify { my ($self) = @_; return "Float[" . $$self[0] . "]"; } # Utility for formatting sane numbers. sub floatformat { my ($n) = @_; my $s = sprintf("%.5f", $n); $s =~ s/0+$// if $s =~ /\./; # $s =~ s/\.$//; $s =~ s/\.$/.0/; # Seems TeX prints .0 which in odd corner cases, people use? return $s; } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of floating point numbers; extends L. =head2 Exported functions =over 4 =item C<< $number = Float($num); >> Creates a floating point object representing C<$num>; This is not part of TeX, but useful. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ����������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Font.pm������������������������������������������������������������0000644�0001750�0001750�00000063302�12507513572�017561� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Font | # # | Representaion of Fonts | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Font; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Core::Token; use LaTeXML::Common::Error; use LaTeXML::Common::Object; use LaTeXML::Common::Dimension; use List::Util qw(min max sum); use base qw(LaTeXML::Common::Object); # Note that this has evolved way beynond just "font", # but covers text properties (or even display properties) in general # including basic font information, color & background color # as well as encoding and language information. # NOTE: This is now in Common that it may evolve to be useful in Post processing... my $DEFFAMILY = 'serif'; # [CONSTANT] my $DEFSERIES = 'medium'; # [CONSTANT] my $DEFSHAPE = 'upright'; # [CONSTANT] my $DEFCOLOR = 'black'; # [CONSTANT] my $DEFBACKGROUND = 'white'; # [CONSTANT] my $DEFOPACITY = '1'; # [CONSTANT] my $DEFENCODING = 'OT1'; # [CONSTANT] sub DEFSIZE { return $STATE->lookupValue('NOMINAL_FONT_SIZE') || 10; } #====================================================================== # Mappings from various forms of names or component names in TeX # Given a font, we'd like to map it to the "logical" names derived from LaTeX, # (w/ loss of fine grained control). # I'd like to use Karl Berry's font naming scheme # (See http://www.tug.org/fontname/html/) # but it seems to be a one-way mapping, and moreover, doesn't even fit CM fonts! # We'll assume a sloppier version: # family + series + variant + size # NOTE: This probably doesn't really belong in here... my %font_family = ( cmr => { family => 'serif' }, cmss => { family => 'sansserif' }, cmtt => { family => 'typewriter' }, cmvtt => { family => 'typewriter' }, cmti => { family => 'typewriter', shape => 'italic' }, cmfib => { family => 'serif' }, cmfr => { family => 'serif' }, cmdh => { family => 'serif' }, cm => { family => 'serif' }, ptm => { family => 'serif' }, ppl => { family => 'serif' }, pnc => { family => 'serif' }, pbk => { family => 'serif' }, phv => { family => 'sansserif' }, pag => { family => 'serif' }, pcr => { family => 'typewriter' }, pzc => { family => 'script' }, put => { family => 'serif' }, bch => { family => 'serif' }, psy => { family => 'symbol' }, pzd => { family => 'dingbats' }, ccr => { family => 'serif' }, ccy => { family => 'symbol' }, cmbr => { family => 'sansserif' }, cmtl => { family => 'typewriter' }, cmbrs => { family => 'symbol' }, ul9 => { family => 'typewriter' }, txr => { family => 'serif' }, txss => { family => 'sansserif' }, txtt => { family => 'typewriter' }, txms => { family => 'symbol' }, txsya => { family => 'symbol' }, txsyb => { family => 'symbol' }, pxr => { family => 'serif' }, pxms => { family => 'symbol' }, pxsya => { family => 'symbol' }, pxsyb => { family => 'symbol' }, futs => { family => 'serif' }, uaq => { family => 'serif' }, ugq => { family => 'sansserif' }, eur => { family => 'serif' }, eus => { family => 'script' }, euf => { family => 'fraktur' }, euex => { family => 'symbol' }, # The following are actually math fonts. ms => { family => 'symbol' }, ccm => { family => 'serif', shape => 'italic' }, cmm => { family => 'italic', encoding => 'OML' }, cmex => { family => 'symbol', encoding => 'OMX' }, # Not really symbol, but... cmsy => { family => 'symbol', encoding => 'OMS' }, ccitt => { family => 'typewriter', shape => 'italic' }, cmbrm => { family => 'sansserif', shape => 'italic' }, futm => { family => 'serif', shape => 'italic' }, futmi => { family => 'serif', shape => 'italic' }, txmi => { family => 'serif', shape => 'italic' }, pxmi => { family => 'serif', shape => 'italic' }, bbm => { family => 'blackboard' }, bbold => { family => 'blackboard' }, bbmss => { family => 'blackboard' }, # some ams fonts cmmib => { family => 'italic', series => 'bold' }, cmbsy => { family => 'symbol', series => 'bold' }, msa => { family => 'symbol', encoding => 'AMSA' }, msb => { family => 'symbol', encoding => 'AMSB' }, # Are these really the same? msx => { family => 'symbol', encoding => 'AMSA' }, msy => { family => 'symbol', encoding => 'AMSB' }, ); # Maps the "series code" to an abstract font series name my %font_series = ( '' => { series => 'medium' }, m => { series => 'medium' }, mc => { series => 'medium' }, b => { series => 'bold' }, bc => { series => 'bold' }, bx => { series => 'bold' }, sb => { series => 'bold' }, sbc => { series => 'bold' }, bm => { series => 'bold' }); # Maps the "shape code" to an abstract font shape name. my %font_shape = ('' => { shape => 'upright' }, n => { shape => 'upright' }, i => { shape => 'italic' }, it => { shape => 'italic' }, sl => { shape => 'slanted' }, sc => { shape => 'smallcaps' }, csc => { shape => 'smallcaps' }); # These could be exported... sub lookupFontFamily { my ($familycode) = @_; return $font_family{ ToString($familycode) }; } sub lookupFontSeries { my ($seriescode) = @_; return $font_series{ ToString($seriescode) }; } sub lookupFontShape { my ($shapecode) = @_; return $font_shape{ ToString($shapecode) }; } # Symbolic font sizes, relative to the NOMINAL_FONT_SIZE (often 10) # extended logical font sizes, based on nominal document size of 10pts # Possibly should simply use absolute font point sizes, as declared in class... my %font_size = ( tiny => 0.5, SMALL => 0.7, Small => 0.8, small => 0.9, normal => 1.0, large => 1.2, Large => 1.44, LARGE => 1.728, huge => 2.074, Huge => 2.488, big => 1.2, Big => 1.6, bigg => 2.1, Bigg => 2.6, ); sub rationalizeFontSize { my ($size) = @_; return unless defined $size; if (my $symbolic = $font_size{$size}) { return $symbolic * DEFSIZE(); } return $size; } # convert to percent sub relativeFontSize { my ($newsize, $oldsize) = @_; return int(100 * $newsize / $oldsize) . '%'; } my $FONTREGEXP = '(' . join('|', sort { -($a cmp $b) } keys %font_family) . ')' . '(' . join('|', sort { -($a cmp $b) } keys %font_series) . ')' . '(' . join('|', sort { -($a cmp $b) } keys %font_shape) . ')' . '(\d*)'; sub decodeFontname { my ($name, $at, $scaled) = @_; if ($name =~ /^$FONTREGEXP$/o) { my %props; my ($fam, $ser, $shp, $size) = ($1, $2, $3, $4); if (my $ffam = lookupFontFamily($fam)) { map { $props{$_} = $$ffam{$_} } keys %$ffam; } if (my $fser = lookupFontSeries($ser)) { map { $props{$_} = $$fser{$_} } keys %$fser; } if (my $fsh = lookupFontShape($shp)) { map { $props{$_} = $$fsh{$_} } keys %$fsh; } $size = 1 unless defined $size; $size = $at if defined $at; $size *= $scaled if defined $scaled; $props{size} = $size; # Experimental Hack !?!?!? $props{encoding} = 'OT1' unless defined $props{encoding}; return %props; } else { return; } } sub lookupTeXFont { my ($fontname, $seriescode, $shapecode) = @_; my %props; if (my $ffam = lookupFontFamily($fontname)) { map { $props{$_} = $$ffam{$_} } keys %$ffam; } if (my $fser = lookupFontSeries($seriescode)) { map { $props{$_} = $$fser{$_} } keys %$fser; } if (my $fsh = lookupFontShape($shapecode)) { map { $props{$_} = $$fsh{$_} } keys %$fsh; } return %props; } #====================================================================== # NOTE: Would it make sense to allow compnents to be `inherit' ?? # Note: forcebold, forceshape are only useful for fonts in math sub new { my ($class, %options) = @_; my $family = $options{family}; my $series = $options{series}; my $shape = $options{shape}; my $size = $options{size}; my $color = $options{color}; my $bg = $options{background}; my $opacity = $options{opacity}; my $encoding = $options{encoding}; my $language = $options{language}; my $forcebold = $options{forcebold}; my $forceshape = $options{forceshape}; return $class->new_internal( $family, $series, $shape, rationalizeFontSize($size), $color, $bg, $opacity, $encoding, $language, $forcebold, $forceshape); } sub new_internal { my ($class, @components) = @_; return bless [@components], $class; } sub textDefault { my ($self) = @_; return $self->new_internal($DEFFAMILY, $DEFSERIES, $DEFSHAPE, DEFSIZE(), $DEFCOLOR, $DEFBACKGROUND, $DEFOPACITY, $DEFENCODING, undef, undef, undef); } sub mathDefault { my ($self) = @_; return $self->new_internal('math', $DEFSERIES, 'italic', DEFSIZE(), $DEFCOLOR, $DEFBACKGROUND, $DEFOPACITY, undef, undef, undef, undef); } # Accessors sub getFamily { my ($self) = @_; return $$self[0]; } sub getSeries { my ($self) = @_; return $$self[1]; } sub getShape { my ($self) = @_; return $$self[2]; } sub getSize { my ($self) = @_; return $$self[3]; } sub getColor { my ($self) = @_; return $$self[4]; } sub getBackground { my ($self) = @_; return $$self[5]; } sub getOpacity { my ($self) = @_; return $$self[6]; } sub getEncoding { my ($self) = @_; return $$self[7]; } sub getLanguage { my ($self) = @_; return $$self[8]; } sub toString { my ($self) = @_; return "Font[" . join(',', map { (defined $_ ? $_ : '*') } @{$self}) . "]"; } # Perhaps it is more useful to list only the non-default components? sub stringify { my ($self) = @_; my ($fam, $ser, $shp, $siz, $col, $bkg, $opa, $enc, $lang) = @$self; $fam = 'serif' if $fam && ($fam eq 'math'); return 'Font[' . join(',', grep { $_ } (isDiff($fam, $DEFFAMILY) ? ($fam) : ()), (isDiff($ser, $DEFSERIES) ? ($ser) : ()), (isDiff($shp, $DEFSHAPE) ? ($shp) : ()), (isDiff($siz, DEFSIZE()) ? ($siz) : ()), (isDiff($col, $DEFCOLOR) ? ($col) : ()), (isDiff($bkg, $DEFBACKGROUND) ? ($bkg) : ()), (isDiff($opa, $DEFOPACITY) ? ($opa) : ())) . ']'; } sub equals { my ($self, $other) = @_; return (defined $other) && ((ref $self) eq (ref $other)) && (join('|', map { (defined $_ ? $_ : '*') } @$self) eq join('|', map { (defined $_ ? $_ : '*') } @$other)); } sub match { my ($self, $other) = @_; return 1 unless defined $other; return 0 unless (ref $self) eq (ref $other); my @comp = @$self; my @ocomp = @$other; # If any components are defined in both fonts, they must be equal. while (@comp) { my $c = shift @comp; my $oc = shift @ocomp; return 0 if (defined $c) && (defined $oc) && ($c ne $oc); } return 1; } sub makeConcrete { my ($self, $concrete) = @_; my ($family, $series, $shape, $size, $color, $bg, $opacity, $encoding, $lang) = @$self; my ($ofamily, $oseries, $oshape, $osize, $ocolor, $obg, $oopacity, $oencoding, $olang) = @$concrete; return (ref $self)->new_internal( $family || $ofamily, $series || $oseries, $shape || $oshape, $size || $osize, $color || $ocolor, $bg || $obg, (defined $opacity ? $opacity : $oopacity), $encoding || $oencoding, $lang || $olang); } sub isDiff { my ($x, $y) = @_; return (defined $x) && (!(defined $y) || ($x ne $y)); } # This method compares 2 fonts, returning the differences between them. # Noting that the font-related attributes in the schema distill the # font properties into fewer attributes (font,fontsize,color,background,opacity), # the return value encodes both the attribute changes that would be needed to effect # the font change, along with the font properties that differed # Namely, the result is a hash keyed on the attribute name and whose value is a hash # value => "new_attribute_value" # properties => { %fontproperties } sub relativeTo { my ($self, $other) = @_; my ($fam, $ser, $shp, $siz, $col, $bkg, $opa, $enc, $lang) = @$self; my ($ofam, $oser, $oshp, $osiz, $ocol, $obkg, $oopa, $oenc, $olang) = @$other; $fam = 'serif' if $fam && ($fam eq 'math'); $ofam = 'serif' if $ofam && ($ofam eq 'math'); my @diffs = ( (isDiff($fam, $ofam) ? ($fam) : ()), (isDiff($ser, $oser) ? ($ser) : ()), (isDiff($shp, $oshp) ? ($shp) : ())); return ( (@diffs ? (font => { value => join(' ', @diffs), properties => { (isDiff($fam, $ofam) ? (family => $fam) : ()), (isDiff($ser, $oser) ? (series => $ser) : ()), (isDiff($shp, $oshp) ? (shape => $shp) : ()) } }) : ()), (isDiff($siz, $osiz) ### ? (fontsize => { value => $siz, properties => { size => $siz } }) ? (fontsize => { value => relativeFontSize($siz, $osiz), properties => { size => $siz } }) : ()), (isDiff($col, $ocol) ? (color => { value => $col, properties => { color => $col } }) : ()), (isDiff($bkg, $obkg) ? (backgroundcolor => { value => $bkg, properties => { background => $bkg } }) : ()), (isDiff($opa, $oopa) ? (opacity => { value => $opa, properties => { opacity => $opa } }) : ()), (isDiff($lang, $olang) ? ('xml:lang' => { value => $lang, properties => { language => $lang } }) : ()), ); } sub distance { my ($self, $other) = @_; my ($fam, $ser, $shp, $siz, $col, $bkg, $opa, $enc, $lang) = @$self; my ($ofam, $oser, $oshp, $osiz, $ocol, $obkg, $oopa, $oenc, $olang) = @$other; $fam = 'serif' if $fam && ($fam eq 'math'); $ofam = 'serif' if $ofam && ($ofam eq 'math'); return (isDiff($fam, $ofam) ? 1 : 0) + (isDiff($ser, $oser) ? 1 : 0) + (isDiff($shp, $oshp) ? 1 : 0) + (isDiff($siz, $osiz) ? 1 : 0) + (isDiff($col, $ocol) ? 1 : 0) + (isDiff($bkg, $obkg) ? 1 : 0) + (isDiff($opa, $oopa) ? 1 : 0) ## + (isDiff($enc,$oenc) ? 1 : 0) + (isDiff($lang, $olang) ? 1 : 0) ; } # This matches fonts when both are converted to strings (toString), # such as when they are set as attributes. # This accumulates regular expressions used by match_font # (which, in turn, is used in various XPath searches!) # It is NOT really Daemon safe.... # Need to work out how to do this and/or cache it in STATE???? our %FONT_REGEXP_CACHE = (); sub match_font { my ($font1, $font2) = @_; my $regexp = $FONT_REGEXP_CACHE{$font1}; if (!$regexp) { if ($font1 =~ /^Font\[(.*)\]$/) { my @comp = split(',', $1); my $re = '^Font\[' . join(',', map { ($_ eq '*' ? "[^,]+" : "\Q$_\E") } @comp) . '\]$'; print STDERR "\nCreating re for \"$font1\" => $re\n"; $regexp = $FONT_REGEXP_CACHE{$font1} = qr/$re/; } } return $font2 =~ /$regexp/; } sub font_match_xpaths { my ($font) = @_; if ($font =~ /^Font\[(.*)\]$/) { my @comps = split(',', $1); my ($frag, @frags) = (); for (my $i = 0 ; $i <= $#comps ; $i++) { my $comp = $comps[$i]; if ($comp eq '*') { push(@frags, $frag) if $frag; $frag = undef; } else { my $post = ($i == $#comps ? ']' : ','); if ($frag) { $frag .= $comp . $post; } else { $frag = ($i == 0 ? 'Font[' : ',') . $comp . $post; } } } push(@frags, $frag) if $frag; return join(' and ', '@_font', map { "contains(\@_font,'$_')" } @frags); } } # # Presumably a text font is "sticky", if used in math? # sub isSticky { return 1; } #====================================================================== sub computeStringSize { my ($self, $string) = @_; my $size = $self->getSize; my $u = (defined $string ? (($self->getSize || DEFSIZE()) || 10) * 65535 * length($string) : 0); return (Dimension(0.75 * $u), Dimension(0.7 * $u), Dimension(0.2 * $u)); } # Get nominal width, height base ? sub getNominalSize { my ($self) = @_; my $size = $self->getSize; my $u = (($self->getSize || DEFSIZE()) || 10) * 65535; return (Dimension(0.75 * $u), Dimension(0.7 * $u), Dimension(0.2 * $u)); } # Here's where I avoid trying to emulate Knuth's line-breaking... # Mostly for List & Whatsit: compute the size of a list of boxes. # Options _SHOULD_ include: # width: if given, pretend to simulate line breaking to that width # height,depth : ? # vattach : top, bottom, center, baseline (...?) affects how the height & depth are # allocated when there are multiple lines. # layout : horizontal or vertical !!! # Boxes that arent a Core Box, List, Whatsit or a string are IGNORED # # The big problem with width is to have it propogate down from where # it may have been specified to the actual nested box that will get wrapped! # Try to mask this (temporarily) by unlisting, and (pretending to ) breaking up too wide items # # Another issue; SVG needs (sometimes) real sizes, even if the programmer # set some dimensions to 0 (eg.) We may need to distinguish & store # requested vs real sizes? sub computeBoxesSize { my ($self, $boxes, %options) = @_; my $font = (ref $self ? $self : $STATE->lookupValue('font')); my $fillwidth = $options{width}; if ((!defined $fillwidth) && ($fillwidth = $STATE->lookupDefinition(T_CS('\textwidth')))) { $fillwidth = $fillwidth->valueOf; } # get register my $maxwidth = $fillwidth && $fillwidth->valueOf; my @lines = (); my ($wd, $ht, $dp) = (0, 0, 0); my $vattach = $options{vattach} || 'baseline'; foreach my $box (@$boxes) { next unless defined $box; next if ref $box && !$box->can('getSize'); # Care!! Since we're asking ALL args/compoments my ($w, $h, $d) = (ref $box ? $box->getSize(%options) : $font->computeStringSize($box)); if (ref $w) { $wd += $w->valueOf; } else { Warn('expected', 'Dimension', undef, "Width of " . Stringify($box) . " yeilded a non-dimension: " . Stringify($w)); } if (ref $h) { $ht = max($ht, $h->valueOf); } else { Warn('expected', 'Dimension', undef, "Height of " . Stringify($box) . " yeilded a non-dimension: " . Stringify($h)); } if (ref $d) { $dp = max($dp, $d->valueOf); } else { Warn('expected', 'Dimension', undef, "Depth of " . Stringify($box) . " yeilded a non-dimension: " . Stringify($d)); } if ((($options{layout} || '') eq 'vertical') # EVERY box is a row? # || $box is a (or similar)!!!! ) { push(@lines, [$wd, $ht, $dp]); $wd = $ht = $dp = 0; } elsif ((defined $maxwidth) && ($wd >= $maxwidth)) { # or we've reached the requested width # Compounding errors with wild abandon. # If an underlying box is too wide, we'll split it up into multiple rows # [Rather than correctly break it?] # BUT How do we know if it should break at alL!?!?!?!?! ## while ($wd >= $maxwidth) { ## push(@lines, [$maxwidth, $ht, $dp]); $wd = $wd - $maxwidth; } ## $ht = $h->valueOf; $dp = $d->valueOf; # continue with the leftover push(@lines, [$wd, $ht, $dp]); $wd = $ht = $dp = 0; } } if ($wd) { # be sure to get last line push(@lines, [$wd, $ht, $dp]); } # Deal with multiple lines my $nlines = scalar(@lines); if ($nlines == 0) { $wd = $ht = $dp = 0; } else { $wd = max(map { $$_[0] } @lines); $ht = sum(map { $$_[1] } @lines); $dp = sum(map { $$_[2] } @lines); if ($vattach eq 'top') { # Top of box is aligned with top(?) of current text my ($w, $h, $d) = $font->getNominalSize; $h = $h->valueOf; $dp = $ht + $dp - $h; $ht = $h; } elsif ($vattach eq 'bottom') { # Bottom of box is aligned with bottom (?) of current text $ht = $ht + $dp; $dp = 0; } elsif ($vattach eq 'middle') { my ($w, $h, $d) = $font->getNominalSize; $h = $h->valueOf; my $c = ($ht + $dp) / 2; $ht = $c + $h / 2; $dp = $c - $h / 2; } else { # default is baseline (of the 1st line) my $h = $lines[0][1]; $dp = $ht + $dp - $h; $ht = $h; } } #print "BOXES SIZE ".($wd/65536)." x ".($ht/65536)." + ".($dp/65336)." for " # .join(' ',grep {$_} map { Stringify($_) } @$boxes)."\n"; return (Dimension($wd), Dimension($ht), Dimension($dp)); } sub isSticky { my ($self) = @_; return $$self[0] && ($$self[0] =~ /^(?:serif|sansserif|typewriter)$/); } # NOTE: In math, NORMALLY, setting any one of # family, series or shape # will, usually, automatically reset the others to thier defaults! # You must arrange this in the calls.... sub merge { my ($self, %options) = @_; my $family = $options{family}; my $series = $options{series}; my $shape = $options{shape}; my $size = rationalizeFontSize($options{size}); my $color = $options{color}; my $bg = $options{background}; my $opacity = $options{opacity}; my $encoding = $options{encoding}; my $language = $options{language}; my $forcebold = $options{forcebold}; my $forceshape = $options{forceshape}; # Fallback to positional invocation: $family = $$self[0] unless defined $family; $series = $$self[1] unless defined $series; $shape = $$self[2] unless defined $shape; $size = $$self[3] unless defined $size; $color = $$self[4] unless defined $color; $bg = $$self[5] unless defined $bg; $opacity = $$self[6] unless defined $opacity; $encoding = $$self[7] unless defined $encoding; $language = $$self[8] unless defined $language; $forcebold = $$self[9] unless defined $forcebold; $forceshape = $$self[10] unless defined $forceshape; if (my $scale = $options{scale}) { $size = $scale * $size; } return (ref $self)->new_internal($family, $series, $shape, $size, $color, $bg, $opacity, $encoding, $language, $forcebold, $forceshape); } # Instanciate the font for a particular class of symbols. # NOTE: This works in `normal' latex, but probably needs some tunability. # Depending on the fonts being used, the allowable combinations may be different. # Getting the font right is important, since the author probably # thinks of the identity of the symbols according to what they SEE in the printed # document. Even though the markup might seem to indicate something else... # Use Unicode properties to determine font merging. sub specialize { my ($self, $string) = @_; return $self unless defined $string; my ($family, $series, $shape, $size, $color, $bg, $opacity, $encoding, $language, $forcebold, $forceshape) = @$self; $series = 'bold' if $forcebold; if (($string =~ /^\p{Latin}$/) && ($string =~ /^\p{L}$/)) { # Latin Letter $shape = 'italic' if !$shape && !$family; } elsif ($string =~ /^\p{Greek}$/) { # Single Greek character? if ($string =~ /^\p{Lu}$/) { # Uppercase if (!$family || ($family eq 'math')) { $family = $DEFFAMILY; $shape = $DEFSHAPE if $shape && ($shape ne $DEFSHAPE); } } # if ANY shape, must be default else { # Lowercase $family = $DEFFAMILY if !$family || ($family ne $DEFFAMILY); $shape = 'italic' if !$shape || !$forceshape; # always ? if ($forcebold) { $series = 'bold'; } elsif ($series && ($series ne $DEFSERIES)) { $series = $DEFSERIES; } } } elsif ($string =~ /^\p{N}$/) { # Digit if (!$family || ($family eq 'math')) { $family = $DEFFAMILY; $shape = $DEFSHAPE; } } # defaults, always. else { # Other Symbol $family = $DEFFAMILY; $shape = $DEFSHAPE; # defaults, always. if ($forcebold) { $series = 'bold'; } elsif ($series && ($series ne $DEFSERIES)) { $series = $DEFSERIES; } } return (ref $self)->new_internal($family, $series, $shape, $size, $color, $bg, $opacity, $encoding, $language, $forcebold, $forceshape); } #********************************************************************** 1; __END__ =pod =head1 NAME C - representation of fonts =head1 DESCRIPTION C represent fonts in LaTeXML. It extends L. This module defines Font objects. I'm not completely happy with the arrangement, or maybe just the use of it, so I'm not going to document extensively at this point. The attributes are family : serif, sansserif, typewriter, caligraphic, fraktur, script series : medium, bold shape : upright, italic, slanted, smallcaps size : TINY, Tiny, tiny, SMALL, Small, small, normal, Normal, large, Large, LARGE, huge, Huge, HUGE, gigantic, Gigantic, GIGANTIC color : any named color, default is black They are usually merged against the current font, attempting to mimic the, sometimes counter-intuitive, way that TeX does it, particularly for math =head1 Methods =over 4 =item C<< $font->specialize($string); >> In math mode, C supports computing a font reflecting how the specific C<$string> would be printed when C<$font> is active; This (attempts to) handle the curious ways that lower case greek often doesn't get a different font. In particular, it recognizes the following classes of strings: single latin letter, single uppercase greek character, single lowercase greek character, digits, and others. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Glue.pm������������������������������������������������������������0000644�0001750�0001750�00000012517�12507513572�017551� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Glue | # # | Representation of Stretchy dimensions | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Glue; use LaTeXML::Global; use strict; use warnings; use base qw(LaTeXML::Common::Dimension); use base qw(Exporter); our @EXPORT = (qw(&Glue)); #====================================================================== # Exported constructor. sub Glue { my ($scaledpoints, $plus, $pfill, $minus, $mfill) = @_; return LaTeXML::Common::Glue->new($scaledpoints, $plus, $pfill, $minus, $mfill); } #====================================================================== my %fillcode = (fil => 1, fill => 2, filll => 3); # [CONSTANT] my @FILL = ('', 'fil', 'fill', 'filll'); # [CONSTANT] my $num_re = qr/\d*\.?\d*/; # [CONSTANT] my $unit_re = qr/[a-zA-Z][a-zA-Z]/; # [CONSTANT] my $fill_re = qr/fil|fill|filll|[a-zA-Z][a-zA-Z]/; # [CONSTANT] my $plus_re = qr/\s+plus\s*($num_re)($fill_re)/; # [CONSTANT] my $minus_re = qr/\s+minus\s*($num_re)($fill_re)/; # [CONSTANT] our $GLUE_re = qr/(\+?\-?$num_re)($unit_re)($plus_re)?($minus_re)?/; # [CONSTANT] sub new { my ($class, $sp, $plus, $pfill, $minus, $mfill) = @_; if ((!defined $plus) && (!defined $pfill) && (!defined $minus) && (!defined $mfill)) { if ($sp =~ /^(\d*\.?\d*)$/) { } elsif ($sp =~ /^$GLUE_re$/) { my ($f, $u, $p, $pu, $m, $mu) = ($1, $2, $4, $5, $7, $8); $sp = $f * $STATE->convertUnit($u); if (!$pu) { } elsif ($fillcode{$pu}) { $plus = $p; $pfill = $pu; } else { $plus = $p * $STATE->convertUnit($pu); $pfill = 0; } if (!$mu) { } elsif ($fillcode{$mu}) { $minus = $m; $mfill = $mu; } else { $minus = $m * $STATE->convertUnit($mu); $mfill = 0; } } } return bless [$sp || "0", $plus || "0", $pfill || 0, $minus || "0", $mfill || 0], $class; } #sub getStretch { $_[0]->[1]; } #sub getShrink { $_[0]->[2]; } sub toString { my ($self) = @_; my ($sp, $plus, $pfill, $minus, $mfill) = @$self; my $string = LaTeXML::Common::Dimension::pointformat($sp); $string .= ' plus ' . ($pfill ? $plus . $FILL[$pfill] : LaTeXML::Common::Dimension::pointformat($plus)) if $plus != 0; $string .= ' minus ' . ($mfill ? $minus . $FILL[$mfill] : LaTeXML::Common::Dimension::pointformat($minus)) if $minus != 0; return $string; } sub toAttribute { my ($self) = @_; my ($sp, $plus, $pfill, $minus, $mfill) = @$self; my $string = LaTeXML::Common::Dimension::attributeformat($sp); $string .= ' plus ' . ($pfill ? $plus . $FILL[$pfill] : LaTeXML::Common::Dimension::attributeformat($plus)) if $plus != 0; $string .= ' minus ' . ($mfill ? $minus . $FILL[$mfill] : LaTeXML::Common::Dimension::attributeformat($minus)) if $minus != 0; return $string; } sub negate { my ($self) = @_; my ($pts, $p, $pf, $m, $mf) = @$self; return (ref $self)->new(-$pts, -$p, $pf, -$m, $mf); } sub add { my ($self, $other) = @_; my ($pts, $p, $pf, $m, $mf) = @$self; if (ref $other eq 'LaTeXML::Common::Glue') { my ($pts2, $p2, $pf2, $m2, $mf2) = @$other; $pts += $pts2; if ($pf == $pf2) { $p += $p2; } elsif ($pf < $pf2) { $p = $p2; $pf = $pf2; } if ($mf == $mf2) { $m += $m2; } elsif ($mf < $mf2) { $m = $m2; $mf = $mf2; } return (ref $self)->new($pts, $p, $pf, $m, $mf); } else { return (ref $self)->new($pts + $other->valueOf, $p, $pf, $m, $mf); } } sub multiply { my ($self, $other) = @_; my ($pts, $p, $pf, $m, $mf) = @$self; $other = $other->valueOf if ref $other; return (ref $self)->new($pts * $other, $p * $other, $pf, $m * $other, $mf); } sub stringify { my ($self) = @_; return "Glue[" . join(',', @$self) . "]"; } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of glue, skips, stretchy dimensions; extends L. =head2 Exported functions =over 4 =item C<< $glue = Glue($gluespec); >> =item C<< $glue = Glue($sp,$plus,$pfill,$minus,$mfill); >> Creates a Glue object. C<$gluespec> can be a string in the form that TeX recognizes (number units optional plus and minus parts). Alternatively, the dimension, plus and minus parts can be given separately: C<$pfill> and C<$mfill> are 0 (when the C<$plus> or C<$minus> part is in sp) or 1,2,3 for fil, fill or filll. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Model.pm�����������������������������������������������������������0000644�0001750�0001750�00000047002�12507513572�017712� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Model | # # | Stores representation of Document Type for use by Document | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Model; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Common::Font; use LaTeXML::Common::XML; use LaTeXML::Util::Pathname; use base qw(LaTeXML::Common::Object); #********************************************************************** my $LTX_NAMESPACE = "http://dlmf.nist.gov/LaTeXML"; # [CONSTANT] sub new { my ($class, %options) = @_; my $self = bless { xpath => LaTeXML::Common::XML::XPath->new(), code_namespace_prefixes => {}, code_namespaces => {}, doctype_namespaces => {}, namespace_errors => 0, %options }, $class; $$self{xpath}->registerFunction('match-font', \&LaTeXML::Common::Font::match_font); $self->registerNamespace('xml', "http://www.w3.org/XML/1998/namespace"); return $self; } sub setDocType { my ($self, $roottag, $publicid, $systemid) = @_; $$self{schemadata} = ['DTD', $roottag, $publicid, $systemid]; return; } sub setRelaxNGSchema { my ($self, $schema) = @_; $$self{schemadata} = ['RelaxNG', $schema]; return; } sub loadSchema { my ($self) = @_; return $$self{schema} if $$self{schema_loaded}; my $name; if (!$$self{schemadata}) { Warn('expected', '', undef, "No Schema Model has been declared; assuming LaTeXML"); # article ??? or what ? undef gives problems! $self->setRelaxNGSchema("LaTeXML"); $self->registerNamespace(ltx => $LTX_NAMESPACE); $self->registerNamespace(svg => "http://www.w3.org/2000/svg"); $self->registerNamespace(xlink => "http://www.w3.org/1999/xlink"); # Needed for SVG $self->registerNamespace(m => "http://www.w3.org/1998/Math/MathML"); $self->registerNamespace(xhtml => "http://www.w3.org/1999/xhtml"); $$self{permissive} = 1; } # Actually, they could have declared all sorts of Tags.... my ($type, @data) = @{ $$self{schemadata} }; if ($type eq 'DTD') { my ($roottag, $publicid, $systemid) = @data; require LaTeXML::Common::Model::DTD; $name = $systemid; $$self{schema} = LaTeXML::Common::Model::DTD->new($self, $roottag, $publicid, $systemid); } elsif ($type eq 'RelaxNG') { ($name) = @data; require LaTeXML::Common::Model::RelaxNG; $$self{schema} = LaTeXML::Common::Model::RelaxNG->new($self, $name); } if (my $compiled = !$$self{no_compiled} && pathname_find($name, paths => $STATE->lookupValue('SEARCHPATHS'), types => ['model'], installation_subdir => "resources/$type")) { $self->loadCompiledSchema($compiled); } else { $$self{schema}->loadSchema; } $self->describeModel if $LaTeXML::Common::Model::DEBUG; $$self{schema_loaded} = 1; return $$self{schema}; } sub addSchemaDeclaration { my ($self, $document, $tag) = @_; $$self{schema}->addSchemaDeclaration($document, $tag); return; } #===================================================================== # Make provision to precompile the schema. sub compileSchema { my ($self) = @_; $$self{no_compiled} = 1; $self->loadSchema; foreach my $prefix (sort keys %{ $$self{document_namespaces} }) { print $prefix. '=' . $$self{document_namespaces}{$prefix} . "\n"; } if (my $defs = $$self{schemaclass}) { foreach my $classname (sort keys %$defs) { print $classname. ':=(' . join(',', sort keys %{ $$self{schemaclass}{$classname} }) . ')' . "\n"; } } foreach my $tag (sort keys %{ $$self{tagprop} }) { print $tag . '{' . join(',', sort keys %{ $$self{tagprop}{$tag}{attributes} }) . '}' . '(' . join(',', sort keys %{ $$self{tagprop}{$tag}{model} }) . ')' . "\n"; } return; } sub loadCompiledSchema { my ($self, $file) = @_; NoteBegin("Loading compiled schema $file"); my $MODEL; open($MODEL, '<', $file) or Fatal('I/O', $file, undef, "Cannot open Compiled Model $file for reading", $!); my $line; while ($line = <$MODEL>) { if ($line =~ /^([^\{]+)\{(.*?)\}\((.*?)\)$/) { my ($tag, $attr, $children) = ($1, $2, $3); $self->addTagAttribute($tag, split(/,/, $attr)); $self->addTagContent($tag, split(/,/, $children)); } elsif ($line =~ /^([^:=]+):=(.*?)$/) { my ($classname, $elements) = ($1, $2); $self->setSchemaClass($classname, { map { ($_ => 1) } split(/,/, $elements) }); } elsif ($line =~ /^([^=]+)=(.*?)$/) { my ($prefix, $namespace) = ($1, $2); $self->registerDocumentNamespace($prefix, $namespace); } else { Fatal('internal', $file, undef, "Compiled model '$file' is malformatted at \"$line\""); } } close($MODEL); NoteEnd("Loading compiled schema $file"); return; } #********************************************************************** # Namespaces #********************************************************************** # There are TWO namespace mappings!!! # One for coding, one for the DocType. # # Coding: this namespace mapping associates prefixes to namespace URIs for # use in the latexml code, constructors and such. # This must be a one to one mapping and there are no default namespaces. # Document: this namespace mapping associates prefixes to namespace URIs # as used in the generated document, and will be the # set of prefixes used in the generated output. # This mapping may also use a prefix of "#default" which is for # the unprefixed form of elements (not used for attributes!) sub registerNamespace { my ($self, $codeprefix, $namespace) = @_; if ($namespace) { $$self{code_namespace_prefixes}{$namespace} = $codeprefix; $$self{code_namespaces}{$codeprefix} = $namespace; $$self{xpath}->registerNS($codeprefix, $namespace); } else { my $prev = $$self{code_namespaces}{$codeprefix}; delete $$self{code_namespace_prefixes}{$prev} if $prev; delete $$self{code_namespaces}{$codeprefix}; } return; } # In the following: # $forattribute is 1 if the namespace is for an attribute (in which case, there must be a non-empty prefix) # $probe, if non 0, just test for namespace, without creating an entry if missing. # Get the (code) prefix associated with $namespace, # creating a dummy prefix and signalling an error if none has been registered. sub getNamespacePrefix { my ($self, $namespace, $forattribute, $probe) = @_; if ($namespace) { my $codeprefix = $$self{code_namespace_prefixes}{$namespace}; if ((!defined $codeprefix) && !$probe) { my $docprefix = $$self{document_namespace_prefixes}{$namespace}; # if there's a doc prefix and it's NOT already used in code namespace mapping if ($docprefix && !$$self{code_namespaces}{$docprefix}) { $codeprefix = $docprefix; } else { # Else synthesize one $codeprefix = "namespace" . (++$$self{namespace_errors}); } $self->registerNamespace($codeprefix, $namespace); Warn('malformed', $namespace, undef, "No prefix has been registered for namespace '$namespace' (in code)", "Using '$codeprefix' instead"); } return $codeprefix; } } sub getNamespace { my ($self, $codeprefix, $probe) = @_; my $ns = $$self{code_namespaces}{$codeprefix}; if ((!defined $ns) && !$probe) { $self->registerNamespace($codeprefix, $ns = "http://example.com/namespace" . (++$$self{namespace_errors})); Error('malformed', $codeprefix, undef, "No namespace has been registered for prefix '$codeprefix' (in code)", "Using '$ns' isntead"); } return $ns; } sub registerDocumentNamespace { my ($self, $docprefix, $namespace) = @_; $docprefix = '#default' unless defined $docprefix; if ($namespace) { # Since the default namespace url can still ALSO have a prefix associated, # we prepend "DEFAULT#url" when using as a hash key in the prefixes table. my $regnamespace = ($docprefix eq '#default' ? "DEFAULT#" . $namespace : $namespace); $$self{document_namespace_prefixes}{$regnamespace} = $docprefix; $$self{document_namespaces}{$docprefix} = $namespace; } else { my $prev = $$self{document_namespaces}{$docprefix}; delete $$self{document_namespace_prefixes}{$prev} if $prev; delete $$self{document_namespaces}{$docprefix}; } return; } sub getDocumentNamespacePrefix { my ($self, $namespace, $forattribute, $probe) = @_; if ($namespace) { # Get the prefix associated with the namespace url, noting that for elements, it might by "#default", # but for attributes would never be. my $docprefix = (!$forattribute && $$self{document_namespace_prefixes}{ "DEFAULT#" . $namespace }) || $$self{document_namespace_prefixes}{$namespace}; if ((!defined $docprefix) && !$probe) { $self->registerDocumentNamespace($docprefix = "namespace" . (++$$self{namespace_errors}), $namespace); Warn('malformed', $namespace, undef, "No prefix has been registered for namespace '$namespace' (in document)", "Using '$docprefix' instead"); } return (($docprefix || '#default') eq '#default' ? '' : $docprefix); } } sub getDocumentNamespace { my ($self, $docprefix, $probe) = @_; $docprefix = '#default' unless defined $docprefix; my $ns = $$self{document_namespaces}{$docprefix}; $ns =~ s/^DEFAULT#// if $ns; # Remove the default hack, if present! if (($docprefix ne '#default') && (!defined $ns) && !$probe) { $self->registerDocumentNamespace($docprefix, $ns = "http://example.com/namespace" . (++$$self{namespace_errors})); Error('malformed', $docprefix, undef, "No namespace has been registered for prefix '$docprefix' (in document)", "Using '$ns' instead"); } return $ns; } # Given a Qualified name, possibly prefixed with a namespace prefix, # as defined by the code namespace mapping, # return the NamespaceURI and localname. sub decodeQName { my ($self, $codetag) = @_; if ($codetag =~ /^([^:]+):(.+)$/) { my ($prefix, $localname) = ($1, $2); return (undef, $codetag) if $prefix eq 'xml'; return ($self->getNamespace($prefix), $localname); } else { return (undef, $codetag); } } sub encodeQName { my ($self, $ns, $name) = @_; my $codeprefix = $ns && $self->getNamespacePrefix($ns); return ($codeprefix ? "$codeprefix:$name" : $name); } # Get the node's qualified name in standard form # Ie. using the registered (code) prefix for that namespace. # NOTE: Reconsider how _Capture_ & _WildCard_ should be integrated!?! sub getNodeQName { my ($self, $node) = @_; my $type = $node->nodeType; if ($type == XML_TEXT_NODE) { return '#PCDATA'; } elsif ($type == XML_DOCUMENT_NODE) { return '#Document'; } elsif ($type == XML_COMMENT_NODE) { return '#Comment'; } elsif ($type == XML_PI_NODE) { return '#ProcessingInstruction'; } elsif ($type == XML_DTD_NODE) { return '#DTD'; } # Need others? elsif (($type != XML_ELEMENT_NODE) && ($type != XML_ATTRIBUTE_NODE)) { Fatal('misdefined', '', undef, "Should not ask for Qualified Name for node of type $type: " . Stringify($node)); return; } elsif (my $ns = $node->namespaceURI) { return $self->getNamespacePrefix($ns) . ":" . $node->localname; } else { return $node->localname; } } # Given a Document QName, convert to "code" form # Used to convert a possibly prefixed name from the DTD # (using the DTD's prefixes) # into a prefixed name using the Code's prefixes # NOTE: Used only for DTD sub recodeDocumentQName { my ($self, $docQName) = @_; my ($docprefix, $name) = (undef, $docQName); if ($docQName =~ /^(#PCDATA|#Comment|ANY|#ProcessingInstruction|#Document)$/) { return $docQName; } else { ($docprefix, $name) = ($1, $2) if $docQName =~ /^([^:]+):(.+)/; return $self->encodeQName($self->getDocumentNamespace($docprefix), $name); } } # Get an XPath context that knows about our namespace mappings. sub getXPath { my ($self) = @_; return $$self{xpath}; } #********************************************************************** # Accessors #********************************************************************** sub getTags { my ($self) = @_; return keys %{ $$self{tagprop} }; } sub getTagContents { my ($self, $tag) = @_; my $h = $$self{tagprop}{$tag}{model}; return $h ? keys %$h : (); } sub addTagContent { my ($self, $tag, @elements) = @_; $$self{tagprop}{$tag}{model} = {} unless $$self{tagprop}{$tag}{model}; map { $$self{tagprop}{$tag}{model}{$_} = 1 } @elements; return; } sub getTagAttributes { my ($self, $tag) = @_; my $h = $$self{tagprop}{$tag}{attributes}; return $h ? keys %$h : (); } sub addTagAttribute { my ($self, $tag, @attributes) = @_; $$self{tagprop}{$tag}{attributes} = {} unless $$self{tagprop}{$tag}{attributes}; map { $$self{tagprop}{$tag}{attributes}{$_} = 1 } @attributes; return; } sub setSchemaClass { my ($self, $classname, $content) = @_; $$self{schemaclass}{$classname} = $content; return; } #********************************************************************** # Document Structure Queries #********************************************************************** # NOTE: These are public, but perhaps should be passed # to submodel, in case it can evolve to more precision? # However, it would need more context to do that. # Can an element with (qualified name) $tag contain a $childtag element? sub canContain { my ($self, $tag, $childtag) = @_; $self->loadSchema unless $$self{schema_loaded}; # Handle obvious cases explicitly. return 0 if $tag eq '#PCDATA'; return 0 if $tag eq '#Comment'; return 1 if $tag =~ /(.*?:)?_Capture_$/; # with or without namespace prefix return 1 if $tag eq '_WildCard_'; return 1 if $childtag =~ /(.*?:)?_Capture_$/; return 1 if $childtag eq '_WildCard_'; return 1 if $childtag eq '#Comment'; return 1 if $childtag eq '#ProcessingInstruction'; return 1 if $childtag eq '#DTD'; # return 1 if $$self{permissive}; # No DTD? Punt! return 1 if $$self{permissive} && ($tag eq '#Document') && ($childtag ne '#PCDATA'); # No DTD? Punt! # Else query tag properties. my $model = $$self{tagprop}{$tag}{model}; return $$model{ANY} || $$model{$childtag}; } sub canHaveAttribute { my ($self, $tag, $attrib) = @_; $self->loadSchema unless $$self{schema_loaded}; return 0 if $tag eq '#PCDATA'; return 0 if $tag eq '#Comment'; return 0 if $tag eq '#Document'; return 0 if $tag eq '#ProcessingInstruction'; return 0 if $tag eq '#DTD'; return 1 if $tag =~ /(.*?:)?_Capture_$/; return 1 if $$self{permissive}; return $$self{tagprop}{$tag}{attributes}{$attrib}; } sub isInSchemaClass { my ($self, $classname, $tag) = @_; $tag = $self->getNodeQName($tag) if ref $tag; # In case tag is a node. my $class = $$self{schemaclass}{$classname}; return $class && $$class{$tag}; } #********************************************************************** sub describeModel { my ($self) = @_; print STDERR "Doctype\n"; foreach my $tag (sort keys %{ $$self{tagprop} }) { if (my $model = $$self{tagprop}{$tag}{model}) { if (keys %$model) { print STDERR "$tag can contain " . join(', ', sort keys %{ $$self{tagprop}{$tag}{model} }) . "\n"; } } else { print STDERR "$tag is empty\n"; } } return; } #********************************************************************** 1; __END__ =pod =head1 NAME C - represents the Document Model =head1 DESCRIPTION C encapsulates information about the document model to be used in converting a digested document into XML by the L. This information is based on the document schema (eg, DTD, RelaxNG), but is also modified by package modules; thus the model may not be complete until digestion is completed. The kinds of information that is relevant is not only the content model (what each element can contain contain), but also SGML-like information such as whether an element can be implicitly opened or closed, if needed to insert a new element into the document. Currently, only an approximation to the schema is understood and used. For example, we only record that certain elements can appear within another; we don't preserve any information about required order or number of instances. It extends L. =head2 Model Creation =over 4 =item C<< $model = LaTeXML::Common::Model->new(%options); >> Creates a new model. The only useful option is C<< permissive=>1 >> which ignores any DTD and allows the document to be built without following any particular content model. =back =head2 Document Type =over 4 =item C<< $model->setDocType($rootname,$publicid,$systemid,%namespaces); >> Declares the expected rootelement, the public and system ID's of the document type to be used in the final document. The hash C<%namespaces> specifies the namespace prefixes that are expected to be found in the DTD, along with the associated namespace URI. These prefixes may be different from the prefixes used in implementation code (eg. in ltxml files; see RegisterNamespace). The generated document will use the namespaces and prefixes defined here. =back =head2 Namespaces Note that there are I namespace mappings between namespace URIs and prefixes that are relevant to L. The `code' mapping is the one used in code implementing packages, and in particular, constructors defined within those packages. The prefix C is used consistently to refer to L's own namespace (C. The other mapping, the `document' mapping, is used in the created document; this may be different from the `code' mapping in order to accommodate DTDs, for example, or for use by other applications that expect a rigid namespace mapping. =over 4 =item C<< $model->registerNamespace($prefix,$namespace_url); >> Register C<$prefix> to stand for the namespace C<$namespace_url>. This prefix can then be used to create nodes in constructors and Document methods. It will also be recognized in XPath expressions. =item C<< $model->getNamespacePrefix($namespace,$forattribute,$probe); >> Return the prefix to use for the given C<$namespace>. If C<$forattribute> is nonzero, then it looks up the prefix as appropriate for attributes. If C<$probe> is nonzero, it only probes for the prefix, without creating a missing entry. =item C<< $model->getNamespace($prefix,$probe); >> Return the namespace url for the given C<$prefix>. =back =head2 Model queries =over 2 =item C<< $boole = $model->canContain($tag,$childtag); >> Returns whether an element with qualified name C<$tag> can contain an element with qualified name C<$childtag>. The tag names #PCDATA, #Document, #Comment and #ProcessingInstruction are specially recognized. =item C<< $boole = $model->canHaveAttribute($tag,$attribute); >> Returns whether an element with qualified name C<$tag> is allowed to have an attribute with the given name. =back =head1 SEE ALSO L, L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Model/�������������������������������������������������������������0000755�0001750�0001750�00000000000�12507513572�017351� 5����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Model/DTD.pm�������������������������������������������������������0000644�0001750�0001750�00000013662�12507513572�020332� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Model::DTD | # # | Extract Model information from a DTD | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Model::DTD; use strict; use warnings; use LaTeXML::Util::Pathname; use LaTeXML::Global; use LaTeXML::Common::Error; use LaTeXML::Common::XML; #********************************************************************** # NOTE: Arglist is DTD specific. # Effectively asks for DTD submodel. sub new { my ($class, $model, $roottag, $publicid, $systemid) = @_; my $self = { model => $model, roottag => $roottag, public_id => $publicid, system_id => $systemid }; bless $self, $class; return $self; } # Question: if we don't have a doctype, can we rig the queries to # let it build a `reasonable' document? # This is responsible for setting any DocType, and adding any # required namespace declarations to the root element. sub addSchemaDeclaration { my ($self, $document, $tag) = @_; $document->getDocument->createInternalSubset($tag, $$self{public_id}, $$self{system_id}); return; } #********************************************************************** # DTD Analysis #********************************************************************** # Uses XML::LibXML to read in the DTD. Then extracts a simplified # model: which elements can appear within each element, ignoring # (for now) the ordering, repeat, etc, of the elements. # From this, and the Tag declarations of autoOpen (that an # element can be opened automatically, if needed) we derive an implicit model. # Thus, if we want to insert an element (or, say #PCDATA) into an # element that doesn't allow it, we may find an implied element # to create & insert, and insert the #PCDATA into it. my $NAME_re = qr/[a-zA-Z0-9\-\_\:]+/; # [CONSTANT] sub loadSchema { my ($self) = @_; $$self{schema_loaded} = 1; NoteBegin("Loading DTD " . $$self{public_id} || $$self{system_id}); my $model = $$self{model}; $model->addTagContent('#Document', $$self{roottag}) if $$self{roottag}; # Parse the DTD my $dtd = $self->readDTD; return unless $dtd; NoteBegin("Analyzing DTD"); # Extract all possible namespace attributes foreach my $node ($dtd->childNodes()) { if ($node->nodeType() == XML_ATTRIBUTE_DECL) { if ($node->toString =~ /^$/) { my ($tag, $attr, $extra) = ($1, $2, $3); if ($attr =~ /^xmlns(:($NAME_re))?$/) { my $prefix = ($1 ? $2 : '#default'); my $ns; if ($extra =~ /^CDATA\s+#FIXED\s+(\'|\")(.*)\1\s*$/) { $ns = $2; } # Just record prefix, not element?? $model->registerDocumentNamespace($prefix, $ns); } } } } # Extract all possible children for each tag. foreach my $node ($dtd->childNodes()) { if ($node->nodeType() == XML_ELEMENT_DECL) { my $decl = $node->toString(); chomp($decl); if ($decl =~ /^$/) { my ($tag, $content) = ($1, $2); $content =~ s/[\+\*\?\,\(\)\|]/ /g; $content =~ s/\s+/ /g; $content =~ s/^\s+//; $content =~ s/\s+$//; $model->addTagContent($model->recodeDocumentQName($tag), ($content eq 'EMPTY' ? () : map { $model->recodeDocumentQName($_) } split(/ /, $content))); } else { Warn('misdefined', $decl, undef, "Can't process DTD declaration '$decl'"); } } elsif ($node->nodeType() == XML_ATTRIBUTE_DECL) { if ($node->toString =~ /^$/) { my ($tag, $attr, $extra) = ($1, $2, $3); if ($attr !~ /^xmlns/) { $model->addTagAttribute($model->recodeDocumentQName($tag), ($attr =~ /:/ ? $model->recodeDocumentQName($attr) : $attr)); } } } } NoteEnd("Analyzing DTD"); # Done analyzing NoteEnd("Loading DTD " . $$self{public_id} || $$self{system_id}); return; } sub readDTD { my ($self) = @_; LaTeXML::Common::XML::initialize_catalogs(); NoteBegin("Loading DTD for $$self{public_id} $$self{system_id}"); # NOTE: setting XML_DEBUG_CATALOG makes this Fail!!! my $dtd = XML::LibXML::Dtd->new($$self{public_id}, $$self{system_id}); if ($dtd) { NoteProgress(" via catalog "); } else { # Couldn't find dtd in catalog, try finding the file. (search path?) my $dtdfile = pathname_find($$self{system_id}, paths => $STATE->lookupValue('SEARCHPATHS'), installation_subdir => 'resources/DTD'); if ($dtdfile) { NoteProgress(" from $dtdfile "); $dtd = XML::LibXML::Dtd->new($$self{public_id}, $dtdfile); NoteProgress(" from $dtdfile ") if $dtd; Error('misdefined', $$self{system_id}, undef, "Parsing of DTD \"$$self{public_id}\" \"$$self{system_id}\" failed") unless $dtd; } else { Error('missing_file', $$self{system_id}, undef, "Can't find DTD \"$$self{public_id}\" \"$$self{system_id}\""); } } return $dtd; } #====================================================================== 1; __END__ =head1 NAME C - represents DTD document models; extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Model/RelaxNG.pm���������������������������������������������������0000644�0001750�0001750�00000101071�12507513572�021207� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Model::RelaxNG | # # | Extract Model information from a RelaxNG schema | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Model::RelaxNG; use strict; use warnings; use LaTeXML::Util::Pathname; use LaTeXML::Common::XML; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use Scalar::Util qw(weaken); my $XMLPARSER = LaTeXML::Common::XML::Parser->new(); # [CONSTANT] # $schema->documentModules; # NOTE: Pending problem; # Once we've got multiple namespaces in the schema, # we haven't provided a means to specify the namespace <=> prefix mapping! # It may be that rnc supplies that info, however! # Alternatively, we could extract a symbol from the url, # but it needs a sanity/collision check! # NOTE: When a schema is composed from various modules, # some elements may not be "reachable" and (perhaps) should be removed # Scan a RelaxNG schema into an internal summary of modules. sub new { my ($class, $model, $name) = @_; my $self = { name => $name, model => $model, modules => [], elementdefs => {}, defs => {}, elements => {}, internal_grammars => 0 }; weaken($$self{model}); # circular back ref; weaked so can be garbage collected. bless $self, $class; return $self; } sub addSchemaDeclaration { my ($self, $document, $tag) = @_; $document->insertPI('latexml', RelaxNGSchema => $$self{name}); return; } sub loadSchema { my ($self) = @_; NoteBegin("Loading RelaxNG $$self{name}"); # Scan the schema file(s), and extract the info my @schema = $self->scanExternal($$self{name}); if ($LaTeXML::Common::Model::RelaxNG::DEBUG) { print STDERR "========================\nRaw Schema\n"; map { showSchema($_) } @schema; } @schema = map { $self->simplify($_) } @schema; if ($LaTeXML::Common::Model::RelaxNG::DEBUG) { print STDERR "========================\nSimplified Schema\n"; map { showSchema($_) } @schema; print STDERR "========================\nElements\n"; foreach my $tag (sort keys %{ $$self{elements} }) { showSchema(['element', $tag, @{ $$self{elements}{$tag} }]); } print STDERR "========================\nModules\n"; foreach my $mod (@{ $$self{modules} }) { showSchema($mod); } } # The resulting @schema should contain the "start" of the grammar. my ($startcontent) = $self->extractContent('#Document', @schema); $$self{model}->addTagContent('#Document', keys %$startcontent); if ($LaTeXML::Common::Model::RelaxNG::DEBUG) { print STDERR "========================\nStart\n" . join(', ', keys %$startcontent) . "\n"; } # NOTE: Do something automatic about this too!?! # We'll need to generate namespace prefixes for all namespaces found in the doc! $$self{model}->registerDocumentNamespace(undef, "http://dlmf.nist.gov/LaTeXML"); # Distill the info into allowed children & attributes for each element. foreach my $tag (sort keys %{ $$self{elements} }) { if ($tag eq 'ANY') { # Ignore any internal structure (side effect of restricted names) $$self{model}->addTagContent($tag, 'ANY'); next; } my @body = @{ $$self{elements}{$tag} }; my ($content, $attributes) = $self->extractContent($tag, @body); $$self{model}->addTagContent($tag, keys %$content); $$self{model}->addTagAttribute($tag, keys %$attributes); } # Extract definitions of symbols that define Schema Classes, too foreach my $symbol (sort keys %{ $$self{defs} }) { if ($symbol =~ /^grammar\d+:(.+?)\.class$/) { my $name = $1; my ($content, $attributes) = $self->extractContent($symbol, $$self{defs}{$symbol}); $$self{model}->setSchemaClass($name, $content); } } NoteEnd("Loading RelaxNG $$self{name}"); return; } # Return two hashrefs for content & attributes sub extractContent { my ($self, $tag, @body) = @_; my (%attr, %child); my @savebody = @body; while (@body) { my $item = shift(@body); if (ref $item eq 'ARRAY') { my ($op, $name, @args) = @$item; if ($op eq 'attribute') { $attr{$name} = 1; } elsif ($op eq 'elementref') { $child{$name} = 1; } elsif ($op eq 'doc') { } elsif ($op eq 'combination') { push(@body, @args); } elsif ($op eq 'grammar') { push(@body, $self->extractStart(@args)); } elsif ($op eq 'module') { push(@body, $self->extractStart(@args)); } elsif (($op eq 'ref') || ($op eq 'parentref')) { if (my $el = $$self{elementdefs}{$name}) { push(@body, ['elementref', $el]); } elsif (my $expansion = $$self{defs}{$name}) { push(@body, $expansion); } } elsif ($op eq 'element') { $child{$name} = 1; } # ??? elsif (($op eq 'value') || ($op eq 'data')) { $child{'#PCDATA'} = 1; } else { print STDERR "Unknown child $op [$name] of element $tag in extractContent\n"; } } elsif ($item eq '#PCDATA') { $child{'#PCDATA'} = 1; } } return ({%child}, {%attr}); } #====================================================================== # Internal Representation of a RelaxNG Schema # This should build a usable intermediate structure # WITHOUT side effects so that an (eventual) rnc parser # can create the same intermediate! # Intermediate structure is a list of # strings (representing the raw leave data) # and recursive items of the following form: # [$op, $name, @forms] # where $op is one of: # ref : defines or references a symbol # parentref : references a symbol in parent's context [converted to ref by simplify] # def,defchoice,definterleave : defines $name to be @forms. # the last 2 combine w/existing values. # elementref : references an element [Added by expand] # grammar : Collects the a grammar's specifications; defined names are scoped (to $name), # and the start is the effective pattern. # (due to or [Replaced by it's start by simplify] # override : The @forms consist of a module, and the rest are replacement rules. # (due to ) [Reduced to 'module' by simplify] # element : $name is the tag QName, @forms are the content/attribute patterns # attribute : $name is the attribute's QName, @forms are the patterns for the value. # start : the grammar's start pattern ($name is undef) # value : a literal value (typically for attributes) # data : a data type # doc : An annotation ($name is undef), the @forms are documentation strings. # combination: combines various other patterns in @forms, # $name is one of group, interleave, choice, optional, zeroOrMore, oneOrMore, list # module : collects the specifications coming from a separate schema file # for documentation purposes; $name is the name of the file # [Recored in the modules list and replaced by its content in simplify] # Tricky is getting the thing scanned and creating blocks # that should be separately documentable. # Each external schema (whether include or external)? #====================================================================== # SCAN: Walk through the XML Representation compiling information about # modules, definitions and elements. # # The representation built here has minimal processing done, # so that hopefully it will be feasable to generate the same structure # from a parsed RelaxNG Compact, without duplicating the processing. #====================================================================== my %RNGNSMAP = ( # [CONSTANT] "http://relaxng.org/ns/structure/1.0" => 'rng', "http://relaxng.org/ns/compatibility/annotations/1.0" => 'rnga'); local @LaTeXML::Common::Model::RelaxNG::PATHS = (); sub scanExternal { my ($self, $name, $inherit_ns) = @_; my $modname = $name; $modname =~ s/\.rn(g|c)$//; my $paths = [@LaTeXML::Common::Model::RelaxNG::PATHS, @{ $STATE->lookupValue('SEARCHPATHS') }]; if (my $schemadoc = LaTeXML::Common::XML::RelaxNG->new($name, searchpaths => $paths)) { my $uri = $schemadoc->URI; NoteBegin("Loading RelaxNG schema from $uri"); local @LaTeXML::Common::Model::RelaxNG::PATHS = (pathname_directory($schemadoc->URI), @LaTeXML::Common::Model::RelaxNG::PATHS); my $node = $schemadoc->documentElement; # Fetch any additional namespaces foreach my $ns ($node->getNamespaces) { my ($prefix, $nsuri) = ($ns->getLocalName, $ns->getData); next if $nsuri =~ m|^http://relaxng.org|; # Ignore RelaxNG namespaces(!!) $$self{model}->registerDocumentNamespace($prefix, $nsuri); } my $mod = (['module', $modname, $self->scanPattern($node, $inherit_ns)]); NoteEnd("Loading RelaxNG schema from $uri"); return $mod; } else { Warn('expected', $name, undef, "Failed to find RelaxNG schema for name '$name'"); return (); } } sub getRelaxOp { my ($node) = @_; return unless $node->nodeType == XML_ELEMENT_NODE; my $ns = $node->namespaceURI; my $prefix = $ns && $RNGNSMAP{$ns}; return ($prefix ? $prefix : "{$ns}") . ":" . $node->localname; } sub getElements { my ($node) = @_; return grep { $_->nodeType == XML_ELEMENT_NODE } $node->childNodes; } my $COMBINER_re = # [CONSTANT] qr/group|interleave|choice|optional|zeroOrMore|oneOrMore|list/; sub scanPattern { my ($self, $node, $inherit_ns) = @_; if (my $relaxop = getRelaxOp($node)) { my $ns = $node->getAttribute('ns') || $inherit_ns; # Possibly bind new namespace # Element description if ($relaxop eq 'rng:element') { return $self->scanPattern_element($ns, $node); } # Attribute description elsif ($relaxop eq 'rng:attribute') { return $self->scanPattern_attribute($ns, $node); } # Various combiners elsif ($relaxop =~ /^rng:($COMBINER_re)$/) { my $op = $1; return (['combination', $op, $self->scanChildren($ns, getElements($node))]); } # Mixed is a combiner but includes #PCDATA elsif ($relaxop eq 'rng:mixed') { return (['combination', 'interleave', '#PCDATA', $self->scanChildren($ns, getElements($node))]); } # Reference to a defined symbol, or parent grammar's defined symbol elsif ($relaxop =~ /^rng:(ref|parentRef)$/) { my $op = lc($1); return ([$op, $node->getAttribute('name')]); } elsif ($relaxop =~ /^rng:(empty|notAllowed)$/) { # Ignorable here return (); } elsif ($relaxop eq 'rng:text') { return ('#PCDATA'); } elsif ($relaxop eq 'rng:value') { # Not interested in details here. return (['value', undef, $node->textContent]); } elsif ($relaxop eq 'rng:data') { # Not interested in details here. return (['data', undef, $node->getAttribute('type')]); } # Include an external grammar elsif ($relaxop eq 'rng:externalRef') { return $self->scanExternal($node->getAttribute('href'), $ns); } # Include an internal grammar elsif ($relaxop eq 'rng:grammar') { return (['grammar', "grammar" . (++$$self{internal_grammars}), $self->scanGrammarContent($ns, getElements($node))]); } elsif ($relaxop =~ /^rnga:documentation$/) { return (['doc', undef, $node->textContent]); } else { Warn('misdefined', $relaxop, undef, "Didn't expect '$relaxop' in RelaxNG Schema (scanPattern)"); return (); } } else { return (); } } sub scanPattern_element { my ($self, $ns, $node) = @_; my @children = getElements($node); if (my $name = $node->getAttribute('name')) { return (['element', $$self{model}->encodeQName($ns, $name), $self->scanChildren($ns, @children)]); } else { my $namenode = shift(@children); my @names = $self->scanNameClass($namenode, $ns); return map { ['element', $_, $self->scanChildren($ns, @children)] } @names; } } sub scanPattern_attribute { my ($self, $ns, $node) = @_; $ns = $node->getAttribute('ns'); # ONLY explicit declaration! my @children = getElements($node); if (my $name = $node->getAttribute('name')) { return (['attribute', $$self{model}->encodeQName($ns, $name), $self->scanChildren($ns, @children)]); } else { my $namenode = shift(@children); my @names = $self->scanNameClass($namenode, $ns); return map { ['attribute', $_, $self->scanChildren($ns, @children)] } @names; } } sub scanChildren { my ($self, $ns, @children) = @_; return grep { $_ } map { ($self->scanPattern($_, $ns)) } @children; } sub scanGrammarContent { my ($self, $ns, @content) = @_; return map { $self->scanGrammarItem($_, $ns) } @content; } sub scanGrammarItem { my ($self, $node, $inherit_ns) = @_; if (my $relaxop = getRelaxOp($node)) { my @children = getElements($node); my $ns = $node->getAttribute('ns') || $inherit_ns; # Possibly bind new namespace # The start element's content is returned if ($relaxop eq 'rng:start') { return (['start', undef, $self->scanChildren($ns, @children)]); } elsif ($relaxop eq 'rng:define') { my $name = $node->getAttribute('name'); my $op = $node->getAttribute('combine') || ''; return (['def' . $op, $name, $self->scanChildren($ns, @children)]); } elsif ($relaxop eq 'rng:div') { return $self->scanGrammarContent($ns, @children); } elsif ($relaxop eq 'rng:include') { my $name = $node->getAttribute('href'); my $paths = [@LaTeXML::Common::Model::RelaxNG::PATHS, @{ $STATE->lookupValue('SEARCHPATHS') }]; if (my $schemadoc = LaTeXML::Common::XML::RelaxNG->new($name, searchpaths => $paths)) { local @LaTeXML::Common::Model::RelaxNG::PATHS = (pathname_directory($schemadoc->URI), @LaTeXML::Common::Model::RelaxNG::PATHS); my @patterns; # Hopefully, just a file, not a URL? my $doc = $schemadoc->documentElement; # Ignore the grammar level, if any, since we do NOT establish a binding with include if (getRelaxOp($doc) eq 'rng:grammar') { my $nns = $doc->getAttribute('ns') || $inherit_ns; # Possibly bind new namespace @patterns = $self->scanGrammarContent($nns, getElements($doc)); } else { @patterns = $self->scanPattern($doc, undef); } # The rule is "includeContent", same as grammarContent # except that it shouldn't have nested rng:include; # we'll just assume there aren't any. my $mod = $name; $mod =~ s/\.rn(g|c)$//; if (my @replacements = $self->scanGrammarContent($ns, @children)) { return (['override', undef, ['module', $mod, @patterns], @replacements]); } else { return (['module', $mod, @patterns]); } } else { return (); } } } else { return (); } } sub scanNameClass { my ($self, $node, $ns) = @_; my $relaxop = getRelaxOp($node); if ($relaxop eq 'rng:name') { return ($$self{model}->encodeQName($ns, $node->textContent)); } elsif ($relaxop eq 'rng:anyName') { Info('unexpected', $relaxop, undef, "Can't handle RelaxNG operation '$relaxop'", "Treating " . ToString($node) . " as ANY") if $node->hasChildNodes; return ('ANY'); } elsif ($relaxop eq 'rng:nsName') { Info('unexpected', $relaxop, undef, "Can't handle RelaxNG operation '$relaxop'", "Treating " . ToString($node) . " as ANY"); # NOTE: We _could_ conceivably use a namespace predicate, # but Model has to be extended to support it! return ('ANY'); } elsif ($relaxop eq 'rng:choice') { my %names = (); foreach my $choice ($node->childNodes) { map { $names{$_} = 1 } $self->scanNameClass($choice, $ns); } return ($names{ANY} ? ('ANY') : keys %names); } else { my $op = $node->nodeName; Fatal('misdefined', $op, undef, "Expected a RelaxNG name element (rng:name|rng:anyName|rng:nsName|rng:choice), got '$op'"); return; } } #====================================================================== # Simplify # Various simplifications: # grammar : the binding of the separate space of defines is applied. # and the result is simplified, and replaced by the start. # module : stored for any documentation purposes, and simplified content returned. # ref, parentref : replaced by a ref of appropriately scoped symbol. # def : store the combined (but unexpanded) definitions # and # symbols, elements are recorded. sub eqOp { my ($form, $op) = @_; return (ref $form eq 'ARRAY') && ($$form[0] eq $op); } sub extractStart { my ($self, @items) = @_; my @starts = (); foreach my $item (@items) { if (ref $item eq 'ARRAY') { my ($op, $name, @args) = @$item; if ($op eq 'start') { push(@starts, @args); } elsif ($op eq 'module') { push(@starts, $self->extractStart(@args)); } elsif ($op eq 'grammar') { push(@starts, $self->extractStart(@args)); } } } return @starts; } # NOTE: Reconsider this process. # In particular, how we're returning throwing away stuff (after it gets recorded). # Mainly it's an issue for being able to document a schema, # having separate sections for each "module". # What order should we be simplifing and expanding? # For documentable modules we want: # the content, grammars NOT yet replaced by start, # elementdef's sorted out # For model extraction we also want # models flattened, grammars replaced by start, all symbols joined & expanded ##### # In Simplify # grammar: extract & return start # [for doc of a module, shouldn't do this, but for doc of an element, should!] # [Actually, I'm not even sure how to document an embedded grammar] # override: make replacements in module, return module # [this should always happen] # element : store in elements table # [OK] # ref : adjust name # [OK] # parentref : adjust name, convert to ref # [OK] # defchoice, definterleave, def: add to defns table, possibly combining with existing # [OK] # module : store in modules list, return contents # [for doc,we'd like to return nothing? but for getting grammar start we want content?] sub simplify { my ($self, $form, $binding, $parentbinding, $container) = @_; if (ref $form eq 'ARRAY') { my ($op, $name, @args) = @$form; if ($op eq 'grammar') { return (['grammar', $name, $self->simplify_args($name, $binding, $container, @args)]); } elsif ($op eq 'override') { return $self->simplify_override($binding, $parentbinding, $container, @args); } elsif ($op eq 'module') { my $module = ['module', $name]; push(@{ $$self{modules} }, $module); # Keep in order: push first, then scan contents push(@$module, $self->simplify_args($binding, $parentbinding, $container, @args)); return ($module); } elsif ($op eq 'element') { @args = $self->simplify_args($binding, $parentbinding, "element:$name", @args); push(@{ $$self{elements}{$name} }, @args); return (['element', $name, @args]); } elsif ($op =~ /^(ref|parentref)$/) { my $qname = ($1 eq 'parentref' ? $parentbinding : $binding) . ":" . $name; @args = $self->simplify_args($binding, $parentbinding, $container, @args); $$self{usesname}{$qname}{$container} = 1 if $container; return (['ref', $qname]); } elsif ($op =~ /^def(choice|interleave|)$/) { my $combination = $1 || 'group'; my $qname = $binding . ":" . $name; $$self{usesname}{$qname}{$container} = 1 if $container; @args = $self->simplify_args($binding, $parentbinding, "pattern:$qname", @args); # Special case: simple definition of an element if (($combination eq 'group') && (scalar(@args) == 1) && eqOp($args[0], 'element')) { $$self{elementdefs}{$qname} = $args[0][1]; $$self{elementreversedefs}{ $args[0][1] } = $qname; return @args; } else { my $prev = $$self{defs}{$qname}; my $prevc = $$self{def_combiner}{$qname}; my @xargs = grep { !eqOp($_, 'doc') } @args; # Remove annotations if ($prev) { # Previoud definition? if (($combination eq 'group') && ($prevc eq 'group')) { # apparently RE-defining $qname? $prev = undef; } elsif (($combination eq 'group') && ($prevc ne 'group')) { # Use old combination!?!?!?!? $combination = $prevc; } } $$self{defs}{$qname} = simplifyCombination(['combination', $combination, ($prev ? @$prev : ()), @xargs]); $$self{def_combiner}{$qname} = $combination; return ([$op, $qname, @args]); } } else { return ([$op, $name, $self->simplify_args($binding, $parentbinding, $container, @args)]); } } else { return ($form); } } sub simplify_args { my ($self, $binding, $parentbinding, $container, @forms) = @_; return map { $self->simplify($_, $binding, $parentbinding, $container) } @forms; } sub simplify_override { my ($self, $binding, $parentbinding, $container, @args) = @_; # Note that we do NOT simplify till we've made the replacements! my ($module, @replacements) = @args; my ($modop, $modname, @patterns) = @$module; # Replace any start from @patterns by that from @replacement, if any. if (my @replacement_start = grep { eqOp($_, 'start') } @replacements) { @patterns = grep { !eqOp($_, 'start') } @patterns; } foreach my $def (grep { (ref $_ eq 'ARRAY') && ($$_[0] =~ /^def/) } @replacements) { my ($defop, $symbol) = @$def; @patterns = grep { !(eqOp($_, $defop) && ($$_[1] eq $symbol)) } @patterns; } # Recurse on the overridden module return $self->simplify(['module', "$modname (overridden)", @patterns, @replacements], $binding, $parentbinding, $container); } sub simplifyCombination { my ($combination) = @_; if ((ref $combination) && ($$combination[0] eq 'combination')) { my ($c, $op, @stuff) = @$combination; @stuff = map { simplifyCombination($_) } @stuff; if ($op =~ /^(group|choice)$/) { # These can be flattened. @stuff = map { ((ref $_) && ($$_[0] eq 'combination') && ($$_[1] eq $op) ? @$_[2 .. $#$_] : ($_)) } @stuff; } return [$c, $op, @stuff]; } else { return $combination; } } #====================================================================== # For debugging... sub showSchema { my ($item, $level) = @_; $level = 0 unless defined $level; if (ref $item eq 'ARRAY') { my ($op, $name, @args) = @$item; if ($op eq 'doc') { $name = "..."; @args = (); } print STDERR "" . (' ' x (2 * $level)) . $op . ($name ? " " . $name : '') . "\n"; foreach my $arg (@args) { showSchema($arg, $level + 1); } } else { print STDERR "" . (' ' x (2 * $level)) . $item . "\n"; } return; } #====================================================================== # Generate TeX documentation for a Schema #====================================================================== # The svg schema can only just barely be read in and recognized, # but it is structured in a way that makes a joke of our attempt at automatic documentation my $SKIP_SVG = 1; # [CONFIGURABLE?] sub documentModules { my ($self) = @_; my $docs = ""; $$self{defined_patterns} = {}; foreach my $module (@{ $$self{modules} }) { my ($op, $name, @content) = @$module; next if $SKIP_SVG && $name =~ /:svg:/; # !!!! $name =~ s/^urn:x-LaTeXML:RelaxNG://; # Remove the urn part. $docs = join("\n", $docs, "\\begin{schemamodule}{$name}", (map { $self->toTeX($_) } @content), "\\end{schemamodule}"); } foreach my $name (keys %{ $$self{defined_patterns} }) { if ($$self{defined_patterns}{$name} < 0) { $docs =~ s/\\patternadd\{$name\}/\\patterndefadd{$name}/s; } } return $docs; } sub cleanTeX { my ($string) = @_; return '\typename{text}' if $string eq '#PCDATA'; $string =~ s/\#/\\#/g; $string =~ s/<([^>]*)>/\\texttt{$1}/g; # An apparent convention == ttfont? $string =~ s/_/\\_/g; return $string; } sub cleanTeXName { my ($string) = @_; $string = cleanTeX($string); $string =~ s/^ltx://; # $string =~ s/:/../; return $string; } sub toTeX { my ($self, $object) = @_; if (ref $object eq 'HASH') { return join(', ', map { "$_=" . $self->toTeX($$object{$_}) } sort keys %$object); } elsif (ref $object eq 'ARRAY') { # an object? my ($op, $name, @data) = @$object; if ($op eq 'doc') { return join(' ', map { cleanTeX($_) } @data) . "\n"; } elsif ($op eq 'ref') { return $self->toTeX_ref($op, $name); } elsif ($op =~ /^def(choice|interleave|)$/) { return $self->toTeX_def($1, $name, @data); } elsif ($op eq 'element') { return $self->toTeX_element($name, @data); } elsif ($op eq 'attribute') { return $self->toTeX_attribute($name, @data); } elsif ($op eq 'combination') { return $self->toTeX_combination($name, @data); } elsif ($op eq 'data') { return "\\typename{" . cleanTeX($data[0]) . "}"; } elsif ($op eq 'value') { return '\attrval{' . cleanTeX($data[0]) . "}"; } elsif ($op eq 'start') { my ($docs, @spec) = $self->toTeXExtractDocs(@data); my $content = join(' ', map { $self->toTeX($_) } @spec); return "\\item[\\textit{Start}]\\textbf{==}\\ $content" . ($docs ? " \\par$docs" : ''); } elsif ($op eq 'grammar') { # Don't otherwise mention it? ## join("\n",'\item[\textit{Grammar}:] '.$name, return join("\n", map { $self->toTeX($_) } @data); } elsif ($op eq 'module') { $name =~ s/^urn:x-LaTeXML:RelaxNG://; # Remove the urn part. if (($name =~ /^svg/) && $SKIP_SVG) { return '\item[\textit{Module }' . cleanTeX($name) . '] included.'; } else { return '\item[\textit{Module }\moduleref{' . cleanTeX($name) . '}] included.'; } } else { Warn('unexpected', $op, undef, "RelaxNG->toTeX: Unrecognized item $op"); return "[$op: " . join(', ', map { $self->toTeX($_) } @data) . "]"; } } else { return cleanTeX($object); } } sub toTeX_ref { my ($self, $op, $name) = @_; if (my $el = $$self{elementdefs}{$name}) { $el = cleanTeXName($el); return "\\elementref{$el}"; } else { $name =~ s/^\w+://; # Strip off qualifier!!!! (watch for clash in docs?) return "\\patternref{" . cleanTeX($name) . "}"; } } sub toTeX_def { my ($self, $combiner, $name, @data) = @_; my $qname = $name; $name =~ s/^\w+://; # Strip off qualifier!!!! (watch for clash in docs?) $name = cleanTeX($name); my ($docs, @spec) = $self->toTeXExtractDocs(@data); my ($attr, $content) = $self->toTeXBody(@spec); if ($combiner) { my $body = $attr; $body .= '\item[' . ($combiner eq 'choice' ? '\textbar=' : '\&=') . '] ' . $content if $content; $$self{defined_patterns}{$name} = -1 unless defined $$self{defined_patterns}{$name}; return "\\patternadd{$name}{$docs}{$body}\n"; } # elsif((scalar(@data)==1) && (ref $data[0] eq 'ARRAY') && ($data[0][0] eq 'grammar')){ else { $attr = '\item[\textit{Attributes:}] \textit{empty}' if !$attr && ($name =~ /\\_attributes/); $content = '\textit{empty}' if !$content && ($name =~ /\\_model/); my $body = $attr; $body .= '\item[\textit{Content}:] ' . $content if $content; my ($xattr, $xcontent) = $self->toTeXBody($$self{defs}{$qname}); $body .= '\item[\textit{Expansion}:] ' . $xcontent if !$attr && !$xattr && $xcontent && ($xcontent ne $content); if ($name !~ /_(?:attributes|model)$/) { # Skip the "used by" if element-specific attributes or moel. if (my $uses = $self->getSymbolUses($qname)) { $body .= '\item[\textit{Used by}:] ' . $uses; } } if ((defined $$self{defined_patterns}{$name}) && ($$self{defined_patterns}{$name} > 0)) { # Already been defined??? return ''; } else { $$self{defined_patterns}{$name} = 1; return "\\patterndef{$name}{$docs}{$body}\n"; } } } sub toTeX_element { my ($self, $name, @data) = @_; my $qname = $name; $name =~ s/^ltx://; $name = cleanTeXName($name); my ($docs, @spec) = $self->toTeXExtractDocs(@data); my ($attr, $content) = $self->toTeXBody(@spec); $content = "\\typename{empty}" unless $content; # Shorten display for element-specific attributes & model, ASSUMING they immediately folllow! $attr = '' if $attr eq '\item[\textit{Attributes}:] \patternref{' . $name . '\\_attributes}'; $content = '' if $content eq '\patternref{' . $name . '\\_model}'; my $body = $attr; $body .= '\item[\textit{Content}:] ' . $content if $content; if (my $ename = $$self{elementreversedefs}{$qname}) { if (my $uses = $self->getSymbolUses($ename)) { $body .= '\item[\textit{Used by}:] ' . $uses; } } return "\\elementdef{$name}{$docs}{$body}\n"; } sub toTeX_attribute { my ($self, $name, @data) = @_; $name = cleanTeXName($name); my ($docs, @spec) = $self->toTeXExtractDocs(@data); my $content = join(' ', map { $self->toTeX($_) } @spec) || '\typename{text}'; return "\\attrdef{$name}{$docs}{$content}"; } sub toTeX_combination { my ($self, $name, @data) = @_; if ($name eq 'group') { return "(" . join(', ', map { $self->toTeX($_) } @data) . ")"; } elsif ($name eq 'interleave') { return "(" . join(' ~\&~ ', map { $self->toTeX($_) } @data) . ")"; } # ? elsif ($name eq 'choice') { return "(" . join(' ~\textbar~ ', map { $self->toTeX($_) } @data) . ")"; } elsif ($name eq 'optional') { if ((@data == 1) && eqOp($data[0], 'attribute')) { return $self->toTeX($data[0]); } else { return $self->toTeX($data[0]) . "?"; } } elsif ($name eq 'zeroOrMore') { return $self->toTeX($data[0]) . "*"; } elsif ($name eq 'oneOrMore') { return $self->toTeX($data[0]) . "+"; } elsif ($name eq 'list') { return "(" . join(', ', map { $self->toTeX($_) } @data) . ")"; } # ? else { Warn('unexpected', $name, undef, "RelaxNG->toTeX: Unrecognized combination $name"); return; } } sub getSymbolUses { my ($self, $qname) = @_; if (my $uses = $$self{usesname}{$qname}) { my @uses = sort keys %$uses; @uses = grep { !/\bSVG./ } @uses if $SKIP_SVG; # !!! return join(', ', (map { /^pattern:[^:]*:(.*)$/ ? ('\patternref{' . cleanTeX($1) . '}') : () } @uses), (map { /^pattern:[^:]*:(.*)$/ ? ('\patternref{' . cleanTeX($1) . '}') : () } @uses), (map { /^element:(.*)$/ ? ('\elementref{' . cleanTeXName($1) . '}') : () } @uses)); } else { return ''; } } # Extract any documentation nodes from @data sub toTeXExtractDocs { my ($self, @data) = @_; my $docs = ""; my @rest = (); while (my $item = shift(@data)) { if ((ref $item eq 'ARRAY') && ($$item[0] eq 'doc')) { $docs .= $self->toTeX($item); } else { push(@rest, $item); } } return ($docs, @rest); } # Format the attributes & content model of a named pattern or element. # This generates a sequence of \item's to be put in a definition list. sub toTeXBody { my ($self, @data) = @_; my (@attributes, @content, @patterns); while (my $item = shift(@data)) { if (ref $item eq 'ARRAY') { my ($op, $name, @args) = @$item; # NOTE: W/o the simplification of optional(attribute), above, # we've got to do some extra work here! if ($op eq 'attribute') { push(@attributes, $self->toTeX($item)); } elsif (($op eq 'combination') && ($name eq 'optional') && (@args == 1) && eqOp($args[0], 'attribute')) { unshift(@data, $args[0]); } # Note dubious assumption about naming convention! elsif (($op eq 'ref') && ($name =~ /[^a-zA-Z]attributes$/)) { push(@patterns, $self->toTeX($item)); } else { push(@content, $self->toTeX($item)); } } else { push(@content, $self->toTeX($item)); } } return (join('', (@patterns ? '\item[\textit{' . ((grep { $_ !~ /[^a-zA-Z]attributes\}*?$/ } @patterns) ? 'Includes' : 'Attributes') . '}:] ' . join(', ', @patterns) : ''), @attributes), join(', ', @content)); } #====================================================================== 1; __END__ =head1 NAME C - represents RelaxNG document models; extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Number.pm����������������������������������������������������������0000644�0001750�0001750�00000012503�12507513572�020100� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Number | # # | Representation of numbers | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Number; use LaTeXML::Global; use strict; use warnings; use LaTeXML::Common::Object; use LaTeXML::Core::Token; use base qw(LaTeXML::Common::Object); use base qw(Exporter); our @EXPORT = (qw(&Number)); #====================================================================== # Exported constructor. sub Number { my ($number) = @_; return LaTeXML::Common::Number->new($number); } #====================================================================== sub new { my ($class, $number) = @_; return bless [$number || "0"], $class; } sub valueOf { my ($self) = @_; return $$self[0]; } sub toString { my ($self) = @_; return $$self[0]; } my @SCALES = (1, 10, 100, 1000, 10000, 100000); # smallest number that makes a difference added to 1 in Perl's float format. my $EPSILON = 1.0; while (1.0 + $EPSILON / 2 != 1) { $EPSILON /= 2.0; } # Round $number to $prec decimals (0...6) # attempting to do so portably. sub roundto { my ($number, $prec) = @_; $prec = 2 unless defined $prec; $prec = 0 if $prec < 0; $prec = 5 if $prec > 5; my $scale = $SCALES[$prec]; # scale to integer, w/some slop in case arbitrarily close to an integer... my $n = $number * $scale * (1 + 100 * $EPSILON); return int($n < -$EPSILON ? $n - 0.5 : ($n > $EPSILON ? $n + 0.5 : 0.0)) / $scale; } sub ptValue { my ($self, $prec) = @_; return roundto($$self[0] / 65536, $prec); } sub pxValue { my ($self, $prec) = @_; return roundto($$self[0] / 65536 * ($STATE->lookupValue('DPI') || 100 / 72.27), $prec); } sub unlist { my ($self) = @_; return $self; } sub revert { my ($self) = @_; return ExplodeText($self->toString); } sub smaller { my ($self, $other) = @_; return ($self->valueOf < $other->valueOf) ? $self : $other; } sub larger { my ($self, $other) = @_; return ($self->valueOf > $other->valueOf) ? $self : $other; } sub absolute { my ($self, $other) = @_; return (ref $self)->new(abs($self->valueOf)); } sub sign { my ($self) = @_; return ($self->valueOf < 0) ? -1 : (($self->valueOf > 0) ? 1 : 0); } sub negate { my ($self) = @_; return (ref $self)->new(-$self->valueOf); } sub add { my ($self, $other) = @_; return (ref $self)->new($self->valueOf + $other->valueOf); } sub subtract { my ($self, $other) = @_; return (ref $self)->new($self->valueOf - $other->valueOf); } # arg 2 is a number sub multiply { my ($self, $other) = @_; return (ref $self)->new(int($self->valueOf * (ref $other ? $other->valueOf : $other))); } sub stringify { my ($self) = @_; return "Number[" . $$self[0] . "]"; } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of numbers; extends L. =head2 Exported functions =over 4 =item C<< $number = Number($num); >> Creates a Number object representing C<$num>. =back =head2 Methods =over 4 =item C<< @tokens = $object->unlist; >> Return a list of the tokens making up this C<$object>. =item C<< $string = $object->toString; >> Return a string representing C<$object>. =item C<< $string = $object->ptValue; >> Return a value representing C<$object> without the measurement unit (pt) with limited decimal places. =item C<< $string = $object->pxValue; >> Return an integer value representing C<$object> in pixels. Uses the state variable C (dots per inch). =item C<< $n = $object->valueOf; >> Return the value in scaled points (ignoring shrink and stretch, if any). =item C<< $n = $object->smaller($other); >> Return C<$object> or C<$other>, whichever is smaller =item C<< $n = $object->larger($other); >> Return C<$object> or C<$other>, whichever is larger =item C<< $n = $object->absolute; >> Return an object representing the absolute value of the C<$object>. =item C<< $n = $object->sign; >> Return an integer: -1 for negatives, 0 for 0 and 1 for positives =item C<< $n = $object->negate; >> Return an object representing the negative of the C<$object>. =item C<< $n = $object->add($other); >> Return an object representing the sum of C<$object> and C<$other> =item C<< $n = $object->subtract($other); >> Return an object representing the difference between C<$object> and C<$other> =item C<< $n = $object->multiply($n); >> Return an object representing the product of C<$object> and C<$n> (a regular number). =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/Object.pm����������������������������������������������������������0000644�0001750�0001750�00000021206�12507513572�020056� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::Object | # # | Abstract base class for LaTeXML objects | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::Object; use strict; use warnings; use LaTeXML::Global; use XML::LibXML; # Need XML_xxx constants! use base qw(Exporter); our @EXPORT = ( qw(&Stringify &ToString &Revert &Equals), ); #====================================================================== # Exported generic functions for dealing with LaTeXML's objects #====================================================================== my %NOBLESS = map { ($_ => 1) } qw( SCALAR HASH ARRAY CODE REF GLOB LVALUE); # [CONSTANT] sub Stringify { my ($object) = @_; if (!defined $object) { return 'undef'; } elsif (!ref $object) { return $object; } elsif ($NOBLESS{ ref $object }) { return "$object"; } elsif ($object->can('stringify')) { return $object->stringify; } # Have to handle LibXML stuff explicitly (unless we want to add methods...?) elsif ($object->isa('XML::LibXML::Node')) { if ($object->nodeType == XML_ELEMENT_NODE) { my $tag = $STATE->getModel->getNodeQName($object); my $attributes = ''; foreach my $attr ($object->attributes) { my $name = $attr->nodeName; my $val = $attr->getData; $val = substr($val, 0, 30) . "..." if length($val) > 35; $attributes .= ' ' . $name . "=\"" . $val . "\""; } return "<" . $tag . $attributes . ($object->hasChildNodes ? ">..." : "/>"); } elsif ($object->nodeType == XML_TEXT_NODE) { return "XMLText[" . $object->data . "]"; } elsif ($object->nodeType == XML_DOCUMENT_NODE) { return "XMLDocument[" . $$object . "]"; } elsif ($object->nodeType == XML_DOCUMENT_FRAG_NODE) { return "XMLFragment[" . join('', map { Stringify($_) } $object->childNodes) . "]"; } else { return "$object"; } } else { return "$object"; } } sub ToString { my ($object) = @_; my $r; return (defined $object ? (($r = ref $object) && !$NOBLESS{$r} ? $object->toString : "$object") : ''); } # Just how deep of an equality test should this be? sub Equals { my ($a, $b) = @_; return 1 if !(defined $a) && !(defined $b); # both undefined, equal, I guess return 0 unless (defined $a) && (defined $b); # else both must be defined my $refa = (ref $a) || '_notype_'; my $refb = (ref $b) || '_notype_'; return 0 if $refa ne $refb; # same type? return $a eq $b if ($refa eq '_notype_') || $NOBLESS{$refa}; # Deep comparison of builtins? return 1 if $a->equals($b); # semi-shallow comparison? # Special cases? (should be methods, but that embeds State knowledge too low) if ($refa eq 'LaTeXML::Core::Token') { # Check if they've been \let to the same defn. my $defa = $STATE->lookupDefinition($a); my $defb = $STATE->lookupDefinition($b); return $defa && $defb && ($defa eq $defb); } return 0; } # Reverts an object into TeX code, as a Tokens list, that would create it. # Note that this is not necessarily the original TeX. sub Revert { my ($thing) = @_; return (defined $thing ? (ref $thing ? map { $_->unlist } $thing->revert : LaTeXML::Core::Token::Explode($thing)) # Ugh!! : ()); } #====================================================================== # LaTeXML Object # Base object for all LaTeXML Objects; # Defines basic default methods for comparison, printing # Tried to use overloading, but the Magic methods lead to hard-to-find # (and occasionally quite serious) performance issues -- at least, if you # try to have stringify do too much. #====================================================================== sub stringify { my ($object) = @_; my $string = "$object"; overload::StrVal($object); $string =~ s/^LaTeXML:://; $string =~ s/=(SCALAR|HASH|ARRAY|CODE|REF|GLOB|LVALUE|)\(/\[@/; $string =~ s/\)$/\]/; return $string; } sub toString { my ($self) = @_; return $self->stringify; } sub toAttribute { my ($self) = @_; return $self->toString; } sub equals { my ($a, $b) = @_; return "$a" eq "$b"; } # overload::StrVal($a) eq overload::StrVal($b); } sub notequals { my ($a, $b) = @_; return !($a->equals($b)); } sub isaToken { return 0; } sub isaBox { return 0; } sub isaDefinition { return 0; } # These should really only make sense for Data objects within the # processing stream. # Defaults (probably poor) sub beDigested { my ($self) = @_; return $self; } sub beAbsorbed { my ($self, $document) = @_; return $document->openText($self->toString, $document->getNodeFont($document->getElement)); } sub unlist { my ($self) = @_; return $self; } #********************************************************************** 1; __END__ =pod =head1 NAME C - abstract base class for most LaTeXML objects. =head1 DESCRIPTION C serves as an abstract base class for all other objects (both the data objects and control objects). It provides for common methods for stringification and comparison operations to simplify coding and to beautify error reporting. =head2 Generic functions =over 4 =item C<< $string = Stringify($object); >> Returns a string identifying C<$object>, for debugging. Works on any values and objects, but invokes the stringify method on blessed objects. More informative than the default perl conversion to a string. =item C<< $string = ToString($object); >> Converts C<$object> to string attempting, when possible, to generate straight text without TeX markup. This is most useful for converting Tokens or Boxes to document content or attribute values, or values to be used for pathnames, keywords, etc. Generally, however, it is not possible to convert Whatsits generated by Constructors into clean strings, without TeX markup. Works on any values and objects, but invokes the toString method on blessed objects. =item C<< $boolean = Equals($a,$b); >> Compares the two objects for equality. Works on any values and objects, but invokes the equals method on blessed objects, which does a deep comparison of the two objects. =item C<< $tokens = Revert($object); >> Returns a Tokens list containing the TeX that would create C<$object>. Note that this is not necessarily the original TeX code; expansions or other substitutions may have taken place. =back =head2 Methods =over 4 =item C<< $string = $object->stringify; >> Returns a readable representation of C<$object>, useful for debugging. =item C<< $string = $object->toString; >> Returns the string content of C<$object>; most useful for extracting a clean, usable, Unicode string from tokens or boxes that might representing a filename or such. To the extent possible, this should provide a string that can be used as XML content, or attribute values, or for filenames or whatever. However, control sequences defined as Constructors may leave TeX code in the value. =item C<< $boole = $object->equals($other); >> Returns whether $object and $other are equal. Should perform a deep comparision, but the default implementation just compares for object identity. =item C<< $boole = $object->isaToken; >> Returns whether C<$object> is an L. =item C<< $boole = $object->isaBox; >> Returns whether C<$object> is an L. =item C<< $boole = $object->isaDefinition; >> Returns whether C<$object> is an L. =item C<< $digested = $object->beDigested; >> Does whatever is needed to digest the object, and return the digested representation. Tokens would be digested into boxes; Some objects, such as numbers can just return themselves. =item C<< $object->beAbsorbed($document); >> Do whatever is needed to absorb the C<$object> into the C<$document>, typically by invoking appropriate methods on the C<$document>. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/XML.pm�������������������������������������������������������������0000644�0001750�0001750�00000040064�12507513572�017313� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::XML | # # | XML representation common to LaTeXML & Post | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # ###################################################################### # This is (the beginnings of) a common interface to XML, # specifically XML::LibXML, used in LaTeXML and also Post processing. # Collecting this here will hopefully allow us to # * (eventually) make useful extensions to the DOM api. # * hide any version specific patches that have become necessary # Convenience Utilities to simplify using XML::LibXML # #====================================================================== # An eventual possibility which would be to wrap all XML::LibXML objects # in our own classes. This would give a cleaner way to extend the API, # [the extensions _should_ be methods, not random exported functions!!!] # and also to implement patches [currently kinda worrisome]. # # However, it would require some clumsy (& probably expensive) # re-blessing or wrapping of all # common LibXML accessors # [ie. nodeChildren would need to convert all children to the new type]. # ###################################################################### # One concern is to clone any nodes ..... package LaTeXML::Common::XML; use strict; use warnings; use XML::LibXML qw(:all); use XML::LibXML::XPathContext; use LaTeXML::Util::Pathname; use Encode; use Carp; # ? require LaTeXML::Common::XML::Parser; require LaTeXML::Common::XML::XPath; require LaTeXML::Common::XML::XSLT; require LaTeXML::Common::XML::RelaxNG; # we're too low-level to use LaTeXML's error handling, but at least use Carp....(?) use base qw(Exporter); our @EXPORT = ( # Export just these symbols from XML::LibXML # Possibly (if/when we abstract away from XML::LibXML), we should be selective? qw( XML_ELEMENT_NODE XML_ATTRIBUTE_NODE XML_TEXT_NODE XML_CDATA_SECTION_NODE XML_ENTITY_REF_NODE XML_ENTITY_NODE XML_PI_NODE XML_COMMENT_NODE XML_DOCUMENT_NODE XML_DOCUMENT_TYPE_NODE XML_DOCUMENT_FRAG_NODE XML_NOTATION_NODE XML_HTML_DOCUMENT_NODE XML_DTD_NODE XML_ELEMENT_DECL XML_ATTRIBUTE_DECL XML_ENTITY_DECL XML_NAMESPACE_DECL XML_XINCLUDE_END XML_XINCLUDE_START encodeToUTF8 decodeFromUTF8 ), @XML::LibXML::EXPORT, # Possibly (later) export these utility functions qw(&element_nodes &text_in_node &new_node &append_nodes &clear_node &maybe_clone &valid_attributes ©_attributes &rename_attribute &remove_attr &get_attr &isTextNode &isElementNode &isChild &isDescendant &isDescendantOrSelf &set_RDFa_prefixes &initialize_catalogs) ); # These really should be constant, but visible outside! our $XMLNS_NS = 'http://www.w3.org/2000/xmlns/'; # [CONSTANT] our $XML_NS = 'http://www.w3.org/XML/1998/namespace'; # [CONSTANT] #====================================================================== # XML Utilities sub element_nodes { my ($node) = @_; return grep { $_->nodeType == XML_ELEMENT_NODE } $node->childNodes; } sub text_in_node { my ($node) = @_; return join("\n", map { $_->data } grep { $_->nodeType == XML_TEXT_NODE } $node->childNodes); } sub isTextNode { my ($node) = @_; return $node->nodeType == XML_TEXT_NODE; } sub isElementNode { my ($node) = @_; return $node->nodeType == XML_ELEMENT_NODE; } # Is $child a child of $parent? sub isChild { my ($child, $parent) = @_; my $p = $child && $child->parentNode; return 1 if $p && $p->isSameNode($parent); return 0; } # Is $child a descendant of $parent? sub isDescendant { my ($child, $parent) = @_; my $p = $child && $child->parentNode; while ($p) { return 1 if $p->isSameNode($parent); $p = $p->parentNode; } return 0; } # Is $child the same as $parent, or a descendent of $parent? sub isDescendantOrSelf { my ($child, $parent) = @_; my $p = $child; while ($p) { return 1 if $p->isSameNode($parent); $p = $p->parentNode; } return 0; } sub new_node { my ($nsURI, $tag, $children, %attributes) = @_; # print "\n\n\nnsURI: $nsURI, tag: $tag, children: $children\n"; my ($nspre, $rawtag) = (undef, $tag); if ($tag =~ /^(\w+):(.*)$/) { ($nspre, $rawtag) = ($1, $2 || $tag); } my $node = XML::LibXML::Element->new($rawtag); # my $node=$LaTeXML::Post::DOC->createElement($tag); # my $node=$LaTeXML::Post::DOC->createElementNS($nsURI,$tag); if ($nspre) { $node->setNamespace($nsURI, $nspre, 1); } else { $node->setNamespace($nsURI); } append_nodes($node, $children); foreach my $key (sort keys %attributes) { $node->setAttribute($key, $attributes{$key}) if defined $attributes{$key}; } return $node; } # Append the given nodes (which might also be array ref's of nodes, or even strings) # to $node. This takes care to clone any node that already has a parent. sub append_nodes { my ($node, @children) = @_; foreach my $child (@children) { if (ref $child eq 'ARRAY') { append_nodes($node, @$child); } elsif (ref $child) { #eq 'XML::LibXML::Element'){ $node->appendChild(maybe_clone($child)); } elsif (defined $child) { $node->appendText($child); } } return $node; } sub clear_node { my ($node) = @_; return map { $node->removeChild($_) } grep { ($_->nodeType == XML_ELEMENT_NODE) || ($_->nodeType == XML_TEXT_NODE) } $node->childNodes; } # We have to be _extremely_ careful when rearranging trees when using XML::LibXML!!! # If we add one node to another, it is _silently_ removed from it's previous # parent, if any! # Hopefully, this test is sufficient? sub maybe_clone { my ($node) = @_; return ($node->parentNode ? $node->cloneNode(1) : $node); } # the attributes list may contain undefined values # and attributes with no name (?) sub valid_attributes { my ($node) = @_; return grep { $_ && $_->getName } $node->attributes; } # copy @attr attributes from $from to $to sub copy_attributes { my ($to, $from) = @_; foreach my $attr ($from->attributes) { my $key = $attr->getName; $to->setAttribute($key, $from->getAttribute($key)); } return; } sub rename_attribute { my ($node, $from, $to) = @_; $node->setAttribute($to, $node->getAttribute($from)); $node->removeAttribute($from); return; } sub remove_attr { my ($node, @attr) = @_; map { $node->removeAttribute($_) } @attr; return; } sub get_attr { my ($node, @attr) = @_; return map { $node->getAttribute($_) } @attr; } # NOTE: This really should be part of some top-level 'common' initialization # and probably should accommodate catalogs being given as configuration options! # However, it presumably sets some global state in XML::LibXML, # so it's safe to do ( record! ) once, even across Daemon calls. my $catalogs_initialized = 0; # [CONFIGURATION] sub initialize_catalogs { return if $catalogs_initialized; $catalogs_initialized = 1; foreach my $catalog (pathname_findall('LaTeXML.catalog', installation_subdir => '.')) { XML::LibXML->load_catalog($catalog); } return; } # FINISH THIS EXPERIMENT LATER.... # We need to be able to find various XML resources: XSLT, RelaxNG and other random xml. # Catalogs provide one means to provide a level of abstraction in pathname location. # But at least at the top level, files ought to be searched for according to the current # search paths (being command line arguments, relative to source files, etc); # Possibly files might be referred to within XML files that libxml is already parsing # and these could benefit from the searchpath approach? # # We can also provide InputCallbacks to the various XML::LibXML objects that allow # us to programatically find & read these XML items according to the searchpaths. # One problem is that we don't have (from this level of the API) # a clean method of accessing the current search paths! # Another problem is that embedded references to oddly-located relative files will usually # get turned into paths relative to the top-level document that libxml is currently reading! # So, we'll be given an absolute path before we have a chance to search the searchpaths for it! # # Perhaps we should even handle the catalog functionality here? # # How should we find the searchpaths????? # Note also that if you use relative pathnames to refer to xml objects from within another, # that libxml2 will already have _assumed_ that it is relative to the base document! # That is, we're getting an absolute path, here. Of course, the original request # could have been an absolute path, so we probably shouldn't be blithely rewriting abs paths!!! # We could take over the whole catalog business, however... # sub initialize_input_callbacks { # my($object,%options) = @_; # # return; # # THIS IS TOTALLY WRONG!!!! Figure out how we'll find out about search paths! # my $paths = $LaTeXML::SEARCHPATHS; # # options might be installation_subdirs, or such pathname_find things. # my $cb = XML::LibXML::InputCallback->new(); # $cb->register_callbacks([ # sub { # Matcher # my($uri)=@_; # print STDERR "INPUT CHECK: $uri\n"; # # We don't want to do the search here, 'cause we'll have to do it again in open! # return 0 if $uri =~ m|^file://|; # pass on absolute pathnames # return 0 if pathname_is_absolute($uri); # a # # if($uri =~ /^urn:x-LaTeXML:([^:]*):(.*)$/){ # print STDERR "INPUT ACCEPT: $uri\n"; # return 1; }, # sub { # Opener # my($uri)=@_; # my $handle; # $uri =~ s|^file://||; # # WARNING!!! Kludge alert! # my @paths = ('.'); # push(@paths, @$LaTeXML::SEARCHPATHS) if $LaTeXML::SEARCHPATHS; # push(@paths, @{$LaTeXML::POST{searchpaths}}) if $LaTeXML::POST; # push(@paths, $LaTeXML::DOCUMENT->getSearchPaths) if $LaTeXML::DOCUMENT; # if(my $pathname = pathname_find($uri, # # types => ['xsl'], installation_subdir => 'resources/XSLT', # paths=>[@paths])){ # open($handle,$pathname); # return $handle; } # else { # Error('missing-file',$uri,undef, # "Couldn't find file '$uri' in search paths", # "Search paths were ".join(',',@paths)); # return; }}, # sub { # Reader # my($handle,$length)=@_; # my $buffer; # read($handle,$buffer,$length); # return $buffer; }, # sub { # Closer # my($handle)=@_; # close($handle); # return; }]); # $object->input_callbacks($cb); # return; } #====================================================================== # Odd place for this utility, but it is needed in both conversion & post # ALSO needs error reporting capability. my @RDF_TERM_ATTRIBUTES = ( # [CONSTANT] qw(about resource property typeof rel rev datatype)); my %NON_RDF_PREFIXES = map { ($_ => 1) } qw(http https ftp); # [CONSTANT] sub set_RDFa_prefixes { my ($document, $map) = @_; my $root = $document->documentElement; my %prefixes = (); my %localmap = map { ($_ => $$map{$_}) } keys %$map; if (my $prefixes = $root->getAttribute('prefix')) { my @x = split(/\s/, $prefixes); while (@x) { my ($prefix, $uri) = (shift(@x), shift(@x)); $prefix =~ s/:$//; $prefixes{$prefix} = 1; if (!$localmap{$prefix}) { $localmap{$prefix} = $uri; } elsif ($localmap{$prefix} ne $uri) { carp "Clash of RDFa prefix '$prefix' ('$uri' vs '$localmap{$prefix}'); " . "Skipping RDFa prefix management"; return; } } } if (my @n = $document->findnodes('descendant::*[@prefix]')) { if ((scalar(@n) > 1) || !$root->isSameNode($n[0])) { carp "RDFa attribute 'prefix' on non-root node; " . "Skipping RDFa prefix management"; return; } } if (my @n = $document->findnodes('descendant::*[@vocab]')) { carp "RDFa attribute 'vocab' on non-root node; " . "Skipping RDFa prefix management"; return; } my $xpath = 'descendant::*[' . join(' or ', map { '@' . $_ } @RDF_TERM_ATTRIBUTES) . ']'; foreach my $node ($document->findnodes($xpath)) { foreach my $k (@RDF_TERM_ATTRIBUTES) { if (my $v = $node->getAttribute($k)) { foreach my $term (split(/\s/, $v)) { if (($term =~ /^(\w+):/) && !$NON_RDF_PREFIXES{$1}) { $prefixes{$1} = 1 if $localmap{$1}; } } } } } # A prefix is a prefix IFF there is a mapping!! if (my $prefixes = join(' ', map { $_ . ": " . $localmap{$_} } sort keys %prefixes)) { $root->setAttribute(prefix => $prefixes); } return; } ###################################################################### # PATCH Section ###################################################################### # Various versions of XML::LibXML have introduced incompatable improvements # We can run using older versions, but have to patch things up to # a consistent level. our $original_XML_LibXML_Document_toString; # [CONFIGURATION] our $original_XML_LibXML_Element_getAttribute; # [CONFIGURATION] our $original_XML_LibXML_Element_hasAttribute; # [CONFIGURATION] our $original_XML_LibXML_Element_setAttribute; # [CONFIGURATION] BEGIN { *original_XML_LibXML_Document_toString = *XML::LibXML::Document::toString; *original_XML_LibXML_Element_getAttribute = *XML::LibXML::Element::getAttribute; *original_XML_LibXML_Element_hasAttribute = *XML::LibXML::Element::hasAttribute; *original_XML_LibXML_Element_setAttribute = *XML::LibXML::Element::setAttribute; } # As of 1.63, LibXML converts a document "to String" as bytes, not characters (?) sub encoding_XML_LibXML_Document_toString { my ($self, $depth) = @_; # Encode::encode("utf-8", $self->original_XML_LibXML_Document_toString($depth)); } return Encode::encode("utf-8", original_XML_LibXML_Document_toString($self, $depth)); } # As of 1.59, element attribute methods accept attributes names as "xml:foo" # (in particular, xml:id), without explicitly calling the NS versions. # The new form is considerably more convenient. sub xmlns_XML_LibXML_Element_getAttribute { my ($self, $name) = @_; if ($name =~ /^xml:(.*)$/) { my $attr = $1; return $self->getAttributeNS($LaTeXML::Common::XML::XML_NS, $attr); } else { return original_XML_LibXML_Element_getAttribute($self, $name); } } sub xmlns_XML_LibXML_Element_hasAttribute { my ($self, $name) = @_; if ($name =~ /^xml:(.*)$/) { my $attr = $1; return $self->hasAttributeNS($LaTeXML::Common::XML::XML_NS, $attr); } else { return original_XML_LibXML_Element_hasAttribute($self, $name); } } sub xmlns_XML_LibXML_Element_setAttribute { my ($self, $name, $value) = @_; if ($name =~ /^xml:(.*)$/) { my $attr = $1; return $self->setAttributeNS($LaTeXML::Common::XML::XML_NS, $attr, $value); } else { return original_XML_LibXML_Element_setAttribute($self, $name, $value); } } our $xml_libxml_version; # [CONFIGURATION] BEGIN { $xml_libxml_version = $XML::LibXML::VERSION; $xml_libxml_version =~ s/_\d+$//; ### print STDERR "XML::LibXML Version $XML::LibXML::VERSION => $xml_libxml_version\n"; if ($xml_libxml_version < 1.63) { *XML::LibXML::Document::toString = *encoding_XML_LibXML_Document_toString; } if ($xml_libxml_version < 1.59) { *XML::LibXML::Element::getAttribute = *xmlns_XML_LibXML_Element_getAttribute; *XML::LibXML::Element::hasAttribute = *xmlns_XML_LibXML_Element_hasAttribute; *XML::LibXML::Element::setAttribute = *xmlns_XML_LibXML_Element_setAttribute; } } #====================================================================== 1; ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/XML/���������������������������������������������������������������0000755�0001750�0001750�00000000000�12507513572�016751� 5����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/XML/Parser.pm������������������������������������������������������0000644�0001750�0001750�00000005266�12507513572�020554� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::XML::Parser | # # | XML Parser (wrapper for XML::LibXML | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::XML::Parser; use strict; use warnings; use XML::LibXML; sub new { my ($class) = @_; my $parser = XML::LibXML->new(); $parser->validation(0); return bless { parser => $parser }, $class; } sub parseFile { my ($self, $file) = @_; LaTeXML::Common::XML::initialize_catalogs(); # LaTeXML::Common::XML::initialize_input_callbacks($$self{parser}); return $$self{parser}->parse_file($file); } sub parseString { my ($self, $string) = @_; return $$self{parser}->parse_string($string); } sub parseChunk { my ($self, $string) = @_; my $hasxmlns = $string =~ /\Wxml:id\W/; # print STDERR "\nFISHY!!\n" if $hasxmlns; my $xml = $$self{parser}->parse_xml_chunk($string); # Simplify, if we get a single node Document Fragment. #[which we, apparently, always do] if ($xml && (ref $xml eq 'XML::LibXML::DocumentFragment')) { my @k = $xml->childNodes; $xml = $k[0] if (scalar(@k) == 1); } # $xml = $xml->cloneNode(1); #### # In 1.58, the prefix for the XML_NS, which should be DEFINED to be "xml" # is sometimes unbound, leading to mysterious segfaults!!! ### if (($xml_libxml_version < 1.59) && $hasxmlns) { if (($LaTeXML::Common::XML::xml_libxml_version < 1.59) && $hasxmlns) { #print STDERR "Patchup...\n"; # Re-create all xml:id entrys, hopefully with correct NS! # We assume all id are, in fact, xml:id, # because we seemingly can't probe the namespace! foreach my $attr ($xml->findnodes("descendant-or-self::*/attribute::*[local-name()='id']")) { my $element = $attr->parentNode; my $id = $attr->getValue(); #print STDERR "RESET ID: $id\n"; $attr->unbindNode(); $element->setAttributeNS($LaTeXML::Common::XML::XML_NS, 'id', $id); } # print STDERR "\nXML: ".$xml->toString."\n"; } return $xml; } #====================================================================== 1; ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/XML/RelaxNG.pm�����������������������������������������������������0000644�0001750�0001750�00000005412�12507513572�020611� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::XML::RelaxNG | # # | wrapper for XML::LibXML | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::XML::RelaxNG; use strict; use warnings; use XML::LibXML; use LaTeXML::Util::Pathname; # Note: XML::LibXML::RelaxNG->new(...) takes # location=>$filename_or_url; # string=>$schemastring # DOM=>$doc # options: nocatalogs, searchpaths # Create a Wrapper for a RelaxNG, # containing the XML document representing the schema # defering converting it to an actual RelaxNG object. sub new { my ($class, $name, %options) = @_; LaTeXML::Common::XML::initialize_catalogs(); my $xmlparser = LaTeXML::Common::XML::Parser->new(); my $schemadoc; $name .= ".rng" unless $name =~ /\.rng$/; # First, try to load directly, in case it's found via libxml's catalogs... # But be careful calling C library; its failures are harder to trap w/eval if (!$options{nocatalogs}) { $schemadoc = eval { no warnings 'all'; local $SIG{'__DIE__'} = undef; $xmlparser->parseFile($name); }; } if (!$schemadoc) { if (my $path = pathname_find($name, paths => $options{searchpaths} || ['.'], types => ['rng'], # Eventually, rnc? installation_subdir => 'resources/RelaxNG')) { # Hopefully, just a file, not a URL? $schemadoc = $xmlparser->parseFile($path); } else { return; # ??? } } return bless { schemadoc => $schemadoc }, $class; } sub validate { my ($self, $document) = @_; # Lazy conversion of the Schema's XML doc into an actual RelaxNG object. if (!$$self{schema} && $$self{schemadoc}) { $$self{schema} = XML::LibXML::RelaxNG->new(DOM => $$self{schemadoc}); } return $$self{schema}->validate($document); } # This returns the root element of the XML document representing the schema! sub documentElement { my ($self) = @_; return $$self{schemadoc}->documentElement; } sub URI { my ($self) = @_; return $$self{schemadoc}->URI; } #====================================================================== 1; ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/XML/XPath.pm�������������������������������������������������������0000644�0001750�0001750�00000003243�12507513572�020335� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::XML::XPath | # # | XML Parser (wrapper for XML::LibXML | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::XML::XPath; use strict; use warnings; use XML::LibXML::XPathContext; sub new { my ($class, %mappings) = @_; my $context = XML::LibXML::XPathContext->new(); foreach my $prefix (keys %mappings) { $context->registerNs($prefix => $mappings{$prefix}); } return bless { context => $context }, $class; } sub registerNS { my ($self, $prefix, $url) = @_; $$self{context}->registerNs($prefix => $url); return; } sub registerFunction { my ($self, $name, $function) = @_; $$self{context}->registerFunction($name => $function); return; } sub findnodes { my ($self, $xpath, $node) = @_; return $$self{context}->findnodes($xpath, $node); } sub findvalue { my ($self, $xpath, $node) = @_; return $$self{context}->findvalue($xpath, $node); } #====================================================================== 1; �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Common/XML/XSLT.pm��������������������������������������������������������0000644�0001750�0001750�00000003300�12507513572�020075� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Common::XML::XSLT | # # | wrapper for XML::LibXML | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Common::XML::XSLT; use strict; use warnings; use XML::LibXSLT; sub new { my ($class, $stylesheet) = @_; my $xslt = XML::LibXSLT->new(); LaTeXML::Common::XML::initialize_catalogs(); # LaTeXML::Common::XML::initialize_input_callbacks($xslt,installation_subdir => 'resources/XSLT'); # Do we still need this logic, if callbacks work? if (!ref $stylesheet) { $stylesheet = LaTeXML::Common::XML::Parser->new()->parseFile($stylesheet); } # $stylesheet = $xslt->parse_stylesheet_file($stylesheet); } if (ref $stylesheet eq 'XML::LibXML::Document') { $stylesheet = $xslt->parse_stylesheet($stylesheet); } return bless { stylesheet => $stylesheet }, $class; } sub transform { my ($self, $document, %params) = @_; return $$self{stylesheet}->transform($document, %params); } #====================================================================== 1; ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core.pm�������������������������������������������������������������������0000644�0001750�0001750�00000035407�12507513572�016320� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML | # # | Core Module for TeX conversion | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core; use strict; use warnings; use LaTeXML::Global; #use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::State; use LaTeXML::Core::Token; use LaTeXML::Core::Tokens; use LaTeXML::Core::Stomach; use LaTeXML::Core::Document; use LaTeXML::Common::Model; use LaTeXML::MathParser; use LaTeXML::Util::Pathname; use LaTeXML::Pre::BibTeX; use LaTeXML::Package; # !!!! use LaTeXML::Version; use Encode; use FindBin; use base qw(LaTeXML::Common::Object); #********************************************************************** sub new { my ($class, %options) = @_; my $state = LaTeXML::Core::State->new(catcodes => 'standard', stomach => LaTeXML::Core::Stomach->new(), model => $options{model} || LaTeXML::Common::Model->new()); $state->assignValue(VERBOSITY => (defined $options{verbosity} ? $options{verbosity} : 0), 'global'); $state->assignValue(STRICT => (defined $options{strict} ? $options{strict} : 0), 'global'); $state->assignValue(INCLUDE_COMMENTS => (defined $options{includeComments} ? $options{includeComments} : 1), 'global'); $state->assignValue(DOCUMENTID => (defined $options{documentid} ? $options{documentid} : ''), 'global'); $state->assignValue(SEARCHPATHS => [map { pathname_absolute(pathname_canonical($_)) } @{ $options{searchpaths} || [] }], 'global'); $state->assignValue(GRAPHICSPATHS => [map { pathname_absolute(pathname_canonical($_)) } @{ $options{graphicspaths} || [] }], 'global'); $state->assignValue(INCLUDE_STYLES => $options{includeStyles} || 0, 'global'); $state->assignValue(PERL_INPUT_ENCODING => $options{inputencoding}) if $options{inputencoding}; return bless { state => $state, nomathparse => $options{nomathparse} || 0, preload => $options{preload}, }, $class; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # High-level API. sub convertAndWriteFile { my ($self, $file) = @_; $file =~ s/\.tex$//; my $dom = $self->convertFile($file); $dom->toFile("$file.xml", 1) if $dom; return $dom; } sub convertFile { my ($self, $file) = @_; my $digested = $self->digestFile($file); return unless $digested; return $self->convertDocument($digested); } sub getStatusMessage { my ($self) = @_; return $$self{state}->getStatusMessage; } sub getStatusCode { my ($self) = @_; return $$self{state}->getStatusCode; } # You'd typically do this after both digestion AND conversion... sub showProfile { my ($self, $digested) = @_; return $self->withState(sub { LaTeXML::Core::Definition::showProfile(); # Show profile (if any) }); } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Mid-level API. # options are currently being evolved to accomodate the Daemon: # mode : the processing mode, ie the pool to preload: TeX or BibTeX # noinitialize : if defined, it does not initialize State. # preamble = names a tex file (or standard_preamble.tex) # postamble = names a tex file (or standard_postamble.tex) our %MODE_EXTENSION = ( # CONFIGURATION? TeX => 'tex', LaTeX => 'tex', AmSTeX => 'tex', BibTeX => 'bib'); sub digestFile { my ($self, $request, %options) = @_; my ($dir, $name, $ext); my $mode = $options{mode} || 'TeX'; if (pathname_is_literaldata($request)) { $dir = undef; $ext = $MODE_EXTENSION{$mode}; $name = "Anonymous String"; } elsif (pathname_is_url($request)) { $dir = undef; $ext = $MODE_EXTENSION{$mode}; $name = $request; } else { $request =~ s/\.\Q$MODE_EXTENSION{$mode}\E$//; if (my $pathname = pathname_find($request, types => [$MODE_EXTENSION{$mode}, ''])) { $request = $pathname; ($dir, $name, $ext) = pathname_split($request); } else { $self->withState(sub { Fatal('missing_file', $request, undef, "Can't find $mode file $request"); }); } } return $self->withState(sub { my ($state) = @_; NoteBegin("Digesting $mode $name"); $self->initializeState($mode . ".pool", @{ $$self{preload} || [] }) unless $options{noinitialize}; $state->assignValue(SOURCEFILE => $request) if (!pathname_is_literaldata($request)); $state->assignValue(SOURCEDIRECTORY => $dir) if defined $dir; $state->unshiftValue(SEARCHPATHS => $dir) if defined $dir && !grep { $_ eq $dir } @{ $state->lookupValue('SEARCHPATHS') }; $state->unshiftValue(GRAPHICSPATHS => $dir) if defined $dir && !grep { $_ eq $dir } @{ $state->lookupValue('GRAPHICSPATHS') }; $state->installDefinition(LaTeXML::Core::Definition::Expandable->new(T_CS('\jobname'), undef, Tokens(Explode($name)))); # Reverse order, since last opened is first read! $self->loadPostamble($options{postamble}) if $options{postamble}; LaTeXML::Package::InputContent($request); $self->loadPreamble($options{preamble}) if $options{preamble}; # Now for the Hacky part for BibTeX!!! if ($mode eq 'BibTeX') { my $bib = LaTeXML::Pre::BibTeX->newFromGullet($name, $state->getStomach->getGullet); LaTeXML::Package::InputContent("literal:" . $bib->toTeX); } my $list = $self->finishDigestion; NoteEnd("Digesting $mode $name"); return $list; }); } sub finishDigestion { my ($self) = @_; my $state = $$self{state}; my $stomach = $state->getStomach; my @stuff = (); while ($stomach->getGullet->getMouth->hasMoreInput) { push(@stuff, $stomach->digestNextBody); } if (my $env = $state->lookupValue('current_environment')) { Error('expected', "\\end{$env}", $stomach, "Input ended while environment $env was open"); } my $ifstack = $state->lookupValue('if_stack'); if ($ifstack && $$ifstack[0]) { Error('expected', '\fi', $stomach, "Input ended while conditional " . ToString($$ifstack[0]{token}) . " was incomplete", "started at " . ToString($$ifstack[0]{start})); } $stomach->getGullet->flush; return List(@stuff); } sub loadPreamble { my ($self, $preamble) = @_; my $gullet = $$self{state}->getStomach->getGullet; if ($preamble eq 'standard_preamble.tex') { $preamble = 'literal:\documentclass{article}\begin{document}'; } return LaTeXML::Package::InputContent($preamble); } sub loadPostamble { my ($self, $postamble) = @_; my $gullet = $$self{state}->getStomach->getGullet; if ($postamble eq 'standard_postamble.tex') { $postamble = 'literal:\end{document}'; } return LaTeXML::Package::InputContent($postamble); } sub convertDocument { my ($self, $digested) = @_; return $self->withState(sub { my ($state) = @_; my $model = $state->getModel; # The document model. my $document = LaTeXML::Core::Document->new($model); local $LaTeXML::DOCUMENT = $document; NoteBegin("Building"); $model->loadSchema(); # If needed? if (my $paths = $state->lookupValue('SEARCHPATHS')) { if ($state->lookupValue('INCLUDE_COMMENTS')) { $document->insertPI('latexml', searchpaths => join(',', @$paths)); } } foreach my $preload (@{ $$self{preload} }) { next if $preload =~ /\.pool$/; my $options = undef; # Stupid perlcritic policy if ($preload =~ s/^\[([^\]]*)\]//) { $options = $1; } if ($preload =~ s/\.cls$//) { $document->insertPI('latexml', class => $preload, ($options ? (options => $options) : ())); } else { $preload =~ s/\.sty$//; $document->insertPI('latexml', package => $preload, ($options ? (options => $options) : ())); } } $document->absorb($digested); NoteEnd("Building"); if (my $rules = $state->lookupValue('DOCUMENT_REWRITE_RULES')) { NoteBegin("Rewriting"); $document->markXMNodeVisibility; foreach my $rule (@$rules) { $rule->rewrite($document, $document->getDocument->documentElement); } NoteEnd("Rewriting"); } LaTeXML::MathParser->new()->parseMath($document) unless $$self{nomathparse}; NoteBegin("Finalizing"); my $xmldoc = $document->finalize(); NoteEnd("Finalizing"); return $xmldoc; }); } sub withState { my ($self, $closure) = @_; local $STATE = $$self{state}; # And, set fancy error handler for ANY die! # Could be useful to distill the more common messages so they provide useful build statistics? local $SIG{__DIE__} = sub { LaTeXML::Common::Error::perl_die_handler(@_); }; local $SIG{INT} = sub { LaTeXML::Common::Error::Fatal('perl', 'interrupt', undef, "LaTeXML was interrupted", @_); }; local $SIG{__WARN__} = sub { LaTeXML::Common::Error::perl_warn_handler(@_); }; local $LaTeXML::DUAL_BRANCH = ''; return &$closure($STATE); } sub initializeState { my ($self, @files) = @_; my $state = $$self{state}; my $stomach = $state->getStomach; # The current Stomach; my $gullet = $stomach->getGullet; $stomach->initialize; my $paths = $state->lookupValue('SEARCHPATHS'); foreach my $preload (@files) { my ($options, $type); $options = $1 if $preload =~ s/^\[([^\]]*)\]//; $type = ($preload =~ s/\.(\w+)$// ? $1 : 'sty'); my $handleoptions = ($type eq 'sty') || ($type eq 'cls'); if ($options) { if ($handleoptions) { $options = [split(/,/, $options)]; } else { Warn('unexpected', 'options', "Attempting to pass options to $preload.$type (not style or class)", "The options were [$options]"); } } # Attach extension back if HTTP protocol: if (pathname_is_url($preload)) { $preload .= '.' . $type; } LaTeXML::Package::InputDefinitions($preload, type => $type, handleoptions => $handleoptions, options => $options); } return; } sub writeDOM { my ($self, $dom, $name) = @_; $dom->toFile("$name.xml", 1); return 1; } #********************************************************************** # Should post processing be managed from here too? # Problem: with current DOM setup, I pretty much have to write the # file and reread it anyway... # Also, want to inhibit loading an extreme number of classes if not needed. #********************************************************************** 1; __END__ =pod =head1 NAME C - transforms TeX into XML. =head1 SYNOPSIS use LaTeXML::Core; my $latexml = LaTeXML::Core->new(); $latexml->convertAndWrite("adocument"); But also see the convenient command line script L which suffices for most purposes. =head1 DESCRIPTION =head2 METHODS =over 4 =item C<< my $latexml = LaTeXML::Core->new(%options); >> Creates a new LaTeXML object for transforming TeX files into XML. verbosity : Controls verbosity; higher is more verbose, smaller is quieter. 0 is the default. strict : If true, undefined control sequences and invalid document constructs give fatal errors, instead of warnings. includeComments : If false, comments will be excluded from the result document. preload : an array of modules to preload searchpath : an array of paths to be searched for Packages and style files. (these generally set config variables in the L object) =item C<< $latexml->convertAndWriteFile($file); >> Reads the TeX file C<$file>.tex, digests and converts it to XML, and saves it in C<$file>.xml. =item C<< $doc = $latexml->convertFile($file); >> Reads the TeX file C<$file>, digests and converts it to XML and returns the resulting L. =item C<< $doc = $latexml->convertString($string); >> B Use C<$latexml->convertFile("literal:$string");> instead. =item C<< $latexml->writeDOM($doc,$name); >> Writes the XML document to $name.xml. =item C<< $box = $latexml->digestFile($file); >> Reads the TeX file C<$file>, and digests it returning the L representation. =item C<< $box = $latexml->digestString($string); >> B Use C<$latexml->digestFile("literal:$string");> instead. =item C<< $doc = $latexml->convertDocument($digested); >> Converts C<$digested> (the L reprentation) into XML, returning the L. =back =head2 Customization In the simplest case, LaTeXML will understand your source file and convert it automatically. With more complicated (realistic) documents, you will likely need to make document specific declarations for it to understand local macros, your mathematical notations, and so forth. Before processing a file I, LaTeXML reads the file I, if present. Likewise, the LaTeXML implementation of a TeX style file, say I is provided by a file I. See L for documentation of these customization and implementation files. =head1 SEE ALSO See L for a simple command line script. See L for documentation of these customization and implementation files. For cases when the high-level declarations described in L are not enough, or for understanding more of LaTeXML's internals, see =over 2 =item L maintains the current state of processing, bindings or variables, definitions, etc. =item L, L and L deal with tokens, tokenization of strings and files, and basic TeX sequences such as arguments, dimensions and so forth. =item L and L deal with digestion of tokens into boxes. =item L, L, L dealing with conversion of the digested boxes into XML. =item L and L representation of LaTeX macros, primitives, registers and constructors. =item L the math parser. =item L, L, L other random modules. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/���������������������������������������������������������������������0000755�0001750�0001750�00000000000�12507513572�015751� 5����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Alignment.pm���������������������������������������������������������0000644�0001750�0001750�00000122642�12507513572�020234� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Alignment | # # | Support for tabular/array environments | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Alignment; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Core::Tokens; use LaTeXML::Common::Object; use LaTeXML::Common::XML; use LaTeXML::Common::Dimension; use LaTeXML::Core::Alignment::Template; use List::Util qw(max sum); use base qw(LaTeXML::Core::Whatsit); use base qw(Exporter); our @EXPORT = (qw( &constructAlignment &ReadAlignmentTemplate &MatrixTemplate)); #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # An "Alignment" is an array/tabular construct as: #
... # or, for math mode # ... # (where initially, each XMCell will contain an XMArg to indicate # individual parsing of each cell's content is desired) # # An Alignment object is a sort of fake Whatsit; # It takes some magic to sneak it into the Digestion stream # (see TeX.pool \@open@alignment), but it needs to be created # BEFORE the contents of the alignment are digested, # since we stuff a lot of information into it # (row, column boxes, borders, spacing, etc...) # But once it has been captured, it should otherwise act # like a Whatsit and be responsible for construction (beAbsorbed), # and sizing estimation (computeSize) # # Ultimately, this should be better tied into DefConstructor # because an Alignment currently doesn't know what CS created it (debugging!); # Also, it would better connect the things being constructed, reversion, etc. #====================================================================== # Create a new Alignment. # %data can contain: # template : an Alignment::Template object # openContainer = sub($doc,%attrib); creates the container element with given attributes # closeContainer = sub($doc); closes the container # openRow = sub($doc,%attrib); creates the row element with given attributes # closeRow = closes the row # openColumn = sub($doc,%attrib); creates the column element with given attributes # closeColumn = closes the column # attributes = hashref containing extra attributes for the container element. sub new { my ($class, %data) = @_; my $self = bless {%data}, $class; $$self{template} = LaTeXML::Core::Alignment::Template->new() unless $$self{template}; $$self{template} = parseAlignmentTemplate($$self{template}) unless ref $$self{template}; $$self{rows} = []; $$self{current_column} = 0; $$self{current_row} = undef; $$self{properties} = {} unless $$self{properties}; # Copy any attribute width, height, depth to main properties. if (my $attributes = $$self{properties}{attributes}) { $$self{properties}{width} = $$attributes{width} if $$attributes{width}; $$self{properties}{height} = $$attributes{height} if $$attributes{height}; $$self{properties}{depth} = $$attributes{depth} if $$attributes{depth}; } return $self; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Alignment specific accessors sub getTemplate { my ($self, $template) = @_; return $$self{template}; } sub currentRow { my ($self) = @_; return $$self{current_row}; } sub newRow { my ($self) = @_; my $row = $$self{template}->clone; $$self{current_row} = $row; $$self{current_column} = 0; push(@{ $$self{rows} }, $row); return $row; } sub removeRow { my ($self) = @_; my @rows = @{ $$self{rows} }; if (@rows) { my $row = pop(@rows); $$self{rows} = [@rows]; return $row; } else { return; } } sub prependRows { my ($self, @rows) = @_; unshift(@{ $$self{rows} }, @rows); return; } sub appendRows { my ($self, @rows) = @_; push(@{ $$self{rows} }, @rows); return; } sub rows { my ($self) = @_; return @{ $$self{rows} }; } sub addLine { my ($self, $border, @cols) = @_; my $row = $$self{current_row}; if (@cols) { foreach my $c (@cols) { my $colspec = $row->column($c); $$colspec{border} .= $border; } } else { foreach my $colspec (@{ $$row{columns} }) { $$colspec{border} .= $border; } } return; } sub nextColumn { my ($self) = @_; my $colspec = $$self{current_row}->column(++$$self{current_column}); if (!$colspec) { Error('unexpected', '&', $STATE->getStomach->getGullet, "Extra alignment tab '&'"); $$self{current_row}->addColumn(align => 'center'); $colspec = $$self{current_row}->column($$self{current_column}); } return $colspec; } sub currentColumnNumber { my ($self) = @_; return $$self{current_column}; } sub currentRowNumber { my ($self) = @_; return scalar(@{ $$self{rows} }); } sub currentColumn { my ($self) = @_; return $$self{current_row}->column($$self{current_column}); } sub getColumn { my ($self, $n) = @_; return $$self{current_row}->column($n); } # Ugh... these take boxes; adding before/after columns takes tokens! sub addBeforeRow { my ($self, @boxes) = @_; $$self{current_row}{before} = [@{ $$self{current_row}{before} || [] }, @boxes]; return; } sub addAfterRow { my ($self, @boxes) = @_; $$self{current_row}{after} = [@{ $$self{current_row}{after} || [] }, @boxes]; return; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Making the Alignment act like a Whatsit sub toString { my ($self) = @_; return "Alignment[]"; } # Methods for overloaded operators sub stringify { my ($self) = @_; return "Alignment[]"; } sub revert { my ($self) = @_; return $self->getBody->revert; } sub computeSize { my ($self, %options) = @_; $self->normalizeAlignment; my $props = $self->getPropertiesRef; my @rowheights = (); my @colwidths = (); # add \baselineskip between rows? Or max the row heights with it ... my $base = $STATE->lookupDefinition(T_CS('\baselineskip'))->valueOf->valueOf; foreach my $row (@{ $$self{rows} }) { # Do we need to account for any space in the $$row{before} or $$row{after}? my @cols = @{ $$row{columns} }; my $ncols = scalar(@cols); if (my $short = $ncols - scalar(@colwidths)) { push(@colwidths, map { 0 } 1 .. $short); } my ($rowh, $rowd) = ($base * 0.7, $base * 0.3); for (my $i = 0 ; $i < $ncols ; $i++) { my $cell = $cols[$i]; next if $$cell{skipped}; next unless $$cell{boxes}; my ($w, $h, $d) = $$cell{boxes}->getSize(align => $$cell{align}, width => $$cell{width}, vattach => $$cell{vattach}); if (($$cell{colspan} || 1) == 1) { $colwidths[$i] = max($colwidths[$i], $w->valueOf); } else { } # Could check afterwards that spanned columns are wide enough? if (($$cell{rowspan} || 1) == 1) { $rowh = max($rowh, $h->valueOf); $rowd = max($rowd, $d->valueOf); } else { } # Ditto spanned rows } push(@rowheights, $rowh + $rowd + 0.5 * $base); } # somehow our heights are way too short???? # Should we check for space for spanned rows & columns ? # sum my $ww = Dimension(sum(@colwidths)); my $hh = Dimension(sum(@rowheights)); my $dd = Dimension(0); $$props{width} = $ww unless defined $$props{width}; $$props{height} = $hh unless defined $$props{height}; $$props{depth} = $dd unless defined $$props{depth}; return; } #====================================================================== # Constructing the XML for the alignment. sub beAbsorbed { my ($self, $document) = @_; my $attr = $self->getProperty('attributes'); my $body = $self->getBody; my $ismath = $$self{isMath}; $self->normalizeAlignment; # We _should_ attach boxes to the alignment and rows, # but (ATM) we've only got sensible boxes for the cells. &{ $$self{openContainer} }($document, ($attr ? %$attr : ())); foreach my $row (@{ $$self{rows} }) { &{ $$self{openRow} }($document, 'xml:id' => $$row{id}, refnum => $$row{refnum}, frefnum => $$row{frefnum}, rrefnum => $$row{rrefnum}); if (my $before = $$row{before}) { map { $document->absorb($_) } @$before; } foreach my $cell (@{ $$row{columns} }) { next if $$cell{skipped}; # Normalize the border attribute my $border = join(' ', sort(map { split(/ */, $_) } $$cell{border} || '')); $border =~ s/(.) \1/$1$1/g; my $empty = !$$cell{boxes} || !scalar($$cell{boxes}->unlist); $$cell{cell} = &{ $$self{openColumn} }($document, align => $$cell{align}, width => $$cell{width}, vattach => $$cell{vattach}, (($$cell{colspan} || 1) != 1 ? (colspan => $$cell{colspan}) : ()), (($$cell{rowspan} || 1) != 1 ? (rowspan => $$cell{rowspan}) : ()), ($border ? (border => $border) : ()), ($$cell{head} ? (thead => 'true') : ())); if (!$empty) { local $LaTeXML::BOX = $$cell{boxes}; $document->openElement('ltx:XMArg', rule => 'Anything,') if $ismath; # Hacky! $document->absorb($$cell{boxes}); $document->closeElement('ltx:XMArg') if $ismath; } &{ $$self{closeColumn} }($document); } if (my $after = $$row{after}) { map { $document->absorb($_) } @$after; } &{ $$self{closeRow} }($document); } my $node = &{ $$self{closeContainer} }($document); # If requested to guess headers & we're not nested inside another tabular # This should be an afterConstruct somewhere? if ($self->getProperty('guess_headers') && !$document->findnodes("ancestor::ltx:tabular", $node)) { # If no cells are already marked as being thead, apply heuristic if (!$document->findnodes('descendant::ltx:td[contains(@class,"thead")]', $node)) { guess_alignment_headers($document, $node, $self); } # Otherwise, if not a math array, group thead & tbody rows elsif (!$body->isMath) { # in case already marked w/thead|tbody alignment_regroup_rows($document, $node); } } return $node; } # Deprecated sub constructAlignment { my ($document, $body, %props) = @_; Info('deprecated', 'constructAlignment', $document, "The sub constructAlignment is a deprecated way to create Alignments", "See LaTeX.pool {tabular} for the new way."); my $alignment; # Find the alignment that is embedded somewhere within body! # See \@open@alignment for where this gets set into the Whatsit! while (!($alignment = $body->getProperty('alignment'))) { ($body) = grep { $_->getProperty('alignment') } $body->unlist; } $$alignment{isMath} = 1 if $body->isMath; # Merge the specified props with what's already in the Alignment if (my $attr = $props{attributes}) { map { $$alignment{properties}{attributes}{$_} = $$attr{$_} } keys %$attr; } map { $$alignment{properties}{$_} = $props{$_} } grep { $_ ne 'attributes' } keys %props; return $alignment->beAbsorbed($document); } #====================================================================== # Normalize an alignment before construction # * scanning for empty rows and collapse them # * marking columns covered by row & column spans # * tweak borders into the right places while doing this. # Tasks: # (1) a trailing \\ in the alignment will generate an empty row. # Note that the trailing \\ is required to get an \hline at the bottom! # It is empty in the sense that no cells have "real" content # but may have content generated from the template! # This emptiness is sensed by inner@column. # So, if we find such an empty row, we need to remove it, # but copy it's top border to a bottom border of the preceding row! # (2) Some table constructs, particularly Knuth's fancy ones, # have empty columns for spacing purposes. # These likely should be removed from the "logical" table we construct. # Here, emptiness should probably be that there is no text content # in the cell's at all (template data is presumably meaningful). # But here, also, border data may need to be moved (but l/r borders) # (3) put border attributes in a "normal" form to ease use as html's class attribute. # Ie: group by l/r/t/b w/ spaces between groups. # NOTE: Another cleanup issue: # With \halign, Knuth seems to like to introduce many empty columns for spacing. # It may be useful to remove such columns? # Probably have to sub normalizeAlignment { my ($self) = @_; return if $$self{normalized}; my @filtering = @{ $$self{rows} }; my @rows = (); while (my $row = shift(@filtering)) { foreach my $c (@{ $$row{columns} }) { # Fill in empty on completely empty columns $$c{empty} = 1 unless $$c{boxes} && $$c{boxes}->unlist; } if (grep { !$$_{empty} } @{ $$row{columns} }) { # Not empty! so keep it push(@rows, $row); } elsif (my $next = $filtering[0]) { # Remove empty row, but copy top border to NEXT row if ($$row{empty}) { # Only remove middle rows if EXPLICITLY marked (\noalign) my $nc = scalar(@{ $$row{columns} }); for (my $c = 0 ; $c < $nc ; $c++) { my $border = $$row{columns}[$c]{border} || ''; $border =~ s/[^tTbB]//g; # mask all but top & bottom border $border =~ s/./t/g; # but convert to top $$next{columns}[$c]{border} .= $border; } } # add to next row else { push(@rows, $row); } } else { # Remove empty last row, but copy top border to bottom of prev. my $prev = $rows[-1]; my $nc = scalar(@{ $$row{columns} }); for (my $c = 0 ; $c < $nc ; $c++) { my $border = $$row{columns}[$c]{border} || ''; $border =~ s/[^tT]//g; # mask all but top border $border =~ s/./b/g; # convert to bottom $$prev{columns}[$c]{border} .= $border; } } # add to previous row. } $$self{rows} = [@rows]; # Mark any cells that are covered by rowspans for (my $i = 0 ; $i < scalar(@rows) ; $i++) { my @row = @{ $rows[$i]->{columns} }; for (my $j = 0 ; $j < scalar(@row) ; $j++) { my $col = $row[$j]; # scan the row for spanned columns that contain spanned rows! if (my $nc = $$col{colspan} || 1) { if ($nc > 1) { foreach (my $jj = $j + 1 ; $jj < $j + $nc ; $jj++) { if (my $nr = $row[$jj]{rowspan}) { $$col{rowspan} = $nr; } } } } my $nr = $$col{rowspan} || 1; if ($nr > 1) { my $nc = $$col{colspan} || 1; for (my $ii = $i + 1 ; $ii < $i + $nr ; $ii++) { if (my $rrow = $rows[$ii]) { for (my $jj = $j ; $jj < $j + $nc ; $jj++) { if (my $ccol = $$rrow{columns}[$jj]) { $$ccol{skipped} = 1; } } } } # And, if the last (skipped) columns have a bottom border, copy that to the rowspanned col if (my $rrow = $rows[$i + $nr - 1]) { my $sborder = ''; for (my $jj = $j ; $jj < $j + $nc ; $jj++) { if (my $ccol = $$rrow{columns}[$jj]) { my $border = $$ccol{border} || ''; $border =~ s/[^bB]//g; # mask all but bottom border $sborder = $border unless $sborder; } } $$col{border} .= $sborder if $sborder; } } } } $$self{normalized} = 1; return; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Dealing with templates # newcolumntype # defines \NC@rewrite@ # As macro # or "constructor" (or just sub that creates a column) sub ReadAlignmentTemplate { my ($gullet) = @_; $gullet->skipSpaces; local $LaTeXML::BUILD_TEMPLATE = LaTeXML::Core::Alignment::Template->new(columns => [], tokens => []); my @tokens = (T_BEGIN); my $nopens = 0; while (my $open = $gullet->readToken) { if ($open->equals(T_BEGIN)) { $nopens++; } else { $gullet->unread($open); last; } } my $defn; while (my $op = $gullet->readToken) { if ($op->equals(T_SPACE)) { } elsif ($op->equals(T_END)) { while (--$nopens && ($op = $gullet->readToken)->equals(T_END)) { } last unless $nopens; $gullet->unread($op); } elsif (defined($defn = $STATE->lookupDefinition(T_CS('\NC@rewrite@' . ToString($op)))) && $defn->isExpandable) { # A variation on $defn->invoke, so we can reconstruct the reversion my @args = $defn->readArguments($gullet); my @exp = $defn->doInvocation($gullet, @args); if (@exp) { # This just expanded into other stuff $gullet->unread(@exp); } else { push(@tokens, $op); if (my $param = $defn->getParameters) { push(@tokens, $param->revertArguments(@args)); } } } elsif ($op->equals(T_BEGIN)) { # Wrong, but a safety valve $gullet->unread($gullet->readBalanced->unlist); } else { Warn('unexpected', $op, $gullet, "Unrecognized tabular template '" . Stringify($op) . "'"); } last unless $nopens; } push(@tokens, T_END); $LaTeXML::BUILD_TEMPLATE->setReversion(@tokens); return $LaTeXML::BUILD_TEMPLATE; } sub parseAlignmentTemplate { my ($spec) = @_; return $STATE->getStomach->getGullet->readingFromMouth(LaTeXML::Core::Mouth->new("{" . $spec . "}"), sub { ReadAlignmentTemplate($_[0]); }); } sub MatrixTemplate { return LaTeXML::Core::Alignment::Template->new(repeated => [{ before => Tokens(T_CS('\hfil')), after => Tokens(T_CS('\hfil')) }]); } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Experimental alignment heading heuristications. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # We attempt to recognize patterns of rows/columns that indicate which might be headers. # We'll characterize the cells by alignment, content and borders. # Then, assuming that headers will be first and be noticably `different' from data lines, # and also that the data lines will have similar structure, we'll attempt to # recognize groups of header lines and groups data lines, possibly alternating. sub guess_alignment_headers { my ($document, $table, $alignment) = @_; # Assume that headers don't make sense for nested tables. # OR Maybe we should only do this within table environments??? return if $document->findnodes("ancestor::ltx:tabular", $table); my $tag = $document->getModel->getNodeQName($table); my $x; print STDERR "\n" . ('=' x 50) . "\nGuessing alignment headers for " . (($x = $document->findnode('ancestor-or-self::*[@xml:id]', $table)) ? $x->getAttribute('xml:id') : $tag) . "\n" if $LaTeXML::Core::Alignment::DEBUG; my $ismath = $tag eq 'ltx:XMArray'; local $LaTeXML::TR = ($ismath ? 'ltx:XMRow' : 'ltx:tr'); local $LaTeXML::TD = ($ismath ? 'ltx:XMCell' : 'ltx:td'); my $reversed = 0; # Build a view of the table by extracting the rows, collecting & characterizing each cell. my @rows = collect_alignment_rows($document, $table, $alignment); # Flip the rows around to produce a column view. my @cols = (); return unless @rows; for (my $c = 0 ; $c < scalar(@{ $rows[0] }) ; $c++) { push(@cols, [map { $$_[$c] } @rows]); } # Attempt to recognize header lines. if (alignment_characterize_lines(0, 0, @rows)) { } # This usually does something unpleasant ## else { ## print STDERR "Retry characterizing lines in reverse\n" if $LaTeXML::Core::Alignment::DEBUG; ## $reversed=alignment_characterize_lines(0,1,reverse(@rows)); } alignment_characterize_lines(1, 0, @cols); # Did we go overboard? my %n = (h => 0, d => 0); foreach my $r (@rows) { foreach my $c (@$r) { $n{ $$c{cell_type} }++; } } print STDERR "$n{h} header, $n{d} data cells\n" if $LaTeXML::Core::Alignment::DEBUG; if ($n{d} == 1) { # Or any other heuristic? foreach my $r (@rows) { foreach my $c (@$r) { $$c{cell_type} = 'd'; $$c{cell}->removeAttribute('thead') if $$c{cell}; } } } # Regroup the rows into thead & tbody elements. # But not if it's a math array, or if reversed (since browsers get confused?) if (!$ismath && !$reversed) { alignment_regroup_rows($document, $table); } # Debugging report! summarize_alignment([@rows], [@cols]) if $LaTeXML::Core::Alignment::DEBUG; return; } #====================================================================== # Regroup the rows into thead & tbody # (not messing with tfoot, ATM; HTML4 wants it ONLY after the thead; HTML5 also allows at end!) # Any leading rows, all of whose cells have attribute thead should be in thead. # UNLESS any of them have a rowspan that extends PAST the end of the thead!!!! sub alignment_regroup_rows { my ($document, $table) = @_; my @rows = $document->findnodes("ltx:tr", $table); my @heads = (); my $maxreach = 0; while (@rows) { my @cells = $document->findnodes('ltx:td', $rows[0]); # Non header cells, done. last if scalar(grep { (!$_->getAttribute('thead')) && (($_->getAttribute('class') || '') !~ /\bthead\b/) } @cells); push(@heads, shift(@rows)); my $line = scalar(@heads); $maxreach = max($maxreach, map { ($_->getAttribute('rowspan') || 0) + $line } @cells); } if ($maxreach > scalar(@heads)) { # rowspan crossed over thead boundary! unshift(@rows, @heads); @heads = (); } $document->wrapNodes('ltx:thead', @heads) if @heads; $document->wrapNodes('ltx:tbody', @rows) if @rows; return; } #====================================================================== # Build a View of the alignment, with characterized cells, for analysis. my %ALIGNMENT_CODE = ( # CONSTANT right => 'r', left => 'l', center => 'c', justify => 'p'); sub collect_alignment_rows { my ($document, $table, $alignment) = @_; my @arows = @{ $$alignment{rows} }; my $nrows = scalar(@arows); my $ncols = 0; foreach my $arow (@arows) { my $n = scalar(@{ $$arow{columns} }); $ncols = $n if $n > $ncols; } my @rows = (); my ($h, $v) = (0, 0); foreach my $arow (@arows) { push(@rows, []); my @cols = @{ $$arow{columns} }; foreach my $col (@cols) { push(@{ $rows[-1] }, $col); $$col{cell_type} = 'd'; $$col{content_class} = (($$col{align} || '') eq 'justify' ? 'mx' # Assume mixed content for any justified cell??? : ($$col{cell} ? classify_alignment_cell($document, $$col{cell}) : '?')); $$col{content_length} = ($$col{content_class} eq 'g' ? 1000 : ($$col{cell} ? length($$col{cell}->textContent) : 0)); my %border = (t => 0, r => 0, b => 0, l => 0); # Decode border map { $border{$_}++ } split(/ */, $$col{border} || ''); $h = 1 if $border{t} || $border{b}; $v = 1 if $border{r} || $border{l}; map { $$col{$_} = $border{$_} } keys %border; } # pad the columns out. for (my $c = scalar(@cols) ; $c < $ncols ; $c++) { my $col = {}; push(@{ $rows[-1] }, $col); $$col{align} = 'c'; $$col{cell_type} = 'd'; $$col{content_class} = '_'; $$col{content_length} = 0; map { $$col{$_} = 0 } qw(t r b l); } } # copy the characterizations to spanned cells for (my $r = 0 ; $r < $nrows ; $r++) { for (my $c = 0 ; $c < $ncols ; $c++) { my $rs = $rows[$r][$c]{rowspan} || 1; my $cs = $rows[$r][$c]{colspan} || 1; my $ca = $rows[$r][$c]{align}; my $cc = $rows[$r][$c]{content_class}; my $cl = $rows[$r][$c]{content_length}; my $rb = $rows[$r][$c]{r}; $rows[$r][$c]{r} = 0; my $bb = $rows[$r][$c]{b}; $rows[$r][$c]{b} = 0; for (my $sc = 1 ; $sc < $cs ; $sc++) { $rows[$r][$c + $sc]{align} = $ca; $rows[$r][$c + $sc]{content_class} = $cc; $rows[$r][$c + $sc]{content_length} = $cl; } for (my $sr = 1 ; $sr < $rs ; $sr++) { for (my $sc = 0 ; $sc < $cs ; $sc++) { $rows[$r + $sr][$c + $sc]{align} = $ca; $rows[$r + $sr][$c + $sc]{content_class} = $cc; $rows[$r + $sr][$c + $sc]{content_length} = $cl; } } # move the outer borders for (my $sr = 0 ; $sr < $rs ; $sr++) { $rows[$r + $sr][$c + $cs - 1]{r} = $rb; } for (my $sc = 0 ; $sc < $cs ; $sc++) { $rows[$r + $rs - 1][$c + $sc]{b} = $bb; } } } # Now, do some border massaging... for (my $r = 0 ; $r < $nrows ; $r++) { $rows[$r][0]{l} = $v; $rows[$r][0]{r} = $rows[$r][1]{l} if ($ncols > 1) && $rows[$r][1]{l}; $rows[$r][$ncols - 1]{l} = $rows[$r][$ncols - 2]{r} if ($ncols > 1) && $rows[$r][$ncols - 2]{r}; $rows[$r][$ncols - 1]{r} = $v; } for (my $c = 0 ; $c < $ncols ; $c++) { $rows[0][$c]{t} = $h; $rows[0][$c]{b} = $rows[1][$c]{t} if ($nrows > 1) && $rows[1][$c]{t}; $rows[$nrows - 1][$c]{t} = $rows[$nrows - 2][$c]{b} if ($nrows > 1) && $rows[$nrows - 2][$c]{b}; $rows[$nrows - 1][$c]{b} = $h; } for (my $r = 1 ; $r < $nrows - 1 ; $r++) { for (my $c = 1 ; $c < $ncols - 1 ; $c++) { $rows[$r][$c]{t} = $rows[$r - 1][$c]{b} if $rows[$r - 1][$c]{b}; $rows[$r][$c]{b} = $rows[$r + 1][$c]{t} if $rows[$r + 1][$c]{t}; $rows[$r][$c]{l} = $rows[$r][$c - 1]{r} if $rows[$r][$c - 1]{r}; $rows[$r][$c]{r} = $rows[$r][$c + 1]{l} if $rows[$r][$c + 1]{l}; } } if ($LaTeXML::Core::Alignment::DEBUG) { print STDERR "\nCell characterizations:\n"; for (my $r = 0 ; $r < $nrows ; $r++) { for (my $c = 0 ; $c < $ncols ; $c++) { my $col = $rows[$r][$c]; print STDERR "[$r,$c]=>" . ($$col{cell_type} || '?') . ($$col{align} ? $ALIGNMENT_CODE{ $$col{align} } : ' ') . ($$col{content_class} || '?') . ' ' . $$col{content_length} . ' ' . $$col{border} . "=>" . join('', grep { $$col{$_} } qw(t r b l)) . (($$col{rowspan} || 1) > 1 ? " rowspan=" . $$col{rowspan} : '') . (($$col{colspan} || 1) > 1 ? " colspan=" . $$col{colspan} : '') . "\n"; } } } return @rows; } # Return one of: i(nteger), t(ext), m(ath), ? (unknown) or '_' (empty) (or some combination) # or 'mx' for alternating text & math. sub classify_alignment_cell { my ($document, $xcell) = @_; my $content = $xcell->textContent; my $class = ''; # if($content =~ /^\s*\d+\s*$/){ if ($content =~ /^[\s\d]+$/) { $class = 'i'; } else { my @nodes = $xcell->childNodes; while (@nodes) { my $ch = shift(@nodes); my $chtype = $ch->nodeType; if ($chtype == XML_TEXT_NODE) { my $text = $ch->textContent; $class .= 't' unless $text =~ /^\s*$/ || (($class eq 'm') && ($text =~ /^\s*[\.,;]\s*$/)); } elsif ($chtype == XML_ELEMENT_NODE) { my $chtag = $document->getModel->getNodeQName($ch); if ($chtag eq 'ltx:text') { # Font would be useful, but haven't "resolved" it, yet! $class .= 't' unless $class eq 't'; } elsif ($chtag eq 'ltx:graphics') { $class .= 'g' unless $class eq 'g'; } elsif ($chtag eq 'ltx:Math') { $class .= 'm' unless $class eq 'm'; } elsif ($chtag eq 'ltx:XMText') { $class .= 't' unless $class eq 't'; } elsif ($chtag eq 'ltx:XMArg') { unshift(@nodes, $ch->childNodes); } elsif ($chtag =~ /^ltx:XM/) { $class .= 'm' unless $class eq 'm'; } else { $class .= '?' unless $class; } } } } $class = 'mx' if $class && (($class =~ /^((m|i)t)+(m|i)?$/) || ($class =~ /^(t(m|i))+t?$/)); return $class || '_'; } #====================================================================== # Scan pairs of rows/columns attempting to recognize differences that # might indicate which are headers and which are data. # Warning: This section is full of "magic numbers" # guessed by sampling various test cases. my $MIN_ALIGNMENT_DATA_LINES = 1; # (or 2?) [CONSTANT] my $MAX_ALIGNMENT_HEADER_LINES = 4; # [CONSTANT] # We expect to find header lines at the beginning, noticably different from the eventual data lines. # Both header lines and data lines can consist of several neighboring lines. # Check that header lines are `similar' to each other. So, the strategy is to look # for a `hump' in the line differences and consider blocks containing these lines to be potential headers. sub alignment_characterize_lines { my ($axis, $reversed, @lines) = @_; my $n = scalar(@lines); return if $n < 2; local @::TABLINES = @lines; print STDERR "\nCharacterizing $n " . ($axis ? "columns" : "rows") . "\n " if $LaTeXML::Core::Alignment::DEBUG; # Establish a scale of differences for the table. my ($diffhi, $difflo, $diffavg) = (0, 99999999, 0); for (my $l = 0 ; $l < $n - 1 ; $l++) { my $d = alignment_compare($axis, 1, $reversed, $l, $l + 1); $diffavg += $d; $diffhi = $d if $d > $diffhi; $difflo = $d if $d < $difflo; } $diffavg = $diffavg / ($n - 1); if ($diffhi < 0.05) { # virtually no differences. print STDERR "Lines are almost identical => Fail\n" if $LaTeXML::Core::Alignment::DEBUG; return; } if (($n > 2) && (($diffhi - $difflo) < $diffhi * 0.5)) { # differences too similar to establish pattern print STDERR "Differences between lines are almost identical => Fail\n" if $LaTeXML::Core::Alignment::DEBUG; return; } # local $::TAB_THRESHOLD = $difflo + 0.4*($diffhi-$difflo); local $::TAB_THRESHOLD = $difflo + 0.3 * ($diffhi - $difflo); # local $::TAB_THRESHOLD = $difflo + 0.2*($diffhi-$difflo); # local $::TAB_THRESHOLD = $diffavg; local $::TAB_AXIS = $axis; print STDERR "\nDifferences $difflo -- $diffhi => threshold = $::TAB_THRESHOLD\n" if $LaTeXML::Core::Alignment::DEBUG; # Find the first hump in differences. These are candidates for header lines. print STDERR "Scanning for headers\n " if $LaTeXML::Core::Alignment::DEBUG; my $diff; my ($minh, $maxh) = (1, 1); while (($diff = alignment_compare($axis, 1, $reversed, $maxh - 1, $maxh)) < $::TAB_THRESHOLD) { $maxh++; } return if $maxh > $MAX_ALIGNMENT_HEADER_LINES; # too many before even finding diffs? give up! # while( alignment_compare($axis,1,$reversed,$maxh,$maxh+1) > $difflo + ($diff-$difflo)/6){ while (alignment_compare($axis, 1, $reversed, $maxh, $maxh + 1) > $::TAB_THRESHOLD) { $maxh++; } $maxh = $MAX_ALIGNMENT_HEADER_LINES if $maxh > $MAX_ALIGNMENT_HEADER_LINES; print STDERR "\nFound from $minh--$maxh potential headers\n" if $LaTeXML::Core::Alignment::DEBUG; my $nn = scalar(@{ $lines[0] }) - 1; # The sets of lines 1--$minh, .. 1--$maxh are potential headers. for (my $nh = $maxh ; $nh >= $minh ; $nh--) { # for(my $nh = $minh; $nh <= $maxh; $nh++){ # Check whether the set 1..$nh is plausable. if (my @heads = alignment_test_headers($nh)) { # Now, change all cells marked as header from td => th. foreach my $h (@heads) { my $i = 0; foreach my $cell (@{ $lines[$h] }) { $$cell{cell_type} = 'h'; if (my $xcell = $$cell{cell}) { if (($$cell{content_class} eq '_') # But NOT empty cells on outer edges. && ((($i == 0) && !$$cell{ ($axis == 0 ? 'l' : 't') }) || (($i == $nn) && !$$cell{ ($axis == 0 ? 'r' : 'b') }))) { } else { $$cell{cell}->setAttribute(thead => 'true'); } } $i++; } } return 1; } } return; } # Test whether $nhead lines makes a good fit for the headers sub alignment_test_headers { my ($nhead) = @_; print STDERR "Testing $nhead headers\n" if $LaTeXML::Core::Alignment::DEBUG; my ($headlength, $datalength) = (0, 0); my @heads = (0 .. $nhead - 1); # The indices of heading lines. $headlength = alignment_max_content_length($headlength, 0, $nhead - 1); my $nextline = $nhead; # Start from the end of the proposed headings. # Watch out for the assumed header being really data that is a repeated pattern. my $nrep = scalar(@::TABLINES) / $nhead; if (($nhead > 1) && ($nrep == int($nrep))) { print STDERR "Check for apparent header repeated $nrep times\n" if $LaTeXML::Core::Alignment::DEBUG; my $matched = 1; for (my $r = 1 ; $r < $nrep ; $r++) { $matched &&= alignment_match_head(0, $r * $nhead, $nhead); } print STDERR "Repeated headers: " . ($matched ? "Matched=> Fail" : "Nomatch => Succeed") . "\n" if $LaTeXML::Core::Alignment::DEBUG; return if $matched; } # And find a following grouping of data lines. my $ndata = alignment_skip_data($nextline); return if $ndata < $nhead; # ???? Well, maybe if _really_ convincing??? return if ($ndata < $nhead) && ($ndata < 2); # Check that the content of the headers isn't dramatically larger than the content in the data $datalength = alignment_max_content_length($datalength, $nextline, $nextline + $ndata - 1); $nextline += $ndata; my $nd; # If there are more lines, they should match either the previous data block, or the head/data pattern. while ($nextline < scalar(@::TABLINES)) { # First try to match a repeat of the 1st data block; # This would be the case when groups of data have borders around them. # Could want to match a variable number of datalines, but they should be similar!!!??!?!? if (($ndata > 1) && ($nd = alignment_match_data($nhead, $nextline, $ndata))) { $datalength = alignment_max_content_length($datalength, $nextline, $nextline + $nd - 1); $nextline += $nd; } # Else, try to match the first header block; less common. elsif (alignment_match_head(0, $nextline, $nhead)) { push(@heads, $nextline .. $nextline + $nhead - 1); $headlength = alignment_max_content_length($headlength, $nextline, $nextline + $nhead - 1); $nextline += $nhead; # Then attempt to match a new data block. # my $d = alignment_skip_data($nextline); # return unless ($d >= $nhead) || ($d >= 2); # $nextline += $d; } # No, better be the same data block? return unless ($nd = alignment_match_data($nhead, $nextline, $ndata)); $datalength = alignment_max_content_length($datalength, $nextline, $nextline + $nd - 1); $nextline += $nd; } else { return; } } # Header content seems too large relative to data? print STDERR "header content = $headlength; data content = $datalength\n" if $LaTeXML::Core::Alignment::DEBUG; ## if(($headlength > 10) && (0.3*$headlength > $datalength)){ if (($headlength > 10) && (0.25 * $headlength > $datalength)) { print STDERR "header content too much longer than data content\n" if $LaTeXML::Core::Alignment::DEBUG; return; } print STDERR "Succeeded with $nhead headers\n" if $LaTeXML::Core::Alignment::DEBUG; return @heads; } sub alignment_match_head { my ($p1, $p2, $nhead) = @_; print STDERR "Try match $nhead header lines from $p1 to $p2\n " if $LaTeXML::Core::Alignment::DEBUG; my $nh = alignment_match_lines($p1, $p2, $nhead); my $ok = $nhead == $nh; print STDERR "\nMatched $nh header lines => " . ($ok ? "Succeed" : "Failed") . "\n" if $LaTeXML::Core::Alignment::DEBUG; return ($ok ? $nhead : 0); } sub alignment_match_data { my ($p1, $p2, $ndata) = @_; print STDERR "Try match $ndata data lines from $p1 to $p2\n " if $LaTeXML::Core::Alignment::DEBUG; my $nd = alignment_match_lines($p1, $p2, $ndata); my $ok = ($nd * 1.0) / $ndata > 0.66; print STDERR "\nMatched $nd data lines => " . ($ok ? "Succeed" : "Failed") . "\n" if $LaTeXML::Core::Alignment::DEBUG; return ($ok ? $nd : 0); } # Match the $n lines starting at $i2 to those starting at $i1. sub alignment_match_lines { my ($p1, $p2, $n) = @_; for (my $i = 0 ; $i < $n ; $i++) { return $i if ($p1 + $i >= scalar(@::TABLINES)) || ($p2 + $i >= scalar(@::TABLINES)) || alignment_compare($::TAB_AXIS, 0, 0, $p1 + $i, $p2 + $i) >= $::TAB_THRESHOLD; } return $n; } # Skip through a block of lines starting at $i that appear to be data, returning the number of lines. # We'll assume the 1st line is data, compare it to following lines, # but also accept `continuation' data lines. sub alignment_skip_data { my ($i) = @_; return 0 if $i >= scalar(@::TABLINES); print STDERR "Scanning for data\n " if $LaTeXML::Core::Alignment::DEBUG; my $n = 1; while ($i + $n < scalar(@::TABLINES)) { last if (alignment_compare($::TAB_AXIS, 1, 0, $i + $n - 1, $i + $n) >= $::TAB_THRESHOLD) # Accept an outlying `continuation line' as data, if mostly empty && (($n < 2) || (scalar(grep { $$_{content_class} eq '_' } @{ $::TABLINES[$i + $n] }) <= 0.4 * scalar($::TABLINES[0]))); $n++; } print STDERR "\nFound $n data lines at $i\n" if $LaTeXML::Core::Alignment::DEBUG; return ($n >= $MIN_ALIGNMENT_DATA_LINES ? $n : 0); } sub XXXalignment_max_content_length { my ($length, $from, $to) = @_; foreach my $j (($from .. $to)) { foreach my $cell (@{ $::TABLINES[$j] }) { $length = $$cell{content_length} if $$cell{content_length} && ($$cell{content_length} > $length); } } return $length; } # Return the maximum "content length" for lines from $from to $to. sub alignment_max_content_length { my ($length, $from, $to) = @_; foreach my $j (($from .. $to)) { my $l = 0; foreach my $cell (@{ $::TABLINES[$j] }) { $l += $$cell{content_length}; } $length = $l if $l > $length; } return $length; } #====================================================================== # The comparator. # our %cell_class_diff = # ('_'=>{'_'=>0.0, m=>0.1, i=>0.1, t=>0.1, '?'=>0.1, mx=>0.1}, # m =>{'_'=>0.1, m=>0.0, i=>0.1, mx=>0.2}, # i =>{'_'=>0.1, m=>0.1, i=>0.0, mx=>0.2}, # t =>{'_'=>0.1, t=>0.0, mx=>0.2}, # '?'=>{'_'=>0.1, '?'=>0.0, mx=>0.2}, # mx=>{'_'=>0.1, m=>0.2, i=>0.2, t=>0.2, '?'=>0.2, mx=>0.0}); my %cell_class_diff = ( # [CONSTANT] '_' => { '_' => 0.0, m => 0.05, i => 0.05, t => 0.05, '?' => 0.05, mx => 0.05 }, m => { '_' => 0.05, m => 0.0, i => 0.1, mx => 0.2 }, i => { '_' => 0.05, m => 0.1, i => 0.0, mx => 0.2 }, t => { '_' => 0.05, t => 0.0, mx => 0.2 }, '?' => { '_' => 0.05, '?' => 0.0, mx => 0.2 }, mx => { '_' => 0.05, m => 0.2, i => 0.2, t => 0.2, '?' => 0.2, mx => 0.0 }); # Compare two lines along $axis (0=row,1=column), returning a measure of the difference. # The borders are compared differently if # $foradjacency: we adjacent lines that might belong to the same block, # otherwise : comparing two lines that ought to have identical patterns (eg. in a repeated block) sub alignment_compare { my ($axis, $foradjacency, $reversed, $p1, $p2) = @_; my $line1 = $::TABLINES[$p1]; my $line2 = $::TABLINES[$p2]; return 0 if !($line1 && $line2); return 999999 if $line1 xor $line2; my @cells1 = @$line1; my @cells2 = @$line2; my $ncells = scalar(@cells1); my $diff = 0.0; while (@cells1 && @cells2) { my $cell1 = shift(@cells1); my $cell2 = shift(@cells2); # $diff += 0.5 if (($$cell1{align}||'') ne ($$cell2{align}||'')) $diff += 0.75 if (($$cell1{align} || '') ne ($$cell2{align} || '')) && ($$cell1{content_class} ne '_') && ($$cell2{content_class} ne '_'); if (my $d = $cell_class_diff{ $$cell1{content_class} }{ $$cell2{content_class} }) { $diff += $d; } elsif ($$cell1{content_class} ne $$cell2{content_class}) { $diff += 0.75; } # compare certain edges if ($foradjacency) { # Compare edges for adjacent rows of potentially different purpose $diff += 0.3 * scalar(grep { $$cell1{$_} != $$cell2{$_} } ($axis == 0 ? qw(r l) : qw(t b))); # Penalty for apparent divider between. my $pedge = ($axis == 0 ? ($reversed ? 't' : 'b') : ($reversed ? 'l' : 'r')); if ($$cell1{$pedge} && ($$cell1{$pedge} != $$cell2{$pedge})) { $diff += abs($$cell1{$pedge} - $$cell2{$pedge}) * 1.0; } } else { # Compare edges for rows from diff places for potential similarity $diff += 0.3 * scalar(grep { $$cell1{$_} != $$cell2{$_} } qw(r l t b)); } } $diff /= $ncells; print STDERR "$p1-$p2 => $diff; " if $LaTeXML::Core::Alignment::DEBUG; return $diff; } #====================================================================== # Debugging. sub summarize_alignment { my ($rows, $cols) = @_; my $r = 0; my ($nrows, $ncols) = (scalar(@$rows), scalar(@{ $$rows[0] })); print STDERR "\n"; foreach my $cell (@{ $$rows[0] }) { print STDERR ' ' . ($$cell{t} ? ('-' x 6) : (' ' x 6)); } print STDERR "\n"; foreach my $row (@$rows) { my $maxb = 0; print STDERR ($$row[0]{l} ? ('|' x $$row[0]{l}) : ' '); foreach my $cell (@$row) { print STDERR sprintf(" %4s ", ($$cell{cell_type} || '?') . ($$cell{align} ? $ALIGNMENT_CODE{ $$cell{align} } : ' ') . ($$cell{content_class} || '?') . ($$cell{r} ? ('|' x $$cell{r}) : ' ')); $maxb = $$cell{b} if $$cell{b} > $maxb; } # print STDERR sprintf("%.3f",alignment_compare(0,1,$$rows[$r],$$rows[$r+1])) if ($r < $nrows-1); print STDERR "\n"; for (my $b = 0 ; $b < $maxb ; $b++) { foreach my $cell (@$row) { print STDERR ' ' . ($b < $$cell{b} ? ('-' x 6) : (' ' x 6)); } print STDERR "\n"; } $r++; } print STDERR " "; # for(my $c = 0; $c < $ncols-1; $c++){ # print STDERR sprintf(" %.3f ",alignment_compare(1,1,$$cols[$c],$$cols[$c+1])); } print STDERR "\n"; return; } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of aligned structures =head1 DESCRIPTION This module defines aligned structures. It needs more documentation. It extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ����������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Alignment/�����������������������������������������������������������0000755�0001750�0001750�00000000000�12507513572�017667� 5����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Alignment/Template.pm������������������������������������������������0000644�0001750�0001750�00000010730�12507513572�022001� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Alignment | # # | Support for tabular/array environments | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Alignment::Template; use strict; use warnings; use base qw(LaTeXML::Common::Object); use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Core::Tokens; sub new { my ($class, %data) = @_; $data{columns} = [] unless $data{columns}; $data{repeating} = 1 if $data{repeating} || $data{repeated}; $data{repeated} = [] unless $data{repeated}; $data{non_repeating} = scalar(@{ $data{columns} }); $data{save_before} = [] unless $data{save_before}; $data{save_between} = [] unless $data{save_between}; # between comes before before! map { $$_{empty} = 1 } @{ $data{columns} }; map { $$_{empty} = 1 } @{ $data{repeated} }; return bless {%data}, $class; } sub revert { my ($self) = @_; return @{ $$self{tokens} }; } # Methods for constructing a template. sub setReversion { my ($self, @tokens) = @_; $$self{tokens} = [@tokens]; return; } sub setRepeating { my ($self) = @_; $$self{repeating} = 1; return; } # These add material before & after the current column sub addBeforeColumn { my ($self, @tokens) = @_; unshift(@{ $$self{save_before} }, @tokens); # NOTE: goes all the way to front! return; } sub addAfterColumn { my ($self, @tokens) = @_; $$self{current_column}{after} = Tokens(@tokens, @{ $$self{current_column}{after} }); return; } # Or between this column & next... sub addBetweenColumn { my ($self, @tokens) = @_; my @cols = @{ $$self{columns} }; if ($$self{current_column}) { $$self{current_column}{after} = Tokens(@{ $$self{current_column}{after} }, @tokens); } else { push(@{ $$self{save_between} }, @tokens); } return; } sub addColumn { my ($self, %properties) = @_; my $col = {%properties}; my @before = (); push(@before, @{ $$self{save_between} }) if $$self{save_between}; push(@before, $properties{before}->unlist) if $properties{before}; push(@before, @{ $$self{save_before} }) if $$self{save_before}; $$col{before} = Tokens(@before); $$col{after} = Tokens() unless $properties{after}; $$col{head} = $properties{head}; $$col{empty} = 1; $$self{save_between} = []; $$self{save_before} = []; $$self{current_column} = $col; if ($$self{repeating}) { $$self{non_repeating} = scalar(@{ $$self{columns} }); push(@{ $$self{repeated} }, $col); } else { push(@{ $$self{columns} }, $col); } return; } # Methods for using a template. sub clone { my ($self) = @_; my @dup = (); foreach my $cell (@{ $$self{columns} }) { push(@dup, {%$cell}); } return bless { columns => [@dup], repeated => $$self{repeated}, non_repeating => $$self{non_repeating}, repeating => $$self{repeating} }, ref $self; } sub show { my ($self) = @_; my @strings = (); push(@strings, "\nColumns:\n"); foreach my $col (@{ $$self{columns} }) { push(@strings, "\n{" . join(', ', map { "$_=>" . Stringify($$col{$_}) } keys %$col) . '}'); } if ($$self{repeating}) { push(@strings, "\nRepeated Columns:\n"); foreach my $col (@{ $$self{repeated} }) { push(@strings, "\n{" . join(', ', map { "$_=>" . Stringify($$col{$_}) } keys %$col) . '}'); } } return join(', ', @strings); } sub column { my ($self, $n) = @_; my $N = scalar(@{ $$self{columns} }); if (($n > $N) && $$self{repeating}) { my @rep = @{ $$self{repeated} }; if (my $m = scalar(@rep)) { for (my $i = $N ; $i < $n ; $i++) { my %dup = %{ $rep[($i - $$self{non_repeating}) % $m] }; push(@{ $$self{columns} }, {%dup}); } } } return $$self{columns}->[$n - 1]; } sub columns { my ($self) = @_; return @{ $$self{columns} }; } #====================================================================== 1; ����������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Array.pm�������������������������������������������������������������0000644�0001750�0001750�00000011006�12507513572�017363� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Array | # # | Support for Lists or Arrays of digestable stuff for LaTeXML | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Array; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use base qw(LaTeXML::Common::Object); # The following tokens (individual Token's or Tokens') describe how to revert the Array # open,close and separator are the outermost delimiter and separator between items # itemopen,itemclose are delimiters for each item sub new { my ($class, %options) = @_; return bless { type => $options{type}, open => $options{open}, close => $options{close}, separator => $options{separator}, itemopen => $options{itemopen}, itemclose => $options{itemclose}, values => $options{values} }, $class; } sub getValue { my ($self, $n) = @_; return $$self{values}[$n]; } sub setValue { my ($self, $n, $value) = @_; return $$self{values}[$n] = $value; } sub getValues { my ($self) = @_; return @{ $$self{values} }; } sub beDigested { my ($self, $stomach) = @_; my @v = (); foreach my $item (@{ $$self{values} }) { # Yuck my $typedef = $$self{type} && $STATE->lookupMapping('PARAMETER_TYPE', $$self{type}); my $dodigest = (ref $item) && (!$typedef || !$$typedef{undigested}); my $semiverb = $dodigest && $typedef && $$typedef{semiverbatim}; $STATE->beginSemiverbatim() if $semiverb; push(@v, ($dodigest ? $item->beDigested($stomach) : $item)); $STATE->endSemiverbatim() if $semiverb; } return (ref $self)->new(open => $$self{open}, close => $$self{close}, separator => $$self{separator}, itemopen => $$self{itemopen}, itemclose => $$self{itemclose}, type => $$self{type}, values => [@v]); } sub revert { my ($self) = @_; my @tokens = (); foreach my $item (@{ $$self{values} }) { push(@tokens, $$self{separator}->unlist) if $$self{separator} && @tokens; push(@tokens, $$self{itemopen}->unlist) if $$self{itemopen}; push(@tokens, Revert($item)); push(@tokens, $$self{itemclose}->unlist) if $$self{itemclose}; } unshift(@tokens, $$self{open}->unlist) if $$self{open}; push(@tokens, $$self{close}->unlist) if $$self{close}; return @tokens; } sub unlist { my ($self) = @_; return @{ $$self{values} }; } # ???? sub toString { my ($self) = @_; my $string = ''; foreach my $item (@{ $$self{values} }) { $string .= ', ' if $string; $string .= ToString($item); } return '[[' . $string . ']]'; } #====================================================================== 1; __END__ =pod =head1 NAME C - support for Arrays of objects =head1 DESCRIPTION Provides a representation of arrays of digested objects. It extends L. =head2 Methods =over 4 =item C<< LaTeXML::Core::Array->new(%options); >> Creates an Array object Options are values List of values; typically Tokens, initially. type The type of objects (as a ParameterType) The following are Tokens lists that are used for reverting to raw TeX, each can be undef open the opening delimiter eg "{" close the closing delimiter eg "}" separator the separator between items, eg "," itemopen the opening delimiter for each item itemclose the closeing delimiter for each item =back =head2 Accessors =over 4 =item C<< $value = $array->getValue($n) >> Return the C<$n>-th item in the list. =item C<< $array->setValue($n,$value) >> Sets the C<$n>-th value to C<$value>. =item C<< @values = $keyval->getValues(); >> Return the list of values. =item C<< $keyval->beDigested; >> Return a new C object with all values digested as appropriate. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Box.pm���������������������������������������������������������������0000644�0001750�0001750�00000020703�12507513572�017041� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Box | # # | Digested objects produced in the Stomach | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Box; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use base qw(LaTeXML::Common::Object); use base qw(Exporter); our @EXPORT = ( qw( &Box ), ); #====================================================================== # Exported constructors sub Box { my ($string, $font, $locator, $tokens, %properties) = @_; $font = $STATE->lookupValue('font') unless defined $font; $locator = $STATE->getStomach->getGullet->getLocator unless defined $locator; $tokens = LaTeXML::Core::Token::T_OTHER($string) if $string && !defined $tokens; my $state = $STATE; if ($state->lookupValue('IN_MATH')) { my $attr = (defined $string) && $state->lookupValue('math_token_attributes_' . $string); return LaTeXML::Core::Box->new($string, $font->specialize($string), $locator, $tokens, mode => 'math', ($attr ? %$attr : ()), %properties); } else { return LaTeXML::Core::Box->new($string, $font, $locator, $tokens, %properties); } } #====================================================================== # Box Object sub new { my ($class, $string, $font, $locator, $tokens, %properties) = @_; return bless [$string, $font, $locator, $tokens, {%properties}], $class; } # Accessors sub isaBox { return 1; } sub getString { my ($self) = @_; return $$self[0]; } # Return the string contents of the box sub getFont { my ($self) = @_; return $$self[1]; } # Return the font this box uses. sub isMath { my ($self) = @_; return ($$self[4]{mode} || 'text') eq 'math'; } sub getLocator { my ($self) = @_; return $$self[2]; } sub getSource { my ($self) = @_; return $$self[2]; } # So a Box can stand in for a List sub unlist { my ($self) = @_; return ($self); } # Return list of the boxes sub revert { my ($self) = @_; return ($$self[3] ? $$self[3]->unlist : ()); } sub toString { my ($self) = @_; return $$self[0]; } # Methods for overloaded operators sub stringify { my ($self) = @_; my $type = ref $self; $type =~ s/^LaTeXML::Core:://; my $font = (defined $$self[1]) && $$self[1]->stringify; # show font, too, if interesting return $type . '[' . (defined $$self[0] ? $$self[0] : (defined $$self[3] ? '[' . ToString($$self[3]) . ']' : '')) . ($font && ($font ne 'Font[]') ? ' ' . $font : '') . ']'; } # Should this compare fonts too? sub equals { my ($a, $b) = @_; return (defined $b) && ((ref $a) eq (ref $b)) && ($$a[0] eq $$b[0]) && ($$a[1]->equals($$b[1])); } sub beAbsorbed { my ($self, $document) = @_; my $string = $$self[0]; my $mode = $$self[4]{mode} || 'text'; return ((defined $string) && ($string ne '') ? ($mode eq 'math' ? $document->insertMathToken($string, font => $$self[1], %{ $$self[4] }) : $document->openText($string, $$self[1])) : undef); } sub getProperty { my ($self, $key) = @_; if ($key eq 'isSpace') { my $tex = LaTeXML::Core::Token::UnTeX($$self[3]); # ! return (defined $tex) && ($tex =~ /^\s*$/); } # Check the TeX code, not (just) the string! else { return $$self[4]{$key}; } } sub getProperties { my ($self) = @_; return %{ $$self[4] }; } sub getPropertiesRef { my ($self) = @_; return $$self[4]; } sub setProperty { my ($self, $key, $value) = @_; $$self[4]{$key} = $value; return; } sub setProperties { my ($self, %props) = @_; while (my ($key, $value) = each %props) { $$self{properties}{$key} = $value if defined $value; } return; } sub getWidth { my ($self, %options) = @_; my $props = $self->getPropertiesRef; $self->computeSize(%options) unless defined $$props{width}; return $$props{width}; } sub getHeight { my ($self, %options) = @_; my $props = $self->getPropertiesRef; $self->computeSize(%options) unless defined $$props{height}; return $$props{height}; } sub getDepth { my ($self, %options) = @_; my $props = $self->getPropertiesRef; $self->computeSize(%options) unless defined $$props{depth}; return $$props{depth}; } sub getTotalHeight { my ($self, %options) = @_; my $props = $self->getPropertiesRef; $self->computeSize(%options) unless defined $$props{height} && defined $$props{depth}; return $$props{height}->add($$props{depth}); } sub setWidth { my ($self, $width) = @_; my $props = $self->getPropertiesRef; $$props{width} = $width; return; } sub setHeight { my ($self, $height) = @_; my $props = $self->getPropertiesRef; $$props{height} = $height; return; } sub setDepth { my ($self, $depth) = @_; my $props = $self->getPropertiesRef; $$props{depth} = $depth; return; } sub getSize { my ($self, %options) = @_; my $props = $self->getPropertiesRef; $self->computeSize(%options) unless (defined $$props{width}) && (defined $$props{height}) && (defined $$props{depth}); return ($$props{width}, $$props{height}, $$props{depth}); } # for debugging.... sub showSize { my ($self) = @_; return '[' . ToString($self->getWidth) . ' x ' . ToString($self->getHeight) . ' + ' . ToString($self->getDepth) . ']'; } #omg # Fake computing the dimensions of strings (typically single chars). # Eventually, this needs to link into real font data sub computeSize { my ($self, %options) = @_; my $props = $self->getPropertiesRef; $options{width} = $$props{width} if $$props{width}; $options{height} = $$props{height} if $$props{height}; $options{depth} = $$props{depth} if $$props{depth}; my ($w, $h, $d) = ($$self[1] || LaTeXML::Common::Font->textDefault)->computeStringSize($$self[0], %options); $$props{width} = $w unless defined $$props{width}; $$props{height} = $h unless defined $$props{height}; $$props{depth} = $d unless defined $$props{depth}; return; } #********************************************************************** # What about Kern, Glue, Penalty ... #====================================================================== 1; __END__ =pod =head1 NAME C - Representations of digested objects; extends L. =head2 Exported Functions =over 4 =item C<< $box = Box($string,$font,$locator,$tokens); >> Creates a Box representing the C<$string> in the given C<$font>. The C<$locator> records the document source position. The C<$tokens> is a Tokens list containing the TeX that created (or could have) the Box. If C<$font> or C<$locator> are undef, they are obtained from the currently active L. Note that $string can be undef which contributes nothing to the generated document, but does record the TeX code (in C<$tokens>). =back =head2 METHODS =over 4 =item C<< $font = $digested->getFont; >> Returns the font used by C<$digested>. =item C<< $boole = $digested->isMath; >> Returns whether C<$digested> was created in math mode. =item C<< @boxes = $digested->unlist; >> Returns a list of the boxes contained in C<$digested>. It is also defined for the Boxes and Whatsit (which just return themselves) so they can stand-in for a List. =item C<< $string = $digested->toString; >> Returns a string representing this C<$digested>. =item C<< $string = $digested->revert; >> Reverts the box to the list of Cs that created (or could have created) it. =item C<< $string = $digested->getLocator; >> Get a string describing the location in the original source that gave rise to C<$digested>. =item C<< $digested->beAbsorbed($document); >> C<$digested> should get itself absorbed into the C<$document> in whatever way is apppropriate. =item C<< $string = $box->getString; >> Returns the string part of the C<$box>. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut �������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Comment.pm�����������������������������������������������������������0000644�0001750�0001750�00000003535�12507513572�017717� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Comment | # # | Digested objects produced in the Stomach | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Comment; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Dimension; use base qw(LaTeXML::Core::Box); sub revert { return (); } sub toString { return ''; } sub beAbsorbed { my ($self, $document) = @_; return $document->insertComment($$self[0]); } sub getWidth { return Dimension(0); } sub getHeight { return Dimension(0); } sub getTotalHeight { return Dimension(0); } sub getDepth { return Dimension(0); } sub getSize { return (Dimension(0), Dimension(0), Dimension(0)); } #====================================================================== 1; __END__ =pod =head1 NAME C - Representations of digested objects. =head1 DESCRIPTION C is a representation of digested objects. It extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut �������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition.pm��������������������������������������������������������0000644�0001750�0001750�00000016760�12507513572�020411� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Definition | # # | Representation of definitions of Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Definition; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Core::Token; use LaTeXML::Core::Parameters; use Time::HiRes; use base qw(LaTeXML::Common::Object); # Make these present, but do not import. require LaTeXML::Core::Definition::Expandable; require LaTeXML::Core::Definition::Conditional; require LaTeXML::Core::Definition::Primitive; require LaTeXML::Core::Definition::Register; require LaTeXML::Core::Definition::CharDef; require LaTeXML::Core::Definition::Constructor; #********************************************************************** sub isaDefinition { return 1; } sub getCS { my ($self) = @_; return $$self{cs}; } sub getCSName { my ($self) = @_; return (defined $$self{alias} ? $$self{alias} : $$self{cs}->getCSName); } sub isExpandable { return 0; } sub isRegister { return ''; } sub isPrefix { return 0; } sub getLocator { my ($self) = @_; return $$self{locator}; } sub readArguments { my ($self, $gullet) = @_; my $params = $self->getParameters; return ($params ? $params->readArguments($gullet, $self) : ()); } sub getParameters { my ($self) = @_; # Allow defering these until the Definition is actually used. if ((defined $$self{parameters}) && !ref $$self{parameters}) { require LaTeXML::Package; $$self{parameters} = LaTeXML::Package::parseParameters($$self{parameters}, $$self{cs}); } return $$self{parameters}; } #====================================================================== # Overriding methods sub stringify { my ($self) = @_; my $type = ref $self; $type =~ s/^LaTeXML:://; my $name = ($$self{alias} || $$self{cs}->getCSName); return $type . '[' . ($$self{parameters} ? $name . ' ' . Stringify($$self{parameters}) : $name) . ']'; } sub toString { my ($self) = @_; return ($$self{parameters} ? ToString($$self{cs}) . ' ' . ToString($$self{parameters}) : ToString($$self{cs})); } # Return the Tokens that would invoke the given definition with arguments. sub invocation { my ($self, @args) = @_; my $params = $self->getParameters; return ($$self{cs}, ($params ? $params->revertArguments(@args) : ())); } #====================================================================== # Profiling support #====================================================================== # If the value PROFILING is true, we'll collect some primitive profiling info. # Start profiling $CS (typically $LaTeXML::CURRENT_TOKEN) # Call from within ->invoke. sub startProfiling { my ($cs) = @_; my $name = $cs->getCSName; my $entry = $STATE->lookupMapping('runtime_profile', $name); # [#calls, total time, starts of pending calls...] if (!defined $entry) { $entry = [0, 0]; $STATE->assignMapping('runtime_profile', $name, $entry); } $$entry[0]++; # One more call. push(@$entry, [Time::HiRes::gettimeofday]); # started new call return; } # Stop profiling $CS, if it was being profiled. sub stopProfiling { my ($cs) = @_; $cs = $cs->getString if $cs->getCatcode == CC_MARKER; # Special case for macros!! my $name = $cs->getCSName; if (my $entry = $STATE->lookupMapping('runtime_profile', $name)) { if (scalar(@$entry) > 2) { # Hopefully we're the pop gets the corresponding start time!?!?! $$entry[1] += Time::HiRes::tv_interval(pop(@$entry), [Time::HiRes::gettimeofday]); } } return; } our $MAX_PROFILE_ENTRIES = 50; # [CONSTANT] # Print out profiling information, if any was collected sub showProfile { if (my $profile = $STATE->lookupValue('runtime_profile')) { my @cs = keys %$profile; my @unfinished = (); foreach my $cs (@cs) { push(@unfinished, $cs) if scalar(@{ $$profile{$cs} }) > 2; } my @frequent = sort { $$profile{$b}[0] <=> $$profile{$a}[0] } @cs; @frequent = @frequent[0 .. $MAX_PROFILE_ENTRIES]; my @expensive = sort { $$profile{$b}[1] <=> $$profile{$a}[1] } @cs; @expensive = @expensive[0 .. $MAX_PROFILE_ENTRIES]; print STDERR "\nProfiling results:\n"; print STDERR "Most frequent:\n " . join(', ', map { $_ . ':' . $$profile{$_}[0] } @frequent) . "\n"; print STDERR "Most expensive (inclusive):\n " . join(', ', map { $_ . ':' . sprintf("%.2fs", $$profile{$_}[1]) } @expensive) . "\n"; if (@unfinished) { print STDERR "The following were never marked as done:\n " . join(', ', @unfinished) . "\n"; } } return; } #=============================================================================== 1; __END__ =pod =head1 NAME C - Control sequence definitions. =head1 DESCRIPTION This abstract class represents the various executables corresponding to control sequences. See L for the most convenient means to create them. It extends L. =head2 Methods =over 4 =item C<< $token = $defn->getCS; >> Returns the (main) token that is bound to this definition. =item C<< $string = $defn->getCSName; >> Returns the string form of the token bound to this definition, taking into account any alias for this definition. =item C<< $defn->readArguments($gullet); >> Reads the arguments for this C<$defn> from the C<$gullet>, returning a list of L. =item C<< $parameters = $defn->getParameters; >> Return the C object representing the formal parameters of the definition. =item C<< @tokens = $defn->invocation(@args); >> Return the tokens that would invoke the given definition with the provided arguments. This is used to recreate the TeX code (or it's equivalent). =item C<< $defn->invoke; >> Invoke the action of the C<$defn>. For expandable definitions, this is done in the Gullet, and returns a list of Ls. For primitives, it is carried out in the Stomach, and returns a list of Les. For a constructor, it is also carried out by the Stomach, and returns a L. That whatsit will be responsible for constructing the XML document fragment, when the L invokes C<$whatsit->beAbsorbed($document);>. Primitives and Constructors also support before and after daemons, lists of subroutines that are executed before and after digestion. These can be useful for changing modes, etc. =back =head1 SEE ALSO L, L, L, L, L and L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ����������������latexml-0.8.1/lib/LaTeXML/Core/Definition/����������������������������������������������������������0000755�0001750�0001750�00000000000�12507513572�020041� 5����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition/CharDef.pm������������������������������������������������0000644�0001750�0001750�00000004635�12507513572�021703� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Definition::CharDef | # # | Representation of definitions of Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Definition::CharDef; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Error; use base qw(LaTeXML::Core::Definition::Register); # A CharDef is a specialized register; # You can't assign it; when you invoke the control sequence, it returns # the result of evaluating the character (more like a regular primitive). sub new { my ($class, $cs, $value, $internalcs, %traits) = @_; return bless { cs => $cs, parameters => undef, value => $value, internalcs => $internalcs, registerType => 'Number', readonly => 1, locator => "from " . $STATE->getStomach->getGullet->getMouth->getLocator(-1), %traits }, $class; } sub valueOf { my ($self) = @_; return $$self{value}; } sub setValue { my ($self, $value) = @_; Error('unexpected', $self, undef, "Can't assign to chardef " . $self->getCSName); return; } sub invoke { my ($self, $stomach) = @_; my $cs = $$self{internalcs}; # Tracing ? return (defined $cs ? $stomach->invokeToken($cs) : undef); } #=============================================================================== 1; __END__ =pod =head1 NAME C - Control sequence definitions for chardefs. =head1 DESCRIPTION Representation as a further specialized Register for chardef. See L for the most convenient means to create them. It extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ���������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition/Conditional.pm��������������������������������������������0000644�0001750�0001750�00000022131�12507513572�022641� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Definition::Conditional | # # | Representation of definitions of Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Definition::Conditional; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use base qw(LaTeXML::Core::Definition::Expandable); # Conditional control sequences; Expandable # Expand enough to determine true/false, then maybe skip # record a flag somewhere so that \else or \fi is recognized # (otherwise, they should signal an error) sub new { my ($class, $cs, $parameters, $test, %traits) = @_; my $source = $STATE->getStomach->getGullet->getMouth; return bless { cs => $cs, parameters => $parameters, test => $test, locator => "from " . $source->getLocator(-1), %traits }, $class; } sub getTest { my ($self) = @_; return $$self{test}; } # Note that although conditionals are Expandable, # they do NOT defined as macros, so they don't need to handle doInvocation, sub invoke { my ($self, $gullet) = @_; # A real conditional must have condition_type set if (my $cond_type = $$self{conditional_type}) { if ($cond_type eq 'if') { return $self->invoke_conditional($gullet); } elsif ($cond_type eq 'else') { return $self->invoke_else($gullet); } elsif ($cond_type eq 'or') { return $self->invoke_else($gullet); } elsif ($cond_type eq 'fi') { return $self->invoke_fi($gullet); } } Error('unexpected', $$self{cs}, $gullet, "Unknown conditional control sequence " . Stringify($LaTeXML::CURRENT_TOKEN)); return; } sub invoke_conditional { my ($self, $gullet) = @_; # Keep a stack of the conditionals we are processing. my $ifid = $STATE->lookupValue('if_count') || 0; $STATE->assignValue(if_count => ++$ifid, 'global'); local $LaTeXML::IFFRAME = { token => $LaTeXML::CURRENT_TOKEN, start => $gullet->getLocator, parsing => 1, elses => 0, ifid => $ifid }; $STATE->unshiftValue(if_stack => $LaTeXML::IFFRAME); my @args = $self->readArguments($gullet); $$LaTeXML::IFFRAME{parsing} = 0; # Now, we're done parsing the Test clause. my $tracing = $STATE->lookupValue('TRACINGCOMMANDS'); print STDERR '{' . ToString($LaTeXML::CURRENT_TOKEN) . "} [#$ifid]\n" if $tracing; if (my $test = $self->getTest) { my $result = &$test($gullet, @args); if ($result) { print STDERR "{true}\n" if $tracing; } else { my $to = skipConditionalBody($gullet, -1); print STDERR "{false} [skipped to " . ToString($to) . "]\n" if $tracing; } } # If there's no test, it must be the Special Case, \ifcase else { my $num = $args[0]->valueOf; if ($num > 0) { my $to = skipConditionalBody($gullet, $num); print STDERR "{$num} [skipped to " . ToString($to) . "]\n" if $tracing; } } return; } #====================================================================== # Support for conditionals: # Skipping for conditionals # 0 : skip to \fi # -1 : skip to \else, if any, or \fi # n : skip to n-th \or, if any, or \else, if any, or \fi. # NOTE that there are 2 kinds of "nested" ifs. # \if's inside the body of either the true or false branch # are easily skipped by tracking a level of if nesting and skipping over the # same number of \fi as you find \if. # \if's that get expanded while evaluating the test clause itself # are considerably trickier. There's a frame on the if-stack for this \if # that's above the one we're currently processing; typically the \else & \fi # may still remain, but we need to either evaluate them a normal # if we're continuing to follow the true branch, or skip oever them if # we're trying to find the \else for the false branch. # The danger is mistaking the \else that's associated with the test clause's \if # and taking it for the \else that we're skipping to! # Canonical example: # \if\ifx AA XY junk \else blah \fi True \else False \fi # The inner \ifx should expand to "XY junk", since A==A # Return the token we've skipped to, and the frame that this applies to. sub skipConditionalBody { my ($gullet, $nskips) = @_; my $level = 1; my $n_ors = 0; my $start = $gullet->getLocator; # NOTE: Open-coded manipulation of if_stack! # [we're only reading tokens & looking up, so State shouldn't change behind our backs] my $stack = $STATE->lookupValue('if_stack'); while (my $t = $gullet->readToken) { # The only Interesting tokens are bound to defns (defined OR \let!!!) if (defined(my $defn = $STATE->lookupDefinition($t))) { if (my $cond_type = $$defn{conditional_type}) { if ($cond_type eq 'if') { # Found a \ifxx of some sort $level++; } elsif ($cond_type eq 'fi') { # Found a \fi if ($$stack[0] ne $LaTeXML::IFFRAME) { # But is it for a condition nested in the test clause? shift(@$stack); } # then DO pop that conditional's frame; it's DONE! elsif (!--$level) { # If no more nesting, we're done. shift(@$stack); # Done with this frame return $t; } } # AND Return the finishing token. elsif ($level > 1) { # Ignore \else,\or nested in the body. } elsif (($cond_type eq 'or') && (++$n_ors == $nskips)) { return $t; } elsif (($cond_type eq 'else') && $nskips # Found \else and we're looking for one? # Make sure this \else is NOT for a nested \if that is part of the test clause! && ($$stack[0] eq $LaTeXML::IFFRAME)) { # No need to actually call elseHandler, but note that we've seen an \else! $$stack[0]{elses} = 1; return $t; } } } } Error('expected', '\fi', $gullet, "Missing \\fi or \\else, conditional fell off end", "Conditional started at $start"); return; } sub invoke_else { my ($self, $gullet) = @_; my $stack = $STATE->lookupValue('if_stack'); if (!($stack && $$stack[0])) { # No if stack entry ? Error('unexpected', $LaTeXML::CURRENT_TOKEN, $gullet, "Didn't expect a " . Stringify($LaTeXML::CURRENT_TOKEN) . " since we seem not to be in a conditional"); return; } elsif ($$stack[0]{parsing}) { # Defer expanding the \else if we're still parsing the test return (T_CS('\relax'), $LaTeXML::CURRENT_TOKEN); } elsif ($$stack[0]{elses}) { # Already seen an \else's at this level? Error('unexpected', $LaTeXML::CURRENT_TOKEN, $gullet, "Extra " . Stringify($LaTeXML::CURRENT_TOKEN), "already saw \\else for " . Stringify($$stack[0]{token}) . " [" . $$stack[0]{ifid} . "] at " . $$stack[0]{start}); return; } else { local $LaTeXML::IFFRAME = $$stack[0]; my $t = skipConditionalBody($gullet, 0); print STDERR '{' . ToString($LaTeXML::CURRENT_TOKEN) . '}' . " [for " . ToString($$LaTeXML::IFFRAME{token}) . " #" . $$LaTeXML::IFFRAME{ifid} . " skipping to " . ToString($t) . "]\n" if $STATE->lookupValue('TRACINGCOMMANDS'); return; } } sub invoke_fi { my ($self, $gullet) = @_; my $stack = $STATE->lookupValue('if_stack'); if (!($stack && $$stack[0])) { # No if stack entry ? Error('unexpected', $LaTeXML::CURRENT_TOKEN, $gullet, "Didn't expect a " . Stringify($LaTeXML::CURRENT_TOKEN) . " since we seem not to be in a conditional"); return; } elsif ($$stack[0]{parsing}) { # Defer expanding the \else if we're still parsing the test return (T_CS('\relax'), $LaTeXML::CURRENT_TOKEN); } else { # "expand" by removing the stack entry for this level local $LaTeXML::IFFRAME = $$stack[0]; $STATE->shiftValue('if_stack'); # Done with this frame print STDERR '{' . ToString($LaTeXML::CURRENT_TOKEN) . '}' . " [for " . Stringify($$LaTeXML::IFFRAME{token}) . " #" . $$LaTeXML::IFFRAME{ifid} . "]\n" if $STATE->lookupValue('TRACINGCOMMANDS'); return; } } #=============================================================================== 1; __END__ =pod =head1 NAME C - Conditionals Control sequence definitions. =head1 DESCRIPTION These represent the control sequences for conditionals, as well as C<\else>, C<\or> and C<\fi>. See L for the most convenient means to create them. It extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition/Constructor.pm��������������������������������������������0000644�0001750�0001750�00000017230�12507513572�022727� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Definition::Constructor | # # | Representation of definitions of Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Definition::Constructor; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Whatsit; use base qw(LaTeXML::Core::Definition::Primitive); use LaTeXML::Core::Definition::Constructor::Compiler; #********************************************************************** # Constructor control sequences. # They are first converted to a Whatsit in the Stomach, and that Whatsit's # contruction is carried out to form parts of the document. # In particular, beforeDigest, reading args and afterDigest are executed # in the Stomach. #********************************************************************** # Known traits: # beforeDigest, afterDigest : code for before/after digestion daemons # afterDigestBody : similar, but applies only the \begin{environment} commands. # reversion : CODE or TOKENS for reverting to TeX form # captureBody : whether to capture the following List as a `body` # (for environments, math modes) # If this is a token, it is the token that will be matched to end the body. # properties : a hash of default values for properties to store in the Whatsit. sub new { my ($class, $cs, $parameters, $replacement, %traits) = @_; my $source = $STATE->getStomach->getGullet->getMouth; Fatal('misdefined', $cs, $source, "Constructor replacement for '" . ToString($cs) . "' is not a string or CODE", "Replacement is $replacement") if !(defined $replacement) || ((ref $replacement) && !(ref $replacement eq 'CODE')); return bless { cs => $cs, parameters => $parameters, replacement => $replacement, locator => "from " . $source->getLocator(-1), %traits, ## nargs => (defined $traits{nargs} ? $traits{nargs} ## : ($parameters ? $parameters->getNumArgs : 0)) nargs => $traits{nargs} }, $class; } sub getReversionSpec { my ($self) = @_; my $spec = $$self{reversion}; if ($spec && !ref $spec) { $spec = $$self{reversion} = LaTeXML::Package::TokenizeInternal($spec); } return $spec; } sub getSizer { my ($self) = @_; return $$self{sizer}; } sub getAlias { my ($self) = @_; return $$self{alias}; } sub getNumArgs { my ($self) = @_; return $$self{nargs} if defined $$self{nargs}; my $params = $self->getParameters; $$self{nargs} = ($params ? $params->getNumArgs : 0); return $$self{nargs}; } # Digest the constructor; This should occur in the Stomach to create a Whatsit. # The whatsit which will be further processed to create the document. sub invoke { my ($self, $stomach) = @_; # Call any `Before' code. my $profiled = $STATE->lookupValue('PROFILING') && ($LaTeXML::CURRENT_TOKEN || $$self{cs}); LaTeXML::Core::Definition::startProfiling($profiled) if $profiled; my @pre = $self->executeBeforeDigest($stomach); if ($STATE->lookupValue('TRACINGCOMMANDS')) { print STDERR '{' . $self->getCSName . "}\n"; } # Get some info before we process arguments... my $font = $STATE->lookupValue('font'); my $ismath = $STATE->lookupValue('IN_MATH'); # Parse AND digest the arguments to the Constructor my $params = $self->getParameters; my @args = ($params ? $params->readArgumentsAndDigest($stomach, $self) : ()); my $nargs = $self->getNumArgs; @args = @args[0 .. $nargs - 1]; # Compute any extra Whatsit properties (many end up as element attributes) my $properties = $$self{properties}; my %props = (!defined $properties ? () : (ref $properties eq 'CODE' ? &$properties($stomach, @args) : %$properties)); foreach my $key (keys %props) { my $value = $props{$key}; if (ref $value eq 'CODE') { $props{$key} = &$value($stomach, @args); } } $props{font} = $font unless defined $props{font}; $props{locator} = $stomach->getGullet->getMouth->getLocator unless defined $props{locator}; $props{isMath} = $ismath unless defined $props{isMath}; $props{level} = $stomach->getBoxingLevel; # Now create the Whatsit, itself. my $whatsit = LaTeXML::Core::Whatsit->new($self, [@args], %props); # Call any 'After' code. my @post = $self->executeAfterDigest($stomach, $whatsit); if (my $cap = $$self{captureBody}) { $whatsit->setBody(@post, $stomach->digestNextBody((ref $cap ? $cap : undef))); @post = (); } my @postpost = $self->executeAfterDigestBody($stomach, $whatsit); LaTeXML::Core::Definition::stopProfiling($profiled) if $profiled; return (@pre, $whatsit, @post, @postpost); } # Similar to executeAfterDigest sub executeAfterDigestBody { my ($self, $stomach, @whatever) = @_; local $LaTeXML::Core::State::UNLOCKED = 1; my $post = $$self{afterDigestBody}; return ($post ? map { &$_($stomach, @whatever) } @$post : ()); } sub doAbsorbtion { my ($self, $document, $whatsit) = @_; # First, compile the constructor pattern, if needed. my $replacement = $$self{replacement}; if (!ref $replacement) { $$self{replacement} = $replacement = LaTeXML::Core::Definition::Constructor::Compiler::compileConstructor($self); } # Now do the absorbtion. if (my $pre = $$self{beforeConstruct}) { map { &$_($document, $whatsit) } @$pre; } &{$replacement}($document, $whatsit->getArgs, $whatsit->getProperties); if (my $post = $$self{afterConstruct}) { map { &$_($document, $whatsit) } @$post; } return; } #=============================================================================== 1; __END__ =pod =head1 NAME C - Control sequence definitions. =head1 DESCRIPTION This class represents control sequences that contribute arbitrary XML fragments to the document tree. During digestion, a C records the arguments used in the invokation to produce a L. The resulting L (usually) generates an XML document fragment when absorbed by an instance of L. Additionally, a C may have beforeDigest and afterDigest daemons defined which are executed for side effect, or for adding additional boxes to the output. It extends L. More documentation needed, but see LaTeXML::Package for the main user access to these. =head2 More about Constructors =begin latex \label{LaTeXML::Core::Definition::ConstructorCompiler} =end latex A constructor has as it's C a subroutine or a string pattern representing the XML fragment it should generate. In the case of a string pattern, the pattern is compiled into a subroutine on first usage by the internal class C. Like primitives, constructors may have C and C. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition/Constructor/����������������������������������������������0000755�0001750�0001750�00000000000�12507513572�022366� 5����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition/Constructor/Compiler.pm�����������������������������������0000644�0001750�0001750�00000023462�12507513572�024505� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Definition::Constructor::Compiler | # # | Compiler for Constructor Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Definition::Constructor::Compiler; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Common::XML; use Scalar::Util qw(refaddr); my $VALUE_RE = "(\\#|\\&[\\w\\:]*\\()"; # [CONSTANT] my $COND_RE = "\\?$VALUE_RE"; # [CONSTANT] # Attempt to follow XML Spec, Appendix B my $QNAME_RE = "((?:\\p{Ll}|\\p{Lu}|\\p{Lo}|\\p{Lt}|\\p{Nl}|_|:)" # [CONSTANT] . "(?:\\p{Ll}|\\p{Lu}|\\p{Lo}|\\p{Lt}|\\p{Nl}|_|:|\\p{M}|\\p{Lm}|\\p{Nd}|\\.|\\-)*)"; my $TEXT_RE = "(.[^\\#<\\?\\)\\&\\,]*)"; # [CONSTANT] sub compileConstructor { my ($constructor) = @_; my $replacement = $$constructor{replacement}; return sub { } unless $replacement; my $cs = $constructor->getCS; my $name = $cs->getCSName; my $nargs = $constructor->getNumArgs; local $LaTeXML::Core::Definition::Constructor::CONSTRUCTOR = $constructor; local $LaTeXML::Core::Definition::Constructor::NAME = $name; local $LaTeXML::Core::Definition::Constructor::NARGS = $nargs; $name =~ s/\W//g; my $uid = refaddr $constructor; $name = "LaTeXML::Package::Pool::constructor_" . $name . '_' . $uid; my $floats = ($replacement =~ s/^\^\s*//); # Grab float marker. my $body = translate_constructor($replacement, $floats); # Compile the constructor pattern into an anonymous sub that will construct the requested XML. my $code = # Put the function in the Pool package, so that functions defined there can be used within" # And, also that these definitions get cleaned up by the Daemon. " package LaTeXML::Package::Pool;\n" . "sub $name {\n" . "my(" . join(', ', '$document', (map { "\$arg$_" } 1 .. $nargs), '%prop') . ")=\@_;\n" . ($floats ? "my \$savenode;\n" : '') . $body . ($floats ? "\$document->setNode(\$savenode) if defined \$savenode;\n" : '') . "}\n" . "1;\n"; ###print STDERR "Compilation of \"$replacement\" => \n$code\n"; my $result = eval $code; Fatal('misdefined', $name, $constructor, "Compilation of constructor code for '$name' failed", "\"$replacement\" => $code", $@) if !$result || $@; return \&$name; } sub translate_constructor { my ($constructor, $float) = @_; my $code = ''; local $_ = $constructor; while ($_) { if (/^$COND_RE/so) { my ($bool, $if, $else) = parse_conditional(); $code .= "if($bool){\n" . translate_constructor($if) . "}\n" . ($else ? "else{\n" . translate_constructor($else) . "}\n" : ''); } # Processing instruction: elsif (s|^\s*<\?$QNAME_RE||so) { my ($pi, $av) = ($1, translate_avpairs()); $code .= "\$document->insertPI('$pi'" . ($av ? ", $av" : '') . ");\n"; Fatal('misdefined', $LaTeXML::Core::Definition::Constructor::NAME, $LaTeXML::Core::Definition::Constructor::CONSTRUCTOR, "Missing \"?>\" in constructor template at \"$_\"") unless s|^\s*\?>||; } # Open tag: or .../> (for empty element) elsif (s|^\s*<$QNAME_RE||so) { my ($tag, $av) = ($1, translate_avpairs()); if ($float) { $code .= "\$savenode=\$document->floatToElement('$tag');\n"; $float = undef; } $code .= "\$document->openElement('$tag'" . ($av ? ", $av" : '') . ");\n"; $code .= "\$document->closeElement('$tag');\n" if s|^/||; # Empty element. Fatal('misdefined', $LaTeXML::Core::Definition::Constructor::NAME, $LaTeXML::Core::Definition::Constructor::CONSTRUCTOR, "Missing \">\" in constructor template at \"$_\"") unless s|^>||; } # Close tag: elsif (s|^||so) { $code .= "\$document->closeElement('$1');\n"; } # Substitutable value: argument, property... elsif (/^$VALUE_RE/o) { $code .= "\$document->absorb(" . translate_value() . ",\%prop);\n"; } # Attribute: a=v; assigns in current node? [May conflict with random text!?!] elsif (s|^$QNAME_RE\s*=\s*||so) { my $key = $1; my $value = translate_string(); if (defined $value) { if ($float) { $code .= "\$savenode=\$document->floatToAttribute('$key');\n"; $float = undef; } $code .= "\$document->setAttribute(\$document->getElement,'$key'," . $value . ");\n"; } else { # attr value didn't match value pattern? treat whole thing as random text! $code .= "\$document->absorb('" . slashify($key) . "=',\%prop);\n"; } } # Else random text elsif (s/^$TEXT_RE//so) { # Else, just some text. $code .= "\$document->absorb('" . slashify($1) . "',\%prop);\n"; } } return $code; } sub slashify { my ($string) = @_; $string =~ s/\\/\\\\/g; return $string; } # parse a conditional in a constructor # Conditionals are of the form ?value(...)(...), # Return the translated condition, along with the strings for the if and else clauses. use Text::Balanced; sub parse_conditional { s/^\?//; # Remove leading "?" my $bool = 'ToString(' . translate_value() . ')'; if (my $if = Text::Balanced::extract_bracketed($_, '()')) { $if =~ s/^\(//; $if =~ s/\)$//; my $else = Text::Balanced::extract_bracketed($_, '()'); $else =~ s/^\(// if $else; $else =~ s/\)$// if $else; return ($bool, $if, $else); } else { Fatal('misdefined', $LaTeXML::Core::Definition::Constructor::NAME, $LaTeXML::Core::Definition::Constructor::CONSTRUCTOR, "Unbalanced conditional in constructor template \"$_\""); return; } } # Parse a substitutable value from the constructor (in $_) # Recognizes the #1, #prop, and also &function(args,...) sub translate_value { my $value; if (s/^\&([\w\:]*)\(//) { # Recognize a function call, w/args my $fcn = $1; my @args = (); while (!/^\s*\)/) { if (/^\s*[\'\"]/) { push(@args, translate_string()); } else { push(@args, translate_value()); } last unless s/^\s*\,\s*//; } Error('misdefined', $LaTeXML::Core::Definition::Constructor::NAME, $LaTeXML::Core::Definition::Constructor::CONSTRUCTOR, "Missing ')' in &$fcn(...) in constructor pattern for $LaTeXML::Core::Definition::Constructor::NAME") unless s/\)//; $value = "$fcn(" . join(',', @args) . ")"; } elsif (s/^\#(\d+)//) { # Recognize an explicit #1 for whatsit args my $n = $1; if (($n < 1) || ($n > $LaTeXML::Core::Definition::Constructor::NARGS)) { Error('misdefined', $LaTeXML::Core::Definition::Constructor::NAME, $LaTeXML::Core::Definition::Constructor::CONSTRUCTOR, "Illegal argument number $n in constructor for " . "$LaTeXML::Core::Definition::Constructor::NAME which takes $LaTeXML::Core::Definition::Constructor::NARGS args"); $value = "\"Missing\""; } else { $value = "\$arg$n" } } elsif (s/^\#([\w\-_]+)//) { $value = "\$prop{'$1'}"; } # Recognize #prop for whatsit properties elsif (s/$TEXT_RE//so) { $value = "'" . slashify($1) . "'"; } return $value; } # Parse a delimited string from the constructor (in $_), # for example, an attribute value. Can contain substitutions (above), # the result is a string. # NOTE: UNLESS there is ONLY one substituted value, then return the value object. # This is (hopefully) temporary to handle font objects as attributes. # The DOM holds the font objects, rather than strings, # to resolve relative fonts on output. sub translate_string { my @values = (); if (s/^\s*([\'\"])//) { my $quote = $1; while ($_ && !s/^$quote//) { if (/^$COND_RE/o) { my ($bool, $if, $else) = parse_conditional(); my $code = "($bool ?"; { local $_ = $if; $code .= translate_value(); } $code .= ":"; if ($else) { local $_ = $else; $code .= translate_value(); } else { $code .= "''"; } $code .= ")"; push(@values, $code); } elsif (/^$VALUE_RE/o) { push(@values, translate_value()); } elsif (s/^(.[^\#<\?\!$quote]*)//) { push(@values, "'" . slashify($1) . "'"); } } } if (!@values) { return; } elsif (@values == 1) { return $values[0]; } else { return join('.', (map { (/^\'/ ? $_ : " ToString($_)") } @values)); } } # Parse a set of attribute value pairs from a constructor pattern, # substituting argument and property values from the whatsit. sub translate_avpairs { my @avs = (); s|^\s*||; while ($_) { if (/^$COND_RE/o) { my ($bool, $if, $else) = parse_conditional(); my $code = "($bool ? ("; { local $_ = $if; $code .= translate_avpairs(); } $code .= ") : ("; { local $_ = $else; $code .= translate_avpairs() if $else; } $code .= "))"; push(@avs, $code); } elsif (/^%$VALUE_RE/) { # Hash? Assume the value can be turned into a hash! s/^%//; # Eat the "%" push(@avs, '%{' . translate_value() . '}'); } elsif (s|^$QNAME_RE\s*=\s*||o) { my ($key, $value) = ($1, translate_string()); push(@avs, "'$key'=>$value"); } # if defined $value; } else { last; } s|^\s*||; } return join(', ', @avs); } #=============================================================================== 1; ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition/Expandable.pm���������������������������������������������0000644�0001750�0001750�00000014306�12507513572�022446� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Definition | # # | Representation of definitions of Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Definition::Expandable; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Core::Tokens; use LaTeXML::Core::Parameters; use base qw(LaTeXML::Core::Definition); sub new { my ($class, $cs, $parameters, $expansion, %traits) = @_; $expansion = Tokens($expansion) if ref $expansion eq 'LaTeXML::Core::Token'; my $source = $STATE->getStomach->getGullet->getMouth; if (ref $expansion eq 'LaTeXML::Core::Tokens') { my $level = 0; foreach my $t ($expansion->unlist) { $level++ if $t->equals(T_BEGIN); $level-- if $t->equals(T_END); } Fatal('misdefined', $cs, $source, "Expansion of '" . ToString($cs) . "' has unbalanced {}", "Expansion is " . ToString($expansion)) if $level; } return bless { cs => $cs, parameters => $parameters, expansion => $expansion, locator => "from " . $source->getLocator(-1), isProtected => $STATE->getPrefix('protected'), %traits }, $class; } sub isExpandable { return 1; } sub isProtected { my ($self) = @_; return $$self{isProtected}; } sub getExpansion { my ($self) = @_; if (!ref $$self{expansion}) { # Tokenization DEFERRED till actually used (shaves > 5%) require LaTeXML::Package; # make sure present, but no imports $$self{expansion} = LaTeXML::Package::TokenizeInternal($$self{expansion}); } return $$self{expansion}; } # Expand the expandable control sequence. This should be carried out by the Gullet. sub invoke { my ($self, $gullet) = @_; return $self->doInvocation($gullet, $self->readArguments($gullet)); } sub doInvocation { my ($self, $gullet, @args) = @_; my $expansion = $self->getExpansion; my $r; my $profiled = $STATE->lookupValue('PROFILING') && ($LaTeXML::CURRENT_TOKEN || $$self{cs}); LaTeXML::Core::Definition::startProfiling($profiled) if $profiled; my @result; if ($STATE->lookupValue('TRACINGMACROS')) { # More involved... if (ref $expansion eq 'CODE') { # Harder to emulate \tracingmacros here. @result = &$expansion($gullet, @args); print STDERR "\n" . ToString($self->getCSName) . ' ==> ' . tracetoString(Tokens(@result)) . "\n"; my $i = 1; foreach my $arg (@args) { print STDERR '#' . $i++ . '<-' . ToString($arg) . "\n"; } } else { # for "real" macros, make sure all args are Tokens my @targs = map { $_ && (($r = ref $_) && ($r eq 'LaTeXML::Core::Tokens') ? $_ : ($r && ($r eq 'LaTeXML::Core::Token') ? Tokens($_) : Tokens(Revert($_)))) } @args; print STDERR "\n" . ToString($self->getCSName) . ' -> ' . tracetoString($expansion) . "\n"; my $i = 1; foreach my $arg (@targs) { print STDERR '#' . $i++ . '<-' . ToString($arg) . "\n"; } @result = substituteTokens($expansion, @targs); } } else { @result = (ref $expansion eq 'CODE' ? &$expansion($gullet, @args) : substituteTokens($expansion, map { $_ && (($r = ref $_) && ($r eq 'LaTeXML::Core::Tokens') ? $_ : ($r && ($r eq 'LaTeXML::Core::Token') ? Tokens($_) : Tokens(Revert($_)))) } @args)); } # This would give (something like) "inclusive time" # LaTeXML::Core::Definition::stopProfiling($profiled) if $profiled; # This gives (something like) "exclusive time" # but requires dubious Gullet support! push(@result, T_MARKER($profiled)) if $profiled; return @result; } # print a string of tokens like TeX would when tracing. sub tracetoString { my ($tokens) = @_; return join('', map { ($_->getCatcode == CC_CS ? $_->getString . ' ' : $_->getString) } $tokens->unlist); } # NOTE: Assumes $tokens is a Tokens list of Token's and each arg either undef or also Tokens # Using inline accessors on those assumptions sub substituteTokens { my ($tokens, @args) = @_; my @in = @{$tokens}; # ->unlist my @result = (); while (@in) { my $token; if (($token = shift(@in))->[1] != CC_PARAM) { # Non '#'; copy it push(@result, $token); } elsif (($token = shift(@in))->[1] != CC_PARAM) { # Not multiple '#'; read arg. if (my $arg = $args[ord($$token[0]) - ord('0') - 1]) { push(@result, @$arg); } } # ->unlist, assuming it's a Tokens() !!! else { # Duplicated '#', copy 2nd '#' push(@result, $token); } } return @result; } sub equals { my ($self, $other) = @_; return (defined $other && (ref $self) eq (ref $other)) && Equals($self->getParameters, $other->getParameters) && Equals($self->getExpansion, $other->getExpansion); } #====================================================================== 1; __END__ =pod =head1 NAME C - Expandable Control sequence definitions. =head1 DESCRIPTION These represent macros and other expandable control sequences that are carried out in the Gullet during expansion. The results of invoking an C should be a list of Cs. See L for the most convenient means to create Expandables. It extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition/Primitive.pm����������������������������������������������0000644�0001750�0001750�00000010177�12507513572�022355� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Definition::Primitive | # # | Representation of definitions of Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Definition::Primitive; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use base qw(LaTeXML::Core::Definition); # Known traits: # isPrefix : whether this primitive is a TeX prefix, \global, etc. sub new { my ($class, $cs, $parameters, $replacement, %traits) = @_; # Could conceivably have $replacement being a List or Box? my $source = $STATE->getStomach->getGullet->getMouth; Fatal('misdefined', $cs, $source, "Primitive replacement for '" . ToString($cs) . "' is not CODE", "Replacement is $replacement") unless ref $replacement eq 'CODE'; return bless { cs => $cs, parameters => $parameters, replacement => $replacement, locator => "from " . $source->getLocator(-1), %traits }, $class; } sub isPrefix { my ($self) = @_; return $$self{isPrefix}; } sub executeBeforeDigest { my ($self, $stomach) = @_; local $LaTeXML::Core::State::UNLOCKED = 1; my $pre = $$self{beforeDigest}; return ($pre ? map { &$_($stomach) } @$pre : ()); } sub executeAfterDigest { my ($self, $stomach, @whatever) = @_; local $LaTeXML::Core::State::UNLOCKED = 1; my $post = $$self{afterDigest}; return ($post ? map { &$_($stomach, @whatever) } @$post : ()); } # Digest the primitive; this should occur in the stomach. sub invoke { my ($self, $stomach) = @_; my $profiled = $STATE->lookupValue('PROFILING') && ($LaTeXML::CURRENT_TOKEN || $$self{cs}); LaTeXML::Core::Definition::startProfiling($profiled) if $profiled; if ($STATE->lookupValue('TRACINGCOMMANDS')) { print STDERR '{' . $self->getCSName . "}\n"; } my @result = ( $self->executeBeforeDigest($stomach), &{ $$self{replacement} }($stomach, $self->readArguments($stomach->getGullet)), $self->executeAfterDigest($stomach)); LaTeXML::Core::Definition::stopProfiling($profiled) if $profiled; return @result; } sub equals { my ($self, $other) = @_; return (defined $other && (ref $self) eq (ref $other)) && Equals($self->getParameters, $other->getParameters) && Equals($$self{replacement}, $$other{replacement}); } #=============================================================================== 1; __END__ =pod =head1 NAME C - Primitive Control sequence definitions. =head1 DESCRIPTION These represent primitive control sequences that are converted directly to Boxes or Lists containing basic Unicode content, rather than structured XML, or those executed for side effect during digestion in the L, changing the L. The results of invoking a C, if any, should be a list of digested items (C, C or C). It extends L. Primitive definitions may have lists of daemon subroutines, C and C, that are executed before (and before the arguments are read) and after digestion. These should either end with C, C<()>, or return a list of digested objects (L, etc) that will be contributed to the current list. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Definition/Register.pm�����������������������������������������������0000644�0001750�0001750�00000007603�12507513572�022171� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Definition::Register | # # | Representation of definitions of Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Definition::Register; use strict; use warnings; use LaTeXML::Global; use base qw(LaTeXML::Core::Definition::Primitive); # Known Traits: # registerType : the type of register (a LaTeXML class) # getter : a sub to get the value (essentially required) # setter : a sub to set the value (estentially required) # beforeDigest, afterDigest : code for before/after digestion daemons # readonly : whether this register can only be read sub new { my ($class, $cs, $parameters, %traits) = @_; return bless { cs => $cs, parameters => $parameters, locator => "from " . $STATE->getStomach->getGullet->getMouth->getLocator(-1), %traits }, $class; } sub isPrefix { return 0; } sub isRegister { my ($self) = @_; return $$self{registerType}; } sub isReadonly { my ($self) = @_; return $$self{readonly}; } sub valueOf { my ($self, @args) = @_; return &{ $$self{getter} }(@args); } sub setValue { my ($self, $value, @args) = @_; &{ $$self{setter} }($value, @args); return; } # No before/after daemons ??? # (other than afterassign) sub invoke { my ($self, $stomach) = @_; my $profiled = $STATE->lookupValue('PROFILING') && ($LaTeXML::CURRENT_TOKEN || $$self{cs}); LaTeXML::Core::Definition::startProfiling($profiled) if $profiled; my $gullet = $stomach->getGullet; my @args = $self->readArguments($gullet); $gullet->readKeyword('='); # Ignore my $value = $gullet->readValue($self->isRegister); $self->setValue($value, @args); # Tracing ? if (my $after = $STATE->lookupValue('afterAssignment')) { $STATE->assignValue(afterAssignment => undef, 'global'); $gullet->unread($after); } # primitive returns boxes, so these need to be digested! LaTeXML::Core::Definition::stopProfiling($profiled) if $profiled; return; } #=============================================================================== 1; __END__ =pod =head1 NAME C - Control sequence definitions for Registers. =head1 DESCRIPTION These are set up as a speciallized primitive with a getter and setter to access and store values in the Stomach. See L for the most convenient means to create them. It extends L. Registers generally store some value in the current C, but are not required to. Like TeX's registers, when they are digested, they expect an optional C<=>, and then a value of the appropriate type. Register definitions support these additional methods: =head1 Methods =over 4 =item C<< $value = $register->valueOf(@args); >> Return the value associated with the register, by invoking it's C function. The additional args are used by some registers to index into a set, such as the index to C<\count>. =item C<< $register->setValue($value,@args); >> Assign a value to the register, by invoking it's C function. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut �����������������������������������������������������������������������������������������������������������������������������latexml-0.8.1/lib/LaTeXML/Core/Document.pm����������������������������������������������������������0000644�0001750�0001750�00000246660�12507513572�020103� 0����������������������������������������������������������������������������������������������������ustar �norbert�������������������������norbert����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# /=====================================================================\ # # | LaTeXML::Core::Document | # # | Constructs the Document from digested material | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Document; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Core::List; use LaTeXML::Common::Error; use LaTeXML::Common::XML; use LaTeXML::Util::Radix; use Unicode::Normalize; use base qw(LaTeXML::Common::Object); #********************************************************************** # These two element names are `leaks' of the document structure into # the Core of LaTeXML... In principle, we should be more abstract! our $FONT_ELEMENT_NAME = "ltx:text"; our $MATH_TOKEN_NAME = "ltx:XMTok"; #********************************************************************** # [could conceivable make more sense to let the Stomach create the Document?] # Mystery attributes: # font : # Probably should keep font ONLY in extra properties, # THEN once complete, compute the relative font at each node that accepts # a font attribute, and add the attribute. # locator : a box sub new { my ($class, $model) = @_; my $doc = XML::LibXML::Document->new("1.0", "UTF-8"); # We'll set the DocType when the 1st Element gets added. return bless { document => $doc, node => $doc, model => $model, idstore => {}, labelstore => {}, node_fonts => {}, node_boxes => {}, node_properties => {}, pending => [], progress => 0 }, $class; } #********************************************************************** # Basic Accessors # This will be a node of type XML_DOCUMENT_NODE sub getDocument { my ($self) = @_; return $$self{document}; } sub getModel { my ($self) = @_; return $$self{model}; } # Get the node representing the current insertion point. # The node will have nodeType of # XML_DOCUMENT_NODE if the document is empty, so far. # XML_ELEMENT_NODE for normal elements. # XML_TEXT_NODE if the last insertion was text # The other node types will not appear here. sub getNode { my ($self) = @_; return $$self{node}; } sub setNode { my ($self, $node) = @_; my $type = $node->nodeType; if ($type == XML_DOCUMENT_FRAG_NODE) { # Whoops my @n = $node->childNodes; if (@n > 1) { Error('unexpected', 'multiple-nodes', $self, "Cannot set insertion point to a DOCUMENT_FRAG_NODE", Stringify($node)); } elsif (@n < 1) { Error('unexpected', 'empty-nodes', $self, "Cannot set insertion point to an empty DOCUMENT_FRAG_NODE"); } $node = $n[0]; } $$self{node} = $node; return; } sub getLocator { my ($self, @args) = @_; if (my $box = $self->getNodeBox($$self{node})) { return $box->getLocator(@args); } else { return 'EOF?'; } } # well? # Get the element at (or containing) the current insertion point. sub getElement { my ($self) = @_; my $node = $$self{node}; $node = $node->parentNode if $node->getType == XML_TEXT_NODE; return ($node->getType == XML_DOCUMENT_NODE ? undef : $node); } # Get the child elements of the given $node sub getChildElements { my ($self, $node) = @_; return grep { $_->nodeType == XML_ELEMENT_NODE } $node->childNodes; } # Get the last element node (if any) in $node sub getLastChildElement { my ($self, $node) = @_; if ($node->hasChildNodes) { my $n = $node->lastChild; while ($n && $n->nodeType != XML_ELEMENT_NODE) { $n = $node->previousSibling; } return $n; } } # get the first element node (if any) in $node sub getFirstChildElement { my ($self, $node) = @_; if ($node->hasChildNodes) { my $n = $node->firstChild; while ($n && $n->nodeType != XML_ELEMENT_NODE) { $n = $n->nextSibling; } return $n; } return; } # Find the nodes according to the given $xpath expression, # the xpath is relative to $node (if given), otherwise to the document node. sub findnodes { my ($self, $xpath, $node) = @_; return $$self{model}->getXPath->findnodes($xpath, ($node || $$self{document})); } # Like findnodes, but only returns the first matched node sub findnode { my ($self, $xpath, $node) = @_; my @nodes = $$self{model}->getXPath->findnodes($xpath, ($node || $$self{document})); return (@nodes ? $nodes[0] : undef); } # Get the node's qualified name in standard form # Ie. using the registered prefix for that namespace. # NOTE: Reconsider how _Capture_ & _WildCard_ should be integrated!?! # NOTE: Should Deprecate! (use model) sub getNodeQName { my ($self, $node) = @_; return $$self{model}->getNodeQName($node); } #********************************************************************** # Extensions of Model. sub canContain { my ($self, $tag, $child) = @_; my $model = $$self{model}; $tag = $model->getNodeQName($tag) if ref $tag; # In case tag is a node. $child = $model->getNodeQName($child) if ref $child; # In case child is a node. return $model->canContain($tag, $child); } # Can an element with (qualified name) $tag contain a $childtag element indirectly? # That is, by openning some number of autoOpen'able tags? # And if so, return the tag to open. sub canContainIndirect { my ($self, $tag, $child) = @_; my $model = $$self{model}; $tag = $model->getNodeQName($tag) if ref $tag; # In case tag is a node. $child = $model->getNodeQName($child) if ref $child; # In case child is a node. # $imodel{$tag}{$child} => $intermediate || $child my $imodel = $STATE->lookupValue('INDIRECT_MODEL'); if (!$imodel) { $imodel = $self->computeIndirectModel(); $STATE->assignValue(INDIRECT_MODEL => $imodel, 'global'); } return $$imodel{$tag}{$child}; } # The indirect model includes all elements allowed as direct children, # and all descendents of a node that can be inserted after autoOpen'ing intermediate elements. # This model therefor includes information from the Schema, as well as # autoOpen information that may be introduced in binding files. # [Thus it should NOT be modifying the Model object, which may cover several documents in Daemon] sub computeIndirectModel { my ($self) = @_; my $model = $$self{model}; my $imodel = {}; # Determine any indirect paths to each descendent via an `autoOpen-able' tag. foreach my $tag ($model->getTags) { local %::DESC = (); computeIndirectModel_aux($model, $tag, ''); $$imodel{$tag} = {%::DESC}; } # PATCHUP if ($$model{permissive}) { # !!! Alarm!!! $$imodel{'#Document'}{'#PCDATA'} = 'ltx:p'; } return $imodel; } sub computeIndirectModel_aux { my ($model, $tag, $start) = @_; my $x; foreach my $kid ($model->getTagContents($tag)) { next if $::DESC{$kid}; # already seen $::DESC{$kid} = $start if $start; if (($kid ne '#PCDATA') && ($x = $STATE->lookupMapping('TAG_PROPERTIES', $kid)) && $$x{autoOpen}) { computeIndirectModel_aux($model, $kid, $start || $kid); } } return; } sub canContainSomehow { my ($self, $tag, $child) = @_; my $model = $$self{model}; $tag = $model->getNodeQName($tag) if ref $tag; # In case tag is a node. $child = $model->getNodeQName($child) if ref $child; # In case child is a node. return $model->canContain($tag, $child) || $self->canContainIndirect($tag, $child); } sub canHaveAttribute { my ($self, $tag, $attrib) = @_; my $model = $$self{model}; $tag = $model->getNodeQName($tag) if ref $tag; # In case tag is a node. return $model->canHaveAttribute($tag, $attrib); } sub canAutoOpen { my ($self, $tag) = @_; if (my $props = $STATE->lookupMapping('TAG_PROPERTIES', $tag)) { return $$props{autoClose}; } } # Dirty little secrets: # You can generically allow an element to autoClose using Tag. # OR you can indicate a specific node can autoClose, or forbid it, using # the _autoclose or _noautoclose attributes! sub canAutoClose { my ($self, $node) = @_; my $t = $node->nodeType; my $model = $$self{model}; my $props; return ($t == XML_TEXT_NODE) || ($t == XML_COMMENT_NODE) # text or comments auto close || (($t == XML_ELEMENT_NODE) # otherwise must be element && !$node->getAttribute('_noautoclose') # without _noautoclose && ($node->getAttribute('_autoclose') # and either with _autoclose # OR it has autoClose set on tag properties || (($props = $STATE->lookupMapping('TAG_PROPERTIES', $self->getNodeQName($node))) && $$props{autoClose}))); } # get the actions that should be performed on afterOpen or afterClose sub getTagActionList { my ($self, $tag, $when) = @_; $tag = $$self{model}->getNodeQName($tag) if ref $tag; # In case tag is a node. my ($p, $n) = (undef, $tag); if ($tag =~ /^([^:]+):(.+)$/) { ($p, $n) = ($1, $2); } my $when0 = $when . ':early'; my $when1 = $when . ':late'; my $taghash = $STATE->lookupMapping('TAG_PROPERTIES', $tag) || {}; my $nshash = ((defined $p) && $STATE->lookupMapping('TAG_PROPERTIES', $p . ':*')) || {}; my $allhash = $STATE->lookupMapping('TAG_PROPERTIES', '*') || {}; my $v; return ( (($v = $$taghash{$when0}) ? @$v : ()), (($v = $$nshash{$when0}) ? @$v : ()), (($v = $$allhash{$when0}) ? @$v : ()), (($v = $$taghash{$when}) ? @$v : ()), (($v = $$nshash{$when}) ? @$v : ()), (($v = $$allhash{$when}) ? @$v : ()), (($v = $$taghash{$when1}) ? @$v : ()), (($v = $$nshash{$when1}) ? @$v : ()), (($v = $$allhash{$when1}) ? @$v : ()), ); } #********************************************************************** # This is a diagnostic tool that MIGHT help locate XML::LibXML bugs; # It simply walks through the document tree. Use it before and after # places where some sort of data corruption might have taken place. sub doctest { my ($self, $when, $severe) = @_; local $LaTeXML::NNODES = 0; print STDERR "\nSTART DOC TEST $when....." . ($severe ? "\n" : ''); if (my $root = $self->getDocument->documentElement) { $self->doctest_rec(undef, $root, $severe); } print STDERR "...(" . $LaTeXML::NNODES . " nodes)....DONE\n"; return; } sub doctest_rec { my ($self, $parent, $node, $severe) = @_; # Check consistency of document, parent & type, before proceeding $self->doctest_head($parent, $node, $severe); my $type = $node->nodeType; if ($type == XML_ELEMENT_NODE) { print STDERR "ELEMENT " . join(' ', "<" . $$self{model}->getNodeQName($node), (map { $_->nodeName . '="' . $_->getValue . '"' } $node->attributes)) . ">\n" if $severe; $self->doctest_children($node, $severe); } elsif ($type == XML_ATTRIBUTE_NODE) { print STDERR "ATTRIBUTE " . $node->nodeName . "=>" . $node->getValue . "\n" if $severe; } elsif ($type == XML_TEXT_NODE) { print STDERR "TEXT " . $node->textContent . "\n" if $severe; } elsif ($type == XML_CDATA_SECTION_NODE) { print STDERR "CDATA " . $node->textContent . "\n" if $severe; } # elsif($type == XML_ENTITY_REF_NODE){} # elsif($type == XML_ENTITY_NODE){} elsif ($type == XML_PI_NODE) { print STDERR "PI " . $node->localname . " " . $node->getData . "\n" if $severe; } elsif ($type == XML_COMMENT_NODE) { print STDERR "COMMENT " . $node->textContent . "\n" if $severe; } # elsif($type == XML_DOCUMENT_NODE){} # elsif($type == XML_DOCUMENT_TYPE_NODE){ elsif ($type == XML_DOCUMENT_FRAG_NODE) { print STDERR "DOCUMENT_FRAG \n" if $severe; $self->doctest_children($node, $severe); } # elsif($type == XML_NOTATION_NODE){} # elsif($type == XML_HTML_DOCUMENT_NODE){} # elsif($type == XML_DTD_NODE){} else { print STDERR "OTHER $type\n" if $severe; } return; } sub doctest_head { my ($self, $parent, $node, $severe) = @_; # Check consistency of document, parent & type, before proceeding print STDERR " NODE $$node [" if $severe; # BEFORE checking nodeType! print STDERR "d" if $severe; if (!$node->ownerDocument->isSameNode($self->getDocument)) { print STDERR "!" if $severe; } print STDERR "p" if $severe; if ($parent && !$node->parentNode->isSameNode($parent)) { print STDERR "!" if $severe; } print STDERR "t" if $severe; my $type = $node->nodeType; print STDERR "] " if $severe; return; } sub doctest_children { my ($self, $node, $severe) = @_; print STDERR "[fc" if $severe; my $c = $node->firstChild; while ($c) { print STDERR "]\n" if $severe; $self->doctest_rec($node, $c, $severe); print STDERR "[nc" if $severe; $c = $c->nextSibling; } print STDERR "]done\n" if $severe; return; } #********************************************************************** # This should be called before returning the final XML::LibXML::Document to the # outside world. It resolves the fonts for each node relative to it's ancestors. # It removes the `helper' attributes that store fonts, source box, etc. sub finalize { my ($self) = @_; $self->pruneXMDuals; if (my $root = $self->getDocument->documentElement) { local $LaTeXML::FONT = $self->getNodeFont($root); $self->finalize_rec($root); set_RDFa_prefixes($self->getDocument, $STATE->lookupValue('RDFa_prefixes')); } return $$self{document}; } sub finalize_rec { my ($self, $node) = @_; my $model = $$self{model}; my $qname = $model->getNodeQName($node); my $declared_font = $LaTeXML::FONT; my $desired_font = $LaTeXML::FONT; my %pending_declaration = (); if (my $comment = $node->getAttribute('_pre_comment')) { $node->parentNode->insertBefore(XML::LibXML::Comment->new($comment), $node); } if (my $comment = $node->getAttribute('_comment')) { $node->parentNode->insertAfter(XML::LibXML::Comment->new($comment), $node); } if (my $font_attr = $node->getAttribute('_font')) { $desired_font = $$self{node_fonts}{$font_attr}; %pending_declaration = $desired_font->relativeTo($declared_font); if (($node->hasChildNodes || $node->getAttribute('_force_font')) && scalar(keys %pending_declaration)) { foreach my $attr (keys %pending_declaration) { if ($model->canHaveAttribute($qname, $attr)) { $self->setAttribute($node, $attr => $pending_declaration{$attr}{value}); # Merge to set the font currently in effect $declared_font = $declared_font->merge(%{ $pending_declaration{$attr}{properties} }); delete $pending_declaration{$attr}; } } } } local $LaTeXML::FONT = $declared_font; foreach my $child ($node->childNodes) { my $type = $child->nodeType; if ($type == XML_ELEMENT_NODE) { my $was_forcefont = $child->getAttribute('_force_font'); $self->finalize_rec($child); # Also check if child is $FONT_ELEMENT_NAME AND has no attributes # AND providing $node can contain that child's content, we'll collapse it. if (($model->getNodeQName($child) eq $FONT_ELEMENT_NAME) && !$was_forcefont && !$child->hasAttributes) { my @grandchildren = $child->childNodes; if (!grep { !$self->canContain($qname, $_) } @grandchildren) { $self->replaceNode($child, @grandchildren); } } } # On the other hand, if the font declaration has NOT been effected, # We'll need to put an extra wrapper around the text! elsif ($type == XML_TEXT_NODE) { if ($self->canContain($qname, $FONT_ELEMENT_NAME) && scalar(keys %pending_declaration)) { # Too late to do wrapNodes? my $text = $self->wrapNodes($FONT_ELEMENT_NAME, $child); foreach my $attr (keys %pending_declaration) { $self->setAttribute($text, $attr => $pending_declaration{$attr}{value}); } $self->finalize_rec($text); # Now have to clean up the new node! } } } # Attributes that begin with (the semi-legal) "_" are for Bookkeeping. # Remove them now. foreach my $attr ($node->attributes) { my $n = $attr->nodeName; $node->removeAttribute($n) if $n =~ /^_/; } return; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Document construction at the Current Insertion Point. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% #********************************************************************** # absorb the given $box into the DOM (called from constructors). # This will return a list of whatever nodes were created. # Note that this may include nodes that are children of other nodes in the list # or nodes that are no longer in the document. # Also, note that when a text nodes is appended to, the complete text node is in the list, # not just the portion that was added. # [Note that recording the nodes being constructed isn't all that costly, # but filtering them for parent/child relations IS, particularly since it usually isn't needed] # # A $box that is a Box, or List, or Whatsit, is responsible for carrying out # its own insertion, but it should ultimately call methods of Document # that will record the nodes that were created. # $box can also be a plain string which will be inserted according to whatever # font, mode, etc, are in %props. sub absorb { my ($self, $box, %props) = @_; # Nothing? Skip it if (!defined $box) { return; } # A Proper Box or List or Whatsit? It will handle it. elsif (ref $box) { local $LaTeXML::BOX = $box; # [ATTEMPT to] only record if we're running in NON-VOID context. # [but wantarray seems defined MUCH more than I would have expected!?] if ($LaTeXML::RECORDING_CONSTRUCTION || defined wantarray) { my @n = (); { local $LaTeXML::RECORDING_CONSTRUCTION = 1; local @LaTeXML::CONSTRUCTED_NODES = (); $box->beAbsorbed($self); @n = @LaTeXML::CONSTRUCTED_NODES; } # These were created just now map { $self->recordConstructedNode($_) } @n; # record these for OUTER caller! return @n; } # but return only the most recent set. else { return $box->beAbsorbed($self); } } # Else, plain string in text mode. elsif (!$props{isMath}) { return $self->openText($box, $props{font} || ($LaTeXML::BOX && $LaTeXML::BOX->getFont)); } # Or plain string in math mode. # Note text nodes can ONLY appear in or !!! # Have we already opened an XMTok? Then insert into it. elsif ($$self{model}->getNodeQName($$self{node}) eq $MATH_TOKEN_NAME) { return $self->openMathText_internal($box); } # Else create the XMTok now. else { # Odd case: constructors that work in math & text can insert raw strings in Math mode. return $self->insertMathToken($box, font => $props{font}); } } # Note that a box has been absorbed creating $node; # This does book keeping so that we can return the sequence of nodes # that were added by absorbing material. sub recordConstructedNode { my ($self, $node) = @_; if ((defined $LaTeXML::RECORDING_CONSTRUCTION) # If we're recording! && (!@LaTeXML::CONSTRUCTED_NODES # and this node isn't already recorded || !$node->isSameNode($LaTeXML::CONSTRUCTED_NODES[-1]))) { push(@LaTeXML::CONSTRUCTED_NODES, $node); } return; } sub filterDeletions { my ($self, @nodes) = @_; my $doc = $$self{document}; # This test seems to successfully determine inclusion, # without requiring the (dangerous? & dubious?) unbindNode to be used. return grep { isDescendantOrSelf($_, $doc) } @nodes; } # Given a list of nodes such as from ->absorb, # filter out all the nodes that are children of other nodes in the list. sub filterChildren { my ($self, @node) = @_; # return @node; # return (); return () unless @node; my @n = (shift(@node)); foreach my $n (@node) { push(@n, $n) unless grep { isDescendantOrSelf($n, $_); } @n; } return @n; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Shorthand for open,absorb,close, but returns the new node. sub insertElement { my ($self, $qname, $content, %attrib) = @_; my $node = $self->openElement($qname, %attrib); if (ref $content eq 'ARRAY') { map { $self->absorb($_) } @$content; } elsif (defined $content) { $self->absorb($content); } # In obscure situations, $node may have already gotten closed? # close it if it is still open. my $c = $$self{node}; while ($c && ($c->nodeType != XML_DOCUMENT_NODE) && !$c->isSameNode($node)) { $c = $c->parentNode; } if ($c->isSameNode($node)) { $self->closeElement($qname); } return $node; } sub insertMathToken { my ($self, $string, %attributes) = @_; $attributes{role} = 'UNKNOWN' unless $attributes{role}; my $node = $self->openElement($MATH_TOKEN_NAME, %attributes); my $box = $attributes{_box} || $LaTeXML::BOX; my $font = $attributes{font} || $box->getFont; $self->setNodeFont($node, $font); $self->setNodeBox($node, $box); $self->openMathText_internal($string) if defined $string; $self->closeNode_internal($node); # Should be safe. return $node; } # Insert a new comment, or append to previous comment. # Does NOT move the current insertion point to the Comment, # but may move up past a text node. sub insertComment { my ($self, $text) = @_; chomp($text); $text =~ s/\-\-+/__/g; $self->closeText_internal; # Close any open text node. my $comment; if ($$self{node}->nodeType == XML_DOCUMENT_NODE) { push(@{ $$self{pending} }, $comment = $$self{document}->createComment(' ' . $text . ' ')); } elsif (($comment = $$self{node}->lastChild) && ($comment->nodeType == XML_COMMENT_NODE)) { $comment->setData($comment->data . "\n " . $text); } else { $comment = $$self{node}->appendChild($$self{document}->createComment(' ' . $text . ' ')); } return $comment; } # Insert a ProcessingInstruction of the form # Does NOT move the current insertion point to the PI, # but may move up past a text node. sub insertPI { my ($self, $op, %attrib) = @_; # We'll just put these on the document itself. # Put these in an attractive order, main "operator" first my @keys = ((map { ($attrib{$_} ? ($_) : ()) } qw(class package options)), (grep { $_ !~ /^(?:class|package|options)$/ } sort keys %attrib)); my $data = join(' ', map { $_ . "=\"" . ToString($attrib{$_}) . "\"" } @keys); my $pi = $$self{document}->createProcessingInstruction($op, $data); $self->closeText_internal; # Close any open text node if ($$self{node}->nodeType == XML_DOCUMENT_NODE) { push(@{ $$self{pending} }, $pi); } else { $$self{document}->insertBefore($pi, $$self{document}->documentElement); } return $pi; } #********************************************************************** # Middle level, mostly public, API. # Handlers for various construction operations. # General naming: 'open' opens a node at current pos and sets it to current, # 'close' closes current node(s), inserts opens & closes, ie. w/o moving current # Tricky: Insert some text in a particular font. # We need to find the current effective -- being the closest _declared_ font, # (ie. it will appear in the elements attributes). We may also want # to open/close some elements in such a way as to minimize the font switchiness. # I guess we should only open/close "text" elements, though. # [Actually, we'd like the user to _declare_ what element to use.... # I don't like having "text" built in here! # AND, we've assumed that "font" names the relevant attribute!!!] sub openText { my ($self, $text, $font) = @_; my $node = $$self{node}; my $t = $node->nodeType; return if $text =~ /^\s+$/ && (($t == XML_DOCUMENT_NODE) # Ignore initial whitespace || (($t == XML_ELEMENT_NODE) && !$self->canContain($node, '#PCDATA'))); return if $font->getFamily eq 'nullfont'; print STDERR "Insert text \"$text\" /" . Stringify($font) . " at " . Stringify($node) . "\n" if $LaTeXML::Core::Document::DEBUG; if (($t != XML_DOCUMENT_NODE) # If not at document begin && !(($t == XML_TEXT_NODE) && # And not appending text in same font. ($font->distance($self->getNodeFont($node->parentNode)) == 0))) { # then we'll need to do some open/close to get fonts matched. $node = $self->closeText_internal; # Close text node, if any. my ($bestdiff, $closeto) = (99999, $node); my $n = $node; while ($n->nodeType != XML_DOCUMENT_NODE) { my $d = $font->distance($self->getNodeFont($n)); #print STDERR "Font Compare: ".Stringify($n)." w/font=".Stringify($self->getNodeFont($n))." ==>$d\n"; if ($d < $bestdiff) { $bestdiff = $d; $closeto = $n; last if ($d == 0); } last if ($$self{model}->getNodeQName($n) ne $FONT_ELEMENT_NAME) || $n->getAttribute('_noautoclose'); $n = $n->parentNode; } $self->closeToNode($closeto) if $closeto ne $node; # Move to best starting point for this text. $self->openElement($FONT_ELEMENT_NAME, font => $font, _fontswitch => 1) if $bestdiff > 0; # Open if needed. } # Finally, insert the darned text. my $tnode = $self->openText_internal($text); $self->recordConstructedNode($tnode); return $tnode; } # Mystery: # How to deal with font declarations? # font vs _font; either must redirect to Font object until they are relativized, at end. # When relativizing, should it depend on font attribute on element and/or DTD allowed attribute? sub openElement { my ($self, $qname, %attributes) = @_; NoteProgress('.') if ($$self{progress}++ % 25) == 0; print STDERR "Open element $qname at " . Stringify($$self{node}) . "\n" if $LaTeXML::Core::Document::DEBUG; my $point = $self->find_insertion_point($qname); $attributes{_box} = $LaTeXML::BOX unless $attributes{_box}; my $newnode = $self->openElementAt($point, $qname, _font => $attributes{font} || $attributes{_box}->getFont, %attributes); $self->setNode($newnode); return $newnode; } # Note: This closes the deepest open node of a given type. # This can cause problems with auto-opened nodes, esp. ones for fontswitches! # Since this is an "explicit request", we're currently skipping over those nodes, # ie. we're automatically closing them, even if they're the same type as we're asking to close!!! # This is kinda risky! Maybe we should try to request closing of specific nodes. sub closeElement { my ($self, $qname) = @_; print STDERR "Close element $qname at " . Stringify($$self{node}) . "\n" if $LaTeXML::Core::Document::DEBUG; $self->closeText_internal(); my ($node, @cant_close) = ($$self{node}); while ($node->nodeType != XML_DOCUMENT_NODE) { my $t = $$self{model}->getNodeQName($node); # autoclose until node of same name BUT also close nodes opened' for font switches! last if ($t eq $qname) && !(($t eq $FONT_ELEMENT_NAME) && $node->getAttribute('_fontswitch')); push(@cant_close, $node) unless $self->canAutoClose($node); $node = $node->parentNode; } if ($node->nodeType == XML_DOCUMENT_NODE) { # Didn't find $qname at all!! Error('malformed', $qname, $self, "Attempt to close " . ($qname eq '#PCDATA' ? $qname : '') . ", which isn't open", "Currently in " . $self->getInsertionContext); return; } else { # Found node. # Intervening non-auto-closeable nodes!! Error('malformed', $qname, $self, "Closing " . ($qname eq '#PCDATA' ? $qname : '') . " whose open descendents do not auto-close", "Descendents are " . join(', ', map { Stringify($_) } @cant_close)) if @cant_close; # So, now close up to the desired node. $self->closeNode_internal($node); return $node; } } # Check whether it is possible to open $qname at this point, # possibly by autoOpen'ing & autoClosing other tags. sub isOpenable { my ($self, $qname) = @_; my $node = $$self{node}; while ($node) { return 1 if $self->canContainSomehow($node, $qname); return 0 unless $self->canAutoClose($node); # could close, then check if parent can contain $node = $node->parentNode; } return 0; } # Check whether it is possible to close each element in @tags, # any intervening nodes must be autocloseable. # returning the last node that would be closed if it is possible, # otherwise undef. sub isCloseable { my ($self, @tags) = @_; my $node = $$self{node}; $node = $node->parentNode if $node->nodeType == XML_TEXT_NODE; while (my $qname = shift(@tags)) { while (1) { return if $node->nodeType == XML_DOCUMENT_NODE; my $this_qname = $$self{model}->getNodeQName($node); last if $this_qname eq $qname; return unless $self->canAutoClose($node); $node = $node->parentNode; } $node = $node->parentNode if @tags; } return $node; } # Close $qname, if it is closeable. sub maybeCloseElement { my ($self, $qname) = @_; if (my $node = $self->isCloseable($qname)) { $self->closeNode_internal($node); return $node; } } # This closes all nodes until $node becomes the current point. sub closeToNode { my ($self, $node, $ifopen) = @_; my $model = $$self{model}; my ($t, @cant_close) = (); my $n = $$self{node}; my $lastopen; # go up the tree from current node, till we find $node while ((($t = $n->getType) != XML_DOCUMENT_NODE) && !$n->isSameNode($node)) { push(@cant_close, $n) unless $self->canAutoClose($n); $lastopen = $n; $n = $n->parentNode; } if ($t == XML_DOCUMENT_NODE) { # Didn't find $node at all!! Error('malformed', $model->getNodeQName($node), $self, "Attempt to close " . Stringify($node) . ", which isn't open", "Currently in " . $self->getInsertionContext) unless $ifopen; return; } else { # Found node. Error('malformed', $model->getNodeQName($node), $self, "Closing " . Stringify($node) . " whose open descendents do not auto-close", "Descendents are " . join(', ', map { Stringify($_) } @cant_close)) if @cant_close; # But found has intervening non-auto-closeable nodes!! $self->closeNode_internal($lastopen) if $lastopen; } return; } # This closes all nodes until $node is closed. sub closeNode { my ($self, $node) = @_; my $model = $$self{model}; my ($t, @cant_close) = (); my $n = $$self{node}; while ((($t = $n->getType) != XML_DOCUMENT_NODE) && !$n->isSameNode($node)) { push(@cant_close, $n) unless $self->canAutoClose($n); $n = $n->parentNode; } if ($t == XML_DOCUMENT_NODE) { # Didn't find $qname at all!! Error('malformed', $model->getNodeQName($node), $self, "Attempt to close " . Stringify($node) . ", which isn't open", "Currently in " . $self->getInsertionContext); } else { # Found node. # Intervening non-auto-closeable nodes!! Error('malformed', $model->getNodeQName($node), $self, "Closing " . Stringify($node) . " whose open descendents do not auto-close", "Descendents are " . join(', ', map { Stringify($_) } @cant_close)) if @cant_close; $self->closeNode_internal($node); } return; } # Add the given attribute to the nearest node that is allowed to have it. sub addAttribute { my ($self, $key, $value) = @_; return unless defined $value; my $node = $$self{node}; $node = $node->parentNode if $node->nodeType == XML_TEXT_NODE; while (($node->nodeType != XML_DOCUMENT_NODE) && !$$self{model}->canHaveAttribute($node, $key)) { $node = $node->parentNode; } if ($node->nodeType == XML_DOCUMENT_NODE) { Error('malformed', $key, $self, "Attribute $key not allowed in this node or ancestors"); } else { $self->setAttribute($node, $key, $value); } return; } #********************************************************************** # Low level internal interface # Return a string indicating the path to the current insertion point in the document. # if $levels is defined, show only that many levels sub getInsertionContext { my ($self, $levels) = @_; my $node = $$self{node}; my $type = $node->nodeType; if (($type != XML_TEXT_NODE) && ($type != XML_ELEMENT_NODE) && ($type != XML_DOCUMENT_NODE)) { Error('internal', 'context', $self, "Insertion point is not an element, document or text: ", Stringify($node)); return; } my $path = Stringify($node); while ($node = $node->parentNode) { if ((defined $levels) && (--$levels <= 0)) { $path = '...' . $path; last; } $path = Stringify($node) . $path; } return $path; } # Find the node where an element with qualified name $qname can be inserted. # This will move up the tree (closing auto-closable elements), # or down (inserting auto-openable elements), as needed. sub find_insertion_point { my ($self, $qname) = @_; $self->closeText_internal; # Close any current text node. my $cur_qname = $$self{model}->getNodeQName($$self{node}); my $inter; # If $qname is allowed at the current point, we're done. if ($self->canContain($cur_qname, $qname)) { return $$self{node}; } # Else, if we can create an intermediate node that accepts $qname, we'll do that. elsif (($inter = $self->canContainIndirect($cur_qname, $qname)) && ($inter ne $qname) && ($inter ne $cur_qname)) { $self->openElement($inter, font => $self->getNodeFont($$self{node})); return $self->find_insertion_point($qname); } # And retry insertion (should work now). else { # Now we're getting more desparate... # Check if we can auto close some nodes, and _then_ insert the $qname. my ($node, $closeto) = ($$self{node}); while (($node->nodeType != XML_DOCUMENT_NODE) && $self->canAutoClose($node)) { my $parent = $node->parentNode; if ($self->canContainSomehow($parent, $qname)) { $closeto = $node; last; } $node = $parent; } if ($closeto) { $self->closeNode_internal($closeto); # Close the auto closeable nodes. return $self->find_insertion_point($qname); } # Then retry, possibly w/auto open's else { # Didn't find a legit place. Error('malformed', $qname, $self, ($qname eq '#PCDATA' ? $qname : '<' . $qname . '>') . " isn't allowed here", "Currently in " . $self->getInsertionContext); return $$self{node}; } } } # But we'll do it anyway, unless Error => Fatal. sub getInsertionCandidates { my ($node) = @_; my @nodes = (); # Check the current element FIRST, then build list of candidates. my $first = $node; $first = $first->parentNode if $first && $first->getType == XML_TEXT_NODE; my $isCapture = $first && ($first->localname || '') eq '_Capture_'; push(@nodes, $first) if $first && $first->getType != XML_DOCUMENT_NODE && !$isCapture; $node = $node->lastChild if $node && $node->hasChildNodes; while ($node && ($node->nodeType != XML_DOCUMENT_NODE)) { my $n = $node; while ($n) { if (($n->localname || '') eq '_Capture_') { push(@nodes, element_nodes($n)); } else { push(@nodes, $n); } $n = $n->previousSibling; } $node = $node->parentNode; } push(@nodes, $first) if $isCapture; return @nodes; } # The following two "floatTo" operations find an appropriate point # within the document tree preceding the current insertion point. # They return undef (& issue a warning) if such a point cannot be found. # Otherwise, they move the current insertion point to the appropriate node, # and return the previous insertion point. # After you make whatever changes (insertions or whatever) to the tree, # you should do # $document->setNode($savenode) # to reset the insertion point to where it had been. # Find a node in the document that can contain an element $qname sub floatToElement { my ($self, $qname) = @_; my @candidates = getInsertionCandidates($$self{node}); while (@candidates && !$self->canContain($candidates[0], $qname)) { shift(@candidates); } if (my $n = shift(@candidates)) { my $savenode = $$self{node}; $self->setNode($n); print STDERR "Floating from " . Stringify($savenode) . " to " . Stringify($n) . " for $qname\n" if ($$savenode ne $$n) && $LaTeXML::Core::Document::DEBUG; return $savenode; } else { Warn('malformed', $qname, $self, "No open node can contain element '$qname'", $self->getInsertionContext()) unless $self->canContainSomehow($$self{node}, $qname); return; } } # Find a node in the document that can accept the attribute $key sub floatToAttribute { my ($self, $key) = @_; my @candidates = getInsertionCandidates($$self{node}); while (@candidates && !$self->canHaveAttribute($candidates[0], $key)) { shift(@candidates); } if (my $n = shift(@candidates)) { my $savenode = $$self{node}; $self->setNode($n); return $savenode; } else { Warn('malformed', $key, $self, "No open node can get attribute '$key'", $self->getInsertionContext()); return; } } sub openText_internal { my ($self, $text) = @_; my $qname; if ($$self{node}->nodeType == XML_TEXT_NODE) { # current node already is a text node. print STDERR "Appending text \"$text\" to " . Stringify($$self{node}) . "\n" if $LaTeXML::Core::Document::DEBUG; $$self{node}->appendData($text); } elsif (($text =~ /\S/) # If non space || $self->canContain($$self{node}, '#PCDATA')) { # or text allowed here my $point = $self->find_insertion_point('#PCDATA'); my $node = $$self{document}->createTextNode($text); print STDERR "Inserting text node for \"$text\" into " . Stringify($point) . "\n" if $LaTeXML::Core::Document::DEBUG; $point->appendChild($node); $self->setNode($node); } return $$self{node}; } # return the text node (current) # Question: Why do I have math ligatures handled within openMathText_internal, # but text ligatures handled within closeText_internal ??? sub openMathText_internal { my ($self, $string) = @_; # And if there's already text??? my $node = $$self{node}; my $font = $self->getNodeFont($node); $node->appendText($string); ##print STDERR "Trying Math Ligatures at \"$string\"\n"; $self->applyMathLigatures($node); return $node; } # New stategy (but inefficient): apply ligatures until one succeeds, # then remove it, and repeat until ALL (remaining) fail. sub applyMathLigatures { my ($self, $node) = @_; if (my $ligatures = $STATE->lookupValue('MATH_LIGATURES')) { my @ligatures = @$ligatures; while (@ligatures) { my $matched = 0; foreach my $ligature (@ligatures) { if ($self->applyMathLigature($node, $ligature)) { @ligatures = grep { $_ ne $ligature } @ligatures; $matched = 1; last; } } return unless $matched; } } return; } # Apply ligature operation to $node, presumed the last insertion into it's parent(?) sub applyMathLigature { my ($self, $node, $ligature) = @_; my @sibs = $node->parentNode->childNodes; my ($nmatched, $newstring, %attr) = &{ $$ligature{matcher} }($self, @sibs); if ($nmatched) { my @boxes = ($self->getNodeBox($node)); $node->firstChild->setData($newstring); for (my $i = 0 ; $i < $nmatched - 1 ; $i++) { my $remove = $node->previousSibling; unshift(@boxes, $self->getNodeBox($remove)); $self->removeNode($remove); } ## This fragment replaces the node's box by the composite boxes it replaces ## HOWEVER, this gets things out of sync because parent lists of boxes still ## have the old ones. Unless we could recursively replace all of them, we'd better skip it(??) if (scalar(@boxes) > 1) { $self->setNodeBox($node, List(@boxes, mode => 'math')); } foreach my $key (sort keys %attr) { my $value = $attr{$key}; if (defined $value) { $node->setAttribute($key => $value); } else { $node->removeAttribute($key); } } return 1; } else { return; } } # Closing a text node is a good time to apply regexps (aka. Ligatures) sub closeText_internal { my ($self) = @_; my $node = $$self{node}; if ($node->nodeType == XML_TEXT_NODE) { # Current node is text? my $parent = $node->parentNode; my $font = $self->getNodeFont($parent); my $string = $node->data; my $ostring = $string; my $fonttest; if (my $ligatures = $STATE->lookupValue('TEXT_LIGATURES')) { foreach my $ligature (@$ligatures) { next if ($fonttest = $$ligature{fontTest}) && !&$fonttest($font); $string = &{ $$ligature{code} }($string); } } $node->setData($string) unless $string eq $ostring; $self->setNode($parent); # Now, effectively Closed return $parent; } else { return $node; } } # Close $node, and any current nodes below it. # No checking! Use this when you've already verified that $node can be closed. # and, of course, $node must be current or some ancestor of it!!! sub closeNode_internal { my ($self, $node) = @_; my $closeto = $node->parentNode; # Grab now in case afterClose screws the structure. my $n = $self->closeText_internal; # Close any open text node. while ($n->nodeType == XML_ELEMENT_NODE) { $self->closeElementAt($n); $self->autoCollapseChildren($n); last if $node->isSameNode($n); $n = $n->parentNode; } $self->setNode($closeto); # $self->autoCollapseChildren($node); return $$self{node}; } # Avoid redundant nesting of font switching elements: # If we're closing a node that can take font switches and it contains # a single FONT_ELEMENT_NAME node; pull it up. sub autoCollapseChildren { my ($self, $node) = @_; my $model = $$self{model}; my $qname = $model->getNodeQName($node); my @c; if ((scalar(@c = $node->childNodes) == 1) # with single child && ($model->getNodeQName($c[0]) eq $FONT_ELEMENT_NAME) # AND, $node can have all the attributes that the child has (but at least 'font') && !(grep { !$model->canHaveAttribute($qname, $_) } ('font', grep { /^[^_]/ } map { $_->nodeName } $c[0]->attributes)) # BUT, it isn't being forced somehow && !$c[0]->hasAttribute('_force_font')) { my $c = $c[0]; $self->setNodeFont($node, $self->getNodeFont($c)); $self->removeNode($c); foreach my $gc ($c->childNodes) { $node->appendChild($gc); } # Merge the attributes from the child onto $node foreach my $attr ($c->attributes()) { if ($attr->nodeType == XML_ATTRIBUTE_NODE) { my $key = $attr->nodeName; my $val = $attr->getValue; # Special case attributes if ($key eq 'xml:id') { # Use the replacement id if (!$node->hasAttribute($key)) { $val = $self->recordID($val, $node); $node->setAttribute($key, $val); } } elsif ($key eq 'class') { # combine $class if (my $class = $node->getAttribute($key)) { $node->setAttribute($key, $class . ' ' . $val); } else { $node->setAttribute($key, $val); } } # xoffset, yoffset, pad-width, pad-height should sum up, if present on both. elsif ($key =~ /^(xoffset|yoffset|pad-height|pad-width)$/) { if (my $val2 = $node->getAttribute($key)) { my $v1 = $val =~ /^([\+\-\d\.]*)pt$/ && $1; my $v2 = $val2 =~ /^([\+\-\d\.]*)pt$/ && $1; $node->setAttribute($key => ($v1 + $v2) . 'pt'); } else { $node->setAttribute($key => $val); } } # Remaining attributes should prefer the inner (child's) values, if any # (font, size, color, framed) # (width,height, depth, align, vattach, float) elsif (my $ns = $attr->namespaceURI) { $node->setAttributeNS($ns, $attr->name, $val); } else { $node->setAttribute($attr->localname, $val); } } } } return; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Document surgery (?) #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # The following carry out DOM modification but NOT relative to any current # insertion point (eg $$self{node}), but rather relative to nodes specified # in the arguments. # Set any allowed attribute on a node, decoding the prefix, if any. # Also records, and checks, any id attributes. # [xml:id and namespaced attributes are always allowed] sub setAttribute { my ($self, $node, $key, $value) = @_; $value = $value->toAttribute if ref $value; if ((defined $value) && ($value ne '')) { # Skip if `empty'; but 0 is OK! if ($key eq 'xml:id') { # If it's an ID attribute $value = $self->recordID($value, $node); # Do id book keeping $node->setAttributeNS($LaTeXML::Common::XML::XML_NS, 'id', $value); } # and bypass all ns stuff elsif ($key !~ /:/) { # No colon; no namespace (the common case!) # Ignore attributes not allowed by the model, # but accept "internal" attributes. my $model = $$self{model}; my $qname = $model->getNodeQName($node); if ($model->canHaveAttribute($qname, $key) || $key =~ /^_/) { $node->setAttribute($key => $value); } } else { # Accept any namespaced attributes my ($ns, $name) = $$self{model}->decodeQName($key); if ($ns) { # If namespaced attribute (must have prefix! my $prefix = $node->lookupNamespacePrefix($ns); # namespace already declared? if (!$prefix) { # if namespace not already declared $prefix = $$self{model}->getDocumentNamespacePrefix($ns, 1); # get the prefix to use $self->getDocument->documentElement->setNamespace($ns, $prefix, 0); } # and declare it if ($prefix eq '#default') { # Probably shouldn't happen...? $node->setAttribute($name => $value); } else { $node->setAttributeNS($ns, "$prefix:$name" => $value); } } else { $node->setAttribute($name => $value); } } } # redundant case... return; } #********************************************************************** # Association of nodes and ids (xml:id) sub recordID { my ($self, $id, $node) = @_; if (my $prev = $$self{idstore}{$id}) { # Whoops! Already assigned!!! # Can we recover? if (!$node->isSameNode($prev)) { my $badid = $id; $id = $self->modifyID($id); Info('malformed', 'id', $node, "Duplicated attribute xml:id", "Using id='$id' on " . Stringify($node), "id='$badid' already set on " . Stringify($prev)); } } $$self{idstore}{$id} = $node; return $id; } sub unRecordID { my ($self, $id) = @_; delete $$self{idstore}{$id}; return; } # These are used to record or unrecord, in bulk, all the ids within a node (tree). sub recordNodeIDs { my ($self, $node) = @_; foreach my $idnode ($self->findnodes('descendent-or-self::*[@xml:id]', $node)) { if (my $id = $idnode->getAttribute('xml:id')) { my $newid = $self->recordID($id, $idnode); $idnode->setAttribute('xml:id' => $newid) if $newid ne $id; } } return; } sub unRecordNodeIDs { my ($self, $node) = @_; foreach my $idnode ($self->findnodes('descendant-or-self::*[@xml:id]', $node)) { if (my $id = $idnode->getAttribute('xml:id')) { $self->unRecordID($id); } } return; } # Get a new, related, but unique id # Sneaky option: try $LaTeXML::Core::Document::ID_SUFFIX as a suffix for id, first. sub modifyID { my ($self, $id) = @_; if (my $prev = $$self{idstore}{$id}) { # Whoops! Already assigned!!! # Can we recover? my $badid = $id; if (!$LaTeXML::Core::Document::ID_SUFFIX || $$self{idstore}{ $id = $badid . $LaTeXML::Core::Document::ID_SUFFIX }) { foreach my $s1 (1 .. 26 * 26 * 26) { # Gotta give up, eventually; is 3 letters enough? return $id unless $$self{idstore}{ $id = $badid . radix_alpha($s1) }; } Fatal('malformed', 'id', $self, "Automatic incrementing of ID counters failed", "Last alternative for '$id' is '$badid'"); } } return $id; } sub lookupID { my ($self, $id) = @_; return $$self{idstore}{$id}; } #====================================================================== # Odd bit: # In an XMDual, in each branch (content, presentation) there will be atoms # that correspond to the input (one will be real, the other an XMRef to the first). # But also there will be additional "decoration" (delimiters, punctuation, etc on the presentation # side; other symbols, bindings, whatever, on the content side). # These decorations should NOT be subject to rewrite rules, # and in cross-linked parallel markup, they should be attributed to the # upper containing object's ID, rather than left dangling. # # To determine this, we mark all math nodes as to whether they are "visible" from # presentation, content or both (the default top-level being both). # Decorations are the nodes that are visible to only one mode. # Note that nodes that are not visible at all CAN occur (& do currently when the parser # creates XMDuals), pruneXMDuals (below) gets rid of them. # NOTE: This should ultimately be in a base Document class, # since it is also needed before conversion to parallel markup! sub markXMNodeVisibility { my ($self) = @_; my @xmath = $self->findnodes('//ltx:XMath/*'); foreach my $math (@xmath) { foreach my $node ($self->findnodes('descendant-or-self::*[@_pvis or @_cvis]', $math)) { $node->removeAttribute('_pvis'); $node->removeAttribute('_cvis'); } } foreach my $math (@xmath) { $self->markXMNodeVisibility_aux($math, 1, 1); } return; } sub markXMNodeVisibility_aux { my ($self, $node, $cvis, $pvis) = @_; my $qname = $self->getNodeQName($node); return if (!$cvis || $node->getAttribute('_cvis')) && (!$pvis || $node->getAttribute('_pvis')); # Special case: for XMArg used to wrap "formal" arguments on the content side, # mark them as visible as presentation as well. $pvis = 1 if $cvis && ($qname eq 'ltx:XMArg'); $node->setAttribute('_cvis' => 1) if $cvis; $node->setAttribute('_pvis' => 1) if $pvis; if ($qname eq 'ltx:XMDual') { my ($c, $p) = element_nodes($node); $self->markXMNodeVisibility_aux($c, 1, 0) if $cvis; $self->markXMNodeVisibility_aux($p, 0, 1) if $pvis; } elsif ($qname eq 'ltx:XMRef') { # $self->markXMNodeVisibility_aux($self->realizeXMNode($node),$cvis,$pvis); } my $id = $node->getAttribute('idref'); if (!$id) { Warn('expected', 'id', $self, "Missing id on ltx:XMRef"); return; } my $reffed = $self->lookupID($id); if (!$reffed) { Warn('expected', 'node', $self, "No node found with id=$id (referred to from ltx:XMRef)"); return; } $self->markXMNodeVisibility_aux($reffed, $cvis, $pvis); } else { foreach my $child (element_nodes($node)) { $self->markXMNodeVisibility_aux($child, $cvis, $pvis); } } return; } # Reduce any ltx:XMDual's to just the visible branch, if the other is not visible # (according to markXMNodeVisibility) # If we could be 100% sure that the marking had stayed consistent (after various doc surgery) # we could avoid re-marking, but we'd better be sure before removing nodes! sub pruneXMDuals { my ($self) = @_; # RE-mark visibility! $self->markXMNodeVisibility; # will reversing keep from problems removing nodes from trees that already have been removed? foreach my $dual (reverse $self->findnodes('descendant-or-self::ltx:XMDual')) { my ($content, $presentation) = element_nodes($dual); if (!$self->findnode('descendant-or-self::*[@_pvis or @_cvis]', $content)) { # content never seen $self->replaceTree($presentation, $dual); } elsif (!$self->findnode('descendant-or-self::*[@_pvis or @_cvis]', $presentation)) { # pres. $self->replaceTree($content, $dual); } } return; } #********************************************************************** # Record the Box that created this node. sub setNodeBox { my ($self, $node, $box) = @_; return unless $box; my $boxid = "$box"; $$self{node_boxes}{$boxid} = $box; return $node->setAttribute(_box => $boxid); } sub getNodeBox { my ($self, $node) = @_; my $t = $node->nodeType; return if $t != XML_ELEMENT_NODE; if (my $boxid = $node->getAttribute('_box')) { return $$self{node_boxes}{$boxid}; } } sub setNodeFont { my ($self, $node, $font) = @_; return unless ref $font; # ? my $fontid = $font->toString; $$self{node_fonts}{$fontid} = $font; if ($node->nodeType == XML_ELEMENT_NODE) { $node->setAttribute(_font => $fontid); } else { Warn('malformed', 'font', $node, "Can't set font on this node"); } return; } sub getNodeFont { my ($self, $node) = @_; my $t = $node->nodeType; return (($t == XML_ELEMENT_NODE) && $$self{node_fonts}{ $node->getAttribute('_font') }) || LaTeXML::Common::Font->textDefault(); } sub getNodeLanguage { my ($self, $node) = @_; my ($font, $lang); while ($node && ($node->nodeType == XML_ELEMENT_NODE) && !(($lang = $node->getAttribute('xml:lang')) || (($font = $$self{node_fonts}{ $node->getAttribute('_font') }) && ($lang = $font->getLanguage)))) { $node = $node->parentNode; } return $lang || 'en'; } sub decodeFont { my ($self, $fontid) = @_; return $$self{node_fonts}{$fontid} || LaTeXML::Common::Font->textDefault(); } # Remove a node from the document (from it's parent) sub removeNode { my ($self, $node) = @_; if ($node) { my $chopped = $$self{node}->isSameNode($node); # Note if we're removing insertion point if ($node->nodeType == XML_ELEMENT_NODE) { # If an element, do ID bookkeeping. if (my $id = $node->getAttribute('xml:id')) { $self->unRecordID($id); } $chopped ||= grep { $self->removeNode_aux($_) } $node->childNodes; } my $parent = $node->parentNode; if ($chopped) { # Don't remove insertion point! $self->setNode($parent); } $parent->removeChild($node); } return $node; } sub removeNode_aux { my ($self, $node) = @_; my $chopped = $$self{node}->isSameNode($node); if ($node->nodeType == XML_ELEMENT_NODE) { # If an element, do ID bookkeeping. if (my $id = $node->getAttribute('xml:id')) { $self->unRecordID($id); } $chopped ||= grep { $self->removeNode_aux($_) } $node->childNodes; } return $chopped; } #********************************************************************** # Inserting new nodes at random points into the document, # typically, later in the process or during some kind of rearrangement. # This is a somewhat strange situation; There are commands and environments # that do some interesting thing to their contents. This include things like # center, flushleft, or rotate, or ... # Naively one is tempted to create a containing block with appropriate type & attributes. # However, since these things can be allowed in so many places by LaTeX, that # one has a difficult time creating a sensible document model. # The purpose of transformingBlock is to set the contents (possibly creating a # consistent

around them, if called for), and returning the list of newly # created nodes. These nodes can then have appropriate attributes added as needed # for each specific case. # Since this situation can occur in both LaTeX and AmSTeX type documents, # we'll put it in the TeX pool so it can be reused. # Tricky bit for creating nodes late in the game, ###### ### See createElementAt # This opens a new element at the _specified_ point, rather than the current insertion point. # This is useful during document rearrangement or augmentation that may be needed later # in the process. sub openElementAt { my ($self, $point, $qname, %attributes) = @_; my ($ns, $tag) = $$self{model}->decodeQName($qname); my $newnode; my $font = $attributes{_font} || $attributes{font}; my $box = $attributes{_box}; $box = $$self{node_boxes}{$box} if $box && !ref $box; # may already be the string key # If this will be the document root node, things are slightly more involved. if ($point->nodeType == XML_DOCUMENT_NODE) { # First node! (?) $$self{model}->addSchemaDeclaration($self, $tag); map { $$self{document}->appendChild($_) } @{ $$self{pending} }; # Add saved comments, PI's $newnode = $$self{document}->createElement($tag); $self->recordConstructedNode($newnode); $$self{document}->setDocumentElement($newnode); if ($ns) { # Here, we're creating the initial, document element, which will hold ALL of the namespace declarations. # If there is a default namespace (no prefix), that will also be declared, and applied here. # However, if there is ALSO a prefix associated with that namespace, we have to declare it FIRST # due to the (apparently) buggy way that XML::LibXML works with namespaces in setAttributeNS. my $prefix = $$self{model}->getDocumentNamespacePrefix($ns); my $attprefix = $$self{model}->getDocumentNamespacePrefix($ns, 1, 1); if (!$prefix && $attprefix) { $newnode->setNamespace($ns, $attprefix, 0); } $newnode->setNamespace($ns, $prefix, 1); } } else { $font = $self->getNodeFont($point) unless $font; $box = $self->getNodeBox($point) unless $box; $newnode = $self->openElement_internal($point, $ns, $tag); } foreach my $key (sort keys %attributes) { next if $key eq 'font'; # !!! next if $key eq 'locator'; # !!! $self->setAttribute($newnode, $key, $attributes{$key}); } $self->setNodeFont($newnode, $font) if $font; $self->setNodeBox($newnode, $box) if $box; print STDERR "Inserting " . Stringify($newnode) . " into " . Stringify($point) . "\n" if $LaTeXML::Core::Document::DEBUG; # Run afterOpen operations $self->afterOpen($newnode); return $newnode; } sub openElement_internal { my ($self, $point, $ns, $tag) = @_; my $newnode; if ($ns) { if (!defined $point->lookupNamespacePrefix($ns)) { # namespace not already declared? $self->getDocument->documentElement ->setNamespace($ns, $$self{model}->getDocumentNamespacePrefix($ns), 0); } $newnode = $point->addNewChild($ns, $tag); } else { $newnode = $point->appendChild($$self{document}->createElement($tag)); } $self->recordConstructedNode($newnode); return $newnode; } # Whenever a node has been created using openElementAt, # closeElementAt ought to be used to close it, when you're finished inserting into $node. # Basically, this just runs any afterClose operations. sub closeElementAt { my ($self, $node) = @_; return $self->afterClose($node); } sub afterOpen { my ($self, $node) = @_; # Set current point to this node, just in case the afterOpen's use it. my $savenode = $$self{node}; $self->setNode($node); my $box = $self->getNodeBox($node); map { &$_($self, $node, $box) } $self->getTagActionList($node, 'afterOpen'); $self->setNode($savenode); return $node; } sub afterClose { my ($self, $node) = @_; # Should we set point to this node? (or to last child, or something ?? my $savenode = $$self{node}; my $box = $self->getNodeBox($node); map { &$_($self, $node, $box) } $self->getTagActionList($node, 'afterClose'); $self->setNode($savenode); return $node; } #********************************************************************** # Appending clones of nodes # Inserting clones of nodes into the document. # Nodes that exist in some other part of the document (or some other document) # will need to be cloned so that they can be part of the new document; # otherwise, they would be removed from thier previous document. # Also, we want to have a clean namespace node structure # (otherwise, libxml2 has a tendency to introduce annoying "default" namespace prefix declarations) # And, finally, we need to modify any id's present in the old nodes, # since otherwise they may be duplicated. # Should have variants here for prepend, insert before, insert after.... ??? sub appendClone { my ($self, $node, @newchildren) = @_; # Expand any document fragments @newchildren = map { ($_->nodeType == XML_DOCUMENT_FRAG_NODE ? $_->childNodes : $_) } @newchildren; # Now find all xml:id's in the newchildren and record replacement id's for them local %LaTeXML::Core::Document::IDMAP = (); # Find all id's defined in the copy and change the id. foreach my $child (@newchildren) { foreach my $idnode ($self->findnodes('.//@xml:id', $child)) { my $id = $idnode->getValue; $LaTeXML::Core::Document::IDMAP{$id} = $self->modifyID($id); } } # Now do the cloning (actually copying) and insertion. $self->appendClone_aux($node, @newchildren); return $node; } sub appendClone_aux { my ($self, $node, @newchildren) = @_; foreach my $child (@newchildren) { my $type = $child->nodeType; if ($type == XML_ELEMENT_NODE) { my $new = $self->openElement_internal($node, $child->namespaceURI, $child->localname); foreach my $attr ($child->attributes) { if ($attr->nodeType == XML_ATTRIBUTE_NODE) { my $key = $attr->nodeName; if ($key eq 'xml:id') { # Use the replacement id my $newid = $LaTeXML::Core::Document::IDMAP{ $attr->getValue }; $newid = $self->recordID($newid, $new); $new->setAttribute($key, $newid); } elsif ($key eq 'idref') { # Refer to the replacement id if it was replaced my $id = $attr->getValue; $new->setAttribute($key, $LaTeXML::Core::Document::IDMAP{$id} || $id); } elsif (my $ns = $attr->namespaceURI) { $new->setAttributeNS($ns, $attr->name, $attr->getValue); } else { $new->setAttribute($attr->localname, $attr->getValue); } } } $self->afterOpen($new); $self->appendClone_aux($new, $child->childNodes); $self->afterClose($new); } elsif ($type == XML_TEXT_NODE) { $node->appendTextNode($child->textContent); } } return $node; } #********************************************************************** # Wrapping & Unwrapping nodes by another element. # Wrap @nodes with an element named $qname, making the new element replace the first $node, # and all @nodes becomes the child of the new node. # [this makes most sense if @nodes are a sequence of siblings] # Returns undef if $qname isn't allowed in the parent, or if @nodes aren't allowed in $qname, # otherwise, returns the newly created $qname. sub wrapNodes { my ($self, $qname, @nodes) = @_; return unless @nodes; my $model = $$self{model}; my $parent = $nodes[0]->parentNode; my ($ns, $tag) = $model->decodeQName($qname); my $new = $self->openElement_internal($parent, $ns, $tag); $self->afterOpen($new); $parent->replaceChild($new, $nodes[0]); if (my $font = $self->getNodeFont($parent)) { $self->setNodeFont($new, $font); } if (my $box = $self->getNodeBox($parent)) { $self->setNodeBox($new, $box); } foreach my $node (@nodes) { $new->appendChild($node); } $self->afterClose($new); return $new; } # Unwrap the children of $node, by replacing $node by its children. sub unwrapNodes { my ($self, $node) = @_; return $self->replaceNode($node, $node->childNodes); } # Replace $node by @nodes (presumably descendants of some kind?) sub replaceNode { my ($self, $node, @nodes) = @_; my $parent = $node->parentNode; my $c0; while (my $c1 = shift(@nodes)) { if ($c0) { $parent->insertAfter($c1, $c0); } else { $parent->replaceChild($c1, $node); } $c0 = $c1; } $self->removeNode($node); return $node; } # initially since $node->setNodeName was broken in XML::LibXML 1.58 # but this can provide for more options & correctness? sub renameNode { my ($self, $node, $newname) = @_; my $model = $$self{model}; my ($ns, $tag) = $model->decodeQName($newname); my $parent = $node->parentNode; my $new = $self->openElement_internal($parent, $ns, $tag); my $id; # Move to the position AFTER $node $parent->insertAfter($new, $node); # Copy ALL attributes from $node to $newnode foreach my $attr ($node->attributes) { my $key = $attr->getName; my $value = $node->getAttribute($key); $id = $value if $key eq 'xml:id'; # Save to register after removal of old node. $new->setAttribute($key, $value); } # AND move all content from $node to $newnode foreach my $child ($node->childNodes) { $new->appendChild($child); } ## THEN call afterOpen... ? # It would normally be called before children added, # but how can we know if we're duplicated auto-added stuff? $self->afterOpen($new); $self->afterClose($new); # Finally, remove the old node $self->removeNode($node); # and FINALLY, we can register the new node under the id. if ($id) { my $newid = $self->recordID($id, $new); $new->setAttribute('xml:id' => $newid) if $newid ne $id; } return $new; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Finally, another set of surgery methods # These take an array representation of the XML Tree to append # [tagname,{attributes..}, children] # THESE SHOULD BE PART OF A COMMON BASE CLASS; DUPLICATED IN Post::Document #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% sub replaceTree { my ($self, $new, $old) = @_; my $parent = $old->parentNode; my @following = (); # Collect the matching and following nodes while (my $sib = $parent->lastChild) { last if $sib->isSameNode($old); $parent->removeChild($sib); # We're putting these back, in a moment! unshift(@following, $sib); } $self->removeNode($old); $self->appendTree($parent, $new); my $inserted = $parent->lastChild; map { $parent->appendChild($_) } @following; # No need for clone return $inserted; } sub appendTree { my ($self, $node, @data) = @_; foreach my $child (@data) { if (ref $child eq 'ARRAY') { my ($tag, $attributes, @children) = @$child; my $new = $self->openElementAt($node, $tag, ($attributes ? %$attributes : ())); $self->appendTree($new, @children); } elsif ((ref $child) =~ /^XML::LibXML::/) { my $type = $child->nodeType; if ($type == XML_ELEMENT_NODE) { my $tag = $self->getNodeQName($child); my %attributes = map { $_->nodeType == XML_ATTRIBUTE_NODE ? ($_->nodeName => $_->getValue) : () } $child->attributes; # DANGER: REMOVE the xml:id attribute from $child!!!! # This protects against some versions of XML::LibXML that warn against duplicate id's # Hopefully, you shouldn't be using the node any more $child->removeAttribute('xml:id') if $attributes{'xml:id'}; my $new = $self->openElementAt($node, $tag, %attributes); $self->appendTree($new, $child->childNodes); } elsif ($type == XML_DOCUMENT_FRAG_NODE) { $self->appendTree($node, $child->childNodes); } elsif ($type == XML_TEXT_NODE) { $node->appendTextNode($child->textContent); } } elsif ((ref $child) && $child->isaBox) { my $savenode = $self->getNode; $self->setNode($node); $self->absorb($child); $self->setNode($savenode); } elsif (ref $child) { Warn('malformed', $child, $node, "Dont know how to add '$child' to document; ignoring"); } elsif (defined $child) { $node->appendTextNode($child); } } return; } #********************************************************************** 1; __END__ =pod =head1 NAME C - represents an XML document under construction. =head1 DESCRIPTION A C represents an XML document being constructed by LaTeXML, and also provides the methods for constructing it. It extends L. LaTeXML will have digested the source material resulting in a L (from a L) of Ls, Ls and sublists. At this stage, a document is created and it is responsible for `absorbing' the digested material. Generally, the Ls and Ls create text nodes, whereas the Ls create C document fragments, elements and attributes according to the defining L. Most document construction occurs at a I where material will be added, and which moves along with the inserted material. The L, derived from various declarations and document type, is consulted to determine whether an insertion is allowed and when elements may need to be automatically opened or closed in order to carry out a given insertion. For example, a C element will typically be closed automatically when it is attempted to open a C

element. In the methods described here, the term C<$qname> is used for XML qualified names. These are tag names with a namespace prefix. The prefix should be one registered with the current Model, for use within the code. This prefix is not necessarily the same as the one used in any DTD, but should be mapped to the a Namespace URI that was registered for the DTD. The arguments named C<$node> are an XML::LibXML node. The methods here are grouped into three sections covering basic access to the document, insertion methods at the current insertion point, and less commonly used, lower-level, document manipulation methods. =head2 Accessors =over 4 =item C<< $doc = $document->getDocument; >> Returns the C currently being constructed. =item C<< $doc = $document->getModel; >> Returns the C that represents the document model used for this document. =item C<< $node = $document->getNode; >> Returns the node at the I during construction. This node is considered still to be `open'; any insertions will go into it (if possible). The node will be an C, C or, initially, C. =item C<< $node = $document->getElement; >> Returns the closest ancestor to the current insertion point that is an Element. =item C<< $node = $document->getChildElement($node); >> Returns a list of the child elements, if any, of the C<$node>. =item C<< @nodes = $document->getLastChildElement($node); >> Returns the last child element of the C<$node>, if it has one, else undef. =item C<< $node = $document->getFirstChildElement($node); >> Returns the first child element of the C<$node>, if it has one, else undef. =item C<< @nodes = $document->findnodes($xpath,$node); >> Returns a list of nodes matching the given C<$xpath> expression. The I for C<$xpath> is C<$node>, if given, otherwise it is the document element. =item C<< $node = $document->findnode($xpath,$node); >> Returns the first node matching the given C<$xpath> expression. The I for C<$xpath> is C<$node>, if given, otherwise it is the document element. =item C<< $node = $document->getNodeQName($node); >> Returns the qualified name (localname with namespace prefix) of the given C<$node>. The namespace prefix mapping is the code mapping of the current document model. =item C<< $boolean = $document->canContain($tag,$child); >> Returns whether an element C<$tag> can contain a child C<$child>. C<$tag> and C<$child> can be nodes, qualified names of nodes (prefix:localname), or one of a set of special symbols C<#PCDATA>, C<#Comment>, C<#Document> or C<#ProcessingInstruction>. =item C<< $boolean = $document->canContainIndirect($tag,$child); >> Returns whether an element C<$tag> can contain a child C<$child> either directly, or after automatically opening one or more autoOpen-able elements. =item C<< $boolean = $document->canContainSomehow($tag,$child); >> Returns whether an element C<$tag> can contain a child C<$child> either directly, or after automatically opening one or more autoOpen-able elements. =item C<< $boolean = $document->canHaveAttribute($tag,$attrib); >> Returns whether an element C<$tag> can have an attribute named C<$attrib>. =item C<< $boolean = $document->canAutoOpen($tag); >> Returns whether an element C<$tag> is able to be automatically opened. =item C<< $boolean = $document->canAutoClose($node); >> Returns whether the node C<$node> can be automatically closed. =back =head2 Construction Methods These methods are the most common ones used for construction of documents. They generally operate by creating new material at the I. That point initially is just the document itself, but it moves along to follow any new insertions. These methods also adapt to the document model so as to automatically open or close elements, when it is required for the pending insertion and allowed by the document model (See L). =over 4 =item C<< $xmldoc = $document->finalize; >> This method finalizes the document by cleaning up various temporary attributes, and returns the L that was constructed. =item C<< @nodes = $document->absorb($digested); >> Absorb the C<$digested> object into the document at the current insertion point according to its type. Various of the the other methods are invoked as needed, and document nodes may be automatically opened or closed according to the document model. This method returns the nodes that were constructed. Note that the nodes may include children of other nodes, and nodes that may already have been removed from the document (See filterChildren and filterDeleted). Also, text insertions are often merged with existing text nodes; in such cases, the whole text node is included in the result. =item C<< $document->insertElement($qname,$content,%attributes); >> This is a shorthand for creating an element C<$qname> (with given attributes), absorbing C<$content> from within that new node, and then closing it. The C<$content> must be digested material, either a single box, or an array of boxes, which will be absorbed into the element. This method returns the newly created node, although it will no longer be the current insertion point. =item C<< $document->insertMathToken($string,%attributes); >> Insert a math token (XMTok) containing the string C<$string> with the given attributes. Useful attributes would be name, role, font. Returns the newly inserted node. =item C<< $document->insertComment($text); >> Insert, and return, a comment with the given C<$text> into the current node. =item C<< $document->insertPI($op,%attributes); >> Insert, and return, a ProcessingInstruction into the current node. =item C<< $document->openText($text,$font); >> Open a text node in font C<$font>, performing any required automatic opening and closing of intermedate nodes (including those needed for font changes) and inserting the string C<$text> into it. =item C<< $document->openElement($qname,%attributes); >> Open an element, named C<$qname> and with the given attributes. This will be inserted into the current node while performing any required automatic opening and closing of intermedate nodes. The new element is returned, and also becomes the current insertion point. An error (fatal if in C mode) is signalled if there is no allowed way to insert such an element into the current node. =item C<< $document->closeElement($qname); >> Close the closest open element named C<$qname> including any intermedate nodes that may be automatically closed. If that is not possible, signal an error. The closed node's parent becomes the current node. This method returns the closed node. =item C<< $node = $document->isOpenable($qname); >> Check whether it is possible to open a C<$qname> element at the current insertion point. =item C<< $node = $document->isCloseable($qname); >> Check whether it is possible to close a C<$qname> element, returning the node that would be closed if possible, otherwise undef. =item C<< $document->maybeCloseElement($qname); >> Close a C<$qname> element, if it is possible to do so, returns the closed node if it was found, else undef. =item C<< $document->addAttribute($key=>$value); >> Add the given attribute to the node nearest to the current insertion point that is allowed to have it. This does not change the current insertion point. =item C<< $document->closeToNode($node); >> This method closes all children of C<$node> until C<$node> becomes the insertion point. Note that it closes any open nodes, not only autoCloseable ones. =back =head3 Internal Insertion Methods These are described as an aide to understanding the code; they rarely, if ever, should be used outside this module. =over 4 =item C<< $document->setNode($node); >> Sets the I to be C<$node>. This should be rarely used, if at all; The construction methods of document generally maintain the notion of insertion point automatically. This may be useful to allow insertion into a different part of the document, but you probably want to set the insertion point back to the previous node, afterwards. =item C<< $string = $document->getInsertionContext($levels); >> For debugging, return a string showing the context of the current insertion point; that is, the string of the nodes leading up to it. if C<$levels> is defined, show only that many nodes. =item C<< $node = $document->find_insertion_point($qname); >> This internal method is used to find the appropriate point, relative to the current insertion point, that an element with the specified C<$qname> can be inserted. That position may require automatic opening or closing of elements, according to what is allowed by the document model. =item C<< @nodes = getInsertionCandidates($node); >> Returns a list of elements where an arbitrary insertion might take place. Roughly this is a list starting with C<$node>, followed by its parent and the parents siblings (in reverse order), followed by the grandparent and siblings (in reverse order). =item C<< $node = $document->floatToElement($qname); >> Finds the nearest element at or preceding the current insertion point (see C), that can accept an element C<$qname>; it moves the insertion point to that point, and returns the previous insertion point. Generally, after doing whatever you need at the new insertion point, you should call C<< $document->setNode($node); >> to restore the insertion point. If no such point is found, the insertion point is left unchanged, and undef is returned. =item C<< $node = $document->floatToAttribute($key); >> This method works the same as C, but find the nearest element that can accept the attribute C<$key>. =item C<< $node = $document->openText_internal($text); >> This is an internal method, used by C, that assumes the insertion point has been appropriately adjusted.) =item C<< $node = $document->openMathText_internal($text); >> This internal method appends C<$text> to the current insertion point, which is assumed to be a math node. It checks for math ligatures and carries out any combinations called for. =item C<< $node = $document->closeText_internal(); >> This internal method closes the current node, which should be a text node. It carries out any text ligatures on the content. =item C<< $node = $document->closeNode_internal($node); >> This internal method closes any open text or element nodes starting at the current insertion point, up to and including C<$node>. Afterwards, the parent of C<$node> will be the current insertion point. It condenses the tree to avoid redundant font switching elements. =item C<< $document->afterOpen($node); >> Carries out any afterOpen operations that have been recorded (using C) for the element name of C<$node>. =item C<< $document->afterClose($node); >> Carries out any afterClose operations that have been recorded (using C) for the element name of C<$node>. =back =head2 Document Modification The following methods are used to perform various sorts of modification and rearrangements of the document, after the normal flow of insertion has taken place. These may be needed after an environment (or perhaps the whole document) has been completed and one needs to analyze what it contains to decide on the appropriate representation. =over 4 =item C<< $document->setAttribute($node,$key,$value); >> Sets the attribute C<$key> to C<$value> on C<$node>. This method is prefered over the direct LibXML one, since it takes care of decoding namespaces (if C<$key> is a qname), and also manages recording of xml:id's. =item C<< $document->recordID($id,$node); >> Records the association of the given C<$node> with the C<$id>, which should be the C attribute of the C<$node>. Usually this association will be maintained by the methods that create nodes or set attributes. =item C<< $document->unRecordID($id); >> Removes the node associated with the given C<$id>, if any. This might be needed if a node is deleted. =item C<< $document->modifyID($id); >> Adjusts C<$id>, if needed, so that it is unique. It does this by appending a letter and incrementing until it finds an id that is not yet associated with a node. =item C<< $node = $document->lookupID($id); >> Returns the node, if any, that is associated with the given C<$id>. =item C<< $document->setNodeBox($node,$box); >> Records the C<$box> (being a Box, Whatsit or List), that was (presumably) responsible for the creation of the element C<$node>. This information is useful for determining source locations, original TeX strings, and so forth. =item C<< $box = $document->getNodeBox($node); >> Returns the C<$box> that was responsible for creating the element C<$node>. =item C<< $document->setNodeFont($node,$font); >> Records the font object that encodes the font that should be used to display any text within the element C<$node>. =item C<< $font = $document->getNodeFont($node); >> Returns the font object associated with the element C<$node>. =item C<< $node = $document->openElementAt($point,$qname,%attributes); >> Opens a new child element in C<$point> with the qualified name C<$qname> and with the given attributes. This method is not affected by, nor does it affect, the current insertion point. It does manage namespaces, xml:id's and associating a box, font and locator with the new element, as well as running any C operations. =item C<< $node = $document->closeElementAt($node); >> Closes C<$node>. This method is not affected by, nor does it affect, the current insertion point. However, it does run any C operations, so any element that was created using the lower-level C should be closed using this method. =item C<< $node = $document->appendClone($node,@newchildren); >> Appends clones of C<@newchildren> to C<$node>. This method modifies any ids found within C<@newchildren> (using C), and fixes up any references to those ids within the clones so that they refer to the modified id. =item C<< $node = $document->wrapNodes($qname,@nodes); >> This method wraps the C<@nodes> by a new element with qualified name C<$qname>, that new node replaces the first of C<@node>. The remaining nodes in C<@nodes> must be following siblings of the first one. NOTE: Does this need multiple nodes? If so, perhaps some kind of movenodes helper? Otherwise, what about attributes? =item C<< $node = $document->unwrapNodes($node); >> Unwrap the children of C<$node>, by replacing C<$node> by its children. =item C<< $node = $document->replaceNode($node,@nodes); >> Replace C<$node> by C<@nodes>; presumably they are some sort of descendant nodes. =item C<< $node = $document->renameNode($node,$newname); >> Rename C<$node> to the tagname C<$newname>; equivalently replace C<$node> by a new node with name C<$newname> and copy the attributes and contents. It is assumed that C<$newname> can contain those attributes and contents. =item C<< @nodes = $document->filterDeletions(@nodes); >> This function is useful with C<$doc->absorb($box)>, when you want to filter out any nodes that have been deleted and no longer appear in the document. =item C<< @nodes = $document->filterChildren(@nodes); >> This function is useful with C<$doc->absorb($box)>, when you want to filter out any nodes that are children of other nodes in C<@nodes>. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Gullet.pm0000644000175000017500000010641312507513572017550 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Gullet | # # | Analog of TeX's Gullet; deals with expansion and arg parsing | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Gullet; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Core::Tokens; use LaTeXML::Core::Mouth; use LaTeXML::Util::Pathname; use LaTeXML::Common::Number; use LaTeXML::Common::Float; use LaTeXML::Common::Dimension; use LaTeXML::Common::Glue; use LaTeXML::Core::MuDimension; use LaTeXML::Core::MuGlue; use base qw(LaTeXML::Common::Object); #********************************************************************** sub new { my ($class) = @_; return bless { mouth => undef, mouthstack => [], pushback => [], autoclose => 1, pending_comments => [] }, $class; } #********************************************************************** # Start reading tokens from a new Mouth. # This pushes the mouth as the current source that $gullet->readToken (etc) will read from. # Once this Mouth has been exhausted, readToken, etc, will return undef, # until you call $gullet->closeMouth to clear the source. # Exception: if $toplevel=1, readXToken will step to next source # Note that a Tokens can act as a Mouth. sub openMouth { my ($self, $mouth, $noautoclose) = @_; return unless $mouth; unshift(@{ $$self{mouthstack} }, [$$self{mouth}, $$self{pushback}, $$self{autoclose}]) if $$self{mouth}; $$self{mouth} = $mouth; $$self{pushback} = []; $$self{autoclose} = !$noautoclose; return; } sub closeMouth { my ($self, $forced) = @_; if (!$forced && (@{ $$self{pushback} } || $$self{mouth}->hasMoreInput)) { my $next = Stringify($self->readToken); Error('unexpected', $next, $self, "Closing mouth with input remaining '$next'"); } $$self{mouth}->finish; if (@{ $$self{mouthstack} }) { ($$self{mouth}, $$self{pushback}, $$self{autoclose}) = @{ shift(@{ $$self{mouthstack} }) }; } else { $$self{pushback} = []; ## $$self{mouth}=Tokens(); $$self{mouth} = LaTeXML::Core::Mouth->new(); #### $$self{mouth}=undef; $$self{autoclose} = 1; } return; } sub getMouth { my ($self) = @_; return $$self{mouth}; } sub mouthIsOpen { my ($self, $mouth) = @_; return ($$self{mouth} eq $mouth) || grep { $_ && ($$_[0] eq $mouth) } @{ $$self{mouthstack} }; } # This flushes a mouth so that it will be automatically closed, next time it's read # Corresponds (I think) to TeX's \endinput sub flushMouth { my ($self) = @_; $$self{mouth}->finish;; # but not close! $$self{pushback} = []; # And don't read anytyhing more from it. $$self{autoclose} = 1; return; } # Obscure, but the only way I can think of to End!! (see \bye or \end{document}) # Flush all sources (close all pending mouth's) sub flush { my ($self) = @_; $$self{mouth}->finish; while (@{ $$self{mouthstack} }) { my $entry = shift @{ $$self{mouthstack} }; $$entry[0]->finish; } $$self{pushback} = []; ## $$self{mouth}=Tokens(); $$self{mouth} = LaTeXML::Core::Mouth->new(); #### $$self{mouth}=undef; $$self{autoclose} = 1; $$self{mouthstack} = []; return; } # Do something, while reading stuff from a specific Mouth. # This reads ONLY from that mouth (or any mouth openned by code in that source), # and the mouth should end up empty afterwards, and only be closed here. sub readingFromMouth { my ($self, $mouth, $closure) = @_; $self->openMouth($mouth, 1); # only allow mouth to be explicitly closed here. my ($result, @result); if (wantarray) { @result = &$closure($self); } else { $result = &$closure($self); } # $mouth must still be open, with (at worst) empty autoclosable mouths in front of it while (1) { if ($$self{mouth} eq $mouth) { $self->closeMouth(1); last; } elsif (!@{ $$self{mouthstack} }) { Error('unexpected', '', $self, "Mouth is unexpectedly already closed", "Reading from " . Stringify($mouth) . ", but it has already been closed."); } elsif (!$$self{autoclose} || @{ $$self{pushback} } || $$self{mouth}->hasMoreInput) { my $next = Stringify($self->readToken); Error('unexpected', $next, $self, "Unexpected input remaining: '$next'", "Finished reading from " . Stringify($mouth) . ", but it still has input."); $$self{mouth}->finish; $self->closeMouth(1); } # ?? if we continue? else { $self->closeMouth; } } return (wantarray ? @result : $result); } # User feedback for where something (error?) occurred. sub getLocator { my ($self, $long) = @_; my $mouth = $$self{mouth}; my $i = 0; while ((defined $mouth) && (($$mouth{source} || '') eq 'Anonymous String') && ($i < scalar(@{ $$self{mouthstack} }))) { $mouth = $$self{mouthstack}[$i++][0]; } my $loc = (defined $mouth ? $mouth->getLocator($long) : ''); if (!$loc || $long) { $loc .= show_pushback($$self{pushback}) if $long; foreach my $frame (@{ $$self{mouthstack} }) { my $ml = $$frame[0]->getLocator($long); $loc .= ' ' . $ml if $ml; last if $loc && !$long; $loc .= show_pushback($$frame[1]) if $long; } } return $loc; } sub getSource { my ($self) = @_; my $source = defined $$self{mouth} && $$self{mouth}->getSource; if (!$source) { foreach my $frame (@{ $$self{mouthstack} }) { $source = $$frame[0]->getSource; last if $source; } } return $source; } sub getSourceMouth { my ($self) = @_; my $mouth = $$self{mouth}; my $source = defined $mouth && $mouth->getSource; if (!$source || ($source eq "Anonymous String")) { foreach my $frame (@{ $$self{mouthstack} }) { $mouth = $$frame[0]; $source = $mouth->getSource; last if $source && $source ne "Anonymous String"; } } return $mouth; } # Handy message generator when we didn't get something expected. sub showUnexpected { my ($self) = @_; my $token = $self->readToken; my $message = ($token ? "Next token is " . Stringify($token) : "Input is empty"); $self->unread($token); return $message; } sub show_pushback { my ($pb) = @_; my @pb = @$pb; @pb = (@pb[0 .. 50], T_OTHER('...')) if scalar(@pb) > 55; return (@pb ? "\n To be read again " . ToString(Tokens(@pb)) : ''); } #********************************************************************** # Not really 100% sure how this is supposed to work # See TeX Ch 20, p216 regarding noexpand, \edef with token list registers, etc. # Solution: Duplicate param tokens, stick NOTEXPANDED infront of expandable tokens. sub neutralizeTokens { my ($self, @tokens) = @_; my @result = (); foreach my $token (@tokens) { if ($$token[1] == CC_PARAM) { # Inline ->getCatcode! push(@result, $token); } elsif (defined(my $defn = $STATE->lookupDefinition($token))) { push(@result, Token('\noexpand', CC_NOTEXPANDED)); } push(@result, $token); } return @result; } #********************************************************************** # Low-level readers: read token, read expanded token #********************************************************************** # Note that every char (token) comes through here (maybe even twice, through args parsing), # So, be Fast & Clean! This method only reads from the current input stream (Mouth). sub readToken { my ($self) = @_; # my $token = shift(@{$$self{pushback}}); my $token; my $cc; # Check in pushback first.... while (defined($token = shift(@{ $$self{pushback} })) && ($cc = $$token[1]) && (($cc == CC_COMMENT) || ($cc == CC_MARKER))) { # NOTE: Inlined if ($cc == CC_COMMENT) { push(@{ $$self{pending_comments} }, $token); } elsif ($cc == CC_MARKER) { LaTeXML::Core::Definition::stopProfiling($token); } } return $token if defined $token; while (defined($token = $$self{mouth}->readToken()) && ($cc = $$token[1]) && (($cc == CC_COMMENT) || ($cc == CC_MARKER))) { # NOTE: Inlined if ($cc == CC_COMMENT) { push(@{ $$self{pending_comments} }, $token); } # What to do with comments??? elsif ($cc == CC_MARKER) { LaTeXML::Core::Definition::stopProfiling($token); } } return $token; } # Unread tokens are assumed to be not-yet expanded. sub unread { my ($self, @tokens) = @_; my $r; unshift(@{ $$self{pushback} }, map { (!defined $_ ? () : (($r = ref $_) eq 'LaTeXML::Core::Token' ? $_ : ($r eq 'LaTeXML::Core::Tokens' ? @$_ : Fatal('misdefined', $r, undef, "Expected a Token, got " . Stringify($_))))) } @tokens); return; } # Read the next non-expandable token (expanding tokens until there's a non-expandable one). # Note that most tokens pass through here, so be Fast & Clean! readToken is folded in. # `Toplevel' processing, (if $toplevel is true), used at the toplevel processing by Stomach, # will step to the next input stream (Mouth) if one is available, # If $commentsok is true, will also pass comments. sub readXToken { my ($self, $toplevel, $commentsok) = @_; $toplevel = 1 unless defined $toplevel; return shift(@{ $$self{pending_comments} }) if $commentsok && @{ $$self{pending_comments} }; my ($token, $cc, $defn); while (1) { if (!defined($token = (@{ $$self{pushback} } ? shift(@{ $$self{pushback} }) : $$self{mouth}->readToken()))) { return unless $$self{autoclose} && $toplevel && @{ $$self{mouthstack} }; $self->closeMouth; } # Next input stream. elsif (($cc = $$token[1]) == CC_NOTEXPANDED) { # NOTE: Inlined ->getCatcode # Should only occur IMMEDIATELY after expanding \noexpand (by readXToken), # so this token should never leak out through an EXTERNAL call to readToken. return $self->readToken; } # Just return the next token. elsif ($cc == CC_COMMENT) { return $token if $commentsok; push(@{ $$self{pending_comments} }, $token); } # What to do with comments??? elsif ($cc == CC_MARKER) { LaTeXML::Core::Definition::stopProfiling($token); } elsif (defined($defn = $STATE->lookupDefinition($token)) && $defn->isExpandable && ($toplevel || !$defn->isProtected)) { # is this the right logic here? don't expand unless digesting? local $LaTeXML::CURRENT_TOKEN = $token; my $t; # Do the check here, to be more forgiving and more informative my @expansion = map { (($t = ref $_) eq 'LaTeXML::Core::Token' ? $_ : ($t eq 'LaTeXML::Core::Tokens' ? @$_ : (Error('misdefined', $token, undef, "Expected a Token in expansion of " . ToString($token), "got " . Stringify($_)), ()))) } $defn->invoke($self); # already checked tokens, so just push to be re-read (like ->unread(@expansion); ) unshift(@{ $$self{pushback} }, @expansion); } else { return $token; } # just return it } return; } # never get here. # Read the next raw line (string); # primarily to read from the Mouth, but keep any unread input! sub readRawLine { my ($self) = @_; # If we've got unread tokens, they presumably should come before the Mouth's raw data # but we'll convert them back to string. my @tokens = @{ $$self{pushback} }; my @markers = grep { $_->getCatcode == CC_MARKER } @tokens; if (@markers) { # Whoops, profiling markers! @tokens = grep { $_->getCatcode != CC_MARKER } @tokens; # Remove map { LaTeXML::Core::Definition::stopProfiling($_) } @markers; } $$self{pushback} = []; # If we still have peeked tokens, we ONLY want to combine it with the remainder # of the current line from the Mouth (NOT reading a new line) if (@tokens) { return ToString(Tokens(@tokens)) . $$self{mouth}->readRawLine(1); } # Otherwise, read the next line from the Mouth. else { return $$self{mouth}->readRawLine; } } #********************************************************************** # Mid-level readers: checking and matching tokens, strings etc. #********************************************************************** # The following higher-level parsing methods are built upon readToken & unread. sub readNonSpace { my ($self) = @_; my $token; do { $token = $self->readToken(); } while (defined $token && $$token[1] == CC_SPACE); # Inline ->getCatcode! return $token; } sub skipSpaces { my ($self) = @_; my $tok = $self->readNonSpace; $self->unread($tok) if defined $tok; return; } sub skip1Space { my ($self) = @_; my $token = $self->readToken(); $self->unread($token) if $token && ($$token[1] != CC_SPACE); # Inline ->getCatcode! return; } # = | \relax sub skipFiller { my ($self) = @_; while (1) { my $tok = $self->readNonSpace; return unless defined $tok; # Should \foo work too (where \let\foo\relax) ?? if ($tok->getString ne '\relax') { $self->unread($tok); return; } } return; } # Read a sequence of tokens balanced in {} # assuming the { has already been read. # Returns a Tokens list of the balanced sequence, omitting the closing } sub readBalanced { my ($self) = @_; my @tokens = (); my ($token, $level) = (undef, 1); while ($level && defined($token = $self->readToken())) { my $cc = $$token[1]; # Inline ->getCatcode! $level++ if $cc == CC_BEGIN; $level-- if $cc == CC_END; push(@tokens, $token) if $level; } return Tokens(@tokens); } sub ifNext { my ($self, $token) = @_; if (my $tok = $self->readToken()) { $self->unread($tok); return $tok->equals($token); } else { return 0; } } # Match the input against one of the Token or Tokens in @choices; return the matching one or undef. sub readMatch { my ($self, @choices) = @_; foreach my $choice (@choices) { my @tomatch = $choice->unlist; my @matched = (); my $token; while (@tomatch && defined($token = $self->readToken) && push(@matched, $token) && ($token->equals($tomatch[0]))) { shift(@tomatch); if ($$token[1] == CC_SPACE) { # If this was space, SKIP any following!!! while (defined($token = $self->readToken) && ($$token[1] == CC_SPACE)) { push(@matched, $token); } $self->unread($token) if $token; } } return $choice unless @tomatch; # All matched!!! $self->unread(@matched); # Put 'em back and try next! } return; } # Match the input against a set of keywords; Similar to readMatch, but the keywords are strings, # and Case and catcodes are ignored; additionally, leading spaces are skipped. # AND, macros are expanded. sub readKeyword { my ($self, @keywords) = @_; $self->skipSpaces; foreach my $keyword (@keywords) { $keyword = ToString($keyword) if ref $keyword; my @tomatch = split('', uc($keyword)); my @matched = (); my $tok; while (@tomatch && defined($tok = $self->readXToken(0)) && push(@matched, $tok) && (uc($tok->getString) eq $tomatch[0])) { shift(@tomatch); } return $keyword unless @tomatch; # All matched!!! $self->unread(@matched); # Put 'em back and try next! } return; } # Return a (balanced) sequence tokens until a match against one of the Tokens in @delims. # In list context, also returns the found delimiter. sub readUntil { my ($self, @delims) = @_; my ($n, $found, @tokens) = (0); while (!defined($found = $self->readMatch(@delims))) { my $token = $self->readToken(); # Copy next token to args return unless defined $token; push(@tokens, $token); $n++; if ($$token[1] == CC_BEGIN) { # And if it's a BEGIN, copy till balanced END push(@tokens, $self->readBalanced->unlist, T_END); } } # Notice that IFF the arg looks like {balanced}, the outer braces are stripped # so that delimited arguments behave more similarly to simple, undelimited arguments. if (($n == 1) && ($tokens[0][1] == CC_BEGIN)) { shift(@tokens); pop(@tokens); } return (wantarray ? (Tokens(@tokens), $found) : Tokens(@tokens)); } #********************************************************************** # Higher-level readers: Read various types of things from the input: # tokens, non-expandable tokens, args, Numbers, ... #********************************************************************** sub readArg { my ($self) = @_; my $token = $self->readNonSpace; if (!defined $token) { return; } elsif ($$token[1] == CC_BEGIN) { # Inline ->getCatcode! return $self->readBalanced; } else { return Tokens($token); } } # Note that this returns an empty array if [] is present, # otherwise $default or undef. sub readOptional { my ($self, $default) = @_; my $tok = $self->readNonSpace; if (!defined $tok) { return; } elsif (($tok->equals(T_OTHER('[')))) { return $self->readUntil(T_OTHER(']')); } else { $self->unread($tok); return $default; } } #********************************************************************** # Numbers, Dimensions, Glue # See TeXBook, Ch.24, pp.269-271. #********************************************************************** sub readValue { my ($self, $type) = @_; if ($type eq 'Number') { return $self->readNumber; } elsif ($type eq 'Dimension') { return $self->readDimension; } elsif ($type eq 'Glue') { return $self->readGlue; } elsif ($type eq 'MuGlue') { return $self->readMuGlue; } elsif ($type eq 'Tokens') { return $self->readTokensValue; } elsif ($type eq 'Token') { return $self->readToken; } elsif ($type eq 'any') { return $self->readArg; } else { Error('unexpected', $type, $self, "Gullet->readValue Didn't expect this type: $type"); return; } } sub readRegisterValue { my ($self, $type) = @_; my $token = $self->readXToken(0); return unless defined $token; my $defn = $STATE->lookupDefinition($token); if ((defined $defn) && ($defn->isRegister eq $type)) { return $defn->valueOf($defn->readArguments($self)); } else { $self->unread($token); return; } } # Apparent behaviour of a token value (ie \toks#=) sub readTokensValue { my ($self) = @_; my $token = $self->readNonSpace; if (!defined $token) { return; } elsif ($$token[1] == CC_BEGIN) { # Inline ->getCatcode! return $self->readBalanced; } elsif (my $defn = $STATE->lookupDefinition($token)) { if ($defn->isRegister eq 'Tokens') { return $defn->valueOf($defn->readArguments($self)); } elsif ($defn->isExpandable) { $self->unread($defn->invoke($self)); return $self->readTokensValue; } else { return $token; } } # ? else { return $token; } } #====================================================================== # some helpers... # = | # return +1 or -1 sub readOptionalSigns { my ($self) = @_; my ($sign, $t) = ("+1", ''); while (defined($t = $self->readXToken(0)) && (($t->getString eq '+') || ($t->getString eq '-') || ($t->equals(T_SPACE)))) { $sign = -$sign if ($t->getString eq '-'); } $self->unread($t) if $t; return $sign; } sub readDigits { my ($self, $range, $skip) = @_; my $string = ''; my ($token, $digit); while (($token = $self->readXToken(0)) && (($digit = $token->getString) =~ /^[$range]$/)) { $string .= $digit; } $self->unread($token) if $token && !($skip && $$token[1] == CC_SPACE); # Inline ->getCatcode! return $string; } # = | # = . | , | | # Return a number (perl number) sub readFactor { my ($self) = @_; my $string = $self->readDigits('0-9'); my $token = $self->readXToken(0); if ($token && $token->getString =~ /^[\.\,]$/) { $string .= '.' . $self->readDigits('0-9'); $token = $self->readXToken(0); } if (length($string) > 0) { $self->unread($token) if $token && $$token[1] != CC_SPACE; # Inline ->getCatcode! return $string; } else { $self->unread($token); my $n = $self->readNormalInteger; return (defined $n ? $n->valueOf : undef); } } #====================================================================== # Integer, Number #====================================================================== # = # = | # = | sub readNumber { my ($self) = @_; my $s = $self->readOptionalSigns; if (defined(my $n = $self->readNormalInteger)) { return ($s < 0 ? $n->negate : $n); } elsif (defined($n = $self->readInternalDimension)) { return Number($s * $n->valueOf); } elsif (defined($n = $self->readInternalGlue)) { return Number($s * $n->valueOf); } else { my $next = $self->readToken(); $self->unread($next); Warn('expected', '', $self, "Missing number, treated as zero", "while processing " . ToString($LaTeXML::CURRENT_TOKEN), "next token is " . ToString($next)); return Number(0); } } # = | # | ' | " # | ` # Return a Number or undef sub readNormalInteger { my ($self) = @_; my $token = $self->readXToken(0); if (!defined $token) { return; } elsif (($$token[1] == CC_OTHER) && ($token->getString =~ /^[0-9]$/)) { # Read decimal literal return Number(int($token->getString . $self->readDigits('0-9', 1))); } elsif ($token->equals(T_OTHER("\'"))) { # Read Octal literal return Number(oct($self->readDigits('0-7', 1))); } elsif ($token->equals(T_OTHER("\""))) { # Read Hex literal return Number(hex($self->readDigits('0-9A-F', 1))); } elsif ($token->equals(T_OTHER("\`"))) { # Read Charcode my $s = $self->readToken->getString; $s =~ s/^\\//; return Number(ord($s)); } # Only a character token!!! NOT expanded!!!! else { $self->unread($token); return $self->readInternalInteger; } } sub readInternalInteger { my ($self) = @_; return $self->readRegisterValue('Number'); } #====================================================================== # Float, a floating point number. # Similar to factor, but does NOT accept comma! # This is NOT part of TeX, but is convenient. sub readFloat { my ($self) = @_; my $s = $self->readOptionalSigns; my $string = $self->readDigits('0-9'); my $token = $self->readXToken(0); if ($token && $token->getString =~ /^[\.]$/) { $string .= '.' . $self->readDigits('0-9'); $token = $self->readXToken(0); } my $n; if (length($string) > 0) { $self->unread($token) if $token && $$token[1] != CC_SPACE; # Inline ->getCatcode! $n = $string; } else { $self->unread($token) if $token; $n = $self->readNormalInteger; $n = $n->valueOf if defined $n; } return (defined $n ? Float($s * $n) : undef); } #====================================================================== # Dimensions #====================================================================== # = # = | # = sub readDimension { my ($self) = @_; my $s = $self->readOptionalSigns; if (defined(my $d = $self->readInternalDimension)) { return ($s < 0 ? $d->negate : $d); } elsif (defined($d = $self->readInternalGlue)) { return Dimension($s * $d->valueOf); } elsif (defined($d = $self->readFactor)) { my $unit = $self->readUnit; if (!defined $unit) { Warn('expected', '', $self, "Illegal unit of measure (pt inserted)."); $unit = 65536; } return Dimension($s * $d * $unit); } else { Warn('expected', '', $self, "Missing number, treated as zero.", "while processing " . ToString($LaTeXML::CURRENT_TOKEN)); return Dimension(0); } } # = # | # = em | ex # | | | # = pt | pc | in | bp | cm | mm | dd | cc | sp # Read a unit, returning the equivalent number of scaled points, sub readUnit { my ($self) = @_; if (defined(my $u = $self->readKeyword('ex', 'em'))) { $self->skip1Space; return $STATE->convertUnit($u); } elsif (defined($u = $self->readInternalInteger)) { return $u->valueOf; } # These are coerced to number=>sp elsif (defined($u = $self->readInternalDimension)) { return $u->valueOf; } elsif (defined($u = $self->readInternalGlue)) { return $u->valueOf; } else { $self->readKeyword('true'); # But ignore, we're not bothering with mag... $u = $self->readKeyword('pt', 'pc', 'in', 'bp', 'cm', 'mm', 'dd', 'cc', 'sp'); if ($u) { $self->skip1Space; return $STATE->convertUnit($u); } else { return; } } } # Return a dimension value or undef sub readInternalDimension { my ($self) = @_; return $self->readRegisterValue('Dimension'); } #====================================================================== # Mu Dimensions #====================================================================== # = # = | # = # = | mu # = sub readMuDimension { my ($self) = @_; my $s = $self->readOptionalSigns; if (defined(my $m = $self->readFactor)) { my $munit = $self->readMuUnit; if (!defined $munit) { Warn('expected', '', $self, "Illegal unit of measure (mu inserted)."); $munit = $STATE->convertUnit('mu'); } return MuDimension($s * $m * $munit); } elsif (defined($m = $self->readInternalMuGlue)) { return MuDimension($s * $m->valueOf); } else { Warn('expected', '', $self, "Expecting mudimen; assuming 0"); return MuDimension(0); } } sub readMuUnit { my ($self) = @_; if (my $m = $self->readKeyword('mu')) { $self->skip1Space; return $STATE->convertUnit($m); } elsif ($m = $self->readInternalMuGlue) { return $m->valueOf; } else { return; } } #====================================================================== # Glue #====================================================================== # = | # = plus | plus | # = minus | minus | sub readGlue { my ($self) = @_; my $s = $self->readOptionalSigns; my $n; if (defined($n = $self->readInternalGlue)) { return ($s < 0 ? $n->negate : $n); } else { my $d = $self->readDimension; if (!$d) { Warn('expected', '', $self, "Missing number, treated as zero.", "while processing " . ToString($LaTeXML::CURRENT_TOKEN)); return Glue(0); } $d = $d->negate if $s < 0; my ($r1, $f1, $r2, $f2); ($r1, $f1) = $self->readRubber if $self->readKeyword('plus'); ($r2, $f2) = $self->readRubber if $self->readKeyword('minus'); return Glue($d->valueOf, $r1, $f1, $r2, $f2); } } my %FILLS = (fil => 1, fill => 2, filll => 3); # [CONSTANT] sub readRubber { my ($self, $mu) = @_; my $s = $self->readOptionalSigns; my $f = $self->readFactor; if (!defined $f) { $f = ($mu ? $self->readMuDimension : $self->readDimension); return ($f->valueOf * $s, 0); } elsif (defined(my $fil = $self->readKeyword('filll', 'fill', 'fil'))) { return ($s * $f, $FILLS{$fil}); } elsif ($mu) { my $u = $self->readMuUnit; if (!defined $u) { Warn('expected', '', $self, "Illegal unit of measure (mu inserted)."); $u = $STATE->convertUnit('mu'); } return ($s * $f * $u, 0); } else { my $u = $self->readUnit; if (!defined $u) { Warn('expected', '', $self, "Illegal unit of measure (pt inserted)."); $u = 65536; } return ($s * $f * $u, 0); } } # Return a glue value or undef. sub readInternalGlue { my ($self) = @_; return $self->readRegisterValue('Glue'); } #====================================================================== # Mu Glue #====================================================================== # = | # = plus | plus | # = minus | minus | sub readMuGlue { my ($self) = @_; my $s = $self->readOptionalSigns; my $n; if (defined($n = $self->readInternalMuGlue)) { return ($s < 0 ? $n->negate : $n); } else { my $d = $self->readMuDimension; if (!$d) { Warn('expected', '', $self, "Missing number, treated as zero.", "while processing " . ToString($LaTeXML::CURRENT_TOKEN)); return MuGlue(0); } $d = $d->negate if $s < 0; my ($r1, $f1, $r2, $f2); ($r1, $f1) = $self->readRubber(1) if $self->readKeyword('plus'); ($r2, $f2) = $self->readRubber(1) if $self->readKeyword('minus'); return MuGlue($d->valueOf, $r1, $f1, $r2, $f2); } } # Return a muglue value or undef. sub readInternalMuGlue { my ($self) = @_; return $self->readRegisterValue('MuGlue'); } #====================================================================== # See pp 272-275 for lists of the various registers. # These are implemented in Primitive.pm #********************************************************************** 1; __END__ =pod =head1 NAME C - expands expandable tokens and parses common token sequences. =head1 DESCRIPTION A C reads tokens (L) from a L. It is responsible for expanding macros and expandable control sequences, if the current definition associated with the token in the L is an L definition. The C also provides a variety of methods for reading various types of input such as arguments, optional arguments, as well as for parsing L, L, etc, according to TeX's rules. It extends L. =head2 Managing Input =over 4 =item C<< $gullet->openMouth($mouth, $noautoclose); >> Is this public? Prepares to read tokens from C<$mouth>. If $noautoclose is true, the Mouth will not be automatically closed when it is exhausted. =item C<< $gullet->closeMouth; >> Is this public? Finishes reading from the current mouth, and reverts to the one in effect before the last openMouth. =item C<< $gullet->flush; >> Is this public? Clears all inputs. =item C<< $gullet->getLocator; >> Returns a string describing the current location in the input stream. =back =head2 Low-level methods =over 4 =item C<< $tokens = $gullet->expandTokens($tokens); >> Return the L resulting from expanding all the tokens in C<$tokens>. This is actually only used in a few circumstances where the arguments to an expandable need explicit expansion; usually expansion happens at the right time. =item C<< @tokens = $gullet->neutralizeTokens(@tokens); >> Another unusual method: Used for things like \edef and token registers, to inhibit further expansion of control sequences and proper spawning of register tokens. =item C<< $token = $gullet->readToken; >> Return the next token from the input source, or undef if there is no more input. =item C<< $token = $gullet->readXToken($toplevel,$commentsok); >> Return the next unexpandable token from the input source, or undef if there is no more input. If the next token is expandable, it is expanded, and its expansion is reinserted into the input. If C<$commentsok>, a comment read or pending will be returned. =item C<< $gullet->unread(@tokens); >> Push the C<@tokens> back into the input stream to be re-read. =back =head2 Mid-level methods =over 4 =item C<< $token = $gullet->readNonSpace; >> Read and return the next non-space token from the input after discarding any spaces. =item C<< $gullet->skipSpaces; >> Skip the next spaces from the input. =item C<< $gullet->skip1Space; >> Skip the next token from the input if it is a space. =item C<< $tokens = $gullet->readBalanced; >> Read a sequence of tokens from the input until the balancing '}' (assuming the '{' has already been read). Returns a L. =item C<< $boole = $gullet->ifNext($token); >> Returns true if the next token in the input matches C<$token>; the possibly matching token remains in the input. =item C<< $tokens = $gullet->readMatch(@choices); >> Read and return whichever of C<@choices> matches the input, or undef if none do. Each of the choices is an L. =item C<< $keyword = $gullet->readKeyword(@keywords); >> Read and return whichever of C<@keywords> (each a string) matches the input, or undef if none do. This is similar to readMatch, but case and catcodes are ignored. Also, leading spaces are skipped. =item C<< $tokens = $gullet->readUntil(@delims); >> Read and return a (balanced) sequence of L until matching one of the tokens in C<@delims>. In a list context, it also returns which of the delimiters ended the sequence. =back =head2 High-level methods =over 4 =item C<< $tokens = $gullet->readArg; >> Read and return a TeX argument; the next Token or Tokens (if surrounded by braces). =item C<< $tokens = $gullet->readOptional($default); >> Read and return a LaTeX optional argument; returns C<$default> if there is no '[', otherwise the contents of the []. =item C<< $thing = $gullet->readValue($type); >> Reads an argument of a given type: one of 'Number', 'Dimension', 'Glue', 'MuGlue' or 'any'. =item C<< $value = $gullet->readRegisterValue($type); >> Read a control sequence token (and possibly it's arguments) that names a register, and return the value. Returns undef if the next token isn't such a register. =item C<< $number = $gullet->readNumber; >> Read a L according to TeX's rules of the various things that can be used as a numerical value. =item C<< $dimension = $gullet->readDimension; >> Read a L according to TeX's rules of the various things that can be used as a dimension value. =item C<< $mudimension = $gullet->readMuDimension; >> Read a L according to TeX's rules of the various things that can be used as a mudimension value. =item C<< $glue = $gullet->readGlue; >> Read a L according to TeX's rules of the various things that can be used as a glue value. =item C<< $muglue = $gullet->readMuGlue; >> Read a L according to TeX's rules of the various things that can be used as a muglue value. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/KeyVals.pm0000644000175000017500000002351612507513572017674 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::KeyVals | # # | Support for key-value pairs for LaTeXML | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::KeyVals; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Core::Tokens; use base qw(LaTeXML::Common::Object); #====================================================================== # A KeyVal argument MUST be delimited by either braces or brackets (if optional) # This method reads the keyval pairs INCLUDING the delimiters, (rather than parsing # after the fact), since some values may have special catcode needs. ##my $T_EQ = T_OTHER('='); # [CONSTANT] ##my $T_COMMA = T_OTHER(','); # [CONSTANT] sub readKeyVals { my ($gullet, $keyset, $close) = @_; my $startloc = $gullet->getLocator(); my $open = $gullet->readToken; my $assign = T_OTHER('='); my $punct = T_OTHER(','); $keyset = ($keyset ? ToString($keyset) : '_anonymous_'); my @kv = (); while (1) { $gullet->skipSpaces; # Read the keyword. my ($ktoks, $delim) = $gullet->readUntil($assign, $punct, $close); Error('expected', $close, $gullet, "Fell off end expecting " . Stringify($close) . " while reading KeyVal key", "key started at $startloc") unless $delim; my $key = ToString($ktoks); $key =~ s/\s//g; if ($key) { my $keydef = $STATE->lookupValue('KEYVAL@' . $keyset . '@' . $key); my $value; if ($delim->equals($assign)) { # Got =, so read the value # WHOA!!! Secret knowledge!!! my $type = ($keydef && (scalar(@$keydef) == 1) && $$keydef[0]{type}) || 'Plain'; my $typedef = $STATE->lookupMapping('PARAMETER_TYPES', $type); $STATE->beginSemiverbatim() if $typedef && $$typedef{semiverbatim}; ## ($value,$delim)=$gullet->readUntil($punct,$close); # This is the core of $gullet->readUntil, but preserves braces needed by rare key types my ($tok, @toks) = (); while ((!defined($delim = $gullet->readMatch($punct, $close))) && (defined($tok = $gullet->readToken()))) { # Copy next token to args push(@toks, $tok, ($tok->getCatcode == CC_BEGIN ? ($gullet->readBalanced->unlist, T_END) : ())); } $value = Tokens(@toks); if (($type eq 'Plain') || ($typedef && $$typedef{undigested})) { } # Fine as is. elsif ($type eq 'Semiverbatim') { # Needs neutralization $value = $value->neutralize; } else { ($value) = $keydef->reparseArgument($gullet, $value) } $STATE->endSemiverbatim() if $typedef && $$typedef{semiverbatim}; } else { # Else, get default value. $value = $STATE->lookupValue('KEYVAL@' . $keyset . '@' . $key . '@default'); } push(@kv, $key); push(@kv, $value); } Error('expected', $close, $gullet, "Fell off end expecting " . Stringify($close) . " while reading KeyVal value", "key started at $startloc") unless $delim; last if $delim->equals($close); } return LaTeXML::Core::KeyVals->new($keyset, [@kv], open => $open, close => $close, punct => $punct, assign => $assign); } #====================================================================== # The Data object representing the KeyVals #====================================================================== # This defines the KeyVal data object that can appear in the datastream # along with tokens, boxes, etc. # Thus it has to be digestible. # KeyVals: representation of keyval arguments, # Not necessarily a hash, since keys could be repeated and order may # be significant. #********************************************************************** # Where does this really belong? # The values can be Tokens, after parsing, or Boxes, after digestion. # (or Numbers, etc. in either case) # But also, it has a non-generic API used above... # If Box-like, it could have a beAbsorbed method; which would do what? # Should it convert to simple text? Or structure? # If latter, there needs to be a key => tag mapping. # Options can be tokens for open, close, punct (between pairs), assign (typically =) sub new { my ($class, $keyset, $pairs, %options) = @_; $keyset = ($keyset ? ToString($keyset) : '_anonymous_'); my %hash = (); my @pp = @$pairs; while (@pp) { my ($k, $v) = (shift(@pp), shift(@pp)); if (!defined $hash{$k}) { $hash{$k} = $v; } # Hmm, accumulate an ARRAY if multiple values for given key. # This is unlikely to be what the caller expects!! But what? elsif (ref $hash{$k} eq 'ARRAY') { push(@{ $hash{$k} }, $v); } else { $hash{$k} = [$hash{$k}, $v]; } } return bless { keyset => $keyset, keyvals => $pairs, hash => {%hash}, open => $options{open}, close => $options{close}, punct => $options{punct}, assign => $options{assign} }, $class; } sub getValue { my ($self, $key) = @_; return $$self{hash}{$key}; } sub setValue { my ($self, $key, $value) = @_; if (defined $value) { $$self{hash}{$key} = $value; } else { delete $$self{hash}{$key}; } return; } sub getPairs { my ($self) = @_; return @{ $$self{keyvals} }; } sub getKeyVals { my ($self) = @_; return $$self{hash}; } sub getHash { my ($self) = @_; return map { ($_ => ToString($$self{hash}{$_})) } keys %{ $$self{hash} }; } sub hasKey { my ($self, $key) = @_; return exists $$self{hash}{$key}; } sub beDigested { my ($self, $stomach) = @_; my $keyset = $$self{keyset}; my @kv = @{ $$self{keyvals} }; my @dkv = (); while (@kv) { my ($key, $value) = (shift(@kv), shift(@kv)); my $keydef = $STATE->lookupValue('KEYVAL@' . $keyset . '@' . $key); my $dodigest = (ref $value) && (!$keydef || !$$keydef[0]{undigested}); # Yuck my $type = ($keydef && (scalar(@$keydef) == 1) && $$keydef[0]{type}) || 'Plain'; my $typedef = $STATE->lookupMapping('PARAMETER_TYPES', $type); my $semiverb = $dodigest && $typedef && $$typedef{semiverbatim}; $STATE->beginSemiverbatim() if $semiverb; push(@dkv, $key, ($dodigest ? $value->beDigested($stomach) : $value)); $STATE->endSemiverbatim() if $semiverb; } return LaTeXML::Core::KeyVals->new($keyset, [@dkv], open => $$self{open}, close => $$self{close}, punct => $$self{punct}, assign => $$self{assign}); } sub revert { my ($self) = @_; my $keyset = $$self{keyset}; my @tokens = (); my @kv = @{ $$self{keyvals} }; while (@kv) { my ($key, $value) = (shift(@kv), shift(@kv)); my $keydef = $STATE->lookupValue('KEYVAL@' . $keyset . '@' . $key); push(@tokens, $$self{punct}) if $$self{punct} && @tokens; push(@tokens, T_SPACE) if @tokens; push(@tokens, Explode($key)); push(@tokens, ($$self{assign} || T_SPACE)) if $value; push(@tokens, ($keydef ? $keydef->revertArguments($value) : Revert($value))) if $value; } unshift(@tokens, $$self{open}) if $$self{open}; push(@tokens, $$self{close}) if $$self{close}; return @tokens; } sub unlist { my ($self) = @_; return $self; } # ???? sub toString { my ($self) = @_; my $string = ''; my @kv = @{ $$self{keyvals} }; while (@kv) { my ($key, $value) = (shift(@kv), shift(@kv)); $string .= ToString($$self{punct} || '') . ' ' if $string; $string .= $key . ToString($$self{assign} || ' ') . ToString($value); } return $string; } #====================================================================== 1; __END__ =pod =head1 NAME C - support for keyvals =head1 DESCRIPTION Provides a parser and representation of keyval pairs C represents parameters handled by LaTeX's keyval package. It extends L. =head2 Declarations =over 4 =item C<< DefKeyVal($keyset,$key,$type); >> Defines the type of value expected for the key $key when parsed in part of a KeyVal using C<$keyset>. C<$type> would be something like 'any' or 'Number', but I'm still working on this. =back =head2 Accessors =over 4 =item C<< GetKeyVal($arg,$key) >> This is useful within constructors to access the value associated with C<$key> in the argument C<$arg>. =item C<< GetKeyVals($arg) >> This is useful within constructors to extract all keyvalue pairs to assign all attributes. =back =head2 KeyVal Methods =over 4 =item C<< $value = $keyval->getValue($key); >> Return the value associated with C<$key> in the C<$keyval>. =item C<< @keyvals = $keyval->getKeyVals; >> Return the hash reference containing the keys and values bound in the C<$keyval>. Note that will only contain the last value for a given key, if they were repeated. =item C<< @keyvals = $keyval->getPairs; >> Return the alternating keys and values bound in the C<$keyval>. Note that this may contain multiple entries for a given key, if they were repeated. =item C<< $keyval->digestValues; >> Return a new C object with all values digested as appropriate. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/List.pm0000644000175000017500000001045212507513572017224 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::List | # # | Digested objects produced in the Stomach | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::List; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Common::Dimension; use List::Util qw(min max); use base qw(Exporter LaTeXML::Core::Box); our @EXPORT = (qw(&List)); # Tricky; don't really want a separate constructor for a Math List, # but you really have to specify it in the arguments # BUT you can't infer the mode from the current state ($STATE may have already switched # back to text) or from the modes of the boxes (may be mixed) # So, it has to be specified along with the boxes; # Here we simply allow # List($box,.... mode=>'math') # Also, if there's only 1 box, we just return it! sub List { my (@boxes) = @_; my $mode = 'text'; # Hacky special case!!! if ((scalar(@boxes) >= 2) && ($boxes[-2] eq 'mode') && (($boxes[-1] eq 'math') || ($boxes[-1] eq 'text'))) { $mode = pop(@boxes); pop(@boxes); } @boxes = grep { defined $_ } @boxes; # strip out undefs if (scalar(@boxes) == 1) { return $boxes[0]; } # Simplify! else { my $list = LaTeXML::Core::List->new(@boxes); $list->setProperty(mode => $mode) if $mode eq 'math'; return $list; } } sub new { my ($class, @boxes) = @_; my ($bx, $font, $locator); my @bxs = @boxes; while (defined($bx = shift(@bxs)) && (!defined $locator)) { $locator = $bx->getLocator unless defined $locator; } @bxs = @boxes; # Maybe the most representative font for a List is the font of the LAST box (that _has_ a font!) ??? while (defined($bx = pop(@bxs)) && (!defined $font)) { $font = $bx->getFont unless defined $font; } return bless [[@boxes], $font, $locator || '', undef, {}], $class; } sub isMath { my ($self) = @_; return ($$self[4]{mode} || 'text') eq 'math'; } sub unlist { my ($self) = @_; return @{ $$self[0] }; } sub revert { my ($self) = @_; return map { Revert($_) } $self->unlist; } sub toString { my ($self) = @_; return join('', grep { defined $_ } map { $_->toString } $self->unlist); } # Methods for overloaded operators sub stringify { my ($self) = @_; my $type = ref $self; $type =~ s/^LaTeXML:://; return $type . '[' . join(',', map { Stringify($_) } $self->unlist) . ']'; } # Not ideal, but.... sub equals { my ($a, $b) = @_; return 0 unless (defined $b) && ((ref $a) eq (ref $b)); my @a = $a->unlist; my @b = $b->unlist; while (@a && @b && ($a[0]->equals($b[0]))) { shift(@a); shift(@b); } return !(@a || @b); } sub beAbsorbed { my ($self, $document) = @_; return map { $document->absorb($_) } $self->unlist; } sub computeSize { my ($self, %options) = @_; my $props = $self->getPropertiesRef; $options{width} = $$props{width} if $$props{width}; $options{height} = $$props{height} if $$props{height}; $options{depth} = $$props{depth} if $$props{depth}; my ($w, $h, $d) = ($$self[1] || LaTeXML::Common::Font->textDefault) ->computeBoxesSize($$self[0], %options); $$props{width} = $w unless defined $$props{width}; $$props{height} = $h unless defined $$props{height}; $$props{depth} = $d unless defined $$props{depth}; return; } #====================================================================== 1; __END__ =pod =head1 NAME C - represents lists of digested objects; extends L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Mouth.pm0000644000175000017500000003665212507513572017417 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Mouth | # # | Analog of TeX's Mouth: Tokenizes strings & files | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Mouth; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Core::Tokens; use LaTeXML::Util::Pathname; use base qw(LaTeXML::Common::Object); # Factory method; # Create an appropriate Mouth # options are # quiet, # atletter, # content sub create { my ($class, $source, %options) = @_; if ($options{content}) { # we've cached the content of this source my ($dir, $name, $ext) = pathname_split($source); $options{source} = $source; $options{shortsource} = "$name.$ext"; return $class->new($options{content}, %options); } elsif ($source =~ s/^literal://) { # we've supplied literal data return $class->new($source, %options); } elsif (!defined $source) { return $class->new('', %options); } else { my $type = pathname_protocol($source); my $newclass = "LaTeXML::Core::Mouth::$type"; if (!$newclass->can('new')) { # not already defined somewhere? require "LaTeXML/Core/Mouth/$type.pm"; } # Load it! return $newclass->new($source, %options); } } sub new { my ($class, $string, %options) = @_; $string = q{} unless defined $string; $options{source} = "Anonymous String" unless defined $options{source}; $options{shortsource} = "String" unless defined $options{shortsource}; my $self = bless { source => $options{source}, shortsource => $options{shortsource}, fordefinitions => ($options{fordefinitions} ? 1 : 0), notes => ($options{notes} ? 1 : 0), }, $class; $self->openString($string); $self->initialize; return $self; } sub openString { my ($self, $string) = @_; $$self{string} = $string; $$self{buffer} = [(defined $string ? splitLines($string) : ())]; return; } sub initialize { my ($self) = @_; $$self{lineno} = 0; $$self{colno} = 0; $$self{chars} = []; $$self{nchars} = 0; if ($$self{notes}) { $$self{note_message} = "Processing " . ($$self{fordefinitions} ? "definitions" : "content") . " " . $$self{source}; NoteBegin($$self{note_message}); } if ($$self{fordefinitions}) { $$self{saved_at_cc} = $STATE->lookupCatcode('@'); $$self{SAVED_INCLUDE_COMMENTS} = $STATE->lookupValue('INCLUDE_COMMENTS'); $STATE->assignCatcode('@' => CC_LETTER); $STATE->assignValue(INCLUDE_COMMENTS => 0); } return; } sub finish { my ($self) = @_; $$self{buffer} = []; $$self{lineno} = 0; $$self{colno} = 0; $$self{chars} = []; $$self{nchars} = 0; if ($$self{fordefinitions}) { $STATE->assignCatcode('@' => $$self{saved_at_cc}); $STATE->assignValue(INCLUDE_COMMENTS => $$self{SAVED_INCLUDE_COMMENTS}); } if ($$self{notes}) { NoteEnd($$self{note_message}); } return; } # This is (hopefully) a platform independent way of splitting a string # into "lines" ending with CRLF, CR or LF (DOS, Mac or Unix). # Note that TeX considers newlines to be \r, ie CR, ie ^^M sub splitLines { my ($string) = @_; $string =~ s/(?:\015\012|\015|\012)/\r/sg; # Normalize remaining return split("\r", $string); } # And split. # This is (hopefully) a correct way to split a line into "chars", # or what is probably more desired is "Grapheme clusters" (even "extended") # These are unicode characters that include any following combining chars, accents & such. # I am thinking that when we deal with unicode this may be the most correct way? # If it's not the way XeTeX does it, perhaps, it must be that ALL combining chars # have to be converted to the proper accent control sequences! sub splitChars { my ($line) = @_; return $line =~ m/\X/g; } sub getNextLine { my ($self) = @_; return unless scalar(@{ $$self{buffer} }); my $line = shift(@{ $$self{buffer} }); return (scalar(@{ $$self{buffer} }) ? $line . "\r" : $line); } # No CR on last line! sub hasMoreInput { my ($self) = @_; return ($$self{colno} < $$self{nchars}) || scalar(@{ $$self{buffer} }); } # Get the next character & it's catcode from the input, # handling TeX's "^^" encoding. # Note that this is the only place where catcode lookup is done, # and that it is somewhat `inlined'. sub getNextChar { my ($self) = @_; if ($$self{colno} < $$self{nchars}) { my $ch = $$self{chars}[$$self{colno}++]; my $cc = $$STATE{catcode}{$ch}[0]; # $STATE->lookupCatcode($ch); OPEN CODED! if ((defined $cc) && ($cc == CC_SUPER) # Possible convert ^^x && ($$self{colno} + 1 < $$self{nchars}) && ($ch eq $$self{chars}[$$self{colno}])) { my ($c1, $c2); if (($$self{colno} + 2 < $$self{nchars}) # ^^ followed by TWO LOWERCASE Hex digits??? && (($c1 = $$self{chars}[$$self{colno} + 1]) =~ /^[0-9a-f]$/) && (($c2 = $$self{chars}[$$self{colno} + 2]) =~ /^[0-9a-f]$/)) { $ch = chr(hex($c1 . $c2)); splice(@{ $$self{chars} }, $$self{colno} - 1, 4, $ch); $$self{nchars} -= 3; } else { # OR ^^ followed by a SINGLE Control char type code??? my $c = $$self{chars}[$$self{colno} + 1]; my $cn = ord($c); $ch = chr($cn + ($cn > 64 ? -64 : 64)); splice(@{ $$self{chars} }, $$self{colno} - 1, 3, $ch); $$self{nchars} -= 2; } $cc = $STATE->lookupCatcode($ch); } $cc = CC_OTHER unless defined $cc; return ($ch, $cc); } else { return (undef, undef); } } sub stringify { my ($self) = @_; return "Mouth[\@$$self{lineno}x$$self{colno}]"; } #********************************************************************** sub getLocator { my ($self, $length) = @_; my ($l, $c) = ($$self{lineno}, $$self{colno}); if ($length && ($length < 0)) { return "at $$self{shortsource}; line $l col $c"; } elsif ($length && (defined $l || defined $c)) { my $msg = "at $$self{source}; line $l col $c"; my $chars = $$self{chars}; if (my $n = $$self{nchars}) { $c = $n - 1 if $c >= $n; my $c0 = ($c > 50 ? $c - 40 : 0); my $cm = ($c < 1 ? 0 : $c - 1); my $cn = ($n - $c > 50 ? $c + 40 : $n - 1); my $p1 = ($c0 <= $cm ? join('', @$chars[$c0 .. $cm]) : ''); chomp($p1); my $p2 = ($c <= $cn ? join('', @$chars[$c .. $cn]) : ''); chomp($p2); $msg .= "\n " . $p1 . "\n " . (' ' x ($c - $c0)) . '^' . ' ' . $p2; } return $msg; } else { return "at $$self{source}; line $l col $c"; } } sub getSource { my ($self) = @_; return $$self{source}; } #********************************************************************** # See The TeXBook, Chapter 8, The Characters You Type, pp.46--47. #********************************************************************** sub handle_escape { # Read control sequence my ($self) = @_; # NOTE: We're using control sequences WITH the \ prepended!!! my $cs = "\\"; # I need this standardized to be able to lookup tokens (A better way???) my ($ch, $cc) = $self->getNextChar; # Knuth, p.46 says that Newlines are converted to spaces, # Bit I believe that he does NOT mean within control sequences $cs .= $ch; if ($cc == CC_LETTER) { # For letter, read more letters for csname. while ((($ch, $cc) = $self->getNextChar) && $ch && ($cc == CC_LETTER)) { $cs .= $ch; } $$self{colno}--; } if ($cc == CC_SPACE) { # We'll skip whitespace here. while ((($ch, $cc) = $self->getNextChar) && $ch && ($cc == CC_SPACE)) { } $$self{colno}-- if ($$self{colno} < $$self{nchars}); } if ($cc == CC_EOL) { # If we've got an EOL # if in \read mode, leave the EOL to be turned into a T_SPACE if (($STATE->lookupValue('PRESERVE_NEWLINES') || 0) > 1) { } else { # else skip it. $self->getNextChar; $$self{colno}-- if ($$self{colno} < $$self{nchars}); } } return T_CS($cs); } sub handle_EOL { my ($self) = @_; # Note that newines should be converted to space (with " " for content) # but it makes nicer XML with occasional \n. Hopefully, this is harmless? my $token = ($$self{colno} == 1 ? T_CS('\par') : ($STATE->lookupValue('PRESERVE_NEWLINES') ? Token("\n", CC_SPACE) : T_SPACE)); $$self{colno} = $$self{nchars}; # Ignore any remaining characters after EOL return $token; } sub handle_space { my ($self) = @_; my ($ch, $cc); # Skip any following spaces! while ((($ch, $cc) = $self->getNextChar) && $ch && (($cc == CC_SPACE) || ($cc == CC_EOL))) { } $$self{colno}-- if ($$self{colno} < $$self{nchars}); return T_SPACE; } sub handle_comment { my ($self) = @_; my $n = $$self{colno}; $$self{colno} = $$self{nchars}; my $comment = join('', @{ $$self{chars} }[$n .. $$self{nchars} - 1]); $comment =~ s/^\s+//; $comment =~ s/\s+$//; return ($comment && $STATE->lookupValue('INCLUDE_COMMENTS') ? T_COMMENT($comment) : undef); } # These cache the (presumably small) set of distinct letters, etc # converted to Tokens. # Note that this gets filled during runtime and carries over to through Daemon frames. # However, since the values don't depend on any particular document, bindings, etc, # they should be safe. my %LETTER = (); my %OTHER = (); my %ACTIVE = (); # Dispatch table for catcodes. my @DISPATCH = ( # [CONSTANT] \&handle_escape, # T_ESCAPE T_BEGIN, # T_BEGIN T_END, # T_END T_MATH, # T_MATH T_ALIGN, # T_ALIGN \&handle_EOL, # T_EOL T_PARAM, # T_PARAM T_SUPER, # T_SUPER T_SUB, # T_SUB sub { undef; }, # T_IGNORE (we'll read next token) \&handle_space, # T_SPACE sub { $LETTER{ $_[1] } || ($LETTER{ $_[1] } = T_LETTER($_[1])); }, # T_LETTER sub { $OTHER{ $_[1] } || ($OTHER{ $_[1] } = T_OTHER($_[1])); }, # T_OTHER sub { $ACTIVE{ $_[1] } || ($ACTIVE{ $_[1] } = T_ACTIVE($_[1])); }, # T_ACTIVE \&handle_comment, # T_COMMENT sub { T_OTHER($_[1]); } # T_INVALID (we could get unicode!) ); # Read the next token, or undef if exhausted. # Note that this also returns COMMENT tokens containing source comments, # and also locator comments (file, line# info). # LaTeXML::Core::Gullet intercepts them and passes them on at appropriate times. sub readToken { my ($self) = @_; while (1) { # Iterate till we find a token, or run out. (use return) # ===== Get next line, if we need to. if ($$self{colno} >= $$self{nchars}) { $$self{lineno}++; $$self{colno} = 0; my $line = $self->getNextLine; if (!defined $line) { # Exhausted the input. $$self{chars} = []; $$self{nchars} = 0; return; } # Remove trailing space, but NOT a control space! End with CR (not \n) since this gets tokenized! $line =~ s/((\\ )*)\s*$/$1\r/s; $$self{chars} = [splitChars($line)]; $$self{nchars} = scalar(@{ $$self{chars} }); while (($$self{colno} < $$self{nchars}) # DIRECT ACCESS to $STATE's catcode table!!! && (($$STATE{catcode}{ $$self{chars}[$$self{colno}] }[0] || CC_OTHER) == CC_SPACE)) { $$self{colno}++; } # Sneak a comment out, every so often. if ((($$self{lineno} % 25) == 0) && $STATE->lookupValue('INCLUDE_COMMENTS')) { return T_COMMENT("**** $$self{shortsource} Line $$self{lineno} ****"); } } # ==== Extract next token from line. my ($ch, $cc) = $self->getNextChar; my $token = $DISPATCH[$cc]; $token = &$token($self, $ch) if ref $token eq 'CODE'; return $token if defined $token; # Else, repeat till we get something or run out. } return; } #********************************************************************** # Read all tokens until a token equal to $until (if given), or until exhausted. # Returns an empty Tokens list, if there is no input sub readTokens { my ($self, $until) = @_; my @tokens = (); while (defined(my $token = $self->readToken())) { last if $until and $token->getString eq $until->getString; push(@tokens, $token); } while (@tokens && $tokens[-1]->getCatcode == CC_SPACE) { # Remove trailing space pop(@tokens); } return Tokens(@tokens); } #********************************************************************** # Read a raw lines; there are so many variants of how it should end, # that the Mouth API is left as simple as possible. # Alas: $noread true means NOT to read a new line, but only return # the remainder of the current line, if any. This is useful when combining # with previously peeked tokens from the Gullet. sub readRawLine { my ($self, $noread) = @_; my $line; if ($$self{colno} < $$self{nchars}) { $line = join('', @{ $$self{chars} }[$$self{colno} .. $$self{nchars} - 1]); # End lines with \n, not CR, since the result will be treated as strings $$self{colno} = $$self{nchars}; } elsif ($noread) { $line = ''; } else { $line = $self->getNextLine; if (!defined $line) { $$self{chars} = []; $$self{nchars} = 0; $$self{colno} = 0; } else { $$self{lineno}++; $$self{chars} = [splitChars($line)]; $$self{nchars} = scalar(@{ $$self{chars} }); $$self{colno} = $$self{nchars}; } } $line =~ s/\s*$//s if defined $line; # Is this right? return $line; } #====================================================================== 1; __END__ =pod =head1 NAME C - tokenize the input. =head1 DESCRIPTION A C (and subclasses) is responsible for I, ie. converting plain text and strings into Ls according to the current category codes (catcodes) stored in the C. It extends L. =head2 Creating Mouths =over 4 =item C<< $mouth = LaTeXML::Core::Mouth->create($source, %options); >> Creates a new Mouth of the appropriate class for reading from C<$source>. =item C<< $mouth = LaTeXML::Core::Mouth->new($string, %options); >> Creates a new Mouth reading from C<$string>. =back =head2 Methods =over 4 =item C<< $token = $mouth->readToken; >> Returns the next L from the source. =item C<< $boole = $mouth->hasMoreInput; >> Returns whether there is more data to read. =item C<< $string = $mouth->getLocator($long); >> Return a description of current position in the source, for reporting errors. =item C<< $tokens = $mouth->readTokens($until); >> Reads tokens until one matches C<$until> (comparing the character, but not catcode). This is useful for the C<\verb> command. =item C<< $lines = $mouth->readRawLine; >> Reads a raw (untokenized) line from C<$mouth>, or undef if none is found. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Mouth/0000755000175000017500000000000012507513572017045 5ustar norbertnorbertlatexml-0.8.1/lib/LaTeXML/Core/Mouth/Binding.pm0000644000175000017500000000547312507513572020766 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Mouth::Binding | # # | Analog of TeX's Mouth: Tokenizes strings & files | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Mouth::Binding; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Error; use LaTeXML::Util::Pathname; # This is a fake mouth, used for processing *.ltxml, *.latexml files # It exists primarily for the purposes of # * getting a locator on anything defined in the file # * serving as a placekeeper in the chain of Mouth's in a Gullet, # when the binding reads in a proper TeX file. sub new { my ($class, $pathname) = @_; my ($dir, $name, $ext) = pathname_split($pathname); my $self = bless { source => $pathname, shortsource => "$name.$ext" }, $class; NoteBegin("Loading $$self{source}"); return $self; } sub finish { my ($self) = @_; NoteEnd("Loading $$self{source}"); return; } # Evolve to figure out if this gets dynamic location! sub getLocator { my ($self, $length) = @_; my $path = $$self{source}; my $loc = ($length && $length < 0 ? $$self{shortsource} : $$self{source}); my $frame = 2; my ($pkg, $file, $line); while (($pkg, $file, $line) = caller($frame++)) { last if $file eq $path; } return $loc . ($line ? " line $line" : ''); } sub getSource { my ($self) = @_; return $$self{source}; } sub hasMoreInput { return 0; } sub readToken { return; } sub stringify { my ($self) = @_; return "Mouth::Binding[$$self{source}]"; } #====================================================================== 1; __END__ =pod =head1 NAME C - a fake Mouth for processing a Binding file =head1 DESCRIPTION This is a fake mouth, used for processing binding files (ie. C<*.ltxml> and C<*.latexml>). It exists primarily for the purposes of (1) getting a locator on anything defined in the file and (2) serving as a placekeeper in the chain of Mouth's in a Gullet, when the binding reads in a proper TeX file. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Mouth/file.pm0000644000175000017500000000760712507513572020334 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Mouth::file | # # | Analog of TeX's Mouth: for reading from files | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Mouth::file; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Error; use LaTeXML::Util::Pathname; use Encode; use base qw(LaTeXML::Core::Mouth); sub new { my ($class, $pathname, %options) = @_; my ($dir, $name, $ext) = pathname_split($pathname); my $self = bless { source => $pathname, shortsource => "$name.$ext" }, $class; $$self{fordefinitions} = 1 if $options{fordefinitions}; $$self{notes} = 1 if $options{notes}; $self->openFile($pathname); $self->initialize; return $self; } sub openFile { my ($self, $pathname) = @_; my $IN; if (!-r $pathname) { Fatal('I/O', $pathname, $self, "File $pathname is not readable."); } elsif ((!-z $pathname) && (-B $pathname)) { Fatal('I/O', $pathname, $self, "Input file $pathname appears to be binary."); } open($IN, '<', $pathname) || Fatal('I/O', $pathname, $self, "Can't open $pathname for reading", $!); $$self{IN} = $IN; $$self{buffer} = []; return; } sub finish { my ($self) = @_; $self->SUPER::finish; if ($$self{IN}) { close(\*{ $$self{IN} }); $$self{IN} = undef; } return; } sub hasMoreInput { my ($self) = @_; # ($$self{colno} < $$self{nchars}) || $$self{IN}; } return ($$self{colno} < $$self{nchars}) || scalar(@{ $$self{buffer} }) || $$self{IN}; } sub getNextLine { my ($self) = @_; if (!scalar(@{ $$self{buffer} })) { return unless $$self{IN}; my $fh = \*{ $$self{IN} }; my $line = <$fh>; if (!defined $line) { close($fh); $$self{IN} = undef; return; } else { push(@{ $$self{buffer} }, LaTeXML::Core::Mouth::splitLines($line)); } } my $line = (shift(@{ $$self{buffer} }) || ''); if ($line) { if (my $encoding = $STATE->lookupValue('PERL_INPUT_ENCODING')) { # Note that if chars in the input cannot be decoded, they are replaced by \x{FFFD} # I _think_ that for TeX's behaviour we actually should turn such un-decodeable chars in to space(?). $line = decode($encoding, $line, Encode::FB_DEFAULT); if ($line =~ s/\x{FFFD}/ /g) { # Just remove the replacement chars, and warn (or Info?) Info('misdefined', $encoding, $self, "input isn't valid under encoding $encoding"); } } } $line .= "\r"; # put line ending back! if (!($$self{lineno} % 25)) { NoteProgressDetailed("[#$$self{lineno}]"); } return $line; } sub stringify { my ($self) = @_; return "Mouth[$$self{source}\@$$self{lineno}x$$self{colno}]"; } #====================================================================== 1; __END__ =pod =head1 NAME C - tokenize the input from a file =head1 DESCRIPTION A C (and subclasses) is responsible for I, ie. converting plain text and strings into Ls according to the current category codes (catcodes) stored in the C. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Mouth/http.pm0000644000175000017500000000377712507513572020400 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Mouth::http | # # | Analog of TeX's Mouth: for reading from http | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Mouth::http; use strict; use warnings; use base qw(LaTeXML::Core::Mouth); use LaTeXML::Util::WWW; use LaTeXML::Global; sub new { my ($class, $url, %options) = @_; my ($urlbase, $name, $ext) = url_split($url); $STATE->assignValue(URLBASE => $urlbase) if defined $urlbase; my $self = bless { source => $url, shortsource => $name }, $class; $$self{fordefinitions} = 1 if $options{fordefinitions}; $$self{notes} = 1 if $options{notes}; my $content = auth_get($url); $self->openString($content); $self->initialize; return $self; } #====================================================================== 1; __END__ =pod =head1 NAME C - tokenize the input from http =head1 DESCRIPTION A C (and subclasses) is responsible for I, ie. converting plain text and strings into Ls according to the current category codes (catcodes) stored in the C. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Mouth/https.pm0000644000175000017500000000303712507513572020550 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Mouth::https | # # | Analog of TeX's Mouth: for reading from https | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Mouth::https; use strict; use warnings; use base qw(LaTeXML::Core::Mouth::http); #====================================================================== 1; __END__ =pod =head1 NAME C - tokenize the input from https =head1 DESCRIPTION A C (and subclasses) is responsible for I, ie. converting plain text and strings into Ls according to the current category codes (catcodes) stored in the C. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/MuDimension.pm0000644000175000017500000000415312507513572020541 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::MuDimension | # # | Representation of Math Dimensions | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::MuDimension; use LaTeXML::Global; use strict; use warnings; use base qw(LaTeXML::Common::Dimension); use base qw(Exporter); our @EXPORT = (qw(&MuDimension)); #====================================================================== # Exported constructor. sub MuDimension { my ($scaledpoints) = @_; return LaTeXML::Core::MuDimension->new($scaledpoints); } #====================================================================== # A mu is 1/18th of an em in the current math font. # 1 mu = 1em/18 = 10pt/18 = 5/9 pt; 1pt = 9/5mu = 1.8mu sub toString { my ($self) = @_; return LaTeXML::Common::Float::floatformat($$self[0] / 65536 * 1.8) . 'mu'; } sub stringify { my ($self) = @_; return "MuDimension[" . $$self[0] . "]"; } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of math dimensions; extends L. =head2 Exported functions =over 4 =item C<< $mudimension = MuDimension($dim); >> Creates a MuDimension object; similar to Dimension. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/MuGlue.pm0000644000175000017500000000610612507513572017510 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::MuGlue | # # | Representation of Stretchy Math Dimensions | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::MuGlue; use LaTeXML::Global; use strict; use warnings; use LaTeXML::Common::Float; use base qw(LaTeXML::Common::Glue); use base qw(Exporter); our @EXPORT = (qw(&MuGlue)); #====================================================================== # Exported constructor. sub MuGlue { my ($scaledpoints, $plus, $pfill, $minus, $mfill) = @_; return LaTeXML::Core::MuGlue->new($scaledpoints, $plus, $pfill, $minus, $mfill); } #====================================================================== # 1 mu = 1em/18 = 10pt/18 = 5/9 pt; 1pt = 9/5mu = 1.8mu sub toString { my ($self) = @_; my ($sp, $plus, $pfill, $minus, $mfill) = @$self; my $string = LaTeXML::Common::Float::floatformat($sp / 65536 * 1.8) . "mu"; $string .= ' plus ' . ($pfill ? $plus . $LaTeXML::Common::Glue::FILL[$pfill] : LaTeXML::Common::Float::floatformat($plus / 65536 * 1.8) . 'mu') if $plus != 0; $string .= ' minus ' . ($mfill ? $minus . $LaTeXML::Common::Glue::FILL[$mfill] : LaTeXML::Common::Float::floatformat($minus / 65536 * 1.8) . 'mu') if $minus != 0; return $string; } sub toAttribute { my ($self) = @_; my ($sp, $plus, $pfill, $minus, $mfill) = @$self; my $string = LaTeXML::Common::Float::floatformat($sp / 65536 * 1.8) . "mu"; $string .= ' plus ' . ($pfill ? $plus . $LaTeXML::Common::Glue::FILL[$pfill] : LaTeXML::Common::Float::floatformat($plus / 65536 * 1.8) . 'mu') if $plus != 0; $string .= ' minus ' . ($mfill ? $minus . $LaTeXML::Common::Glue::FILL[$mfill] : LaTeXML::Common::Float::floatformat($minus / 65536 * 1.8) . 'mu') if $minus != 0; return $string; } sub stringify { my ($self) = @_; return "MuGlue[" . join(',', @$self) . "]"; } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of math glue; extends L. =head2 Exported functions =over 4 =item C<< $glue = MuGlue($gluespec); >> =item C<< $glue = MuGlue($sp,$plus,$pfill,$minus,$mfill); >> Creates a MuGlue object, similar to Glue. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Pair.pm0000644000175000017500000000710012507513572017200 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Pair | # # | Representation of pairs of numbers or dimensions | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Pair; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Core::Token; use base qw(LaTeXML::Common::Object); use base qw(Exporter); our @EXPORT = (qw(&Pair)); #====================================================================== # Exported constructor. sub Pair { my ($x, $y) = @_; return LaTeXML::Core::Pair->new($x, $y); } #====================================================================== # NOTE: This is candiate to be absorbed into Array (perhaps) sub new { my ($class, $x, $y) = @_; return bless [$x, $y], $class; } sub getX { my ($self) = @_; return $$self[0]; } sub getY { my ($self) = @_; return $$self[1]; } # multiply by anything; this keeps the same type of elements in the pair sub multiplyN { my ($self, $other, $other2) = @_; return (ref $self)->new($$self[0]->multiply($other), $$self[1]->multiply($other2 || $other)); } # multiply by a dimension or such; this upgrades the elements in the pair to # the type used in multiplication sub multiply { my ($self, $other, $other2) = @_; return $self->multiplyN($other, $other2) if !(ref $other) || ($other2 && !ref $other2); return (ref $self)->new($other->multiply($$self[0]), ($other2 || $other)->multiply($$self[1])); } sub swap { my ($self) = @_; return (ref $self)->new($$self[1], $$self[0]); } sub ptValue { my ($self, $prec) = @_; return $$self[0]->ptValue($prec) . ',' . $$self[1]->ptValue($prec); } sub pxValue { my ($self, $prec) = @_; return $$self[0]->pxValue($prec) . ',' . $$self[1]->pxValue($prec); } sub toString { my ($self) = @_; return $$self[0]->toString() . ',' . $$self[1]->toString(); } sub toAttribute { my ($self) = @_; return $$self[0]->toAttribute() . ',' . $$self[1]->toAttribute(); } sub stringify { my ($self) = @_; return "Pair[" . join(',', map { $_->stringify } @$self) . "]"; } sub revert { my ($self) = @_; return (T_OTHER('('), Revert($$self[0]), T_OTHER(','), Revert($$self[1]), T_OTHER(')')); } sub negate { my ($self) = @_; return $self->multiply(-1); } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of pairs of numerical things =head1 DESCRIPTION represents pairs of numerical things, coordinates or such. Candidate for removal! =head2 Exported functions =over 4 =item C<< $pair = Pair($num1,$num2); >> Creates an object representing a pair of numbers; Not a part of TeX, but useful for graphical objects. The two components can be any numerical object. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/PairList.pm0000644000175000017500000000546112507513572020044 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::PairList | # # | Representation of lists of pairs of numbers or dimensions | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::PairList; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use base qw(LaTeXML::Common::Object); use base qw(Exporter); our @EXPORT = (qw(&PairList)); #====================================================================== # Exported constructor. sub PairList { my (@pairs) = @_; return LaTeXML::Core::PairList->new(@pairs); } #====================================================================== # Note: This is candiate to be absorbed into Array perhaps... sub new { my ($class, @pairs) = @_; return bless [@pairs], $class; } sub getCount { my ($self) = @_; return $#{$self} + 1; } sub getPair { my ($self, $n) = @_; return $$self[$n]; } sub getPairs { my ($self) = @_; return @$self; } sub ptValue { my ($self) = @_; return join(' ', map { $_->ptValue } @$self); } sub pxValue { my ($self) = @_; return join(' ', map { $_->pxValue } @$self); } sub toString { my ($self) = @_; return join(' ', map { $_->toString } @$self); } sub toAttribute { my ($self) = @_; return join(' ', map { $_->toAttribute } @$self); } sub stringify { my ($self) = @_; return "PairList[" . join(',', map { $_->stringify } @$self) . "]"; } sub revert { my ($self) = @_; my @rev = (); map { push(@rev, Revert($_)) } @$self; return @rev; } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of lists of pairs of numerical things =head1 DESCRIPTION represents lists of pairs of numerical things, coordinates or such. Candidate for removal! =head2 Exported functions =over 4 =item C<< $pair = PairList(@pairs); >> Creates an object representing a list of pairs of numbers; Not a part of TeX, but useful for graphical objects. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Parameter.pm0000644000175000017500000001005012507513572020223 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Parameter | # # | Representation of a single Parameter for Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Parameter; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use base qw(LaTeXML::Common::Object); # sub new { # my ($class, $spec, %options) = @_; # return bless { spec => $spec, %options }, $class; } # Create a parameter reading object for a specific type. # If either a declared entry or a function Read accessible from LaTeXML::Package::Pool # is defined. sub new { my ($class, $type, $spec, %options) = @_; my $descriptor = $STATE->lookupMapping('PARAMETER_TYPES', $type); if (!defined $descriptor) { if ($type =~ /^Optional(.+)$/) { my $basetype = $1; if ($descriptor = $STATE->lookupMapping('PARAMETER_TYPES', $basetype)) { } elsif (my $reader = checkReaderFunction("Read$type") || checkReaderFunction("Read$basetype")) { $descriptor = { reader => $reader }; } $descriptor = { %$descriptor, optional => 1 } if $descriptor; } elsif ($type =~ /^Skip(.+)$/) { my $basetype = $1; if ($descriptor = $STATE->lookupMapping('PARAMETER_TYPES', $basetype)) { } elsif (my $reader = checkReaderFunction($type) || checkReaderFunction("Read$basetype")) { $descriptor = { reader => $reader }; } $descriptor = { %$descriptor, novalue => 1, optional => 1 } if $descriptor; } else { my $reader = checkReaderFunction("Read$type"); $descriptor = { reader => $reader } if $reader; } } Fatal('misdefined', $type, undef, "Unrecognized parameter type in \"$spec\"") unless $descriptor; return bless { spec => $spec, type => $type, %{$descriptor}, %options }, $class; } # Check whether a reader function is accessible within LaTeXML::Package::Pool sub checkReaderFunction { my ($function) = @_; if (defined $LaTeXML::Package::Pool::{$function}) { local *reader = $LaTeXML::Package::Pool::{$function}; if (defined &reader) { return \&reader; } } } sub stringify { my ($self) = @_; return $$self{spec}; } sub read { my ($self, $gullet) = @_; # For semiverbatim, I had messed with catcodes, but there are cases # (eg. \caption(...\label{badchars}}) where you really need to # cleanup after the fact! # Hmmm, seem to still need it... if ($$self{semiverbatim}) { # Nasty Hack: If immediately followed by %, should discard the comment # EVEN if semiverbatim makes % into other! if (my $peek = $gullet->readToken) { $gullet->unread($peek); } $STATE->beginSemiverbatim(); } my $value = &{ $$self{reader} }($gullet, @{ $$self{extra} || [] }); $value = $value->neutralize if $$self{semiverbatim} && (ref $value) && $value->can('neutralize'); if ($$self{semiverbatim}) { $STATE->endSemiverbatim(); } return $value; } #====================================================================== 1; __END__ =pod =head1 NAME C - a formal parameter =head1 DESCRIPTION Provides a representation for a single formal parameter of Ls: It extends L. =head1 SEE ALSO L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Parameters.pm0000644000175000017500000001313412507513572020414 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Parameters | # # | Representation of Parameters for Control Sequences | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Parameters; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Parameter; use LaTeXML::Core::Tokens; use base qw(LaTeXML::Common::Object); sub new { my ($class, @paramspecs) = @_; return bless [@paramspecs], $class; } sub getParameters { my ($self) = @_; return @$self; } sub stringify { my ($self) = @_; my $string = ''; foreach my $parameter (@$self) { my $s = $parameter->stringify; $string .= ' ' if ($string =~ /\w$/) && ($s =~ /^\w/); $string .= $s; } return $string; } sub equals { my ($self, $other) = @_; return (defined $other) && ((ref $self) eq (ref $other)) && ($self->stringify eq $other->stringify); } sub getNumArgs { my ($self) = @_; my $n = 0; foreach my $parameter (@$self) { $n++ unless $$parameter{novalue}; } return $n; } sub revertArguments { my ($self, @args) = @_; my @tokens = (); foreach my $parameter (@$self) { next if $$parameter{novalue}; my $arg = shift(@args); if (my $retoker = $$parameter{reversion}) { push(@tokens, &$retoker($arg, @{ $$parameter{extra} || [] })); } else { push(@tokens, Revert($arg)) if ref $arg; } } return @tokens; } sub readArguments { my ($self, $gullet, $fordefn) = @_; my @args = (); foreach my $parameter (@$self) { # my $value = &{$$parameter{reader}}($gullet,@{$$parameter{extra}||[]}); my $value = $parameter->read($gullet); if ((!defined $value) && !$$parameter{optional}) { Error('expected', $parameter, $gullet, "Missing argument " . ToString($parameter) . " for " . ToString($fordefn), $gullet->showUnexpected); } push(@args, $value) unless $$parameter{novalue}; } return @args; } sub readArgumentsAndDigest { my ($self, $stomach, $fordefn) = @_; my @args = (); my $gullet = $stomach->getGullet; foreach my $parameter (@$self) { my $value = $parameter->read($gullet); if ((!defined $value) && !$$parameter{optional}) { Error('expected', $parameter, $stomach, "Missing argument " . Stringify($parameter) . " for " . Stringify($fordefn), $gullet->showUnexpected); } if (!$$parameter{novalue}) { # If semiverbatim, Expand (before digest), so tokens can be neutralized; BLECH!!!! if ($$parameter{semiverbatim}) { $STATE->beginSemiverbatim(); if ((ref $value eq 'LaTeXML::Core::Token') || (ref $value eq 'LaTeXML::Core::Tokens')) { $gullet->readingFromMouth(LaTeXML::Core::Mouth->new(), sub { my ($igullet) = @_; $igullet->unread($value); my @tokens = (); while (defined(my $token = $igullet->readXToken(1, 1))) { push(@tokens, $token); } $value = Tokens(@tokens); $value = $value->neutralize; }); } } $value = $value->beDigested($stomach) if (ref $value) && !$$parameter{undigested}; $STATE->endSemiverbatim() if $$parameter{semiverbatim}; # Corner case? push(@args, $value); } } return @args; } sub reparseArgument { my ($self, $gullet, $tokens) = @_; if (defined $tokens) { return $gullet->readingFromMouth(LaTeXML::Core::Mouth->new(), sub { # start with empty mouth my ($gulletx) = @_; $gulletx->unread($tokens); # but put back tokens to be read my @values = $self->readArguments($gulletx); $gulletx->skipSpaces; return @values; }); } else { return (); } } #====================================================================== 1; __END__ =pod =head1 NAME C - formal parameters. =head1 DESCRIPTION Provides a representation for the formal parameters of Ls: It extends L. =head2 METHODS =over 4 =item C<< @parameters = $parameters->getParameters; >> Return the list of C contained in C<$parameters>. =item C<< @tokens = $parameters->revertArguments(@args); >> Return a list of L that would represent the arguments such that they can be parsed by the Gullet. =item C<< @args = $parameters->readArguments($gullet,$fordefn); >> Read the arguments according to this C<$parameters> from the C<$gullet>. This takes into account any special forms of arguments, such as optional, delimited, etc. =item C<< @args = $parameters->readArgumentsAndDigest($stomach,$fordefn); >> Reads and digests the arguments according to this C<$parameters>, in sequence. this method is used by Constructors. =back =head1 SEE ALSO L. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Rewrite.pm0000644000175000017500000004156712507513572017745 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Rewrite | # # | Rewrite Rules that modify the Constructed Document | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Rewrite; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Common::XML; sub new { my ($class, $mode, @specs) = @_; my @clauses = (); while (@specs) { my ($op, $pattern) = (shift(@specs), shift(@specs)); push(@clauses, ['uncompiled', $op, $pattern]); } return bless { mode => $mode, math => ($mode eq 'math'), clauses => [@clauses], labels => {} }, $class; } sub clauses { my ($self) = @_; return @{ $$self{clauses} }; } sub rewrite { my ($self, $document, $node) = @_; foreach my $node ($document->findnodes('//*[@labels]')) { my $labels = $node->getAttribute('labels'); if (my $id = $node->getAttribute('xml:id')) { foreach my $label (split(/ /, $labels)) { $$self{labels}{$label} = $id; } } else { Error('malformed', 'label', $node, "Node has labels but no xml:id"); } } $self->applyClause($document, $node, 0, $self->clauses); return; } sub getLabelID { my ($self, $label) = @_; if (my $id = $$self{labels}{ LaTeXML::Package::CleanLabel($label) }) { return $id; } else { Error('misdefined', '', undef, "No id for label $label in Rewrite"); return; } } # Rewrite spec as input # scope => $scope : a scope like "section:1.2.3" or "label:eq.one"; translated to xpath # select => $xpath : selects subtrees based on xpath expression. # match => $code : called on $document and current $node: tests current node, returns $nnodes, if match # match => $string : Treats as TeX, converts Box, then DOM tree, to xpath # (The matching top-level nodes will be replaced, if replace is the next op.) # replace=> $code : removes the current $nnodes, calls $code with $document and removed nodes # replace=> $string : removes $nnodes # Treats $string as TeX, converts to Box and inserts to replace # the removed nodes. # attributes=>$hash : adds data from hash as attributes to the current node. # regexp => $string: apply regexp (subst) to all text nodes in/under the current node. # Compiled rewrite spec: # select => $xpath : operate on nodes selected by $xpath. # test => $code : Calls $code on $document and current $node. # Returns number of nodes matched. # replace=> $code : removes the current $nnodes, calls $code on them. # action => $code : invoke $code on current $node, without removing them. # regexp => $string: apply regexp (subst) to all text nodes in/under the current node. sub applyClause { my ($self, $document, $tree, $n_to_replace, $clause, @more_clauses) = @_; if ($$clause[0] eq 'uncompiled') { $self->compileClause($document, $clause); } my ($ignore, $op, $pattern) = @$clause; if ($op eq 'trace') { local $LaTeXML::Core::Rewrite::DEBUG = 1; $self->applyClause($document, $tree, $n_to_replace, @more_clauses); } elsif ($op eq 'ignore') { $self->applyClause($document, $tree, $n_to_replace, @more_clauses); } elsif ($op eq 'select') { my ($xpath, $nnodes) = @$pattern; my @matches = $document->findnodes($xpath, $tree); print STDERR "Rewrite selecting \"$xpath\" => " . scalar(@matches) . " matches\n" if $LaTeXML::Core::Rewrite::DEBUG; foreach my $node (@matches) { next unless $node->ownerDocument->isSameNode($tree->ownerDocument); # If still attached to original document! $self->applyClause($document, $node, $nnodes, @more_clauses); } } elsif ($op eq 'multi_select') { foreach my $subpattern (@$pattern) { my ($xpath, $nnodes) = @$subpattern; my @matches = $document->findnodes($xpath, $tree); print STDERR "Rewrite selecting \"$xpath\" => " . scalar(@matches) . " matches\n" if $LaTeXML::Core::Rewrite::DEBUG; foreach my $node (@matches) { next unless $node->ownerDocument->isSameNode($tree->ownerDocument); # If still attached to original document! $self->applyClause($document, $node, $nnodes, @more_clauses); } } } elsif ($op eq 'test') { my $nnodes = &$pattern($document, $tree); print STDERR "Rewrite test at " . $tree->toString . ": " . ($nnodes ? $nnodes . " to replace" : "failed") . "\n" if $LaTeXML::Core::Rewrite::DEBUG; $self->applyClause($document, $tree, $nnodes, @more_clauses) if $nnodes; } elsif ($op eq 'wrap') { if ($n_to_replace > 1) { my $parent = $tree->parentNode; # Remove & separate nodes to be replaced, and sibling nodes following them. my @following = (); # Collect the matching and following nodes while (my $sib = $parent->lastChild) { $parent->removeChild($sib); unshift(@following, $sib); last if $$sib == $$tree; } my @replaced = map { shift(@following) } 1 .. $n_to_replace; # Remove the nodes to be replaced $document->setNode($parent); $tree = $document->openElement('ltx:XMWrap', font => $document->getNodeFont($parent)); print STDERR "Wrapping " . join(' ', map { Stringify($_) } @replaced) . "\n" if $LaTeXML::Core::Rewrite::DEBUG; map { $tree->appendChild($_) } @replaced; # Add matched nodes to XMWrap map { $parent->appendChild($_) } @following; # Add back the following nodes. } $self->applyClause($document, $tree, 1, @more_clauses); } elsif ($op eq 'replace') { print STDERR "Rewrite replace at " . $tree->toString . " using $pattern\n" if $LaTeXML::Core::Rewrite::DEBUG; my $parent = $tree->parentNode; # Remove & separate nodes to be replaced, and sibling nodes following them. my @following = (); # Collect the matching and following nodes while (my $sib = $parent->lastChild) { $parent->removeChild($sib); unshift(@following, $sib); last if $$sib == $$tree; } my @replaced = map { shift(@following) } 1 .. $n_to_replace; # Remove the nodes to be replaced # Carry out the operation, inserting whatever nodes. $document->setNode($parent); my $point = $parent->lastChild; &$pattern($document, @replaced); # Carry out the insertion. # Now collect the newly inserted nodes and store in a _Capture_ node. my @inserted = (); # Collect the newly added nodes. if ($point) { while (my $sib = $parent->lastChild) { $parent->removeChild($sib); unshift(@inserted, $sib); last if $$sib == $$point; } } else { @inserted = $parent->childNodes; } my $insertion = $document->openElement('_Capture_', font => $document->getNodeFont($parent)); map { $insertion->appendChild($_) } @inserted; # Now remove the insertion and replace with rewritten nodes and replace the following siblings. @inserted = $insertion->childNodes; $parent->removeChild($insertion); map { $parent->appendChild($_) } @inserted, @following; } elsif ($op eq 'action') { print STDERR "Rewrite action at " . $tree->toString . " using $pattern\n" if $LaTeXML::Core::Rewrite::DEBUG; &$pattern($tree); } elsif ($op eq 'attributes') { map { $tree->setAttribute($_, $$pattern{$_}) } keys %$pattern; print STDERR "Rewrite attributes for " . Stringify($tree) . "\n" if $LaTeXML::Core::Rewrite::DEBUG; } elsif ($op eq 'regexp') { my @matches = $document->findnodes('descendant-or-self::text()', $tree); print STDERR "Rewrite regexp => " . scalar(@matches) . " matches\n" if $LaTeXML::Core::Rewrite::DEBUG; foreach my $text (@matches) { my $string = $text->textContent; if (&$pattern($string)) { $text->setData($string); } } } else { Error('misdefined', '', undef, "Unknown directive '$op' in Compiled Rewrite spec"); } return; } #********************************************************************** sub compileClause { my ($self, $document, $clause) = @_; my ($ignore, $op, $pattern) = @$clause; my ($oop, $opattern) = ($op, $pattern); if ($op eq 'label') { if (ref $pattern eq 'ARRAY') { # $op='multi_select'; $pattern = [map(["descendant-or-self::*[\@label='$_']",1], @$pattern)]; } $op = 'multi_select'; $pattern = [map { ["descendant-or-self::*[\@xml:id='$_']", 1] } map { $self->getLabelID($_) } @$pattern]; } else { # $op='select'; $pattern=["descendant-or-self::*[\@label='$pattern']",1]; }} $op = 'select'; $pattern = ["descendant-or-self::*[\@xml:id='" . $self->getLabelID($pattern) . "']", 1]; } } elsif ($op eq 'scope') { $op = 'select'; if ($pattern =~ /^label:(.*)$/) { # $pattern=["descendant-or-self::*[\@label='$1']",1]; } $pattern = ["descendant-or-self::*[\@xml:id='" . $self->getLabelID($1) . "']", 1]; } elsif ($pattern =~ /^id:(.*)$/) { $pattern = ["descendant-or-self::*[\@xml:id='$1']", 1]; } elsif ($pattern =~ /^(.*):(.*)$/) { $pattern = ["descendant-or-self::*[local-name()='$1' and \@refnum='$2']", 1]; } else { Error('misdefined', '', undef, "Unrecognized scope pattern in Rewrite clause: \"$pattern\"; Ignoring it."); $op = 'ignore'; $pattern = []; } } elsif ($op eq 'xpath') { $op = 'select'; $pattern = [$pattern, 1]; } elsif ($op eq 'match') { if (ref $pattern eq 'CODE') { $op = 'test'; } elsif (ref $pattern eq 'ARRAY') { # Multiple patterns! $op = 'multi_select'; $pattern = [map { $self->compile_match($document, $_) } @$pattern]; } else { $op = 'select'; $pattern = $self->compile_match($document, $pattern); } } elsif ($op eq 'replace') { if (ref $pattern eq 'CODE') { } else { $pattern = $self->compile_replacement($document, $pattern); } } elsif ($op eq 'regexp') { $pattern = $self->compile_regexp($pattern); } print STDERR "Compiled clause $oop=>" . ToString($opattern) . " ==> $op=>" . ToString($pattern) . "\n" if $LaTeXML::Core::Rewrite::DEBUG; $$clause[0] = 'compiled'; $$clause[1] = $op; $$clause[2] = $pattern; return; } #********************************************************************** sub compile_match { my ($self, $document, $pattern) = @_; ### if (!ref $pattern) { ### return $self->compile_match1($document, ### digest_rewrite(($$self{math} ? '$' . $pattern . '$' : $pattern))); } ### els if ($pattern->isaBox) { return $self->compile_match1($document, $pattern); } elsif (ref $pattern) { # Is tokens???? return $self->compile_match1($document, digest_rewrite($pattern)); } else { Error('misdefined', '', undef, "Don't know what to do with match=>\"" . Stringify($pattern) . "\""); return; } } sub compile_match1 { my ($self, $document, $patternbox) = @_; # Create a temporary document my $capdocument = LaTeXML::Core::Document->new($document->getModel); my $capture = $capdocument->openElement('_Capture_', font => LaTeXML::Common::Font->new()); $capdocument->absorb($patternbox); my @nodes = ($$self{mode} eq 'math' ? $capdocument->findnodes("//ltx:XMath/*", $capture) : $capture->childNodes); my $frag = $capdocument->getDocument->createDocumentFragment; map { $frag->appendChild($_) } @nodes; # Convert the captured nodes to an XPath that would match them. my $xpath = domToXPath($capdocument, $frag); # The branches of an XMDual can contain "decorations", nodes that are ONLY visible # from either presentation or content, but not both. # [See LaTeXML::Core::Document->markXMNodeVisibility] # These decorations should NOT have rewrite rules applied $xpath .= '[@_pvis and @_cvis]' if $$self{math}; print STDERR "Converting \"" . ToString($patternbox) . "\"\n => xpath= \"$xpath\"\n" if $LaTeXML::Core::Rewrite::DEBUG; return [$xpath, scalar(@nodes)]; } # Reworked to do digestion at replacement time. sub compile_replacement { my ($self, $document, $pattern) = @_; if ((ref $pattern) && $pattern->isaBox) { $pattern = $pattern->getBody if $$self{math}; return sub { $_[0]->absorb($pattern); } } else { ##### $pattern = Tokenize($$self{math} ? '$' . $pattern . '$' : $pattern) unless ref $pattern; return sub { my $stomach = $STATE->getStomach; $stomach->bgroup; $STATE->assignValue(font => LaTeXML::Common::Font->new(), 'local'); $STATE->assignValue(mathfont => LaTeXML::Common::Font->new(), 'local'); my $box = $stomach->digest($pattern, 0); $stomach->egroup; $box = $box->getBody if $$self{math}; $_[0]->absorb($box); } } } sub compile_regexp { my ($self, $pattern) = @_; my $code = "sub { \$_[0] =~ s${pattern}g; }"; my $fcn = eval $code; Error('misdefined', '', undef, "Failed to compile regexp pattern \"$pattern\" into \"$code\": $!") if $@; return $fcn; } #********************************************************************** sub digest_rewrite { my ($string) = @_; my $stomach = $STATE->getStomach; $stomach->bgroup; $STATE->assignValue(font => LaTeXML::Common::Font->new(), 'local'); # Use empty font, so eventual insertion merges. $STATE->assignValue(mathfont => LaTeXML::Common::Font->new(), 'local'); ### my $box = $stomach->digest((ref $string ? $string : Tokenize($string)), 0); my $box = $stomach->digest($string, 0); $stomach->egroup; return $box; } #********************************************************************** sub domToXPath { my ($document, $node) = @_; return "descendant-or-self::" . domToXPath_rec($document, $node); } # May need some work here; my %EXCLUDED_MATCH_ATTRIBUTES = (scriptpos => 1, 'xml:id' => 1); # [CONSTANT] sub domToXPath_rec { my ($document, $node, @extra_predicates) = @_; my $type = $node->nodeType; if ($type == XML_DOCUMENT_FRAG_NODE) { my @nodes = $node->childNodes; return domToXPath_rec($document, shift(@nodes), domToXPath_seq($document, 'following-sibling', @nodes), @extra_predicates); } elsif ($type == XML_ELEMENT_NODE) { my $qname = $document->getNodeQName($node); return '*[true()]' if $qname eq '_WildCard_'; my @predicates = (); # Order the predicates so as to put most quickly restrictive first. if ($node->hasAttributes) { foreach my $attribute (grep { $_->nodeType == XML_ATTRIBUTE_NODE } $node->attributes) { my $key = $attribute->nodeName; next if ($key =~ /^_/) || $EXCLUDED_MATCH_ATTRIBUTES{$key}; push(@predicates, "\@" . $key . "='" . $attribute->getValue . "'"); } } if ($node->hasChildNodes) { my @children = $node->childNodes; if (!grep { $_->nodeType != XML_TEXT_NODE } @children) { # All are text nodes: push(@predicates, "text()='" . $node->textContent . "'"); } elsif (!grep { $_->nodeType != XML_ELEMENT_NODE } @children) { push(@predicates, domToXPath_seq($document, 'child', @children)); } else { Fatal('misdefined', '', $node, "Can't generate XPath for mixed content"); } } if ($document->canHaveAttribute($qname, 'font')) { if (my $font = $node->getAttribute('_font')) { my $pred = LaTeXML::Common::Font::font_match_xpaths($font); push(@predicates, $pred); } } return $qname . "[" . join(' and ', grep { $_ } @predicates, @extra_predicates) . "]"; } elsif ($type == XML_TEXT_NODE) { ### "text()='".$node->textContent."'"; }} return "*[text()='" . $node->textContent . "']"; } } # $axis would be child or following-sibling sub domToXPath_seq { my ($document, $axis, @nodes) = @_; if (@nodes) { return $axis . "::*[position()=1 and self::" . domToXPath_rec($document, shift(@nodes), domToXPath_seq($document, 'following-sibling', @nodes)) . ']'; } else { return (); } } #********************************************************************** 1; __END__ =pod =head1 NAME C - rewrite rules for modifying the XML document. =head1 DESCRIPTION C implements rewrite rules for modifying the XML document. See L for declarations which create the rewrite rules. Further documentation needed. =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/State.pm0000644000175000017500000006772412507513572017407 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::State | # # | Maintains state: bindings, values, grouping | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::State; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Error; use LaTeXML::Core::Token; # To get CatCodes # Naming scheme for keys (such as it is) # binding: : the definition associated with # value: : some data stored under # With TeX Registers/Parameters, the name begins with "\" # internal: : Some internally interesting state. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # The State efficiently maintain the bindings in a TeX-like fashion. # bindings associate data with keys (eg definitions with macro names) # and respect TeX grouping; that is, an assignment is only in effect # until the current group (opened by \bgroup) is closed (by \egroup). #---------------------------------------------------------------------- # The objective is to make the following, most-common, operations FAST: # begin & end a group (ie. push/pop a stack frame) # lookup & assignment of values # With the more obvious approach, a "stack of frames", either lookup would involve # checking a sequence of frames until the current value is found; # or, starting a new frame would involve copying bindings for all values # I never quite studied how Knuth does it; # The following structures allow these to be constant operations (usually), # except for endgroup (which is linear in # of changed values in that frame). # There are 2 main structures used here. # For each of several $table's (being "value", "meaning", "catcode" or other space of names), # each table maintains the bound values, and "undo" defines the stack frames: # $$self{$table}{$key} = [$current_value, $previous_value, ...] # $$self{undo}[$frame]{$table}{$key} = (undef | $n) # such that the "current value" associated with $key is the 0th element of the table array; # the $previous_value's (if any) are values that had been assigned within previous groups. # The undo list indicates how many values have been assigned for $key in # the $frame'th frame (usually 0 is the one of interest). # [Would be simpler to store boolean in undo, but see deactivateScope] # [All keys fo $$self{undo}[$frame} are table names, EXCEPT "_FRAME_LOCK_"!!] # # So, in handwaving form, the algorithms are as follows: # push-frame == bgroup == begingroup: # push an empty hash {} onto the undo stack; # pop-frame == egroup == endgroup: # for the $n associated with every key in the topmost hash in the undo stack # pop $n values from the table # then remove the hash from the undo stack. # Lookup value: # we simply fetch the 0th element from the table # Assign a value: # local scope (the normal way): # we push a new value into the table described above, # and also increment the associated value in the undo stack # global scope: # remove any locally scoped values, and undo entries for the key # then set the 0th (only remaining) value to the given one. # named-scope $scope: # push an entry [$table,$key,$value] globally to the 'stash' table's value. # And assign locally, if the $scope is active (has non-zero value in stash_active table), # # There are tables for # catcode: keys are char; # Also, "math:$char" =1 when $char is active in math. # mathcode, sfcode, lccode, uccode, delcode : are similar to catcode but store # additional kinds codes per char (see TeX) # value: keys are anything (typically a string, though) and value is the value associated with it # meaning: The definition assocated with $key, usually a control-sequence. # stash & stash_active: support named scopes # (see also activateScope & deactivateScope) #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # options: # catcodes => (standard|style|none) # stomach => a Stomach object. # model => a Mod el object. sub new { my ($class, %options) = @_; my $self = bless { # table => {}, value => {}, meaning => {}, stash => {}, stash_active => {}, catcode => {}, mathcode => {}, sfcode => {}, lccode => {}, uccode => {}, delcode => {}, undo => [{ _FRAME_LOCK_ => 1 }], prefixes => {}, status => {}, stomach => $options{stomach}, model => $options{model} }, $class; $$self{value}{VERBOSITY} = [0]; if ($options{catcodes} =~ /^(standard|style)/) { # Setup default catcodes. my %std = ("\\" => CC_ESCAPE, "{" => CC_BEGIN, "}" => CC_END, "\$" => CC_MATH, "\&" => CC_ALIGN, "\r" => CC_EOL, "#" => CC_PARAM, "^" => CC_SUPER, "_" => CC_SUB, " " => CC_SPACE, "\t" => CC_SPACE, "%" => CC_COMMENT, "~" => CC_ACTIVE, chr(0) => CC_IGNORE); map { $$self{catcode}{$_} = [$std{$_}] } keys %std; for (my $c = ord('A') ; $c <= ord('Z') ; $c++) { $$self{catcode}{ chr($c) } = [CC_LETTER]; $$self{catcode}{ chr($c + ord('a') - ord('A')) } = [CC_LETTER]; } } $$self{value}{SPECIALS} = [['^', '_', '@', '~', '&', '$', '#', '%', "'"]]; if ($options{catcodes} eq 'style') { $$self{catcode}{'@'} = [CC_LETTER]; } $$self{mathcode} = {}; $$self{sfcode} = {}; $$self{lccode} = {}; $$self{uccode} = {}; $$self{delcode} = {}; return $self; } sub assign_internal { my ($self, $table, $key, $value, $scope) = @_; $scope = ($$self{prefixes}{global} ? 'global' : 'local') unless defined $scope; if ($scope eq 'global') { # Remove bindings made in all frames down-to & including the next lower locked frame my $frame; my @frames = @{ $$self{undo} }; while (@frames) { $frame = shift(@frames); if (my $n = $$frame{$table}{$key}) { # Undo the bindings, if $key was bound in this frame map { shift(@{ $$self{$table}{$key} }) } 1 .. $n if $n; delete $$frame{$table}{$key}; } last if $$frame{_FRAME_LOCK_}; } # whatever is left -- if anything -- should be bindings below the locked frame. $$frame{$table}{$key} = 1; # Note that there's only one value in the stack, now unshift(@{ $$self{$table}{$key} }, $value); } elsif ($scope eq 'local') { if ($$self{undo}[0]{$table}{$key}) { # If the value was previously assigned in this frame $$self{$table}{$key}[0] = $value; } # Simply replace the value else { # Otherwise, push new value & set 1 to be undone $$self{undo}[0]{$table}{$key} = 1; unshift(@{ $$self{$table}{$key} }, $value); } } # And push new binding. else { # print STDERR "Assigning $key in stash $stash\n"; assign_internal($self, 'stash', $scope, [], 'global') unless $$self{stash}{$scope}[0]; push(@{ $$self{stash}{$scope}[0] }, [$table, $key, $value]); assign_internal($self, $table, $key, $value, 'local') if $$self{stash_active}{$scope}[0]; } return; } #====================================================================== sub getStomach { my ($self) = @_; return $$self{stomach}; } sub getModel { my ($self) = @_; return $$self{model}; } #====================================================================== # Lookup & assign a general Value # [Note that the more direct $$self{value}{$_[1]}[0]; works, but creates entries # this could concievably cause space issues, but timing doesn't show improvements this way] sub lookupValue { my ($self, $key) = @_; my $e = $$self{value}{$key}; return $e && $$e[0]; } sub assignValue { my ($self, $key, $value, $scope) = @_; assign_internal($self, 'value', $key, $value, $scope); return; } # manage a (global) list of values sub pushValue { my ($self, $key, @values) = @_; my $vtable = $$self{value}; assign_internal($self, 'value', $key, [], 'global') unless $$vtable{$key}[0]; push(@{ $$vtable{$key}[0] }, @values); return; } sub popValue { my ($self, $key) = @_; my $vtable = $$self{value}; assign_internal($self, 'value', $key, [], 'global') unless $$vtable{$key}[0]; return pop(@{ $$vtable{$key}[0] }); } sub unshiftValue { my ($self, $key, @values) = @_; my $vtable = $$self{value}; assign_internal($self, 'value', $key, [], 'global') unless $$vtable{$key}[0]; unshift(@{ $$vtable{$key}[0] }, @values); return; } sub shiftValue { my ($self, $key) = @_; my $vtable = $$self{value}; assign_internal($self, 'value', $key, [], 'global') unless $$vtable{$key}[0]; return shift(@{ $$vtable{$key}[0] }); } # manage a (global) hash of values sub lookupMapping { my ($self, $map, $key) = @_; my $vtable = $$self{value}; my $mapping = $$vtable{$map}[0]; return ($mapping ? $$mapping{$key} : undef); } sub assignMapping { my ($self, $map, $key, $value) = @_; my $vtable = $$self{value}; assign_internal($self, 'value', $map, {}, 'global') unless $$vtable{$map}[0]; if (!defined $value) { delete $$vtable{$map}[0]{$key}; } else { $$vtable{$map}[0]{$key} = $value; } return; } sub lookupMappingKeys { my ($self, $map) = @_; my $vtable = $$self{value}; my $mapping = $$vtable{$map}[0]; return ($mapping ? sort keys %$mapping : ()); } sub lookupStackedValues { my ($self, $key) = @_; my $stack = $$self{value}{$key}; return ($stack ? @$stack : ()); } #====================================================================== # Was $name bound? If $frame is given, check only whether it is bound in # that frame (0 is the topmost). sub isValueBound { my ($self, $key, $frame) = @_; return (defined $frame ? $$self{undo}[$frame]{value}{$key} : defined $$self{value}{$key}[0]); } sub valueInFrame { my ($self, $key, $frame) = @_; $frame = 0 unless defined $frame; my $p = 0; for (my $f = 0 ; $f < $frame ; $f++) { $p += $$self{undo}[$f]{value}{$key}; } return $$self{value}{$key}[$p]; } #====================================================================== # Lookup & assign a character's Catcode sub lookupCatcode { my ($self, $key) = @_; my $e = $$self{catcode}{$key}; return $e && $$e[0]; } sub assignCatcode { my ($self, $key, $value, $scope) = @_; assign_internal($self, 'catcode', $key, $value, $scope); return; } # The following rarely used. sub lookupMathcode { my ($self, $key) = @_; my $e = $$self{mathcode}{$key}; return $e && $$e[0]; } sub assignMathcode { my ($self, $key, $value, $scope) = @_; assign_internal($self, 'mathcode', $key, $value, $scope); return; } sub lookupSFcode { my ($self, $key) = @_; my $e = $$self{sfcode}{$key}; return $e && $$e[0]; } sub assignSFcode { my ($self, $key, $value, $scope) = @_; assign_internal($self, 'sfcode', $key, $value, $scope); return; } sub lookupLCcode { my ($self, $key) = @_; my $e = $$self{lccode}{$key}; return $e && $$e[0]; } sub assignLCcode { my ($self, $key, $value, $scope) = @_; assign_internal($self, 'lccode', $key, $value, $scope); return; } sub lookupUCcode { my ($self, $key) = @_; my $e = $$self{uccode}{$key}; return $e && $$e[0]; } sub assignUCcode { my ($self, $key, $value, $scope) = @_; assign_internal($self, 'uccode', $key, $value, $scope); return; } sub lookupDelcode { my ($self, $key) = @_; my $e = $$self{delcode}{$key}; return $e && $$e[0]; } sub assignDelcode { my ($self, $key, $value, $scope) = @_; assign_internal($self, 'delcode', $key, $value, $scope); return; } #====================================================================== # Specialized versions of lookup & assign for dealing with definitions # Get the `Meaning' of a token. For a control sequence or otherwise active token, # this may give the definition object or a regular token (if it was \let), or undef. # Otherwise, the token itself is returned. sub lookupMeaning { my ($self, $token) = @_; if (my $cs = $token && $LaTeXML::Core::Token::executable_catcode[$$token[1]] && ($LaTeXML::Core::Token::PRIMITIVE_NAME[$$token[1]] || $$token[0])) { my $e = $$self{meaning}{$cs}; return $e && $$e[0]; } else { return $token; } } sub lookupMeaning_internal { my ($self, $token) = @_; my $e = $$self{meaning}{ $token->getCSName }; return $e && $$e[0]; } sub assignMeaning { my ($self, $token, $definition, $scope) = @_; assign_internal($self, 'meaning', $token->getCSName => $definition, $scope); return; } sub lookupDefinition { my ($self, $token) = @_; my $x; return ($token && $LaTeXML::Core::Token::executable_catcode[$$token[1]] && ($x = $$self{meaning}{ ($LaTeXML::Core::Token::PRIMITIVE_NAME[$$token[1]] || $$token[0]) }) && ($x = $$x[0]) ### && $x->isaDefinition && $x->isa('LaTeXML::Core::Definition') ? $x : undef); } # And a shorthand for installing definitions sub installDefinition { my ($self, $definition, $scope) = @_; # Locked definitions!!! (or should this test be in assignMeaning?) # Ignore attempts to (re)define $cs from tex sources my $cs = $definition->getCS->getCSName; if ($self->lookupValue("$cs:locked") && !$LaTeXML::Core::State::UNLOCKED) { if (my $s = $self->getStomach->getGullet->getSource) { # report if the redefinition seems to come from document source if ((($s eq "Anonymous String") || ($s =~ /\.(tex|bib)$/)) && ($s !~ /\.code\.tex$/)) { Info('ignore', $cs, $self->getStomach, "Ignoring redefinition of $cs"); } return; } } assign_internal($self, 'meaning', $cs => $definition, $scope); return; } #====================================================================== sub pushFrame { my ($self, $nobox) = @_; # Easy: just push a new undo hash. unshift(@{ $$self{undo} }, {}); return; } sub popFrame { my ($self) = @_; if ($$self{undo}[0]{_FRAME_LOCK_}) { Fatal('unexpected', '', $self->getStomach, "Attempt to pop last locked stack frame"); } else { my $undo = shift(@{ $$self{undo} }); foreach my $table (keys %$undo) { my $undotable = $$undo{$table}; foreach my $name (keys %$undotable) { # Typically only 1 value to shift off the table, unless scopes have been activated. map { shift(@{ $$self{$table}{$name} }) } 1 .. $$undotable{$name}; } } } return; } #====================================================================== # This is primarily about catcodes, but a bit more... sub beginSemiverbatim { my ($self) = @_; # Is this a good/safe enough shorthand, or should we really be doing beginMode? $self->pushFrame; $self->assignValue(MODE => 'text'); $self->assignValue(IN_MATH => 0); map { $self->assignCatcode($_ => CC_OTHER, 'local') } @{ $self->lookupValue('SPECIALS') }; $self->assignMathcode('\'' => 0x8000, 'local'); # try to stay as ASCII as possible $self->assignValue(font => $self->lookupValue('font')->merge(encoding => 'ASCII'), 'local'); return; } sub endSemiverbatim { my ($self) = @_; $self->popFrame; return; } #====================================================================== sub pushDaemonFrame { my ($self) = @_; my $frame = {}; unshift(@{ $$self{undo} }, $frame); # Push copys of data for any data that is mutable; # Only the value & stash tables need to be to be checked. # NOTE ??? No... foreach my $table (qw(value stash)) { if (my $hash = $$self{$table}) { foreach my $key (keys %$hash) { my $value = $$hash{$key}[0]; my $type = ref $value; if (($type eq 'HASH') || ($type eq 'ARRAY')) { # Only concerned with mutable perl data? # Local assignment $$frame{$table}{$key} = 1; # Note new value in this frame. unshift(@{ $$hash{$key} }, daemon_copy($value)); } } } } # And push new binding. # Record the contents of LaTeXML::Package::Pool as preloaded my $pool_preloaded_hash = { map { $_ => 1 } keys %LaTeXML::Package::Pool:: }; $self->assignValue('_PRELOADED_POOL_', $pool_preloaded_hash, 'global'); # Now mark the top frame as LOCKED!!! $$frame{_FRAME_LOCK_} = 1; return; } sub daemon_copy { my ($ob) = @_; if (ref $ob eq 'HASH') { my %hash = map { ($_ => daemon_copy($$ob{$_})) } keys %$ob; return \%hash; } elsif (ref $ob eq 'ARRAY') { return [map { daemon_copy($_) } @$ob]; } else { return $ob; } } sub popDaemonFrame { my ($self) = @_; while (!$$self{undo}[0]{_FRAME_LOCK_}) { $self->popFrame; } if (scalar(@{ $$self{undo} } > 1)) { delete $$self{undo}[0]{_FRAME_LOCK_}; # Any non-preloaded Pool routines should be wiped away, as we # might want to reuse the Pool namespaces for the next run. my $pool_preloaded_hash = $self->lookupValue('_PRELOADED_POOL_'); $self->assignValue('_PRELOADED_POOL_', undef, 'global'); foreach my $subname (keys %LaTeXML::Package::Pool::) { unless (exists $$pool_preloaded_hash{$subname}) { undef $LaTeXML::Package::Pool::{$subname}; delete $LaTeXML::Package::Pool::{$subname}; } } # Finally, pop the frame $self->popFrame; } else { Fatal('unexpected', '', $self->getStomach, "Daemon Attempt to pop last stack frame"); } return; } #====================================================================== # Set one of the definition prefixes global, etc (only global matters!) sub setPrefix { my ($self, $prefix) = @_; $$self{prefixes}{$prefix} = 1; return; } sub getPrefix { my ($self, $prefix) = @_; return $$self{prefixes}{$prefix}; } sub clearPrefixes { my ($self) = @_; $$self{prefixes} = {}; return; } #====================================================================== sub activateScope { my ($self, $scope) = @_; if (!$$self{stash_active}{$scope}[0]) { assign_internal($self, 'stash_active', $scope, 1, 'local'); if (defined(my $defns = $$self{stash}{$scope}[0])) { # Now make local assignments for all those in the stash. my $frame = $$self{undo}[0]; foreach my $entry (@$defns) { # Here we ALWAYS push the stashed values into the table # since they may be popped off by deactivateScope my ($table, $key, $value) = @$entry; $$frame{$table}{$key}++; # Note that this many values must be undone unshift(@{ $$self{$table}{$key} }, $value); } } } # And push new binding. return; } # Probably, in most cases, the assignments made by activateScope # will be undone by egroup or popping frames. # But they can also be undone explicitly sub deactivateScope { my ($self, $scope) = @_; if ($$self{stash_active}{$scope}[0]) { assign_internal($self, 'stash_active', $scope, 0, 'global'); if (defined(my $defns = $$self{stash}{$scope}[0])) { my $frame = $$self{undo}[0]; foreach my $entry (@$defns) { my ($table, $key, $value) = @$entry; if ($$self{$table}{$key}[0] eq $value) { # Here we're popping off the values pushed by activateScope # to (possibly) reveal a local assignment in the same frame, preceding activateScope. shift(@{ $$self{$table}{$key} }); $$frame{$table}{$key}--; } else { Warn('internal', $key, $self->getStomach, "Unassigning wrong value for $key from table $table in deactivateScope", "value is $value but stack is " . join(', ', @{ $$self{$table}{$key} })); } } } } return; } sub getKnownScopes { my ($self) = @_; my @scopes = sort keys %{ $$self{stash} }; return @scopes; } sub getActiveScopes { my ($self) = @_; my @scopes = sort keys %{ $$self{stash_active} }; return @scopes; } #====================================================================== # Units. # Put here since it could concievably evolve to depend on the current font. # Conversion to scaled points my %UNITS = ( # [CONSTANT] pt => 65536, pc => 12 * 65536, in => 72.27 * 65536, bp => 72.27 * 65536 / 72, cm => 72.27 * 65536 / 2.54, mm => 72.27 * 65536 / 2.54 / 10, dd => 1238 * 65536 / 1157, cc => 12 * 1238 * 65536 / 1157, sp => 1); sub convertUnit { my ($self, $unit) = @_; $unit = lc($unit); # Eventually try to track font size? if ($unit eq 'em') { return 10.0 * 65536; } elsif ($unit eq 'ex') { return 4.3 * 65536; } elsif ($unit eq 'mu') { return 10.0 * 65536 / 18; } else { my $sp = $UNITS{$unit}; if (!$sp) { Warn('expected', '', undef, "Illegal unit of measure '$unit', assuming pt."); $sp = $UNITS{'pt'}; } return $sp; } } #====================================================================== sub noteStatus { my ($self, $type, @data) = @_; if ($type eq 'undefined') { map { $$self{status}{undefined}{$_}++ } @data; } elsif ($type eq 'missing') { map { $$self{status}{missing}{$_}++ } @data; } else { $$self{status}{$type}++; } return; } sub getStatus { my ($self, $type) = @_; return $$self{status}{$type}; } sub getStatusMessage { my ($self) = @_; my $status = $$self{status}; my @report = (); push(@report, "$$status{warning} warning" . ($$status{warning} > 1 ? 's' : '')) if $$status{warning}; push(@report, "$$status{error} error" . ($$status{error} > 1 ? 's' : '')) if $$status{error}; push(@report, "$$status{fatal} fatal error" . ($$status{fatal} > 1 ? 's' : '')) if $$status{fatal}; my @undef = ($$status{undefined} ? keys %{ $$status{undefined} } : ()); push(@report, scalar(@undef) . " undefined macro" . (@undef > 1 ? 's' : '') . "[" . join(', ', @undef) . "]") if @undef; my @miss = ($$status{missing} ? keys %{ $$status{missing} } : ()); push(@report, scalar(@miss) . " missing file" . (@miss > 1 ? 's' : '') . "[" . join(', ', @miss) . "]") if @miss; return join('; ', @report) || 'No obvious problems'; } sub getStatusCode { my ($self) = @_; my $status = $$self{status}; my $code; if ($$status{fatal} && $$status{fatal} > 0) { $code = 3; } elsif ($$status{error} && $$status{error} > 0) { $code = 2; } elsif ($$status{warning} && $$status{warning} > 0) { $code = 1; } else { $code = 0; } return $code; } #====================================================================== 1; __END__ =pod =head1 NAME C - stores the current state of processing. =head1 DESCRIPTION A C object stores the current state of processing. It recording catcodes, variables values, definitions and so forth, as well as mimicing TeX's scoping rules. =head2 Access to State and Processing =over 4 =item C<< $STATE->getStomach; >> Returns the current Stomach used for digestion. =item C<< $STATE->getModel; >> Returns the current Model representing the document model. =back =head2 Scoping The assignment methods, described below, generally take a C<$scope> argument, which determines how the assignment is made. The allowed values and thier implications are: global : global assignment. local : local assignment, within the current grouping. undef : global if \global preceded, else local (default) : stores the assignment in a `scope' which can be loaded later. If no scoping is specified, then the assignment will be global if a preceding C<\global> has set the global flag, otherwise the value will be assigned within the current grouping. =over 4 =item C<< $STATE->pushFrame; >> Starts a new level of grouping. Note that this is lower level than C<\bgroup>; See L. =item C<< $STATE->popFrame; >> Ends the current level of grouping. Note that this is lower level than C<\egroup>; See L. =item C<< $STATE->setPrefix($prefix); >> Sets a prefix (eg. C for C<\global>, etc) for the next operation, if applicable. =item C<< $STATE->clearPrefixes; >> Clears any prefixes. =back =head2 Values =over 4 =item C<< $value = $STATE->lookupValue($name); >> Lookup the current value associated with the the string C<$name>. =item C<< $STATE->assignValue($name,$value,$scope); >> Assign $value to be associated with the the string C<$name>, according to the given scoping rule. Values are also used to specify most configuration parameters (which can therefor also be scoped). The recognized configuration parameters are: VERBOSITY : the level of verbosity for debugging output, with 0 being default. STRICT : whether errors (eg. undefined macros) are fatal. INCLUDE_COMMENTS : whether to preserve comments in the source, and to add occasional line number comments. (Default true). PRESERVE_NEWLINES : whether newlines in the source should be preserved (not 100% TeX-like). By default this is true. SEARCHPATHS : a list of directories to search for sources, implementations, etc. =item C<< $STATE->pushValue($name,$value); >> This is like C<< ->assign >>, but pushes a value onto the end of the stored value, which should be a LIST reference. Scoping is not handled here (yet?), it simply pushes the value onto the last binding of C<$name>. =item C<< $boole = $STATE->isValuebound($type,$name,$frame); >> Returns whether the value C<$name> is bound. If C<$frame> is given, check whether it is bound in the C<$frame>-th frame, with 0 being the top frame. =back =head2 Category Codes =over 4 =item C<< $value = $STATE->lookupCatcode($char); >> Lookup the current catcode associated with the the character C<$char>. =item C<< $STATE->assignCatcode($char,$catcode,$scope); >> Set C<$char> to have the given C<$catcode>, with the assignment made according to the given scoping rule. This method is also used to specify whether a given character is active in math mode, by using C for the character, and using a value of 1 to specify that it is active. =back =head2 Definitions =over 4 =item C<< $defn = $STATE->lookupMeaning($token); >> Get the "meaning" currently associated with C<$token>, either the definition (if it is a control sequence or active character) or the token itself if it shouldn't be executable. (See L) =item C<< $STATE->assignMeaning($token,$defn,$scope); >> Set the definition associated with C<$token> to C<$defn>. If C<$globally> is true, it makes this the global definition rather than bound within the current group. (See L, and L) =item C<< $STATE->installDefinition($definition, $scope); >> Install the definition into the current stack frame under its normal control sequence. =back =head2 Named Scopes Named scopes can be used to set variables or redefine control sequences within a scope other than the standard TeX grouping. For example, the LaTeX implementation will automatically activate any definitions that were defined with a named scope of, say "section:4", during the portion of the document that has the section counter equal to 4. Similarly, a scope named "label:foo" will be activated in portions of the document where C<\label{foo}> is in effect. =over 4 =item C<< $STATE->activateScope($scope); >> Installs any definitions that were associated with the named C<$scope>. Note that these are placed in the current grouping frame and will disappear when that grouping ends. =item C<< $STATE->deactivateScope($scope); >> Removes any definitions that were associated with the named C<$scope>. Normally not needed, since a scopes definitions are locally bound anyway. =item C<< $sp = $STATE->convertUnit($unit); >> Converts a TeX unit of the form C<'10em'> (or whatever TeX unit) into scaled points. (Defined here since in principle it could track the size of ems and so forth (but currently doesn't)) =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Stomach.pm0000644000175000017500000004424312507513572017714 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Stomach | # # | Analog of TeX's Stomach: digests tokens, stores state | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Stomach; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Core::Gullet; use LaTeXML::Core::Box; use LaTeXML::Core::Comment; use LaTeXML::Core::List; use LaTeXML::Core::Mouth; use LaTeXML::Common::Font; # Silly place to import these....? use LaTeXML::Common::Color; use LaTeXML::Core::Definition; use base qw(LaTeXML::Common::Object); #********************************************************************** sub new { my ($class, %options) = @_; return bless { gullet => LaTeXML::Core::Gullet->new(), boxing => [], token_stack => [] }, $class; } #********************************************************************** # Initialize various parameters, preload, etc. sub initialize { my ($self) = @_; $$self{boxing} = []; $$self{token_stack} = []; $STATE->assignValue(MODE => 'text', 'global'); $STATE->assignValue(IN_MATH => 0, 'global'); $STATE->assignValue(PRESERVE_NEWLINES => 1, 'global'); $STATE->assignValue(afterGroup => [], 'global'); $STATE->assignValue(afterAssignment => undef, 'global'); $STATE->assignValue(groupInitiator => 'Initialization', 'global'); # Setup default fonts. $STATE->assignValue(font => LaTeXML::Common::Font->textDefault(), 'global'); $STATE->assignValue(mathfont => LaTeXML::Common::Font->mathDefault(), 'global'); return; } #********************************************************************** sub getGullet { my ($self) = @_; return $$self{gullet}; } sub getLocator { my ($self, @args) = @_; return $$self{gullet}->getLocator(@args); } sub getBoxingLevel { my ($self) = @_; return scalar(@{ $$self{boxing} }); } #********************************************************************** # Digestion #********************************************************************** # NOTE: Worry about whether the $autoflush thing is right? # It puts a lot of cruft in Gullet; Should we just create a new Gullet? sub digestNextBody { my ($self, $terminal) = @_; my $startloc = $self->getLocator; my $initdepth = scalar(@{ $$self{boxing} }); my $token; local @LaTeXML::LIST = (); while (defined($token = $$self{gullet}->readXToken(1, 1))) { # Done if we run out of tokens push(@LaTeXML::LIST, $self->invokeToken($token)); last if $terminal and Equals($token, $terminal); last if $initdepth > scalar(@{ $$self{boxing} }); } # if we've closed the initial mode. Warn('expected', $terminal, $self, "body should have ended with '" . ToString($terminal) . "'", "current body started at " . ToString($startloc)) if $terminal && !Equals($token, $terminal); push(@LaTeXML::LIST, undef) unless $token; # Dummy `trailer' if none explicit. return @LaTeXML::LIST; } # Digest a list of tokens independent from any current Gullet. # Typically used to digest arguments to primitives or constructors. # Returns a List containing the digested material. sub digest { my ($self, $tokens) = @_; return unless defined $tokens; return $$self{gullet}->readingFromMouth(LaTeXML::Core::Mouth->new(), sub { my ($gullet) = @_; $gullet->unread($tokens); $STATE->clearPrefixes; # prefixes shouldn't apply here. my $ismath = $STATE->lookupValue('IN_MATH'); my $initdepth = scalar(@{ $$self{boxing} }); my $depth = $initdepth; local @LaTeXML::LIST = (); while (defined(my $token = $$self{gullet}->readXToken(1, 1))) { # Done if we run out of tokens push(@LaTeXML::LIST, $self->invokeToken($token)); last if $initdepth > scalar(@{ $$self{boxing} }); } # if we've closed the initial mode. Fatal('internal', '', $self, "We've fallen off the end, somehow!?!?!", "Last token " . ToString($LaTeXML::CURRENT_TOKEN) . " (Boxing depth was $initdepth, now $depth: Boxing generated by " . join(', ', map { ToString($_) } @{ $$self{boxing} })) if $initdepth < $depth; List(@LaTeXML::LIST, mode => ($ismath ? 'math' : 'text')); }); } # Invoke a token; # If it is a primitive or constructor, the definition will be invoked, # possibly arguments will be parsed from the Gullet. # Otherwise, the token is simply digested: turned into an appropriate box. # Returns a list of boxes/whatsits. my $MAXSTACK = 200; # [CONSTANT] # Overly complex, but want to avoid recursion/stack sub invokeToken { my ($self, $token) = @_; INVOKE: push(@{ $$self{token_stack} }, $token); if (scalar(@{ $$self{token_stack} }) > $MAXSTACK) { Fatal('internal', '', $self, "Excessive recursion(?): ", "Tokens on stack: " . join(', ', map { ToString($_) } @{ $$self{token_stack} })); } local $LaTeXML::CURRENT_TOKEN = $token; my @result = (); my $meaning = ($token->isExecutable || ($STATE->lookupValue('IN_MATH') && (($STATE->lookupMathcode($token->getString) || 0) == 0x8000)) ? $STATE->lookupMeaning_internal($token) : $token); if (!defined $meaning) { # Supposedly executable token, but no definition! @result = $self->invokeToken_undefined($token); } elsif ($meaning->isaToken) { # Common case @result = $self->invokeToken_simple($token, $meaning); } # A math-active character will (typically) be a macro, # but it isn't expanded in the gullet, but later when digesting, in math mode (? I think) elsif ($meaning->isExpandable) { my $gullet = $$self{gullet}; $gullet->unread($meaning->invoke($gullet)); $token = $gullet->readXToken(); # replace the token by it's expansion!!! pop(@{ $$self{token_stack} }); goto INVOKE; } elsif ($meaning->isaDefinition) { # Otherwise, a normal primitive or constructor @result = $meaning->invoke($self); $STATE->clearPrefixes unless $meaning->isPrefix; } # Clear prefixes unless we just set one. else { Fatal('misdefined', $meaning, $self, "The object " . Stringify($meaning) . " should never reach Stomach!"); } if ((scalar(@result) == 1) && (!defined $result[0])) { @result = (); } # Just paper over the obvious thing. Fatal('misdefined', $token, $self, "Execution yielded non boxes", "Returned " . join(',', map { "'" . Stringify($_) . "'" } grep { (!ref $_) || (!$_->isaBox) } @result)) if grep { (!ref $_) || (!$_->isaBox) } @result; pop(@{ $$self{token_stack} }); return @result; } sub makeError { my ($document, $type, $content) = @_; my $savenode = undef; $savenode = $document->floatToElement('ltx:ERROR') unless $document->isOpenable('ltx:ERROR'); $document->openElement('ltx:ERROR', class => ToString($type)); $document->openText_internal(ToString($content)); $document->closeElement('ltx:ERROR'); $document->setNode($savenode) if $savenode; return; } my @forbidden_cc = ( # [CONSTANT] 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1); sub invokeToken_undefined { my ($self, $token) = @_; my $cs = $token->getCSName; $STATE->noteStatus(undefined => $cs); Error('undefined', $token, $self, "The token " . Stringify($token) . " is not defined."); # To minimize chatter, go ahead and define it... $STATE->installDefinition(LaTeXML::Core::Definition::Constructor->new($token, undef, sub { makeError($_[0], 'undefined', $cs); }), 'global'); # and then invoke it. return $self->invokeToken($token); } sub invokeToken_simple { my ($self, $token, $meaning) = @_; my $cc = $meaning->getCatcode; my $font = $STATE->lookupValue('font'); $STATE->clearPrefixes; # prefixes shouldn't apply here. if ($cc == CC_SPACE) { if (($STATE->lookupValue('IN_MATH') || $STATE->lookupValue('inPreamble'))) { return (); } else { return Box($meaning->getString, $font, $$self{gullet}->getLocator, $meaning); } } elsif ($cc == CC_COMMENT) { # Note: Comments need char decoding as well! my $comment = LaTeXML::Package::FontDecodeString($meaning->getString, undef, 1); # However, spaces normally would have be digested away as positioning... my $badspace = pack('U', 0xA0) . "\x{0335}"; # This is at space's pos in OT1 $comment =~ s/\Q$badspace\E/ /g; return LaTeXML::Core::Comment->new($comment); } elsif ($forbidden_cc[$cc]) { Fatal('misdefined', $token, $self, "The token " . Stringify($token) . " should never reach Stomach!"); return; } else { return Box(LaTeXML::Package::FontDecodeString($meaning->getString, undef, 1), undef, undef, $meaning); } } # Regurgitate: steal the previously digested boxes from the current level. sub regurgitate { my ($self) = @_; my @stuff = @LaTeXML::LIST; @LaTeXML::LIST = (); return @stuff; } #********************************************************************** # Maintaining State. #********************************************************************** # State changes that the Stomach needs to moderate and know about (?) #====================================================================== # Dealing with TeX's bindings & grouping. # Note that lookups happen more often than bgroup/egroup (which open/close frames). sub pushStackFrame { my ($self, $nobox) = @_; $STATE->pushFrame; $STATE->assignValue(beforeAfterGroup => [], 'local'); # ALWAYS bind this! $STATE->assignValue(afterGroup => [], 'local'); # ALWAYS bind this! $STATE->assignValue(afterAssignment => undef, 'local'); # ALWAYS bind this! $STATE->assignValue(groupNonBoxing => $nobox, 'local'); # ALWAYS bind this! $STATE->assignValue(groupInitiator => $LaTeXML::CURRENT_TOKEN, 'local'); $STATE->assignValue(groupInitiatorLocator => $self->getLocator, 'local'); push(@{ $$self{boxing} }, $LaTeXML::CURRENT_TOKEN) unless $nobox; # For begingroup/endgroup return; } sub popStackFrame { my ($self, $nobox) = @_; if (my $beforeafter = $STATE->lookupValue('beforeAfterGroup')) { if (@$beforeafter) { my @result = map { $_->beDigested($self) } @$beforeafter; if (my ($x) = grep { !$_->isaBox } @result) { Fatal('misdefined', $x, $self, "Expected a Box|List|Whatsit, but got '" . Stringify($x) . "'"); } push(@LaTeXML::LIST, @result); } } my $after = $STATE->lookupValue('afterGroup'); $STATE->popFrame; pop(@{ $$self{boxing} }) unless $nobox; # For begingroup/endgroup $$self{gullet}->unread(@$after) if $after; return; } sub currentFrameMessage { my ($self) = @_; return "current frame is " . ($STATE->isValueBound('MODE', 0) # SET mode in CURRENT frame ? ? "mode-switch to " . $STATE->lookupValue('MODE') : ($STATE->lookupValue('groupNonBoxing') # Current frame is a non-boxing group? ? "non-boxing" : "boxing") . " group") . " due to " . Stringify($STATE->lookupValue('groupInitiator')) . " " . ToString($STATE->lookupValue('groupInitiatorLocator')); } #====================================================================== # Grouping pushes a new stack frame for binding definitions, etc. #====================================================================== # if $nobox is true, inhibit incrementing the boxingLevel sub bgroup { my ($self) = @_; pushStackFrame($self, 0); return; } sub egroup { my ($self) = @_; if ($STATE->isValueBound('MODE', 0) # Last stack frame was a mode switch!?!?! || $STATE->lookupValue('groupNonBoxing')) { # or group was opened with \begingroup Error('unexpected', $LaTeXML::CURRENT_TOKEN, $self, "Attempt to close boxing group", $self->currentFrameMessage); } else { # Don't pop if there's an error; maybe we'll recover? popStackFrame($self, 0); } return; } sub begingroup { my ($self) = @_; pushStackFrame($self, 1); return; } sub endgroup { my ($self) = @_; if ($STATE->isValueBound('MODE', 0) # Last stack frame was a mode switch!?!?! || !$STATE->lookupValue('groupNonBoxing')) { # or group was opened with \bgroup Error('unexpected', $LaTeXML::CURRENT_TOKEN, $self, "Attempt to close non-boxing group", $self->currentFrameMessage); } else { # Don't pop if there's an error; maybe we'll recover? popStackFrame($self, 1); } return; } #====================================================================== # Mode (minimal so far; math vs text) # Could (should?) be taken up by Stomach by building horizontal, vertical or math lists ? sub beginMode { my ($self, $mode) = @_; $self->pushStackFrame; # Effectively bgroup my $prevmode = $STATE->lookupValue('MODE'); my $ismath = $mode =~ /math$/; $STATE->assignValue(MODE => $mode, 'local'); $STATE->assignValue(IN_MATH => $ismath, 'local'); my $curfont = $STATE->lookupValue('font'); if ($mode eq $prevmode) { } elsif ($ismath) { # When entering math mode, we set the font to the default math font, # and save the text font for any embedded text. $STATE->assignValue(savedfont => $curfont, 'local'); $STATE->assignValue(font => $STATE->lookupValue('mathfont')->merge(color => $curfont->getColor), 'local'); $STATE->assignValue(mathstyle => ($mode =~ /^display/ ? 'display' : 'text'), 'local'); } else { # When entering text mode, we should set the font to the text font in use before the math (but inherit color!). $STATE->assignValue(font => $STATE->lookupValue('savedfont')->merge(color => $curfont->getColor), 'local'); } return; } sub endMode { my ($self, $mode) = @_; if ((!$STATE->isValueBound('MODE', 0)) # Last stack frame was NOT a mode switch!?!?! || ($STATE->lookupValue('MODE') ne $mode)) { # Or was a mode switch to a different mode Error('unexpected', $LaTeXML::CURRENT_TOKEN, $self, "Attempt to end mode $mode", $self->currentFrameMessage); } else { # Don't pop if there's an error; maybe we'll recover? $self->popStackFrame; } # Effectively egroup. return; } #********************************************************************** 1; __END__ =pod =head1 NAME C - digests tokens into boxes, lists, etc. =head1 DESCRIPTION C digests tokens read from a L (they will have already been expanded). It extends L. There are basically four cases when digesting a L: =over 4 =item A plain character is simply converted to a L recording the current L. =item A primitive If a control sequence represents L, the primitive is invoked, executing its stored subroutine. This is typically done for side effect (changing the state in the L), although they may also contribute digested material. As with macros, any arguments to the primitive are read from the L. =item Grouping (or environment bodies) are collected into a L. =item Constructors A special class of control sequence, called a L produces a L which remembers the control sequence and arguments that created it, and defines its own translation into C elements, attributes and data. Arguments to a constructor are read from the gullet and also digested. =back =head2 Digestion =over 4 =item C<< $list = $stomach->digestNextBody; >> Return the digested L after reading and digesting a `body' from the its Gullet. The body extends until the current level of boxing or environment is closed. =item C<< $list = $stomach->digest($tokens); >> Return the L resuting from digesting the given tokens. This is typically used to digest arguments to primitives or constructors. =item C<< @boxes = $stomach->invokeToken($token); >> Invoke the given (expanded) token. If it corresponds to a Primitive or Constructor, the definition will be invoked, reading any needed arguments fromt he current input source. Otherwise, the token will be digested. A List of Box's, Lists, Whatsit's is returned. =item C<< @boxes = $stomach->regurgitate; >> Removes and returns a list of the boxes already digested at the current level. This peculiar beast is used by things like \choose (which is a Primitive in TeX, but a Constructor in LaTeXML). =back =head2 Grouping =over 4 =item C<< $stomach->bgroup; >> Begin a new level of binding by pushing a new stack frame, and a new level of boxing the digested output. =item C<< $stomach->egroup; >> End a level of binding by popping the last stack frame, undoing whatever bindings appeared there, and also decrementing the level of boxing. =item C<< $stomach->begingroup; >> Begin a new level of binding by pushing a new stack frame. =item C<< $stomach->endgroup; >> End a level of binding by popping the last stack frame, undoing whatever bindings appeared there. =back =head2 Modes =over 4 =item C<< $stomach->beginMode($mode); >> Begin processing in C<$mode>; one of 'text', 'display-math' or 'inline-math'. This also begins a new level of grouping and switches to a font appropriate for the mode. =item C<< $stomach->endMode($mode); >> End processing in C<$mode>; an error is signalled if C<$stomach> is not currently in C<$mode>. This also ends a level of grouping. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Token.pm0000644000175000017500000003343512507513572017377 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Token, LaTeXML::Core::Tokens | # # | Representation of Token(s) | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # #********************************************************************** # A Token represented as a pair: [string,catcode] # string is a character or control sequence. # Yes, a bit inefficient, but code is clearer... #********************************************************************** package LaTeXML::Core::Token; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use base qw(LaTeXML::Common::Object); use base qw(Exporter); our @EXPORT = ( # Catcode constants qw( CC_ESCAPE CC_BEGIN CC_END CC_MATH CC_ALIGN CC_EOL CC_PARAM CC_SUPER CC_SUB CC_IGNORE CC_SPACE CC_LETTER CC_OTHER CC_ACTIVE CC_COMMENT CC_INVALID CC_CS CC_NOTEXPANDED CC_MARKER), # Token constructors qw( T_BEGIN T_END T_MATH T_ALIGN T_PARAM T_SUB T_SUPER T_SPACE &T_LETTER &T_OTHER &T_ACTIVE &T_COMMENT &T_CS T_CR &T_MARKER &Token), # String exploders qw(&Explode &ExplodeText &UnTeX) ); #====================================================================== # Constructors. use constant CC_ESCAPE => 0; use constant CC_BEGIN => 1; use constant CC_END => 2; use constant CC_MATH => 3; use constant CC_ALIGN => 4; use constant CC_EOL => 5; use constant CC_PARAM => 6; use constant CC_SUPER => 7; use constant CC_SUB => 8; use constant CC_IGNORE => 9; use constant CC_SPACE => 10; use constant CC_LETTER => 11; use constant CC_OTHER => 12; use constant CC_ACTIVE => 13; use constant CC_COMMENT => 14; use constant CC_INVALID => 15; # Extended Catcodes for expanded output. use constant CC_CS => 16; use constant CC_NOTEXPANDED => 17; use constant CC_MARKER => 18; # non TeX extension! # [The documentation for constant is a bit confusing about subs, # but these apparently DO generate constants; you always get the same one] # These are immutable use constant T_BEGIN => bless ['{', 1], 'LaTeXML::Core::Token'; use constant T_END => bless ['}', 2], 'LaTeXML::Core::Token'; use constant T_MATH => bless ['$', 3], 'LaTeXML::Core::Token'; use constant T_ALIGN => bless ['&', 4], 'LaTeXML::Core::Token'; use constant T_PARAM => bless ['#', 6], 'LaTeXML::Core::Token'; use constant T_SUPER => bless ['^', 7], 'LaTeXML::Core::Token'; use constant T_SUB => bless ['_', 8], 'LaTeXML::Core::Token'; use constant T_SPACE => bless [' ', 10], 'LaTeXML::Core::Token'; use constant T_CR => bless ["\n", 10], 'LaTeXML::Core::Token'; sub T_LETTER { my ($c) = @_; return bless [$c, 11], 'LaTeXML::Core::Token'; } sub T_OTHER { my ($c) = @_; return bless [$c, 12], 'LaTeXML::Core::Token'; } sub T_ACTIVE { my ($c) = @_; return bless [$c, 13], 'LaTeXML::Core::Token'; } sub T_COMMENT { my ($c) = @_; return bless ['%' . ($c || ''), 14], 'LaTeXML::Core::Token'; } sub T_CS { my ($c) = @_; return bless [$c, 16], 'LaTeXML::Core::Token'; } # Illegal: don't use unless you know... sub T_MARKER { my ($t) = @_; return bless [$t, 18], 'LaTeXML::Core::Token'; } sub Token { my ($string, $cc) = @_; return bless [$string, (defined $cc ? $cc : CC_OTHER)], 'LaTeXML::Core::Token'; } # Explode a string into a list of tokens, all w/catcode OTHER (except space). sub Explode { my ($string) = @_; return (defined $string ? map { ($_ eq ' ' ? T_SPACE() : T_OTHER($_)) } split('', $string) : ()); } # Similar to Explode, but convert letters to catcode LETTER and others to OTHER # Hopefully, this is essentially correct WITHOUT resorting to catcode lookup? sub ExplodeText { my ($string) = @_; return (defined $string ? map { ($_ eq ' ' ? T_SPACE() : (/[a-zA-Z]/ ? T_LETTER($_) : T_OTHER($_))) } split('', $string) : ()); } my $UNTEX_LINELENGTH = 78; # [CONSTANT] sub UnTeX { my ($thing) = @_; return unless defined $thing; my @tokens = (ref $thing ? $thing->revert : Explode($thing)); my $string = ''; my $length = 0; my $level = 0; my ($prevs, $prevcc) = ('', CC_COMMENT); while (@tokens) { my $token = shift(@tokens); my $cc = $token->getCatcode; next if $cc == CC_COMMENT; my $s = $token->getString(); if ($cc == CC_LETTER) { # keep "words" together, just for aesthetics while (@tokens && ($tokens[0]->getCatcode == CC_LETTER)) { $s .= shift(@tokens)->getString; } } my $l = length($s); if ($cc == CC_BEGIN) { $level++; } # Seems a reasonable & safe time to line break, for readability, etc. if (($cc == CC_SPACE) && ($s eq "\n")) { # preserve newlines already present if ($length > 0) { $string .= $s; $length = 0; } } # If this token is a letter (or otherwise starts with a letter or digit): space or linebreak elsif ((($cc == CC_LETTER) || (($cc == CC_OTHER) && ($s =~ /^(?:\p{IsAlpha}|\p{IsDigit})/))) && ($prevcc == CC_CS) && ($prevs =~ /(.)$/) && (($STATE->lookupCatcode($1) || CC_COMMENT) == CC_LETTER)) { # Insert a (virtual) space before a letter if previous token was a CS w/letters # This is required for letters, but just aesthetic for digits (to me?) # Of course, use a newline if we're already at end my $space = (($length > 0) && ($length + $l > $UNTEX_LINELENGTH) ? "\n" : ' '); $string .= $space . $s; $length += 1 + $l; } elsif (($length > 0) && ($length + $l > $UNTEX_LINELENGTH) # linebreak before this token? && (scalar(@tokens) > 1) # and not at end! ) { # Or even within an arg! $string .= "%\n" . $s; $length = $l; } # with %, so that it "disappears" else { $string .= $s; $length += $l; } if ($cc == CC_END) { $level--; } $prevs = $s; $prevcc = $cc; } # Patch up nesting for valid TeX !!! if ($level > 0) { $string = $string . ('}' x $level); } elsif ($level < 0) { $string = ('{' x -$level) . $string; } return $string; } #====================================================================== # Categories of Category codes. # For Tokens with these catcodes, only the catcode is relevant for comparison. # (if they even make it to a stage where they get compared) our @primitive_catcode = ( # [CONSTANT] 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1); our @executable_catcode = ( # [CONSTANT] 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0); our @standardchar = ( # [CONSTANT] "\\", '{', '}', q{$}, q{&}, "\n", q{#}, q{^}, q{_}, undef, undef, undef, undef, undef, q{%}, undef); our @CC_NAME = #[CONSTANT] qw(Escape Begin End Math Align EOL Parameter Superscript Subscript Ignore Space Letter Other Active Comment Invalid ControlSequence NotExpanded); our @PRIMITIVE_NAME = ( # [CONSTANT] 'Escape', 'Begin', 'End', 'Math', 'Align', 'EOL', 'Parameter', 'Superscript', 'Subscript', undef, 'Space', undef, undef, undef, undef, undef, undef, 'NotExpanded'); our @CC_SHORT_NAME = #[CONSTANT] qw(T_ESCAPE T_BEGIN T_END T_MATH T_ALIGN T_EOL T_PARAM T_SUPER T_SUB T_IGNORE T_SPACE T_LETTER T_OTHER T_ACTIVE T_COMMENT T_INVALID T_CS T_NOTEXPANDED ); #====================================================================== # Accessors. sub isaToken { return 1; } # Get the CS Name of the token. This is the name that definitions will be # stored under; It's the same for various `different' BEGIN tokens, eg. sub getCSName { my ($token) = @_; return $PRIMITIVE_NAME[$$token[1]] || $$token[0]; } # Get the CSName only if the catcode is executable! sub getExecutableName { my ($self) = @_; my ($cs, $cc) = @$self; return $executable_catcode[$cc] && ($PRIMITIVE_NAME[$cc] || $cs); } # Return the string or character part of the token sub getString { my ($self) = @_; return $$self[0]; } # Return the character code of character part of the token, or 256 if it is a control sequence sub getCharcode { my ($self) = @_; return ($$self[1] == CC_CS ? 256 : ord($$self[0])); } # Return the catcode of the token. sub getCatcode { my ($self) = @_; return $$self[1]; } sub isExecutable { my ($self) = @_; return $executable_catcode[$$self[1]]; } # Defined so a Token or Tokens can be used interchangeably. sub unlist { my ($self) = @_; return ($self); } my @NEUTRALIZABLE = ( # [CONSTANT] 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0); # neutralize really should only retroactively imitate what Semiverbatim would have done. # So, it needs to neutralize those in SPECIALS # NOTE that although '%' gets it's catcode changed in Semiverbatim, # I'm pretty sure we do NOT want to neutralize comments (turn them into CC_OTHER) # here, since if comments do get into the Tokens, that will introduce weird crap into the stream. sub neutralize { my ($self) = @_; my ($ch, $cc) = @$self; return ($NEUTRALIZABLE[$cc] && (grep { $ch } @{ $STATE->lookupValue('SPECIALS') }) ? T_OTHER($ch) : $self); } #====================================================================== # Note that this converts the string to a more `user readable' form using `standard' chars for catcodes. # We'll need to be careful about using string instead of reverting for internal purposes where the # actual character is needed. # Should revert do something with this??? # ($standardchar[$$self[1]] || $$self[0]); } sub revert { my ($self) = @_; return $self; } sub toString { my ($self) = @_; return $$self[0]; } sub beDigested { my ($self, $stomach) = @_; return $stomach->digest($self); } #====================================================================== # Methods for overloaded ops. # Compare two tokens; They are equal if they both have same catcode, # and either the catcode is one of the primitive ones, or thier strings # are equal. # NOTE: That another popular equality checks whether the "meaning" (defn) are the same. # That is NOT done here; see Equals(x,y). sub equals { my ($a, $b) = @_; return (defined $b && (ref $a) eq (ref $b)) && ($$a[1] == $$b[1]) && ($primitive_catcode[$$a[1]] || ($$a[0] eq $$b[0])); } my @CONTROLNAME = ( #[CONSTANT] qw( NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US)); # Primarily for error reporting. sub stringify { my ($self) = @_; my $string = $$self[0]; # Make the token's char content more printable, since this is for error messages. if (length($string) == 1) { my $c = ord($string); if ($c < 0x020) { $string = 'U+' . sprintf("%04x", $c) . '/' . $CONTROLNAME[$c]; } } return $CC_SHORT_NAME[$$self[1]] . '[' . $string . ']'; } #====================================================================== 1; __END__ =pod =head1 NAME C - representation of a Token: a pair of character and category code (catcode); It extends L. =head2 Exported functions =over 4 =item C<< $catcode = CC_ESCAPE; >> Constants for the category codes: CC_BEGIN, CC_END, CC_MATH, CC_ALIGN, CC_EOL, CC_PARAM, CC_SUPER, CC_SUB, CC_IGNORE, CC_SPACE, CC_LETTER, CC_OTHER, CC_ACTIVE, CC_COMMENT, CC_INVALID, CC_CS, CC_NOTEXPANDED. [The last 2 are (apparent) extensions, with catcodes 16 and 17, respectively]. =item C<< $token = Token($string,$cc); >> Creates a L with the given content and catcode. The following shorthand versions are also exported for convenience: T_BEGIN, T_END, T_MATH, T_ALIGN, T_PARAM, T_SUB, T_SUPER, T_SPACE, T_LETTER($letter), T_OTHER($char), T_ACTIVE($char), T_COMMENT($comment), T_CS($cs) =item C<< @tokens = Explode($string); >> Returns a list of the tokens corresponding to the characters in C<$string>. All tokens have catcode CC_OTHER, except for spaces which have catcode CC_SPACE. =item C<< @tokens = ExplodeText($string); >> Returns a list of the tokens corresponding to the characters in C<$string>. All (roman) letters have catcode CC_LETTER, all others have catcode CC_OTHER, except for spaces which have catcode CC_SPACE. =item C<< UnTeX($object); >> Converts C<$object> to a string containing TeX that created it (or could have). Note that this is not necessarily the original TeX code; expansions or other substitutions may have taken place. =back =head2 Methods =over 4 =item C<< @tokens = $object->unlist; >> Return a list of the tokens making up this C<$object>. =item C<< $string = $object->toString; >> Return a string representing C<$object>. =item C<< $string = $token->getCSName; >> Return the string or character part of the C<$token>; for the special category codes, returns the standard string (eg. CgetCSName> returns "{"). =item C<< $string = $token->getString; >> Return the string or character part of the C<$token>. =item C<< $code = $token->getCharcode; >> Return the character code of the character part of the C<$token>, or 256 if it is a control sequence. =item C<< $code = $token->getCatcode; >> Return the catcode of the C<$token>. =back =head1 AUTHOR pBruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Tokens.pm0000644000175000017500000000725012507513572017556 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Tokens | # # | A list of Token(s) | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Core::Tokens; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use base qw(LaTeXML::Common::Object); use base qw(Exporter); our @EXPORT = ( # Global STATE; This gets bound by LaTeXML.pm qw(&Tokens) ); #====================================================================== # Token List constructors. # Return a LaTeXML::Core::Tokens made from the arguments (tokens) sub Tokens { my (@tokens) = @_; return LaTeXML::Core::Tokens->new(@tokens); } #====================================================================== # Form a Tokens list of Token's # Flatten the arguments Token's and Tokens's into plain Token's # .... Efficiently! since this seems to be called MANY times. sub new { my ($class, @tokens) = @_; my $r; return bless [map { (($r = ref $_) eq 'LaTeXML::Core::Token' ? $_ : ($r eq 'LaTeXML::Core::Tokens' ? @$_ : Fatal('misdefined', $r, undef, "Expected a Token, got " . Stringify($_)))) } @tokens], $class; } # Return a list of the tokens making up this Tokens sub unlist { my ($self) = @_; return @$self; } # Return a shallow copy of the Tokens sub clone { my ($self) = @_; return bless [@$self], ref $self; } # Return a string containing the TeX form of the Tokens sub revert { my ($self) = @_; return @$self; } # toString is used often, and for more keyword-like reasons, # NOT for creating valid TeX (use revert or UnTeX for that!) sub toString { my ($self) = @_; return join('', map { $$_[0] } @$self); } # Methods for overloaded ops. sub equals { my ($a, $b) = @_; return 0 unless defined $b && (ref $a) eq (ref $b); my @a = @$a; my @b = @$b; while (@a && @b && ($a[0]->equals($b[0]))) { shift(@a); shift(@b); } return !(@a || @b); } sub stringify { my ($self) = @_; return "Tokens[" . join(',', map { $_->toString } @$self) . "]"; } sub beDigested { my ($self, $stomach) = @_; return $stomach->digest($self); } sub neutralize { my ($self) = @_; return Tokens(map { $_->neutralize } $self->unlist); } #====================================================================== 1; __END__ =pod =head1 NAME C - represents lists of L's; extends L. =head2 Exported functions =over 4 =item C<< $tokens = Tokens(@token); >> Creates a L from a list of L's =back =head2 Tokens methods The following method is specific to C. =over 4 =item C<< $tokenscopy = $tokens->clone; >> Return a shallow copy of the $tokens. This is useful before reading from a C. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Core/Whatsit.pm0000644000175000017500000002601412507513572017735 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Core::Whatsit | # # | Digested objects produced in the Stomach | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # #********************************************************************** # LaTeXML Whatsit. # Some arbitrary object, possibly with arguments. # Particularly as an intermediate representation for invocations of control # sequences that do NOT get expanded or processed, but are taken to represent # some semantic something or other. # These get preserved in the expanded/processed token stream to be # converted into XML objects in the document. #********************************************************************** package LaTeXML::Core::Whatsit; use strict; use warnings; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Core::Tokens; use LaTeXML::Common::Dimension; use List::Util qw(min max); use LaTeXML::Core::List; use base qw(LaTeXML::Core::Box); # Specially recognized (some required?) properties: # font : The font object # locator : a locator string, where in the source this whatsit was created # isMath : whether this is a math object # id # body # trailer sub new { my ($class, $defn, $args, %properties) = @_; return bless { definition => $defn, args => $args || [], properties => {%properties} }, $class; } sub getDefinition { my ($self) = @_; return $$self{definition}; } sub isMath { my ($self) = @_; return $$self{properties}{isMath}; } sub getFont { my ($self) = @_; return $$self{properties}{font}; } # and if undef ???? sub setFont { my ($self, $font) = @_; $$self{properties}{font} = $font; return; } sub getLocator { my ($self) = @_; return $$self{properties}{locator}; } sub getProperty { my ($self, $key) = @_; return $$self{properties}{$key}; } sub getProperties { my ($self) = @_; return %{ $$self{properties} }; } sub getPropertiesRef { my ($self) = @_; return $$self{properties}; } sub setProperty { my ($self, $key, $value) = @_; $$self{properties}{$key} = $value; return; } sub setProperties { my ($self, %props) = @_; while (my ($key, $value) = each %props) { $$self{properties}{$key} = $value if defined $value; } return; } sub getArg { my ($self, $n) = @_; return $$self{args}[$n - 1]; } sub getArgs { my ($self) = @_; return @{ $$self{args} }; } sub setArgs { my ($self, @args) = @_; $$self{args} = [@args]; return; } sub getBody { my ($self) = @_; return $$self{properties}{body}; } sub setBody { my ($self, @body) = @_; my $trailer = pop(@body); $$self{properties}{body} = List(@body); $$self{properties}{body}->setProperty(mode => 'math') if $self->isMath; $$self{properties}{trailer} = $trailer; # And copy any otherwise undefined properties from the trailer if ($trailer) { my %trailerhash = $trailer->getProperties; foreach my $prop (keys %trailerhash) { $$self{properties}{$prop} = $trailer->getProperty($prop) unless defined $$self{properties}{$prop}; } } return; } sub getTrailer { my ($self) = @_; return $$self{properties}{trailer}; } # So a Whatsit can stand in for a List sub unlist { my ($self) = @_; return ($self); } sub revert { my ($self) = @_; # WARNING: Forbidden knowledge? # (1) provide a means to get the RAW, internal markup that can (hopefully) be RE-digested # this is needed for getting the numerator of \over into textstyle! # (2) caching the reversion (which is a big performance boost) if (my $saved = !$LaTeXML::REVERT_RAW && ($LaTeXML::DUAL_BRANCH ? $$self{dual_reversion}{$LaTeXML::DUAL_BRANCH} : $$self{reversion})) { return $saved->unlist; } else { my $defn = $self->getDefinition; my $spec = ($LaTeXML::REVERT_RAW ? undef : $defn->getReversionSpec); my @tokens = (); if ((defined $spec) && (ref $spec eq 'CODE')) { # If handled by CODE, call it @tokens = &$spec($self, $self->getArgs); } else { if (defined $spec) { @tokens = LaTeXML::Core::Definition::Expandable::substituteTokens($spec, map { Tokens(Revert($_)) } $self->getArgs) if $spec ne ''; } else { my $alias = ($LaTeXML::REVERT_RAW ? undef : $defn->getAlias); if (defined $alias) { push(@tokens, T_CS($alias)) if $alias ne ''; } else { push(@tokens, $defn->getCS); } if (my $parameters = $defn->getParameters) { push(@tokens, $parameters->revertArguments($self->getArgs)); } } if (defined(my $body = $self->getBody)) { push(@tokens, Revert($body)); if (defined(my $trailer = $self->getTrailer)) { push(@tokens, Revert($trailer)); } } } # Now cache it, in case it's needed again if ($LaTeXML::REVERT_RAW) { } # don't cache elsif ($LaTeXML::DUAL_BRANCH) { $$self{dual_reversion}{$LaTeXML::DUAL_BRANCH} = Tokens(@tokens); } else { $$self{reversion} = Tokens(@tokens); } return @tokens; } } sub toString { my ($self) = @_; return ToString(Tokens($self->revert)); } # What else?? sub getString { my ($self) = @_; return $self->toString; } # Ditto? # Methods for overloaded operators sub stringify { my ($self) = @_; my $hasbody = defined $$self{properties}{body}; return "Whatsit[" . join(',', $self->getDefinition->getCS->getCSName, map { Stringify($_) } $self->getArgs, (defined $$self{properties}{body} ? ($$self{properties}{body}, $$self{properties}{trailer}) : ())) . "]"; } sub equals { my ($a, $b) = @_; return 0 unless (defined $b) && ((ref $a) eq (ref $b)); return 0 unless $$a{definition} eq $$b{definition}; # I think we want IDENTITY here, not ->equals my @a = @{ $$a{args} }; push(@a, $$a{properties}{body}) if $$a{properties}{body}; my @b = @{ $$b{args} }; push(@b, $$b{properties}{body}) if $$b{properties}{body}; while (@a && @b && ($a[0]->equals($b[0]))) { shift(@a); shift(@b); } return !(@a || @b); } sub beAbsorbed { my ($self, $document) = @_; # Significant time is consumed here, and associated with a specific CS, # so we should be profiling as well! # Hopefully the csname is the same that was charged in the digestioned phase! my $defn = $self->getDefinition; my $profiled = $STATE->lookupValue('PROFILING') && $defn->getCS; LaTeXML::Core::Definition::startProfiling($profiled) if $profiled; my @result = $defn->doAbsorbtion($document, $self); LaTeXML::Core::Definition::stopProfiling($profiled) if $profiled; return @result; } sub computeSize { my ($self, %options) = @_; # Use #body, if any, else ALL args !?!?! # Eventually, possibly options like sizeFrom, or computeSize or.... my $defn = $self->getDefinition; my $props = $self->getPropertiesRef; my $sizer = $defn->getSizer; my ($width, $height, $depth); # If sizer is a function, call it if (ref $sizer) { ($width, $height, $depth) = &$sizer($self); } else { my @boxes = (); if (!defined $sizer) { # Nothing specified? use #body if any, else sum all box args @boxes = ($$self{properties}{body} ? ($$self{properties}{body}) : (map { ((ref $_) && ($_->isaBox) ? $_->unlist : ()) } @{ $$self{args} })); } elsif (($sizer eq '0') || ($sizer eq '')) { } # 0 size! elsif ($sizer =~ /^(#\w+)*$/) { # Else if of form '#digit' or '#prop', combine sizes while ($sizer =~ s/^#(\w+)//) { my $arg = $1; push(@boxes, ($arg =~ /^\d+$/ ? $self->getArg($arg) : $$props{$arg})); } } else { Warn('unexpected', $sizer, undef, "Expected sizer to be a function, or arg or property specification, not '$sizer'"); } my $font = $$props{font}; $options{width} = $$props{width} if $$props{width}; $options{height} = $$props{height} if $$props{height}; $options{depth} = $$props{depth} if $$props{depth}; $options{vattach} = $$props{vattach} if $$props{vattach}; $options{layout} = $$props{layout} if $$props{layout}; ($width, $height, $depth) = $font->computeBoxesSize([@boxes], %options); } # Now, only set the dimensions that weren't already set. $$props{width} = $width unless defined $$props{width}; $$props{height} = $height unless defined $$props{height}; $$props{depth} = $depth unless defined $$props{depth}; return; } #====================================================================== 1; __END__ =pod =head1 NAME C - Representations of digested objects. =head1 DESCRIPTION represents a digested object that can generate arbitrary elements in the XML Document. It extends L. =head2 METHODS Note that the font is stored in the data properties under 'font'. =over 4 =item C<< $defn = $whatsit->getDefinition; >> Returns the L responsible for creating C<$whatsit>. =item C<< $value = $whatsit->getProperty($key); >> Returns the value associated with C<$key> in the C<$whatsit>'s property list. =item C<< $whatsit->setProperty($key,$value); >> Sets the C<$value> associated with the C<$key> in the C<$whatsit>'s property list. =item C<< $props = $whatsit->getProperties(); >> Returns the hash of properties stored on this Whatsit. (Note that this hash is modifiable). =item C<< $props = $whatsit->setProperties(%keysvalues); >> Sets several properties, like setProperty. =item C<< $list = $whatsit->getArg($n); >> Returns the C<$n>-th argument (starting from 1) for this C<$whatsit>. =item C<< @args = $whatsit->getArgs; >> Returns the list of arguments for this C<$whatsit>. =item C<< $whatsit->setArgs(@args); >> Sets the list of arguments for this C<$whatsit> to C<@args> (each arg should be a C). =item C<< $list = $whatsit->getBody; >> Return the body for this C<$whatsit>. This is only defined for environments or top-level math formula. The body is stored in the properties under 'body'. =item C<< $whatsit->setBody(@body); >> Sets the body of the C<$whatsit> to the boxes in C<@body>. The last C<$box> in C<@body> is assumed to represent the `trailer', that is the result of the invocation that closed the environment or math. It is stored separately in the properties under 'trailer'. =item C<< $list = $whatsit->getTrailer; >> Return the trailer for this C<$whatsit>. See C. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Global.pm0000644000175000017500000000442512507513572016624 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Global | # # | Global constants, accessors and constructors | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # #====================================================================== # This module collects all the commonly useful constants and constructors # that other modules and package implementations are likely to need. # This should be used in a context where presumably all the required # LaTeXML modules that implement the various classes have already been loaded. # # Yes, a lot of stuff is exported, polluting your namespace. # Thus, you use this module only if you _need_ the functionality! #====================================================================== package LaTeXML::Global; use strict; use warnings; use base qw(Exporter); our @EXPORT = ( # Global STATE; This gets bound by LaTeXML.pm qw( *STATE), ); #local $LaTeXML::STATE; #********************************************************************** 1; __END__ =pod =head1 NAME C - global exports used within LaTeXML, and in Packages. =head1 SYNOPSIS use LaTeXML::Global; =head1 DESCRIPTION This module exports the various constants and constructors that are useful throughout LaTeXML, and in Package implementations. =head2 Global state =over 4 =item C<< $STATE; >> This is bound to the currently active L by an instance of L during processing. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/LaTeXML.catalog0000644000175000017500000000436512507513572017633 0ustar norbertnorbert latexml-0.8.1/lib/LaTeXML/MathGrammar0000644000175000017500000011203612507513572017207 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::MathGrammar | # # | LaTeXML's Math Grammar for postprocessing | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # # ================================================================================ # LaTeXML's MathGrammar. # To compile : # perl -MParse::RecDescent - MathGrammar LaTeXML::MathGrammar # ================================================================================ # Startup actions: import the constructors { BEGIN{ use LaTeXML::MathParser qw(:constructors); #### $::RD_TRACE=1; }} # Rules section # ======================================== # Naming Conventions: # UPPERCASE : is for terminals, ie. classes of TeX tokens. # Initial Cap : for non-terminal rules that can possibly be invoked externally. # Initial lowercase : internal rules. # ======================================== # For internal rules # moreFoos[$foo] : Looks for more Foo's w/appropriate punctuation or operators, # whatever is appropriate, and combines it with whatever was passed in # as pattern arg. Typically, the last clause would be simply # | { $arg[0]; } # to return $foo without having found any more foo's. # In such a case, it appears to be advantageous to have the first clause be # : /^\Z/ { $arg[0]; } # which will return immediately if there is no additional input. # addFoo[$bar] : Check for a following Foo and add it, as appropriate to # the $bar. # ======================================== # Note that Parse:RecDescent does NOT backtrack within a rule: # If a given production succeeds, the rule succeeds, but even if the ultimate # parse fails, the parser will NOT go back and try another production within # that same rule!!! Of course, if a production fails, it goes on to the next, # and if that rule fails, etc... # # For example ||a|-|b|| won't work (in spite of various attempts to control it) # After seeing the initial || and attempting to parse an Expression, it gets # a * abs( - abs(b)) # without anything to match the initial ||; and it will NOT backtrack to try # a shorter Expression! # #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Top Level expressions; Just about anything? #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Note in particular, that many inline formula contain `half' a formula, # with the lead-in text effectively being the LHS. eg. function $=foo$; # similarly you can end up with a missing RHS, $x=$ even. Start : Anything /^\Z/ { $item[1]; } #====================================================================== Anything : Anything : AnythingAny /^\Z/ { $item[1]; } #====================================================================== AnythingAny : Formulae | OPEN Formulae CLOSE { Fence($item[1],$item[2],$item[3]); } | modifierFormulae | OPEN modifierFormula CLOSE { Fence($item[1],$item[2],$item[3]); } | MODIFIER | MODIFIEROP Expression { Apply($item[1],Absent(),$item[2]);} | METARELOP Formula { Apply($item[1],Absent(),$item[2]); } | AnyOp (PUNCT(?) AnyOp {[$item[1]->[0]||InvisibleComma(), $item[2]]})(s) { NewList($item[1],map(@$_,@{$item[2]})); } | FLOATSUPERSCRIPT POSTSUBSCRIPT { NewScript(NewScript(Absent(),$item[1]),$item[2]); } | FLOATSUBSCRIPT POSTSUPERSCRIPT { NewScript(NewScript(Absent(),$item[1]),$item[2]); } | FLOATSUPERSCRIPT { NewScript(Absent(),$item[1]); } | FLOATSUBSCRIPT { NewScript(Absent(),$item[1]); } | AnyOp Expression { Apply($item[1],Absent(),$item[2]);} # a top level rule for sub and superscripts that can accept all sorts of junk. Subscript : Subscript : aSubscript (PUNCT(?) aSubscript {[$item[1]->[0] || InvisibleComma(),$item[2]]; })(s?) { NewList($item[1],map(@$_,@{$item[2]})); } Superscript : Superscript : aSuperscript (PUNCT(?) aSuperscript {[$item[1]->[0] || InvisibleComma(),$item[2]]; })(s?) { NewList($item[1],map(@$_,@{$item[2]})); } aSubscript : Formulae | AnyOp Expression { Apply($item[1],Absent(),$item[2]);} | AnyOp aSuperscript : supops | Formulae | AnyOp Expression { Apply($item[1],Absent(),$item[2]);} | AnyOp #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Formulae (relations or grouping of expressions or relations) #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # This maze attempts to recognize the various meaningful(?) alternations of # Expression(s) separated by punctuation, relational operators or metarelational # operators [Think of $a=b=c$ vs $a=b, c=d$ vs. $a=b,c,d$ .. ] # and group them into Formulae (collections of relations), including relations # which have punctuated collections of Expression(s) on either the LHS or RHS, # as well as `multirelation' like a = b = c, or simply punctuated collections of # Expression(s) Formulae : Formula moreFormulae[$item[1]] # moreFormulae[$formula]; Got a Formula, what can follow? moreFormulae : /^\Z/ { $arg[0];} # short circuit! | (endPunct Formula { [$item[1],$item[2]]; })(s) { NewFormulae($arg[0],map(@$_,@{$item[1]})); } | metarelopFormula(s) { NewFormula($arg[0],map(@$_,@{$item[1]})); } | { $arg[0]; } # Punctuation that ends a formula endPunct : PUNCT | PERIOD Formula : Expression extendFormula[$item[1]] # extendFormula[$expression] ; expression might be followed by punct Expression... # or relop Expression... or arrow Expression or nothing. extendFormula : /^\Z/ { $arg[0];} # short circuit! | punctExpr(s) maybeRHS[$arg[0],map(@$_,@{$item[1]})] | relop Expression moreRHS[$arg[0],$item[1],$item[2]] | relop /^\Z/ { NewFormula($arg[0],$item[1], Absent()); } | { $arg[0]; } # maybeRHS[$expr,(punct,$expr)*]; # Could have RELOP Expression (which means the (collected LHS) relation RHS) # or done (just collection) maybeRHS : /^\Z/ { NewList(@arg); } | relopExpr(s) { NewFormula(NewList(@arg),map(@$_,@{$item[1]})); } | { NewList(@arg); } # --- either line could be followed by (>0) # For the latter, does a,b,c (<0) mean c<0 or all of them are <0 ???? # moreRHS[$expr,$relop,$expr]; Could have more (relop Expression) # or (punct Expression)* moreRHS : /^\Z/ { NewFormula($arg[0],$arg[1],$arg[2]); } # short circuit! | PUNCT Expression maybeColRHS[@arg,$item[1],$item[2]] | relopExpr(s?) { NewFormula($arg[0],$arg[1],$arg[2], map(@$_,@{$item[1]})); } # --- 1st line could be preceded by (>0) IF it ends up end of formula # --- 2nd line could be followed by (>0) # maybeColRHS[$expr,$relop,$expr,(punct, $expr)*]; # Could be done, get punct (collection) or rel Expression (another formula) maybeColRHS : /^\Z/ { NewFormula($arg[0],$arg[1],NewList(@arg[2..$#arg])); } | relop Expression moreRHS[$arg[$#arg],$item[1],$item[2]] { NewFormulae(NewFormula($arg[0],$arg[1], NewList(@arg[2..$#arg-2])),$arg[$#arg-1],$item[3]); } | PUNCT Expression maybeColRHS[@arg,$item[1],$item[2]] | { NewFormula($arg[0],$arg[1],NewList(@arg[2..$#arg])); } # --- 1st line handles it through more RHS ??? # --- 2nd line could be preceded by (>0) if it ends formula # --- 3rd line could be followed by (>0) punctExpr : PUNCT Expression { [$item[1],$item[2]]; } relopExpr : relop Expression { [$item[1],$item[2]]; } | relop /^\Z/ { [$item[1], Absent()]; } metarelopFormula : METARELOP Formula { [$item[1],$item[2]]; } | METARELOP /^\Z/ { [$item[1], Absent()]; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # `Modifier' formula, things like $<0$, that might follow another formula or text. # Absent() is a placeholder for the missing thing... (?) # [and also when the LHS is moved away, due to alignment rearrangement] modifierFormulae : modifierFormula moreFormulae[$item[1]] modifierFormula : relop Expression moreRHS[Absent(),$item[1],$item[2]] #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Expressions; sums of terms # Abstractly, things combined by operators binding tighter than relations #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Expressions : Expression punctExpr(s?) { NewList($item[1],map(@$_,@{$item[2]})); } Expression : SignedTerm moreTerms[[],$item[1]] addExpressionModifier[$item[2]] # # very tentatively allow an operator as a complete expression # # BUT, this should only suceed if at end, or followed by punctuation!!!!!!! # # (or CLOSE, or... ?!?!?!?) | AnyOp ...anyOpIsolator { $item[1]; } anyOpIsolator : /^\Z/ | PUNCT | CLOSE # moreTerms[ [($term,$addop)*], $term]; Check for more addop & term's moreTerms : /^\Z/ { LeftRec(@{$arg[0]},$arg[1]); } # short circuit! | AddOp moreTerms2[$arg[0],$arg[1],$item[1]] | { LeftRec(@{$arg[0]},$arg[1]); } # moreTerms2[ [($term,$addop)*], $term, $addop]; Check if addop is followed # by another term, or if not, it presumably represents a limiting form # like "a+" (ie a from above) moreTerms2 : Term moreTerms[ [@{$arg[0]},$arg[1],$arg[2]],$item[1] ] | { LeftRec(@{$arg[0]},Apply(New('limit-from'),$arg[1],$arg[2])); } addExpressionModifier : /^\Z/ { $arg[0];} # short circuit! | PUNCT(?) OPEN relop Expression balancedClose[$item[2]] { Apply(New('annotated'),$arg[0], Fence($item[2], Apply($item[3],Absent(),$item[4]),$item[5])); } # An alternative form would have OPEN Expression relop... # but that seems less like a "modifier" and more like a relation as argument! ### | PUNCT(?) OPEN Expression relop Expression ### moreRHS[$item[3],$item[4],$item[5]] balancedClose[$item[2]] ### { Apply(New('annotated'),$arg[0],Fence($item[2],$item[6],$item[7])); } | PUNCT(?) OPEN MODIFIEROP Expression balancedClose[$item[2]] { Apply($item[3],$arg[0],$item[4]); } # Is the punctuation Lost here? | MODIFIER { Apply(New('annotated'),$arg[0],$item[1]); } | MODIFIEROP Expression { Apply($item[1],$arg[0],$item[2]); } | { $arg[0]; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Terms: products of factors # Abstractly, things combined by operators binding tighter than addition #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% SignedTerm : AddOp Term { Apply($item[1],$item[2]); } | Term Term : Factor moreFactors[$item[1]] moreFactors : /^\Z/ { $arg[0];} # short circuit! | MulOp Factor moreFactors[ApplyNary($item[1],$arg[0],$item[2])] # Given an explicit COMPOSEOP, we'll assume the preceding is # an implicit lambda of some sort(?) | COMPOSEOP makeComposition[$arg[0],$item[1]] | { ($forbidEvalAt ? undef : 1); } evalAtOp maybeEvalAt[$arg[0],$item[2]] | Factor moreFactors[ApplyNary(InvisibleTimes(),$arg[0],$item[1])] | { $arg[0]; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Factors: function applications, postfix on atoms, etc. # Abstractly, things combined by operators binding tighter than multiplication #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Factor : # These 2nd two are Iffy; hopefully the 1st rule will protect from backtrack? OPEN ARRAY CLOSE addScripts[Fence($item[1],$item[2],$item[3])] # perhaps only when OPEN or CLOSED is { or } ?? # should be explicitly {, and moreover the array should be only 1 or 2 columns! | LBRACE ARRAY { InterpretDelimited(New('cases'),$item[1],$item[2],Absent()); } | ARRAY RBRACE { InterpretDelimited(New('cases'),Absent(),$item[1],$item[2]); } | preScripted['FUNCTION'] addArgs[$item[1]] | preScripted['OPFUNCTION'] addOpFunArgs[$item[1]] | preScripted['TRIGFUNCTION'] addTrigFunArgs[$item[1]] | preScripted['ATOM_OR_ID'] maybeArgs[$item[1]] | preScripted['UNKNOWN'] doubtArgs[$item[1]] | NUMBER addScripts[$item[1]] | SCRIPTOPEN scriptFactorOpen[$item[1]] | OPEN factorOpen[$item[1]] | preScripted['bigop'] addOpArgs[$item[1]] | { ($forbidVertBar ? undef : 1); } SINGLEVERTBAR SINGLEVERTBAR absExpression SINGLEVERTBAR SINGLEVERTBAR # || exp || ==> norm addScripts[Fence(New(undef,'||',role=>'OPEN'),$item[4],New(undef,'||',role=>'CLOSE'))] | { ($forbidVertBar ? undef : 1); } VERTBAR absExpression VERTBAR # | exp | => absolute-value addScripts[Fence($item[2],$item[3],$item[4])] | { ($forbidVertBar ? undef : IsNotationAllowed('QM')); } VERTBAR ketExpression RANGLE { SawNotation('QM'); } # | exp > ==> ket addScripts[InterpretDelimited(New('ket'), Annotate($item[2],role=>'OPEN'),$item[3],Annotate($item[4],role=>'CLOSE'))] # ket | { IsNotationAllowed('QM'); } LANGLE ketExpression VERTBAR maybeBra[$item[2],$item[3],$item[4]] | { IsNotationAllowed('QM'); } LANGLE absExpression RANGLE addScripts[Fence(Annotate($item[2],role=>'OPEN'), $item[3], Annotate($item[4],role=>'CLOSE'))] | OPERATOR addScripts[$item[1]] nestOperators[$item[2]] addOpFunArgs[$item[3]] ATOM_OR_ID : ATOM | ID | ARRAY # A restricted sort of Factor for the unparenthesized argument to a function. # Note f g h => f*g*h, but f g h x => f(g(h(x))) Seems like what people mean... # Should there be a special case for trigs? barearg : aBarearg moreBareargs[$item[1]] aBarearg : preScripted['FUNCTION'] addArgs[$item[1]] | preScripted['OPFUNCTION'] addOpFunArgs[$item[1]] | preScripted['TRIGFUNCTION'] addTrigFunArgs[$item[1]] | preScripted['ATOM_OR_ID'] maybeArgs[$item[1]] | preScripted['UNKNOWN'] doubtArgs[$item[1]] | NUMBER addScripts[$item[1]] | VERTBAR Expression VERTBAR addScripts[Fence($item[1],$item[2],$item[3])] moreBareargs : /^\Z/ { $arg[0];} # short circuit! | MulOp aBarearg moreBareargs[ApplyNary($item[1],$arg[0],$item[2])] | aBarearg moreBareargs[ApplyNary(InvisibleTimes(),$arg[0],$item[1])] | { $arg[0]; } # A variation that does not allow a bare trig function trigBarearg : aTrigBarearg moreTrigBareargs[$item[1]] aTrigBarearg : preScripted['FUNCTION'] addArgs[$item[1]] | preScripted['OPFUNCTION'] addOpFunArgs[$item[1]] | preScripted['ATOM_OR_ID'] maybeArgs[$item[1]] | preScripted['UNKNOWN'] doubtArgs[$item[1]] | NUMBER addScripts[$item[1]] | VERTBAR Expression VERTBAR addScripts[Fence($item[1],$item[2],$item[3])] moreTrigBareargs : /^\Z/ { $arg[0];} # short circuit! | MulOp aTrigBarearg moreTrigBareargs[ApplyNary($item[1],$arg[0],$item[2])] | aTrigBarearg moreTrigBareargs[ApplyNary(InvisibleTimes(),$arg[0],$item[1])] | { $arg[0]; } # maybeEvalAt[$thing,$at_op] maybeEvalAt : POSTSUBSCRIPT moreEvalAt[$arg[0],$arg[1],Arg($item[1],0)] | POSTSUPERSCRIPT POSTSUBSCRIPT moreFactors[Apply(New('evaluated-at'),$arg[0],Arg($item[2],0),Arg($item[1],0))] # maybeEvalAt[$thing,$atop,$sub] moreEvalAt : POSTSUPERSCRIPT moreFactors[Apply(New('evaluated-at'),$arg[0],$arg[2],Arg($item[1],0))] | moreFactors[Apply(New('evaluated-at'),$arg[0],$arg[2])] #====================================================================== # After < a | we might be done, or get or # <$expr | maybeBra[$langle,$expr,$bar] maybeBra : ketExpression maybeBraket[$arg[0],$arg[1],$arg[2],$item[1]] | { SawNotation('QM'); } addScripts[InterpretDelimited(New('bra'), Annotate($arg[0],role=>'OPEN'),$arg[1],Annotate($arg[2],role=>'CLOSE'))] # <$expr1|$expr2 maybeBraket[$langle,$expr1,$bar,$expr2] maybeBraket : RANGLE { SawNotation('QM'); } addScripts[InterpretDelimited(New('inner-product', undef,role=>'MIDDLE'), Annotate($arg[0],role=>'OPEN'),$arg[1], Annotate($arg[2],role=>'MIDDLE'), $arg[3],Annotate($item[1],role=>'CLOSE'))] | VERTBAR ketExpression RANGLE { SawNotation('QM'); } addScripts[InterpretDelimited(New('quantum-operator-product',undef), # Is this a good representation? Annotate($arg[0],role=>'OPEN'),$arg[1], Annotate($arg[2],role=>'CLOSE'), $arg[3], Annotate($item[1],role=>'OPEN'),$item[2], Annotate($item[3],role=>'CLOSE'))] # bra's and ket's (ie ) can contain a rather wide variety of things # from simple symbols to full (but typically short) formula, and so we # want to use the Formulae production. However, for that to work, # we need to keep |, < and > (which delimit the bra & ket) from being # interpreted as usual, otherwise the parse will walk off the end, or # fail at a level that precludes backtracking. ketExpression : ketExpression : ketExpression : Formulae | METARELOP | ARROW | AddOp | MulOp | MODIFIEROP #====================================================================== # absExpression; need to be careful about misinterpreting the next | # since we can't backtrack across productions. # Disable evalAt notation ( |_{x=0} ) and explicitly control abs nesting. absExpression : absExpression : absExpression : { ($MaxAbsDepth >= 0 ? 1 : (SawNotation('AbsFail')&& undef)); } Expression #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Adding pre|post sub|super scripts to various things. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # addScripts[$base] ; adds any following sub/super scripts to $base. addScripts : /^\Z/ { $arg[0];} # short circuit! | POSTSUPERSCRIPT addScripts[NewScript($arg[0],$item[1])] | POSTSUBSCRIPT addScripts[NewScript($arg[0],$item[1])] | POSTFIX addScripts[Apply($item[1],$arg[0])] | { $arg[0]; } # ================================================================================ # preScripted['RULE']; match a RULE possibly preceded by sub/super prescripts, # possibly followed by sub/superscripts. The initial prescript can only be FLOAT # but the following ones can be either POST (which combine) or FLOAT (which don't) preScripted : FLOATSUPERSCRIPT inpreScripted[$arg[0]] { NewScript($item[2],$item[1], 'pre');} | FLOATSUBSCRIPT inpreScripted[$arg[0]] { NewScript($item[2],$item[1], 'pre');} | addScripts[$item[1]] inpreScripted : POSTSUPERSCRIPT inpreScripted[$arg[0]] { NewScript($item[2],$item[1], 'pre');} | POSTSUBSCRIPT inpreScripted[$arg[0]] { NewScript($item[2],$item[1], 'pre');} | FLOATSUPERSCRIPT inpreScripted[$arg[0]] { NewScript($item[2],$item[1], 'pre');} | FLOATSUBSCRIPT inpreScripted[$arg[0]] { NewScript($item[2],$item[1], 'pre');} | addScripts[$item[1]] #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Parenthetical: Things wrapped in OPEN .. CLOSE #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # ================================================================================ # Factors that begin with OPEN; grouped expressions and objects like sets, # intervals, etc. # factorOpen[$open] : Dealing with various things that start with an open. factorOpen : AddOp balancedClose[$arg[0]] addScripts[Fence($arg[0],$item[1],$item[2])] # For (-) # Parenthesized Operator possibly w/scripts | preScripted['bigop'] balancedClose[$arg[0]] addScripts[Fence($arg[0],$item[1],$item[2])] Factor { Apply($item[3],$item[4]); } # Parenthesized Operator including a pre-factor | Factor preScripted['bigop'] balancedClose[$arg[0]] addScripts[Fence($arg[0], Apply(InvisibleTimes(),$item[1],$item[2]),$item[3])] Factor { Apply($item[4],$item[5]); } # read expression too? match subcases. | Expression factorOpenExpr[$arg[0],$item[1]] # Empty OPEN CLOSE ? | balancedClose[$arg[0]] addScripts[Fence($arg[0],$item[1])] # Sequence starting with an operator ? | AnyOp factorOpenExpr[$arg[0],$item[1]] # factorOpenExpr[$open,$expr]; Try to recognize various things that start # this way. Need some extra productions for sets (w/possible middle '|' ) # and vectors; all n-ary. factorOpenExpr : # 2nd expression; some kind of pair, interval, set, whatever [Any CLOSE, NOT balancedClose] (PUNCT Expression { [$item[1],$item[2]]; })(s) CLOSE addScripts[Fence($arg[0],$arg[1],map(@$_,@{$item[1]}),$item[2])] # only 2 things and 2nd one is an op?; some kind of group??? | PUNCT AnyOp balancedClose[$arg[0]] addScripts[InterpretDelimited(New('group'), $arg[0],$arg[1],$item[1],$item[2],$item[3])] # parenthesized expression. | balancedClose[$arg[0]] addScripts[Fence($arg[0],$arg[1],$item[1])] # ================================================================================ # Sets special cases # A conditionalized set scriptFactorOpen : Formula suchThatOp Formulae balancedClose[$arg[0]] addScripts[InterpretDelimited(New('conditional-set'), $arg[0], $item[1],$item[2], $item[3],$item[4])] # Else fall through to normal factorOpen | factorOpen[$arg[0]] # The "such that" that can appear in a sets like {a "such that" predicate(a)} # accept vertical bars, and colon suchThatOp : MIDDLE | VERTBAR | /METARELOP:colon:\d+/ { Lookup($item[1]); } # ================================================================================ # Function args, etc. # maybeArgs[$function] ; Add arguments to an identifier, but only if made explict. maybeArgs : /^\Z/ { $arg[0];} # short circuit! | APPLYOP requireArgs[$arg[0]] | { $arg[0]; } # doubtArgs[$unknown]; Check for apparent arguments following an # Unknown (unclassified) item. If an explicit APPLYOP follows, # it seemingly asserts that the preceding _is_ a function, # otherwise Warn if there seems to be an arglist. doubtArgs : /^\Z/ { $arg[0];} # short circuit! | APPLYOP requireArgs[$arg[0]] | OPEN forbidArgs[$arg[0],$item[1]] | { $arg[0]; } # forbidArgs[$unknown,$open]; Got a suspicious pattern: an unknown and open. # If the following seems to be an argument list, warn. forbidArgs : Argument (argPunct Argument)(s) balancedClose[$arg[1]] { MaybeFunction($arg[0]); undef; } # Term really could be Argument, but that gives a "possible function" warning # even for a(b+c) which has a good reason for the parentheses; These patterns FAIL anyway!! | Term balancedClose[$arg[1]] { MaybeFunction($arg[0]); undef; } # requireArgs[$function]; Add arguments following a known function, failing if it # isn't there! Typically this follows an explicit applyop requireArgs : OPEN Argument (argPunct Argument {[$item[1],$item[2]];})(s?) balancedClose[$item[1]] { ApplyDelimited($arg[0],$item[1],$item[2], map(@$_,@{$item[3]}),$item[4]); } # Hmm, should only be applicable to _some_ functions ??? | barearg { Apply($arg[0],$item[1]); } # addArgs[$function]; We've got a function; Add following arguments to a # function, if present. Also recognizes compostion type ops (something # combining two functions into a function) addArgs : /^\Z/ { $arg[0];} # short circuit! | addEasyArgs[$arg[0]] # Accept bare arg (w/o parens) ONLY if an explicit APPLYOP | APPLYOP barearg { Apply($arg[0],$item[2]);} | { $arg[0]; } # Just return the function itself,then. # addOpFunArgs[$function]; Same as above but for functions classified as # OPFUNCTION. Ie operator-like functions such as \sin, that don't # absolutely require parens around args. addOpFunArgs : /^\Z/ { $arg[0];} # short circuit! | addEasyArgs[$arg[0]] # Accept bare arg (w/o parens) for this class of functions. | APPLYOP(?) barearg { Apply($arg[0],$item[2]);} | { $arg[0]; } # Just return the function itself,then. # addTrigFunArgs[$function]; Yet another variation; # It differs in the barearg is restricted to non-trig addTrigFunArgs : /^\Z/ { $arg[0];} # short circuit! | addEasyArgs[$arg[0]] # Accept bare arg (w/o parens) for this class of functions. | APPLYOP(?) trigBarearg { Apply($arg[0],$item[2]);} | { $arg[0]; } # Just return the function itself,then. # addEasyArgs[$function]; gets unambiguous compositions or parenthesized arguments # These are the "easy" cases for addArgs and addOpFunArgs. addEasyArgs : COMPOSEOP makeComposition[$arg[0],$item[1]] | APPLYOP(?) OPEN Argument (argPunct Argument {[$item[1],$item[2]];})(s?) balancedClose[$item[2]] { ApplyDelimited($arg[0],$item[2],$item[3], map(@$_,@{$item[4]}),$item[5]); } # A function (or other) argument would normally be a simple expression, # but often relations (esp. Statistics) or arrows appear, so allow those as well. Argument : Expression extendArgument[$item[1]] # recognize some longer form "arguments"; things that may look like relations. extendArgument : /^\Z/ { $arg[0]; } # short circuit | relopExpr(s) extendArgument[NewFormula($arg[0],map(@$_,@{$item[1]}))] | METARELOP Formula extendArgument[Apply($item[1],$arg[0],$item[2])] | { $arg[0]; } # makeComposition[$thing,$comp]; Given something that presumably is a function, # and a composition operator, read another function and possibly args makeComposition : preScripted['FUNCTION'] addArgs[Apply($arg[1],$arg[0],$item[1])] { $item[2]; } | preScripted['OPFUNCTION'] addOpFunArgs[Apply($arg[1],$arg[0],$item[1])] { $item[2]; } | preScripted['TRIGFUNCTION'] addTrigFunArgs[Apply($arg[1],$arg[0],$item[1])] { $item[2]; } # Given an explicit composition operator, the next thing may safely(?) # be assumed to be a function, so treat it as such. | Factor addArgs[Apply($arg[1],$arg[0],$item[1])] { $item[2]; } # addOpArgs[$bigop]; Add following Term to a bigop, if present. addOpArgs : /^\Z/ { $arg[0];} # short circuit! # Is the APPLYOP getting "lost" here? | APPLYOP(?) Factor moreOpArgFactors[$item[2]] { Apply($arg[0],$item[3]);} | { $arg[0]; } # moreOpArgFactors[$factor1] : Similar to moreFactors, # but w/o evalAtOp since that most likely belongs to the operator, not # the factors. moreOpArgFactors : /^\Z/ { $arg[0];} # short circuit! | MulOp Factor moreOpArgFactors[ApplyNary($item[1],$arg[0],$item[2])] | Factor moreOpArgFactors[ApplyNary(InvisibleTimes(),$arg[0],$item[1])] | { $arg[0]; } # Punctuation separating function arguments; things marked MIDDLE could # also separate arguments # With great trepidation, I'm adding VERBAR here argPunct : PUNCT | MIDDLE | VERTBAR # ================================================================================ # Operator args, etc. # nestOperators[$operator*]; Nest a possible sequence of operators nestOperators : /^\Z/ { recApply(@arg); } | OPERATOR addScripts[$item[1]] nestOperators[@arg,$item[2]] | FUNCTION addScripts[$item[1]] { recApply(@arg,$item[2]); } | OPFUNCTION addScripts[$item[1]] { recApply(@arg,$item[2]); } | TRIGFUNCTION addScripts[$item[1]] { recApply(@arg,$item[2]); } | OPEN Expression balancedClose[$item[1]] { recApply(@arg[0..$#arg-1], ApplyDelimited($arg[$#arg],$item[1],$item[2],$item[3])); } | { recApply(@arg); } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # (slightly) structured operators #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Same as anyop, at the moment. AnyOp : relop | METARELOP | ARROW | AddOp | MulOp | MODIFIEROP | preScripted['bigop'] | OPERATOR addScripts[$item[1]] # Sub or superscripts on operators; # we recognize the structure, not necessarily the meaning AddOp : BINOP addOpDecoration[$item[1]] | ADDOP addOpDecoration[$item[1]] MulOp : BINOP addOpDecoration[$item[1]] | MULOP addOpDecoration[$item[1]] # (BINOP can never really be satisfactory; it comes from something marked # as \mathbin; we don't know any more about it) # Decorations for an operator; Same thing as addScripts, but not allowing POSTFIX addOpDecoration : /^\Z/ { $arg[0];} # short circuit! | POSTSUPERSCRIPT addOpDecoration[DecorateOperator($arg[0],$item[1])] | POSTSUBSCRIPT addOpDecoration[DecorateOperator($arg[0],$item[1])] | { $arg[0]; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Pseudo-Terminals. # Useful combinations or subsets of terminals. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # A generalized relational operator or arrow # Note we disallow < or > if we're parsing the contents of a bra or ket! relop : { ($forbidLRAngle ? 1 : undef); } /RELOP:(less|greater)-than:\d+/ | RELOP addOpDecoration[$item[1]] | ARROW addOpDecoration[$item[1]] # Check out whether diffop should be treated as bigop or operator # It depends on the binding bigop : BIGOP | SUMOP | INTOP | LIMITOP | DIFFOP operator: OPERATOR # SUPOP is really only \prime(s) (?) supops : SUPOP(s) { New(undef, join('',map($_->textContent,@{$item[1]})), name=>'prime'.scalar(@{$item[1]})); } # ================================================================================ # And some special cases... # Match a CLOSE that `corresponds' to the OPEN balancedClose : CLOSE { (isMatchingClose($arg[0],$item[1]) ? 1 : undef) } { $item[1]; } # The "evaluated at" operator, typically a vertical bar followed by a subscript # equation. But it is ofen used in \left. \right| pairs! evalAtOp : VERTBAR | /CLOSE:\|:\d+/ { Lookup($item[1]); } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Terminals / Lexer # These correspond to the TeX tokens. # The Lexer strings are of the form TYPE:NAME:NUMBER where # TYPE is the grammatical role, or part of speech, # NAME is the specific name (semantic or presentation) of the token # NUMBER is the position of the specific token in the current token sequence. # # NOTE: RecDescent doesn't clearly distinguish lexing from parsing # and so it allows us to interpret the same item as several distinct # terminals; Presumably other parsers would not allow this. # In a couple of cases, we have symbols that can be used in a few # different ways: # | as vertical bar, open or close, also as a close used for eval-at! # : as meta-relation, as such-that # <, > can be relop or part of brackets (eg. qm, etc) # Perhaps these symbols should get a special role reflecting it's specialness # and then have pseudo-terminals that combine (eg. relop == RELOP | langle) # This nibbles at the edge of the Ambiguity issue; if it turns out that # a multi-meaning symbol gets used in a particular way, we'd want to assure # that it's role, meaning, etc, gets changed to reflect the specific usage! # # Upon reflection, this implies that OPEN|CLOSE are rather awkward as roles. # \left< can be an OPEN _or_ RELOP #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ATOM : /ATOM:\S*:\d+/ { Lookup($item[1]); } UNKNOWN : /UNKNOWN:\S*:\d+/ { Lookup($item[1]); } ID : /ID:\S*:\d+/ { Lookup($item[1]); } ARRAY : /ARRAY:\S*:\d+/ { Lookup($item[1]); } NUMBER : /NUMBER:\S*:\d+/ { Lookup($item[1]); } PUNCT : /PUNCT:\S*:\d+/ { Lookup($item[1]); } PERIOD : /PERIOD:\S*:\d+/ { Lookup($item[1]); } RELOP : /RELOP:\S*:\d+/ { Lookup($item[1]); } LANGLE : /RELOP:less-than:\d+/ { Lookup($item[1]); } | /OPEN:langle:\d+/ { Lookup($item[1]); } RANGLE : /RELOP:greater-than:\d+/ { Lookup($item[1]); } | /CLOSE:rangle:\d+/ { Lookup($item[1]); } LBRACE : /OPEN:\{:\d+/ { Lookup($item[1]); } RBRACE : /CLOSE:\}:\d+/ { Lookup($item[1]); } METARELOP : /METARELOP:\S*:\d+/ { Lookup($item[1]); } MODIFIEROP : /MODIFIEROP:\S*:\d+/ { Lookup($item[1]); } MODIFIER : /MODIFIER:\S*:\d+/ { Lookup($item[1]); } ARROW : /ARROW:\S*:\d+/ { Lookup($item[1]); } ADDOP : /ADDOP:\S*:\d+/ { Lookup($item[1]); } MULOP : /MULOP:\S*:\d+/ { Lookup($item[1]); } BINOP : /BINOP:\S*:\d+/ { Lookup($item[1]); } POSTFIX : /POSTFIX:\S*:\d+/ { Lookup($item[1]); } FUNCTION : /FUNCTION:\S*:\d+/ { Lookup($item[1]); } OPFUNCTION : /OPFUNCTION:\S*:\d+/ { Lookup($item[1]); } TRIGFUNCTION : /TRIGFUNCTION:\S*:\d+/ { Lookup($item[1]); } APPLYOP : /APPLYOP:\S*:\d+/ { Lookup($item[1]); } COMPOSEOP : /COMPOSEOP:\S*:\d+/ { Lookup($item[1]); } SUPOP : /SUPOP:\S*:\d+/ { Lookup($item[1]); } OPEN : /OPEN:\S*:\d+/ { Lookup($item[1]); } SCRIPTOPEN : /OPEN:\{:\d+/ { Lookup($item[1]); } CLOSE : /CLOSE:\S*:\d+/ { Lookup($item[1]); } MIDDLE : /MIDDLE:\S*:\d+/ { Lookup($item[1]); } VERTBAR : /VERTBAR:\S*:\d+/ { Lookup($item[1]); } SINGLEVERTBAR : /VERTBAR:\|:\d+/ { Lookup($item[1]); } BIGOP : /BIGOP:\S*:\d+/ { Lookup($item[1]); } SUMOP : /SUMOP:\S*:\d+/ { Lookup($item[1]); } INTOP : /INTOP:\S*:\d+/ { Lookup($item[1]); } LIMITOP : /LIMITOP:\S*:\d+/ { Lookup($item[1]); } DIFFOP : /DIFFOP:\S*:\d+/ { Lookup($item[1]); } OPERATOR : /OPERATOR:\S*:\d+/ { Lookup($item[1]); } ##DIFF : /DIFF:\S*:\d+/ { Lookup($item[1]); } POSTSUBSCRIPT : /POSTSUBSCRIPT:\S*:\d+/ { Lookup($item[1]); } POSTSUPERSCRIPT : /POSTSUPERSCRIPT:\S*:\d+/ { Lookup($item[1]); } FLOATSUPERSCRIPT : /FLOATSUPERSCRIPT:\S*:\d+/ { Lookup($item[1]); } FLOATSUBSCRIPT : /FLOATSUBSCRIPT:\S*:\d+/ { Lookup($item[1]); } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% latexml-0.8.1/lib/LaTeXML/MathParser.pm0000644000175000017500000016432612507513572017501 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::MathParser | # # | Parse Math | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # # ================================================================================ # LaTeXML::MathParser Math Parser for LaTeXML using Parse::RecDescent. # Parse the intermediate representation generated by the TeX processor. # ================================================================================ package LaTeXML::MathParser; use strict; use warnings; use Parse::RecDescent; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Common::Font; use LaTeXML::Common::XML; use base (qw(Exporter)); our @EXPORT_OK = (qw(&Lookup &New &Absent &Apply &ApplyNary &recApply &Annotate &InvisibleTimes &InvisibleComma &NewFormulae &NewFormula &NewList &ApplyDelimited &NewScript &DecorateOperator &InterpretDelimited &LeftRec &Arg &MaybeFunction &SawNotation &IsNotationAllowed &isMatchingClose &Fence)); our %EXPORT_TAGS = (constructors => [qw(&Lookup &New &Absent &Apply &ApplyNary &recApply &Annotate &InvisibleTimes &InvisibleComma &NewFormulae &NewFormula &NewList &ApplyDelimited &NewScript &DecorateOperator &InterpretDelimited &LeftRec &Arg &MaybeFunction &SawNotation &IsNotationAllowed &isMatchingClose &Fence)]); # ================================================================================ sub new { my ($class, %options) = @_; require LaTeXML::MathGrammar; my $internalparser = LaTeXML::MathGrammar->new(); Fatal("expected", "MathGrammar", undef, "Compilation of Math Parser grammar failed") unless $internalparser; my $self = bless { internalparser => $internalparser }, $class; return $self; } sub parseMath { my ($self, $document, %options) = @_; local $LaTeXML::MathParser::DOCUMENT = $document; $self->clear; # Not reentrant! if (my @math = $document->findnodes('descendant-or-self::ltx:XMath[not(ancestor::ltx:XMath)]')) { NoteBegin("Math Parsing"); NoteProgress(scalar(@math) . " formulae ..."); #### SEGFAULT TEST #### $document->doctest("before parse",1); foreach my $math (@math) { $self->parse($math, $document); } NoteProgress("\nMath parsing succeeded:" . join('', map { "\n $_: " . $$self{passed}{$_} . "/" . ($$self{passed}{$_} + $$self{failed}{$_}) } grep { $$self{passed}{$_} + $$self{failed}{$_} } keys %{ $$self{passed} }) . "\n"); if (my @unk = keys %{ $$self{unknowns} }) { NoteProgress("Symbols assumed as simple identifiers (with # of occurences):\n " . join(', ', map { "'$_' ($$self{unknowns}{$_})" } sort @unk) . "\n"); } if (my @funcs = keys %{ $$self{maybe_functions} }) { NoteProgress("Possibly used as functions?\n " . join(', ', map { "'$_' ($$self{maybe_functions}{$_}/$$self{unknowns}{$_} usages)" } sort @funcs) . "\n"); } #### SEGFAULT TEST #### $document->doctest("IN scope",1); NoteEnd("Math Parsing"); } #### SEGFAULT TEST #### $document->doctest("OUT of scope",1); return $document; } sub getQName { my ($node) = @_; if (ref $node eq 'ARRAY') { return $$node[0]; } else { return $LaTeXML::MathParser::DOCUMENT->getModel->getNodeQName($node); } } sub realizeXMNode { my ($node) = @_; my $doc = $LaTeXML::MathParser::DOCUMENT; my $idref; if (!defined $node) { return; } elsif (ref $node eq 'ARRAY') { $idref = $$node[1]{idref} if $$node[0] eq 'ltx:XMRef'; } elsif (ref $node eq 'XML::LibXML::Element') { $idref = $node->getAttribute('idref') if $doc->getModel->getNodeQName($node) eq 'ltx:XMRef'; } if ($idref) { # Can it happen that $realnode is, itself, an XMRef? Then we should recurse! if (my $realnode = $doc->lookupID($idref)) { return $realnode; } else { Error("expected", $idref, undef, "Cannot find a node with xml:id='$idref'"); return ['ltx:ERROR', {}, "Missing XMRef idref=$idref"]; } } else { return $node; } } # ================================================================================ sub clear { my ($self) = @_; $$self{passed} = { 'ltx:XMath' => 0, 'ltx:XMArg' => 0, 'ltx:XMWrap' => 0 }; $$self{failed} = { 'ltx:XMath' => 0, 'ltx:XMArg' => 0, 'ltx:XMWrap' => 0 }; $$self{unknowns} = {}; $$self{maybe_functions} = {}; $$self{n_parsed} = 0; return; } sub token_prettyname { my ($node) = @_; my $name = $node->getAttribute('name'); if (defined $name) { } elsif ($name = $node->textContent) { my $font = $LaTeXML::MathParser::DOCUMENT->getNodeFont($node); my %attr = $font->relativeTo(LaTeXML::Common::Font->textDefault); my $desc = join(' ', map { ToString($attr{$_}{value}) } keys %attr); $name .= "{$desc}" if $desc; } else { $name = Stringify($node); } # what else ???? return $name; } sub note_unknown { my ($self, $node) = @_; my $name = token_prettyname($node); $$self{unknowns}{$name}++; return; } # debugging utility, should be somewhere handy. sub printNode { my ($node) = @_; if (ref $node eq 'ARRAY') { my ($tag, $attr, @children) = @$node; my @keys = sort keys %$attr; return "<$tag" . (@keys ? ' ' . join(' ', map { "$_='$$attr{$_}'" } @keys) : '') . (@children ? ">\n" . join('', map { printNode($_) } @children) . "" : '/>') . "\n"; } else { return ToString($node); } } # ================================================================================ # Some more XML utilities, but math specific (?) # Get the Token's meaning, else name, else content, else role sub getTokenMeaning { my ($node) = @_; my $x; $node = realizeXMNode($node); return (defined($x = $node->getAttribute('meaning')) ? $x : (defined($x = $node->getAttribute('name')) ? $x : (($x = $node->textContent) ne '' ? $x : (defined($x = $node->getAttribute('role')) ? $x : undef)))); } sub node_location { my ($node) = @_; my $n = $node; while ($n && (ref $n !~ /^XML::LibXML::Document/) # Sometimes DocuementFragment ??? && !$n->getAttribute('refnum') && !$n->getAttribute('labels')) { $n = $n->parentNode; } if ($n && (ref $n !~ /^XML::LibXML::Document/)) { my ($r, $l) = ($n->getAttribute('refnum'), $n->getAttribute('labels')); return ($r && $l ? "$r ($l)" : $r || $l); } else { return 'Unknown'; } } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Parser #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Top-level per-formula parse. # We do a depth-first traversal of the content of the XMath element, # since various sub-elements (XMArg & XMWrap) act as containers of # nominally complete subexpressions. # We do these first for two reasons. # Firstly, since after parsing, the parent will be rebuilt from the result, # we lose the node "identity"; ie. we can't find the child to replace it! # Secondly, in principle (although this isn't used yet), parsing the # child could reveal something interesting about it; say, it's effective role. # Then, this information could be used when parsing the parent. # In fact, this could work the other way too; parsing the parent could tell # us something about what the child must be.... sub parse { my ($self, $xnode, $document) = @_; local $LaTeXML::MathParser::STRICT = 1; local $LaTeXML::MathParser::WARNED = 0; local $LaTeXML::MathParser::XNODE = $xnode; local $LaTeXML::MathParser::PUNCTUATION = {}; local $LaTeXML::MathParser::LOSTNODES = {}; if (my $result = $self->parse_rec($xnode, 'Anything,', $document)) { # Add text representation to the containing Math element. my $p = $xnode->parentNode; # This is a VERY screwy situation? How can the parent be a document fragment?? # This has got to be a LibXML bug??? if ($p->nodeType == XML_DOCUMENT_FRAG_NODE) { my @n = $p->childNodes; if (scalar(@n) == 1) { $p = $n[0]; } else { Fatal('malformed', '', $xnode, "XMath node has DOCUMENT_FRAGMENT for parent!"); } } # HACK: replace XMRef's to stray trailing punctution foreach my $id (keys %$LaTeXML::MathParser::PUNCTUATION) { my $r = $$LaTeXML::MathParser::PUNCTUATION{$id}->cloneNode; $r->removeAttribute('xml:id'); foreach my $n ($document->findnodes("descendant-or-self::ltx:XMRef[\@idref='$id']", $p)) { $document->replaceTree($r, $n); } } foreach my $id (keys %$LaTeXML::MathParser::LOSTNODES) { foreach my $n ($document->findnodes("descendant-or-self::ltx:XMRef[\@idref='$id']", $p)) { $document->setAttribute($n, idref => $$LaTeXML::MathParser::LOSTNODES{$id}); } } $p->setAttribute('text', text_form($result)); } return; } my %TAG_FEEDBACK = ('ltx:XMArg' => 'a', 'ltx:XMWrap' => 'w'); # [CONSTANT] # Recursively parse a node with some internal structure # by first parsing any structured children, then it's content. sub parse_rec { my ($self, $node, $rule, $document) = @_; $self->parse_children($node, $document); # This will only handle 1 layer nesting (successfully?) # Note that this would have been found by the top level xpath, # but we've got to worry about node identity: the parent is being rebuilt foreach my $nested ($document->findnodes('descendant::ltx:XMath', $node)) { $self->parse($nested, $document); } my $tag = getQName($node); if (my $requested_rule = $node->getAttribute('rule')) { $rule = $requested_rule; } if (my $result = $self->parse_single($node, $document, $rule)) { $$self{passed}{$tag}++; if ($tag eq 'ltx:XMath') { # Replace the content of XMath with parsed result NoteProgress('[' . ++$$self{n_parsed} . ']'); map { $document->unRecordNodeIDs($_) } element_nodes($node); # unbindNode followed by (append|replace)Tree (which removes ID's) should be safe map { $_->unbindNode() } $node->childNodes; $document->appendTree($node, $result); $result = [element_nodes($node)]->[0]; } else { # Replace the whole node for XMArg, XMWrap; preserve some attributes NoteProgressDetailed($TAG_FEEDBACK{$tag} || '.'); # Copy all attributes my $resultid = p_getAttribute($result, 'xml:id'); my %attr = map { (getQName($_) => $_->getValue) } grep { $_->nodeType == XML_ATTRIBUTE_NODE } $node->attributes; # add to result, even allowing modification of xml node, since we're committed. # [Annotate converts node to array which messes up clearing the id!] my $isarr = ref $result eq 'ARRAY'; my $rtag = ($isarr ? $$result[0] : $document->getNodeQName($result)); # Make sure font is "Appropriate", if we're creating a new token (yuck) if ($isarr && $attr{_font} && ($rtag eq 'ltx:XMTok')) { my $content = join('', @$result[2 .. $#$result]); if ((!defined $content) || ($content eq '')) { delete $attr{_font}; } # No font needed elsif (my $font = $document->decodeFont($attr{_font})) { delete $attr{_font}; $attr{font} = $font->specialize($content); } } else { delete $attr{_font}; } foreach my $key (keys %attr) { next unless ($key =~ /^_/) || $document->canHaveAttribute($rtag, $key); my $value = $attr{$key}; if ($key eq 'xml:id') { # Since we're moving the id...bookkeeping $document->unRecordID($value); $node->removeAttribute('xml:id'); } if ($isarr) { $$result[1]{$key} = $value; } else { $document->setAttribute($result, $key => $value); } } $result = $document->replaceTree($result, $node); my $newid = $attr{'xml:id'}; # Danger: the above code replaced the id on the parsed result with the one from XMArg,.. # If there are any references to $resultid, we need to point them to $newid! if ($resultid && $newid && ($resultid ne $newid)) { foreach my $ref ($document->findnodes("//*[\@idref='$resultid']")) { $ref->setAttribute(idref => $newid); } } } return $result; } else { $self->parse_kludge($node, $document); if ($tag eq 'ltx:XMath') { NoteProgress('[F' . ++$$self{n_parsed} . ']'); } elsif ($tag eq 'ltx:XMArg') { NoteProgressDetailed('-a'); } $$self{failed}{$tag}++; return; } } # Depth first parsing of XMArg nodes. sub parse_children { my ($self, $node, $document) = @_; foreach my $child (element_nodes($node)) { my $tag = getQName($child); if ($tag eq 'ltx:XMArg') { $self->parse_rec($child, 'Anything', $document); } elsif ($tag eq 'ltx:XMWrap') { local $LaTeXML::MathParser::STRICT = 0; $self->parse_rec($child, 'Anything', $document); } ### A nice evolution would be to use the Kludge parser for ### the presentation form in XMDual ### This would avoid silly "parses" of non-semantic stuff; eg assuming times between tokens! ### However, it needs some experimentation to match DLMF's enhancements #### $self->parse_children($child,$document); #### $self->parse_kludge($child,$document); } elsif ($tag =~ /^ltx:(XMApp|XMArray|XMRow|XMCell)$/) { $self->parse_children($child, $document); } elsif ($tag eq 'ltx:XMDual') { $self->parse_children($child, $document); } } return; } my $HINT_PUNCT_THRESHOLD = 10.0; # \quad or bigger becomes punctuation ? [CONSTANT] sub filter_hints { my ($self, $document, @nodes) = @_; my @prefiltered = (); my $prev = undef; my $pending_comments = ''; my $pending_space = 0.0; # Filter the nodes, watching for XMHint's and Comments. foreach my $node (@nodes) { my $type = $node->nodeType; if ($type == XML_TEXT_NODE) { # text? better be (ignorable) whitespace! my $string = $node->textContent; if ($string =~ /\S/) { Warn('unexpected', 'text', $node, "Unexpected text in Math tree"); } } elsif ($type == XML_COMMENT_NODE) { # Comment! my $comment = $node->getData; if ($prev) { # Append to previous element's comments my $c = $prev->getAttribute('_comment'); $prev->setAttribute(_comment => ($c ? $c . "\n" . $comment : $comment)); } else { # Or save for first $pending_comments = ($pending_comments ? $pending_comments . "\n" . $comment : $comment); } } elsif ($type != XML_ELEMENT_NODE) { Warn('unexpected', 'node', $node, "Unexpected item in Math tree"); } elsif (getQName($node) eq 'ltx:XMHint') { # If a Hint node? if (my $width = $node->getAttribute('width')) { if (my $pts = getXMHintSpacing($width)) { if ($prev) { my $s = $prev->getAttribute('_space') || 0.0; $prev->setAttribute(_space => $s + $pts); } else { $pending_space += $pts; } } } } else { # Some other element. if ($pending_comments) { $node->setAttribute(_pre_comment => $pending_comments); $pending_comments = ''; } if ($pending_space) { $node->setAttribute(lpadding => LaTeXML::Common::Dimension::attributeformat($pending_space * 65536)); $pending_space = 0.0; } push(@prefiltered, $node); $prev = $node; } # Keep it. } my @filtered = (); # Filter through the pre-filtered nodes looking for large rpadding. foreach my $node (@prefiltered) { push(@filtered, $node); if (my $s = $node->getAttribute('_space')) { $node->removeAttribute('_space'); if (($s >= $HINT_PUNCT_THRESHOLD) && (($node->getAttribute('role') || '') ne 'PUNCT')) { # Create a new Punctuation node (XMTok) from the wide space # I'm leary that this is a safe way to create an XML node that's NOT in the tree, but... my $p = $node->parentNode; # my $punct = $document->openElementAt($p,'ltx:XMTok',role=>'PUNCT',rpadding=>$s.'pt'); my $punct = $document->openElementAt($p, 'ltx:XMTok', role => 'PUNCT', name => ('q' x int($s / 10)) . 'uad'); $punct->appendText(spacingToString($s)); $p->removeChild($punct); # But don't actually leave it in the tree!!!! push(@filtered, $punct); } else { $node->setAttribute(rpadding => LaTeXML::Common::Dimension::attributeformat($s * 65536)); } } } return @filtered; } # Given a width attribute on an XMHint, return the pts, if any sub getXMHintSpacing { my ($width) = @_; if ($width && ($width =~ /^$LaTeXML::Common::Glue::GLUE_re$/)) { return ($2 eq 'mu' ? $1 / 1.8 : $1); } else { return 0; } } # We've pretty much builtin the assumption that the target XML is "As If" 10 pts, # so we'll assume that 1em is 10 pts. my $POINTS_PER_EM = 10.0; # [CONSTANT] # Convert spacing, given as a number of points, to a string of appropriate spacing chars sub spacingToString { my ($points) = @_; my $spacing = ''; my $ems = $points / $POINTS_PER_EM; my $n = int($ems); if ($n > 0) { $spacing .= ("\x{2003}" x $n); $ems -= $n; } if ($ems > 0.500) { $spacing .= "\x{2002}"; $ems -= 0.500; } if ($ems > 0.333) { $spacing .= "\x{2003}"; $ems -= 0.333; } if ($ems > 0.250) { $spacing .= "\x{2005}"; $ems -= 0.250; } if ($ems > 0.166) { $spacing .= "\x{2006}"; $ems -= 0.166; } return $spacing; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Low-Level hack parsing when "real" parsing fails; # Two issues cause generated Presentation MathML to be really bad: # (1) not having mrow/mfenced structures wrapping OPEN...CLOSE sequences # throws off MathML's stretchiness treatment of the fences # (they're all the same size; big) # (2) un-attached sub/superscripts won't position correctly, # unless they're attached to something plausible. # NOTE: we should be able to optionally switch this off. # Especially, when we want to try alternative parse strategies. sub parse_kludge { my ($self, $mathnode, $document) = @_; my @nodes = $self->filter_hints($document, $mathnode->childNodes); # the 1st array in stack accumlates the nodes within the current fenced row. # When there's only a single array, it's single entry will be the complete row. my @stack = ([], []); my @pairs = map { [$_, $self->getGrammaticalRole($_)] } @nodes; while (scalar(@pairs) || (scalar(@stack) > 1)) { my $pair = shift(@pairs); my $role = ($pair ? $$pair[1] : 'CLOSE'); if ($role eq 'OPEN') { unshift(@stack, [$pair]); } # Start new fenced row; elsif ($role eq 'CLOSE') { # Close the current row my $row = shift(@stack); # get the current list of items push(@$row, $pair) if $pair; # Put the close (if any) into it my @kludged = $self->parse_kludgeScripts_rec(@$row); # handle scripts # wrap, if needed. $row = [(scalar(@kludged) > 1 ? ['ltx:XMWrap', {}, @kludged] : $kludged[0]), 'FENCED']; push(@{ $stack[0] }, $row); } # and put this constructed row at end of containing row. else { push(@{ $stack[0] }, $pair); } } # Otherwise, just put this item into current row. # If we got to here, remove the nodes and replace them by the kludged structure. map { $document->unRecordNodeIDs($_) } element_nodes($mathnode); # unbindNode followed by (append|replace)Tree (which removes ID's) should be safe map { $_->unbindNode() } $mathnode->childNodes; # We're hoping for a single list on the stack, # But extra CLOSEs will leave extra junk behind, so process all the stacked lists. my @replacements = (); foreach my $pair (@{ $stack[0] }) { my $kludge = $$pair[0]; push(@replacements, (ref $kludge eq 'ARRAY') && ($$kludge[0] eq 'ltx:XMWrap') ? @$kludge[2 .. $#$kludge] : ($kludge)); } $document->appendTree($mathnode, @replacements); return; } sub parse_kludgeScripts_rec { my ($self, $a, $b, @more) = @_; if ($$a[1] =~ /^FLOAT(SUB|SUPER)SCRIPT$/) { # a Floating script? maybe pre-script if (!defined $b) { # with nothing behind? => script on nothing return ($$a[0]); } # but just leave it alone. elsif ($$b[1] =~ /^POST(SUB|SUPER)SCRIPT$/) { # followed by another script? if (@more) { # followed by a something? => Combined pre sub & super my ($base, @rest) = $self->parse_kludgeScripts_rec(@more); return (NewScript(NewScript($base, $$b[0], 'pre'), $$a[0], 'pre'), @rest); } else { # else floating sub & super return (NewScript(NewScript(Absent(), $$b[0], 'post'), $$a[0], 'post')); } } else { # else just prescript on whatever follows my ($base, @rest) = $self->parse_kludgeScripts_rec($b, @more); return (NewScript($base, $$a[0], 'pre'), @rest); } } elsif (!defined $b) { # isolated thing? return ($$a[0]); } elsif ($$b[1] =~ /^POST(SUB|SUPER)SCRIPT$/) { # or a postscript is applied to the preceding thing. return $self->parse_kludgeScripts_rec([NewScript($$a[0], $$b[0]), ''], @more); } else { # else skip over a and continue return ($$a[0], $self->parse_kludgeScripts_rec($b, @more)); } } # sub parse_kludge { # my($self,$mathnode,$document)=@_; # my @nodes = $self->filter_hints($document,$mathnode->childNodes); # map { $mathnode->removeChild($_) } @nodes; # my @result=(); # while(@nodes){ # @nodes = $self->parse_kludge_rec(@nodes); # push(@result,shift(@nodes)); } # $document->appendTree($mathnode,@result); } # # Kludge Parse the next thing, and then add any following scripts to it. # sub parse_kludge_rec { # my($self,@more)=@_; # my($item,$open,$close,$seps,$arg); # ($item,@more) = $self->parse_kludge_reca(@more); # while(@more && (($self->getGrammaticalRole($more[0])||'') =~ /^POST(SUB|SUPER)SCRIPT$/)){ # $item = NewScript($item,shift(@more)); } # if(@more && (($self->getGrammaticalRole($more[0])||'') eq 'APPLYOP')){ # shift(@more); # if(@more && (($self->getGrammaticalRole($more[0])||'') eq 'OPEN')){ # ($open,$close,$seps,$arg,@more)=$self->parse_kludge_fence(@more); # $item = Apply(Annotate($item,argopen=>$open, argclose=>$close, separators=>$seps),@$arg); } # else { # ($arg,@more)=$self->parse_kludge_rec(@more); # $item = Apply($item,$arg); }} # ($item,@more); } # sub parse_kludge_reca { # my($self,$next,@more)=@_; # my $role = $self->getGrammaticalRole($next); # if($role =~ /^FLOAT(SUB|SUPER)SCRIPT$/){ # my($base,@rest) = $self->parse_kludge_rec(@more); # (NewScript($base,$next),@rest); } # elsif($role eq 'OPEN'){ # my($open,$close,$seps,$list,@more)=$self->parse_kludge_fence($next,@more); # (Apply(Annotate(New(undef,undef,role=>'FENCED'), # argopen=>$open, argclose=>$close, separators=>$seps),@$list), @more); } # else { # ($next,@more); }} # sub parse_kludge_fence { # my($self,$next,@more)=@_; # my($open,$close,$punct,$r,$item,@list)=($next,undef,'',undef); # while(@more){ # my @i=(); # while(@more && (($r=($self->getGrammaticalRole($more[0])||'')) !~ /^(CLOSE|PUNCT)$/)){ # ($item,@more)=$self->parse_kludge_rec(@more); # push(@i,$item); } # push(@list,(scalar(@i > 1) ? ['ltx:XMWrap',{},@i] : $i[0])); # if($r eq 'CLOSE'){ # $close=shift(@more); last; } # else { # $punct .= ($punct ? ' ':''). p_getValue(shift(@more)); }} # Delimited by SINGLE SPACE! # ($open,$close,$punct,[@list],@more); } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Low-level Parser: parse a single expression #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Convert to textual form for processing by MathGrammar sub parse_single { my ($self, $mathnode, $document, $rule) = @_; my @nodes = $self->filter_hints($document, $mathnode->childNodes); my ($punct, $result, $unparsed); # Extract trailing punctuation, if rule allows it. if ($rule =~ s/,$//) { my ($x, $r) = ($nodes[-1]); $punct = ($x && ($x = realizeXMNode($x)) && (getQName($x) eq 'ltx:XMTok') && ($r = $x->getAttribute('role')) && (($r eq 'PUNCT') || ($r eq 'PERIOD')) ? pop(@nodes) : undef); # Special case hackery, in case this thing is XMRef'd!!! # We could just stick it on the end of the presentation, # but it doesn't belong in the content at all!?!? if (my $id = $punct && $punct->getAttribute('xml:id')) { $$LaTeXML::MathParser::PUNCTUATION{$id} = $punct; } } if (scalar(@nodes) < 2) { # Too few nodes? What's to parse? $result = $nodes[0] || Absent(); } else { if ($LaTeXML::MathParser::DEBUG) { $::RD_TRACE = 1; # Turn on MathGrammar tracing my $box = $document->getNodeBox($LaTeXML::MathParser::XNODE); print STDERR "\n" . ('=' x 40) . "\nParsing formula \"" . ToString($box) . "\" from " . $box->getLocator . "\n == \"" . join(' ', map { node_string($_, $document) } @nodes) . "\"\n"; } # Now do the actual parse. ($result, $unparsed) = $self->parse_internal($rule, @nodes); } # Failure? No result or uparsed lexemes remain. # NOTE: Should do script hack?? if ((!defined $result) || $unparsed) { $self->failureReport($document, $mathnode, $rule, $unparsed, @nodes); return; } # Success! else { $result = Annotate($result, punctuation => $punct); if ($LaTeXML::MathParser::DEBUG) { print STDERR "\n=>" . ToString($result) . "\n"; } return $result; } } sub parse_internal { my ($self, $rule, @nodes) = @_; #------------ # Generate a textual token for each node; The parser operates on this encoded string. local $LaTeXML::MathParser::LEXEMES = {}; my $i = 0; my $lexemes = ''; foreach my $node (@nodes) { my $role = $self->getGrammaticalRole($node); my $text = getTokenMeaning($node); $text = 'Unknown' unless defined $text; my $lexeme = $role . ":" . $text . ":" . ++$i; $lexeme =~ s/\s//g; $$LaTeXML::MathParser::LEXEMES{$lexeme} = $node; $lexemes .= ' ' . $lexeme; } #------------ # apply the parser to the textified sequence. local $LaTeXML::MathParser::PARSER = $self; local %LaTeXML::MathParser::SEEN_NOTATIONS = (); local %LaTeXML::MathParser::DISALLOWED_NOTATIONS = (); local $LaTeXML::MathParser::MAX_ABS_DEPTH = 1; my $unparsed = $lexemes; my $result = $$self{internalparser}->$rule(\$unparsed); if (((!defined $result) || $unparsed) # If parsing Failed && $LaTeXML::MathParser::SEEN_NOTATIONS{QM}) { # & Saw some QM stuff. $LaTeXML::MathParser::DISALLOWED_NOTATIONS{QM} = 1; # Retry w/o QM notations $unparsed = $lexemes; $result = $$self{internalparser}->$rule(\$unparsed); } while (((!defined $result) || $unparsed) # If parsing Failed && ($LaTeXML::MathParser::SEEN_NOTATIONS{AbsFail}) # & Attempted deeper abs nesting? && ($LaTeXML::MathParser::MAX_ABS_DEPTH < 3)) { # & Not ridiculously deep delete $LaTeXML::MathParser::SEEN_NOTATIONS{AbsFail}; ++$LaTeXML::MathParser::MAX_ABS_DEPTH; # Try deeper. $unparsed = $lexemes; $result = $$self{internalparser}->$rule(\$unparsed); } # If still failed, try other strategies? return ($result, $unparsed); } sub getGrammaticalRole { my ($self, $node) = @_; $node = realizeXMNode($node); my $role = $node->getAttribute('role'); if (!defined $role) { my $tag = getQName($node); if ($tag eq 'ltx:XMTok') { $role = 'UNKNOWN'; } elsif ($tag eq 'ltx:XMDual') { $role = $LaTeXML::MathParser::DOCUMENT->getFirstChildElement($node)->getAttribute('role'); } $role = 'ATOM' unless defined $role; } $self->note_unknown($node) if ($role eq 'UNKNOWN') && $LaTeXML::MathParser::STRICT; return $role; } # How many tokens before & after the failure point to report in the Warning message. my $FAILURE_PRETOKENS = 3; # [CONSTANT] my $FAILURE_POSTTOKENS = 1; # [CONSTANT] sub failureReport { my ($self, $document, $mathnode, $rule, $unparsed, @nodes) = @_; if ($LaTeXML::MathParser::STRICT || (($STATE->lookupValue('VERBOSITY') || 0) > 1)) { my $loc = ""; # If we haven't already done it for this formula, show the original TeX. if (!$LaTeXML::MathParser::WARNED) { $LaTeXML::MathParser::WARNED = 1; my $box = $document->getNodeBox($LaTeXML::MathParser::XNODE); $loc = "In \"" . UnTeX($box) . "\""; } $unparsed =~ s/^\s*//; my @rest = split(/ /, $unparsed); my $pos = scalar(@nodes) - scalar(@rest); # Break up the input at the point where the parse failed. my $parsed = join(' ', map { node_string($_, $document) } @nodes[0 .. $pos - 1]); my $toparse = join(' ', map { node_string($_, $document) } @nodes[$pos .. $#nodes]); my $parsefail = join('.', map { $self->getGrammaticalRole($_) } @nodes[($pos - $FAILURE_PRETOKENS >= 0 ? $pos - $FAILURE_PRETOKENS : 0) .. $pos - 1]) . ">" . join('.', map { $self->getGrammaticalRole($_) } @nodes[$pos .. ($pos + $FAILURE_POSTTOKENS - 1 < $#nodes ? $pos + $FAILURE_POSTTOKENS - 1 : $#nodes)]); my $lexeme = node_location($nodes[$pos] || $nodes[$pos - 1] || $mathnode); my $indent = length($parsed) - 2; $indent = 8 if $indent > 8; Warn('not_parsed', $parsefail, $mathnode, "MathParser failed to match rule '$rule'", ($loc ? ($loc) : ()), ($parsed ? ($parsed, (' ' x $indent) . "> " . $toparse) : ("> " . $toparse))); } return; } # used for debugging & failure reporting. sub node_string { my ($node, $document) = @_; my $role = $node->getAttribute('role') || 'UNKNOWN'; my $box = $document->getNodeBox($node); return ($box ? ToString($box) : text_form($node)) . "[[$role]]"; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Conversion to a less ambiguous, mostly-prefix form. # Mostly for debugging information? # Note that the nodes are true libXML nodes, already absorbed into the document #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% sub text_form { my ($node) = @_; # $self->textrec($node,0); } # Hmm, Something Weird is broken!!!! # With <, I get "unterminated entity reference" !?!?!? # my $text= $self->textrec($node,0); my $text = textrec($node, undef); $text =~ s/ '^', SUBSCRIPTOP => '_', times => => '*', 'equals' => '=', 'less-than' => '<', 'greater-than' => '>', 'less-than-or-equals' => '<=', 'greater-than-or-equals' => '>=', 'much-less-than' => '<<', 'much-greater-than' => '>>', 'plus' => '+', 'minus' => '-', 'divide' => '/'); # Put infix, along with `binding power' my %IS_INFIX = (METARELOP => 1, # [CONSTANT] RELOP => 2, ARROW => 2, ADDOP => 10, MULOP => 100, SUPERSCRIPTOP => 1000, SUBSCRIPTOP => 1000); sub textrec { my ($node, $outer_bp, $outer_name) = @_; $node = realizeXMNode($node); my $tag = getQName($node); $outer_bp = 0 unless defined $outer_bp; $outer_name = '' unless defined $outer_name; if ($tag eq 'ltx:XMApp') { my $app_role = $node->getAttribute('role'); my ($op, @args) = element_nodes($node); $op = realizeXMNode($op); if ($app_role && $app_role =~ /^FLOAT(SUB|SUPER)SCRIPT$/) { return ($1 eq 'SUPER' ? '^' : '_') . textrec($op); } else { my $name = ((getQName($op) eq 'ltx:XMTok') && getTokenMeaning($op)) || 'unknown'; my ($bp, $string) = textrec_apply($name, $op, @args); return (($bp < $outer_bp) || (($bp == $outer_bp) && ($name ne $outer_name)) ? '(' . $string . ')' : $string); } } elsif ($tag eq 'ltx:XMDual') { my ($content, $presentation) = element_nodes($node); return textrec($content, $outer_bp, $outer_name); } # Just send out the semantic form. elsif ($tag eq 'ltx:XMTok') { my $name = getTokenMeaning($node); $name = 'Unknown' unless defined $name; return $PREFIX_ALIAS{$name} || $name; } elsif (($tag eq 'ltx:XMWrap') || ($tag eq 'ltx:XMCell')) { # ?? return join('@', map { textrec($_) } element_nodes($node)); } elsif ($tag eq 'ltx:XMArray') { return textrec_array($node); } else { return '[' . (p_getValue($node) || '') . ']'; } } sub textrec_apply { my ($name, $op, @args) = @_; my $role = $op->getAttribute('role') || 'Unknown'; if (($role =~ /^(SUB|SUPER)SCRIPTOP$/) && (($op->getAttribute('scriptpos') || '') =~ /^pre\d+$/)) { # Note that this will likely get parenthesized due to high bp return (5000, textrec($op) . " " . textrec($args[1]) . " " . textrec($args[0])); } elsif (my $bp = $IS_INFIX{$role}) { # Format as infix. return ($bp, (scalar(@args) == 1 # unless a single arg; then prefix. ? textrec($op) . ' ' . textrec($args[0], $bp, $name) : join(' ' . textrec($op) . ' ', map { textrec($_, $bp, $name) } @args))); } elsif ($role eq 'POSTFIX') { return (10000, textrec($args[0], 10000, $name) . textrec($op)); } elsif ($name eq 'multirelation') { return (2, join(' ', map { textrec($_, 2, $name) } @args)); } else { return (500, textrec($op, 10000, $name) . '@(' . join(', ', map { textrec($_) } @args) . ')'); } } sub textrec_array { my ($node) = @_; my $name = $node->getAttribute('meaning') || $node->getAttribute('name') || 'Array'; my @rows = (); foreach my $row (element_nodes($node)) { push(@rows, '[' . join(', ', map { ($_->firstChild ? textrec($_->firstChild) : '') } element_nodes($row)) . ']'); } return $name . '[' . join(', ', @rows) . ']'; } #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Cute! Were it NOT for Sub/Superscripts, the whole parsing process only # builds a new superstructure around the sequence of token nodes in the input. # Thus, any internal structure is unchanged. # They get re-parented, but if the parse fails, we've only got to put them # BACK into the original node, to recover the original arrangment!!! # Thus, we don't have to clone, and deal with namespace duplication. # ... # EXCEPT, as I said, for sub/superscripts!!!! # #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # Constructors used in grammar # All the tree construction in the grammar should come through these operations. # We avoid mucking with the actual XML nodes (both to avoid modifying the original # tree until we have a successful parse, and to avoid XML::LibXML cloning nightmares) # We are converting XML nodes to array representation: [$tag, {%attr},@children] # This means any inspection of nodes has to recognize that # * node may be in XML vs ARRAY representation # * node may be an XMRef to another node whose properties are the ones we should use. # # Also, when we are examining a node's properties (roles, fences, script positioning, etc) # we should be careful to check for XMRef indirection and examine the properties # of the node that was referred to. # HOWEVER, we should construct our parse tree using (a clone of) the XMRef node, # rather than (a clone of) the referred to node, so as to preserve identity. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # We're currently keeping the id's on the nodes as they get cloned, # since they'll (maybe) replace the unparsed nodes. # However, if we consider multiple parses or preserving both parsed & unparsed, # we may have to do some adaptation and id shifting. # ================================================================================ # ================================================================================ # Low-level accessors sub Lookup { my ($lexeme) = @_; return $$LaTeXML::MathParser::LEXEMES{$lexeme}; } # The following accessors work on both the LibXML and ARRAY representations # but they do NOT automatically dereference XMRef! sub p_getValue { my ($node) = @_; if (!defined $node) { return; } elsif (ref $node eq 'XML::LibXML::Element') { my $x; return (($x = $node->textContent) ne '' ? $x # get content, or fall back to name : (defined($x = $node->getAttribute('name')) ? $x : undef)); } elsif (ref $node eq 'ARRAY') { my ($op, $attr, @args) = @$node; if (@args) { return join('', grep { defined $_ } map { p_getValue($_) } @args); } else { return $$node[1]{name}; } } elsif (ref $node eq 'XML::LibXML::Text') { return $node->textContent; } else { # ???? return $node; } } sub p_getTokenMeaning { my ($item) = @_; if (!defined $item) { return; } elsif (ref $item eq 'ARRAY') { my ($op, $attr, @args) = @$item; return $$attr{meaning} || $$attr{name} || $args[0] || $$attr{role}; } elsif (ref $item eq 'XML::LibXML::Element') { return getTokenMeaning($item); } } sub p_getAttribute { my ($item, $key) = @_; if (!defined $item) { return; } elsif (ref $item eq 'ARRAY') { return $$item[1]{$key}; } elsif (ref $item eq 'XML::LibXML::Element') { return $item->getAttribute($key); } } sub p_element_nodes { my ($item) = @_; if (!defined $item) { return (); } elsif (ref $item eq 'ARRAY') { my ($op, $attr, @args) = @$item; return @args; } elsif (ref $item eq 'XML::LibXML::Element') { return element_nodes($item); } } sub p_getQName { my ($item) = @_; if (!defined $item) { return; } elsif (ref $item eq 'ARRAY') { return $$item[0]; } elsif (ref $item eq 'XML::LibXML::Element') { return getQName($item); } } # Make a new Token node with given name, content, and attributes. # $content is an array of nodes (which may need to be cloned if still attached) sub New { my ($meaning, $content, %attributes) = @_; my %attr = (); $attr{meaning} = $meaning if $meaning; foreach my $key (sort keys %attributes) { my $value = p_getValue($attributes{$key}); $attr{$key} = $value if defined $value; } if (!$attr{font}) { $attr{font} = ($content && $content =~ /\S/ ? LaTeXML::Common::Font->textDefault->specialize($content) : LaTeXML::Common::Font->new()); } return ['ltx:XMTok', {%attr}, ($content ? ($content) : ())]; } # Some handy shorthands. sub Absent { return New('absent'); } sub InvisibleTimes { return New('times', "\x{2062}", role => 'MULOP', font => LaTeXML::Common::Font->new()); } sub InvisibleComma { return New(undef, "\x{2063}", role => 'PUNCT', font => LaTeXML::Common::Font->new()); } # Get n-th arg of an XMApp. # However, this is really only used to get the script out of a sub/super script sub Arg { my ($node, $n) = @_; if (ref $node eq 'ARRAY') { return $$node[$n + 2]; } else { my @args = element_nodes($node); return $args[$n]; } } # will get cloned if/when needed. # Add more attributes to a node. # Values can be strings or nodes whose text content is used. # Note that it avoids changing the underlying XML, so it may return a new object! sub Annotate { my ($node, %attributes) = @_; my %attrib = (); # first scan & convert any attributes, to make sure there really are any values. foreach my $attr (keys %attributes) { my $value = p_getValue($attributes{$attr}); $attrib{$attr} = $value if defined $value; } # Hmm.... maybe we need to merge some things, open close? if (keys %attrib) { # Any attributes to assign? # If we've gotten a real XML node, convert to array representation # We do NOT want to modify any of the original XML!!!! if (ref $node ne 'ARRAY') { $node = [getQName($node), { map { (getQName($_) => $_->getValue) } grep { $_->nodeType == XML_ATTRIBUTE_NODE } $node->attributes }, $node->childNodes]; } my $qname = $$node[0]; # Remove any attributes that aren't allowed!!! foreach my $k (keys %attrib) { delete $attrib{$k} unless $k =~ /^_/ || $LaTeXML::MathParser::DOCUMENT->canHaveAttribute($qname, $k); } # Special treatment for some attributes: # Combine opens & closes foreach my $k (qw(open argopen)) { $attrib{$k} = $attrib{$k} . $$node[1]{$k} if $attrib{$k} && $$node[1]{$k}; } foreach my $k (qw(close argclose)) { $attrib{$k} = $$node[1]{$k} . $attrib{$k} if $attrib{$k} && $$node[1]{$k}; } # Make sure font is "Appropriate", if we're creating a new token if ($attrib{_font} && ($qname eq 'ltx:XMTok')) { my $content = join('', @$node[2 .. $#$node]); if ((!defined $content) || ($content eq '')) { delete $attrib{_font}; } # No font needed elsif (my $font = $LaTeXML::MathParser::DOCUMENT->decodeFont($attrib{_font})) { delete $attrib{_font}; $attrib{font} = $font->specialize($content); } } map { $$node[1]{$_} = $attrib{$_} } keys %attrib; } # Now add them. return $node; } # ================================================================================ # Mid-level constructors # Apply $op to the list of arguments # args may be array-rep or lexemes (or nodes?) sub Apply { my ($op, @args) = @_; return ['ltx:XMApp', {}, $op, @args]; } # Apply $op to a `delimited' list of arguments of the form # open, expr (punct expr)* close # after extracting the opening and closing delimiters, and the separating punctuation # Generate an XMDual, so that any styling of delimiters & punctuation is preserved. sub ApplyDelimited { my ($op, @stuff) = @_; my $open = $stuff[0]; my $close = $stuff[-1]; my ($seps, @args) = extract_separators(@stuff[1 .. $#stuff - 1]); return ['ltx:XMDual', {}, Apply(LaTeXML::Package::createXMRefs($LaTeXML::MathParser::DOCUMENT, $op), LaTeXML::Package::createXMRefs($LaTeXML::MathParser::DOCUMENT, @args)), Apply($op, ['ltx:XMWrap', {}, @stuff])]; } # This is similar, but "interprets" a delimited list as being the # application of some operator to the items in the list. sub InterpretDelimited { my ($op, @stuff) = @_; my $open = $stuff[0]; my $close = $stuff[-1]; my ($seps, @args) = extract_separators(@stuff[1 .. $#stuff - 1]); return ['ltx:XMDual', {}, Apply($op, LaTeXML::Package::createXMRefs($LaTeXML::MathParser::DOCUMENT, @args)), ['ltx:XMWrap', {}, @stuff]]; } # Given a sequence of operators, form the nested application op(op(...(arg))) sub recApply { my (@ops) = @_; return (scalar(@ops) > 1 ? Apply(shift(@ops), recApply(@ops)) : $ops[0]); } # Given alternating expressions & separators (punctuation,...) # extract the separators as a concatenated string, # returning (separators, args...) sub extract_separators { my (@stuff) = @_; my ($punct, @args); if (@stuff) { push(@args, shift(@stuff)); # Grab 1st expression while (@stuff) { # Expecting pairs of punct, expression my $p = realizeXMNode(shift(@stuff)); $punct .= ($punct ? ' ' : '') # Delimited by SINGLE SPACE! . spacingToString(getXMHintSpacing(p_getAttribute($p, 'lpadding'))) . p_getValue($p) . spacingToString(getXMHintSpacing(p_getAttribute($p, 'rpadding'))); push(@args, shift(@stuff)); } } # Collect the next expression. return ($punct, @args); } # ================================================================================ # Some special cases # This specifies the "meaning" of things within a pair # of open/close delimiters, depending on the number of things. # Really should be customizable? # Note that these are all Best Guesses, but really can have # alternate interpretations depending on context, field, etc. # Question: Is there enough context to guess better? # For example, whether (a,b) is an interval or list? # (both could reasonably be preceded by \in ) my %balanced = ( # [CONSTANT] '(' => ')', '[' => ']', '{' => '}', '|' => '|', '||' => '||', "\x{230A}" => "\x{230B}", # lfloor, rfloor "\x{2308}" => "\x{2309}", # lceil, rceil "\x{2329}" => "\x{232A}", # angle brackets; NOT mathematical, but balance in case they show up. "\x{27E8}" => "\x{27E9}", # angle brackets (prefered) "\x{2225}" => "\x{2225}", # lVert, rVert ); # For enclosing a single object # Note that the default here is just to put open/closed attributes on the single object my %enclose1 = ( # [CONSTANT] '{@}' => 'set', # alternatively, just variant parentheses '|@|' => 'absolute-value', '||@||' => 'norm', "\x{2225}@\x{2225}" => 'norm', "\x{230A}@\x{230B}" => 'floor', "\x{2308}@\x{2309}" => 'ceiling', '<@>' => 'expectation', # or just average? '<@|' => 'bra', '|@>' => 'ket'); # For enclosing more than 2 objects; the punctuation is significant too my %enclose2 = ( # [CONSTANT] '(@,@)' => 'open-interval', # alternatively, just a list '[@,@]' => 'closed-interval', '(@,@]' => 'open-closed-interval', '[@,@)' => 'closed-open-interval', '{@,@}' => 'set', # alternatively, just a list ? ); # For enclosing more than 2 objects. # assume 1st punct? or should we check all are same? my %encloseN = ( # [CONSTANT] '(@,@)' => 'vector', '{@,@}' => 'set',); sub isMatchingClose { my ($open, $close) = @_; my $oname = p_getValue(realizeXMNode($open)); my $cname = p_getValue(realizeXMNode($close)); my $expect = $balanced{$oname}; return (defined $expect) && ($expect eq $cname); } # Given a delimited sequence: open expr (punct expr)* close # (OR, an empty sequence open close) # Convert it into the appropriate thing, depending on the specific open & close used. # Generate an XMDual to preserve any styling of delimiters and punctuation. sub Fence { my (@stuff) = @_; # Peak at delimiters to guess what kind of construct this is. my $nargs = scalar(@stuff); Error("expected", "arguments", undef, "Even number of arguments to Fence(); should be of form open,expr,(punct,expr)*,close", "got " . join(' ', map { ToString($_) } @stuff)) if ($nargs != 2) && (($nargs % 2) == 0); # either empty or odd number my ($open, $close) = (realizeXMNode($stuff[0]), realizeXMNode($stuff[-1])); my $o = p_getValue($open); my $c = p_getValue($close); my $n = int(($nargs - 2 + 1) / 2); my @p = map { p_getValue(realizeXMNode(@stuff[2 * $_])) } 1 .. $n - 1; my $op = ($n == 0 ? 'list' # ? : ($n == 1 ? $enclose1{ $o . '@' . $c } : ($n == 2 ? ($enclose2{ $o . '@' . $p[0] . '@' . $c } || 'list') : ($encloseN{ $o . '@' . $p[0] . '@' . $c } || 'list')))); $op = 'delimited-' . $o . $c unless defined $op; if (($n == 1) && ($op eq 'delimited-()')) { # Hopefully, can just ignore the parens? return ['ltx:XMDual', {}, LaTeXML::Package::createXMRefs($LaTeXML::MathParser::DOCUMENT, $stuff[1]), ['ltx:XMWrap', {}, @stuff]]; } else { return InterpretDelimited(New($op), @stuff); } } # NOTE: It might be best to separate the multiple Formulae into separate XMath's??? # but only at the top level! sub NewFormulae { my (@stuff) = @_; if (scalar(@stuff) == 1) { return $stuff[0]; } else { my ($seps, @formula) = extract_separators(@stuff); return ['ltx:XMDual', {}, Apply(New('formulae'), LaTeXML::Package::createXMRefs($LaTeXML::MathParser::DOCUMENT, @formula)), ['ltx:XMWrap', {}, @stuff]]; } } # A Formula is an alternation of expr (relationalop expr)* # It presumably would be equivalent to (expr1 relop1 expr2) AND (expr2 relop2 expr3) ... # But, I haven't figured out the ideal prefix form that can easily be converted to presentation. sub NewFormula { my (@args) = @_; my $n = scalar(@args); if ($n == 1) { return $args[0]; } elsif ($n == 3) { return Apply($args[1], $args[0], $args[2]); } else { return Apply(New('multirelation'), @args); } } sub NewList { my (@stuff) = @_; if (@stuff == 1) { return $stuff[0]; } else { my ($seps, @items) = extract_separators(@stuff); return ['ltx:XMDual', {}, Apply(New('list'), LaTeXML::Package::createXMRefs($LaTeXML::MathParser::DOCUMENT, @items)), ['ltx:XMWrap', {}, @stuff]]; } } # Given alternation of expr (addop expr)*, compose the tree (left recursive), # flattenning portions that have the same operator # ie. a + b + c - d => (- (+ a b c) d) sub LeftRec { my ($arg1, @more) = @_; if (@more) { my $op = shift(@more); my $opname = p_getTokenMeaning(realizeXMNode($op)); my @args = ($arg1, shift(@more)); while (@more && ($opname eq p_getTokenMeaning(realizeXMNode($more[0])))) { ReplacedBy($more[0], $op); shift(@more); push(@args, shift(@more)); } return LeftRec(Apply($op, @args), @more); } else { return $arg1; } } # Like apply($op,$arg1,$arg2), but if $op is 'same' as the operator in $arg1, # then combine as an nary apply of $op to $arg1's arguments and $arg2. sub ApplyNary { my ($op, $arg1, $arg2) = @_; my $rop = realizeXMNode($op); my $opname = p_getTokenMeaning($rop) || '__undef_meaning__'; my $opcontent = p_getValue($rop) || '__undef_content__'; my @args = (); if (p_getQName($arg1) eq 'ltx:XMApp') { my ($op1, @args1) = p_element_nodes($arg1); my $rop1 = realizeXMNode($op1); if (((p_getTokenMeaning($rop1) || '__undef_meaning__') eq $opname) # Same operator? && ((p_getValue($rop1) || '__undef_content__') eq $opcontent) # Check that ops are used in same way. && !(grep { (p_getAttribute($rop, $_) || '') ne (p_getAttribute($rop1, $_) || '') } qw(mathstyle)) # Check ops are used in similar way # Check that arg1 isn't wrapped, fenced or enclosed in some restrictive way && !(grep { p_getAttribute(realizeXMNode($arg1), $_) } qw(open close enclose))) { # Note that $op1 GOES AWAY!!! ReplacedBy($op1, $rop); push(@args, @args1); } else { push(@args, $arg1); } } else { push(@args, $arg1); } return Apply($op, @args, $arg2); } # There are several cases where parsing a formula will rearrange nodes # such that some nodes will no-longer be used. For example, when # converting a nested set of infix + into a single n-ary sum. # In effect, all those excess +'s are subsumed by the single first one. # It may be, however, that those lost nodes are referenced (XMRef) from the # other branch of an XMDual, and those references should be updated to refer # to the single node replacing the lost ones. # This function records that replacement, and the top-level parser fixes up the tree. # NOTE: There may be cases (in the Grammar, eg) where punctuation & ApplyOp's # get lost completely? Watch out for this! sub ReplacedBy { my ($lostnode, $keepnode) = @_; if (my $lostid = p_getAttribute($lostnode, 'xml:id')) { if (my $keepid = p_getAttribute($keepnode, 'xml:id')) { # print STDERR "LOST $lostid use instead $keepid\n"; $$LaTeXML::MathParser::LOSTNODES{$lostid} = $keepid; } } return; } # ================================================================================ # Construct an appropriate application of sub/superscripts # This accounts for script positioning: # Whether it precedes (float), is over/under (if base requests), # or follows (normal case), along with whether sub/super. # the alignment of multiple sub/superscripts derived from the binding level when created. # scriptpos = (pre|mod|post) number; where number is the binding-level. # If $pos is given (pre|mid|post), it overrides the position implied by the script sub NewScript { my ($base, $script, $pos) = @_; my $role; my $rbase = realizeXMNode($base); my $rscript = realizeXMNode($script); my ($bx, $bl) = (p_getAttribute($rbase, 'scriptpos') || 'post') =~ /^(pre|mid|post)?(\d+)?$/; my ($sx, $sl) = (p_getAttribute($rscript, 'scriptpos') || 'post') =~ /^(pre|mid|post)?(\d+)?$/; my ($mode, $y) = p_getAttribute($rscript, 'role') =~ /^(FLOAT|POST)?(SUB|SUPER)SCRIPT$/; my $x = ($pos ? $pos : ($mode eq 'FLOAT' ? 'pre' : $bx || 'post')); my $lpad = ($x eq 'pre') && p_getAttribute($rscript, 'lpadding'); my $rpad = ($x ne 'pre') && p_getAttribute($rscript, 'rpadding'); my $t; my $l = $sl || $bl || (($t = $LaTeXML::MathParser::DOCUMENT->getNodeBox($script)) && ($t->getProperty('level'))) || 0; # If the INNER script was a floating script (ie. {}^{x}) # we'll NOT want this one to stack over it so bump the level. my $bumped; if (p_getAttribute($rbase, '_wasfloat')) { $l++; $bumped = 1 } elsif (my $innerl = p_getAttribute($rbase, '_bumplevel')) { $l = $innerl; } my $app = Apply(New(undef, undef, role => $y . 'SCRIPTOP', scriptpos => "$x$l"), $base, Arg($script, 0)); # Record whether this script was a floating one $$app[1]{_wasfloat} = 1 if $mode eq 'FLOAT'; $$app[1]{_bumplevel} = $l if $bumped; $$app[1]{scriptpos} = $bx if $bx ne 'post'; $$app[1]{lpadding} = $lpad if $lpad && !$$app[1]{lpadding}; # better to add? $$app[1]{rpadding} = $rpad if $rpad && !$$app[1]{rpadding}; # better to add? return $app; } # Basically, like NewScript, but decorates an operator with sub/superscripts # (with vague unknown implications for meaning?) # but which will preserve the role (& meaning?) sub DecorateOperator { my ($op, $script) = @_; my $decop = NewScript($op, $script); my $role = p_getAttribute($op, 'role'); my $meaning = p_getAttribute($op, 'meaning'); return Annotate($decop, role => $role, meaning => $meaning); } # ================================================================================ # A "notation" is a language construct or set thereof. # Called from the grammar to record the fact that a notation was seen. sub SawNotation { my ($notation, $node) = @_; $LaTeXML::MathParser::SEEN_NOTATIONS{$notation} = 1; return 1; } # Called by the grammar to determine whether we should try productions # which involve the given notation. sub IsNotationAllowed { my ($notation) = @_; return ($LaTeXML::MathParser::DISALLOWED_NOTATIONS{$notation} ? undef : 1); } # ================================================================================ # Note that an UNKNOWN token may have been used as a function. # For simplicity in the grammar, we accept a token that has sub|super scripts applied. sub MaybeFunction { my ($token) = @_; $token = realizeXMNode($token); my $self = $LaTeXML::MathParser::PARSER; while (p_getQName($token) eq 'ltx:XMApp') { $token = Arg($token, 1); } my $name = token_prettyname($token); # DANGER!! # We want to be using Annotate here, but we're screwed up by the # potential "embellishing" of the function token. # (ie. the descent above past all XMApp's) $token->setAttribute('possibleFunction', 'yes'); $$self{maybe_functions}{$name}++ if $LaTeXML::MathParser::STRICT && !$$self{suspicious_tokens}{$token}; $$self{suspicious_tokens}{$token} = 1; return; } # ================================================================================ 1; __END__ =pod =head1 NAME C - parses mathematics content =head1 DESCRIPTION C parses the mathematical content of a document. It uses L and a grammar C. =head2 Math Representation Needs description. =head2 Possibile Customizations Needs description. =head2 Convenience functions The following functions are exported for convenience in writing the grammar productions. =over 4 =item C<< $node = New($name,$content,%attributes); >> Creates a new C node with given C<$name> (a string or undef), and C<$content> (a string or undef) (but at least one of name or content should be provided), and attributes. =item C<< $node = Arg($node,$n); >> Returns the C<$n>-th argument of an C node; 0 is the operator node. =item C<< Annotate($node,%attributes); >> Add attributes to C<$node>. =item C<< $node = Apply($op,@args); >> Create a new C node representing the application of the node C<$op> to the nodes C<@args>. =item C<< $node = ApplyDelimited($op,@stuff); >> Create a new C node representing the application of the node C<$op> to the arguments found in C<@stuff>. C<@stuff> are delimited arguments in the sense that the leading and trailing nodes should represent open and close delimiters and the arguments are seperated by punctuation nodes. The text of these delimiters and punctuation are used to annotate the operator node with C, C and C attributes. =item C<< $node = InterpretDelimited($op,@stuff); >> Similar to C, this interprets sequence of delimited, punctuated items as being the application of C<$op> to those items. =item C<< $node = recApply(@ops,$arg); >> Given a sequence of operators and an argument, forms the nested application C>. =item C<< $node = InvisibleTimes; >> Creates an invisible times operator. =item C<< $boole = isMatchingClose($open,$close); >> Checks whether C<$open> and C<$close> form a `normal' pair of delimiters, or if either is ".". =item C<< $node = Fence(@stuff); >> Given a delimited sequence of nodes, starting and ending with open/close delimiters, and with intermediate nodes separated by punctuation or such, attempt to guess what type of thing is represented such as a set, absolute value, interval, and so on. If nothing specific is recognized, creates the application of C to the arguments. This would be a good candidate for customization! =item C<< $node = NewFormulae(@stuff); >> Given a set of formulas, construct a C application, if there are more than one, else just return the first. =item C<< $node = NewList(@stuff); >> Given a set of expressions, construct a C application, if there are more than one, else just return the first. =item C<< $node = LeftRec($arg1,@more); >> Given an expr followed by repeated (op expr), compose the left recursive tree. For example C would give C<(- (+ a b c) d)>> =item C<< MaybeFunction($token); >> Note the possible use of C<$token> as a function, which may cause incorrect parsing. This is used to generate warning messages. =back =head1 AUTHOR Bruce Miller =head1 COPYRIGHT Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US. =cut latexml-0.8.1/lib/LaTeXML/Package.pm0000644000175000017500000047061312507513572016765 0ustar norbertnorbert# /=====================================================================\ # # | LaTeXML::Package | # # | Exports of Defining forms for Package writers | # # |=====================================================================| # # | Part of LaTeXML: | # # | Public domain software, produced as part of work done by the | # # | United States Government & not subject to copyright in the US. | # # |---------------------------------------------------------------------| # # | Bruce Miller #_# | # # | http://dlmf.nist.gov/LaTeXML/ (o o) | # # \=========================================================ooo==U==ooo=/ # package LaTeXML::Package; use strict; use warnings; use Exporter; use LaTeXML::Global; use LaTeXML::Common::Object; use LaTeXML::Common::Error; use LaTeXML::Core::Token; use LaTeXML::Core::Tokens; use LaTeXML::Core::Box; use LaTeXML::Core::List; use LaTeXML::Core::Mouth::Binding; use LaTeXML::Core::Definition; use LaTeXML::Core::Parameters; use LaTeXML::Common::Number; use LaTeXML::Common::Float; use LaTeXML::Common::Dimension; use LaTeXML::Common::Glue; use LaTeXML::Core::MuDimension; use LaTeXML::Core::MuGlue; # Extra objects typically used in Bindings use LaTeXML::Core::Alignment; use LaTeXML::Core::Array; use LaTeXML::Core::KeyVals; use LaTeXML::Core::Pair; use LaTeXML::Core::PairList; use LaTeXML::Common::Color; # Utitlities use LaTeXML::Util::Pathname; use LaTeXML::Util::WWW; use LaTeXML::Common::XML; use LaTeXML::Core::Rewrite; use LaTeXML::Util::Radix; use File::Which; use Unicode::Normalize; use Text::Balanced; use base qw(Exporter); our @EXPORT = (qw(&DefExpandable &DefMacro &DefMacroI &DefConditional &DefConditionalI &IfCondition &DefPrimitive &DefPrimitiveI &DefRegister &DefRegisterI &DefConstructor &DefConstructorI &dualize_arglist &createXMRefs &DefMath &DefMathI &DefEnvironment &DefEnvironmentI &convertLaTeXArgs), # Class, Package and File loading. qw(&Input &InputContent &InputDefinitions &RequirePackage &LoadClass &LoadPool &FindFile &DeclareOption &PassOptions &ProcessOptions &ExecuteOptions &AddToMacro &AtBeginDocument &AtEndDocument), # Counter support qw(&NewCounter &CounterValue &SetCounter &AddToCounter &StepCounter &RefStepCounter &RefStepID &ResetCounter &GenerateID &AfterAssignment), # Document Model qw(&Tag &DocType &RelaxNGSchema &RegisterNamespace &RegisterDocumentNamespace), # Document Rewriting qw(&DefRewrite &DefMathRewrite &DefLigature &DefMathLigature), # Mid-level support for writing definitions. qw(&Expand &Invocation &Digest &DigestText &DigestIf &DigestLiteral &RawTeX &Let &StartSemiverbatim &EndSemiverbatim &Tokenize &TokenizeInternal), # Font encoding qw(&DeclareFontMap &FontDecode &FontDecodeString &LoadFontMap), # Color qw(&DefColor &DefColorModel &LookupColor), # Support for structured/argument readers qw(&ReadParameters &DefParameterType &DefColumnType &DefKeyVal &GetKeyVal &GetKeyVals), # Access to State qw(&LookupValue &AssignValue &PushValue &PopValue &UnshiftValue &ShiftValue &LookupMapping &AssignMapping &LookupMappingKeys &LookupCatcode &AssignCatcode &LookupMeaning &LookupDefinition &InstallDefinition &XEquals &LookupMathcode &AssignMathcode &LookupSFcode &AssignSFcode &LookupLCcode &AssignLCcode &LookupUCcode &AssignUCcode &LookupDelcode &AssignDelcode ), # Random low-level token or string operations. qw(&CleanID &CleanLabel &CleanIndexKey &CleanBibKey &NormalizeBibKey &CleanURL &UTF &roman &Roman), # Math & font state. qw(&MergeFont), qw(&CheckOptions), # Resources qw(&RequireResource &ProcessPendingResources), @LaTeXML::Global::EXPORT, # And export those things exported by these Core & Common packages. @LaTeXML::Common::Object::EXPORT, @LaTeXML::Common::Error::EXPORT, @LaTeXML::Core::Token::EXPORT, @LaTeXML::Core::Tokens::EXPORT, @LaTeXML::Core::Box::EXPORT, @LaTeXML::Core::List::EXPORT, @LaTeXML::Common::Number::EXPORT, @LaTeXML::Common::Float::EXPORT, @LaTeXML::Common::Dimension::EXPORT, @LaTeXML::Common::Glue::EXPORT, @LaTeXML::Core::MuDimension::EXPORT, @LaTeXML::Core::MuGlue::EXPORT, @LaTeXML::Core::Pair::EXPORT, @LaTeXML::Core::PairList::EXPORT, @LaTeXML::Common::Color::EXPORT, @LaTeXML::Core::Alignment::EXPORT, @LaTeXML::Common::XML::EXPORT, @LaTeXML::Util::Radix::EXPORT, ); #********************************************************************** # Initially, I thought LaTeXML Packages should try to be like perl modules: # once loaded, you didn't need to re-load them, only `initialize' them to # install their definitions into the current stomach. I tried to achieve # that through various package tricks. # But ultimately, most of a package _is_ installing defns in the stomach, # and it's probably better to allow a more TeX-like evaluation of definitions # in order, so \let and such work as expected. # So, it got simpler! # Still, it would be nice if there were `compiled' forms of .ltxml files! #********************************************************************** sub UTF { my ($code) = @_; return pack('U', $code); } sub coerceCS { my ($cs) = @_; $cs = T_CS($cs) unless ref $cs; $cs = T_CS(ToString($cs)) unless ref $cs eq 'LaTeXML::Core::Token'; return $cs; } sub parsePrototype { my ($proto) = @_; my $oproto = $proto; my $cs; if ($proto =~ s/^\\csname\s+(.*)\\endcsname//) { $cs = T_CS('\\' . $1); } elsif ($proto =~ s/^(\\[a-zA-Z@]+)//) { # Match a cs $cs = T_CS($1); } elsif ($proto =~ s/^(\\.)//) { # Match a single char cs, env name,... $cs = T_CS($1); } elsif ($proto =~ s/^(.)//) { # Match an active char ($cs) = TokenizeInternal($1)->unlist; } else { Fatal('misdefined', $proto, $STATE->getStomach, "Definition prototype doesn't have proper control sequence: \"$proto\""); } $proto =~ s/^\s*//; #### return ($cs, parseParameters($proto, $cs)); } return ($cs, $proto); } # If a ReadFoo function exists (accessible from LaTeXML::Package::Pool), # then the parameter spec: # Foo : will invoke it and use the result for the corresponding argument. # it will complain if ReadFoo returns undef. # SkipFoo : will invoke SkipFoo, if it is defined, else ReadFoo, # but in either case, will ignore the result # OptionalFoo : will invoke ReadOptionalFoo if defined, else ReadFoo # but will not complain if the reader returns undef. # In all cases, there is the provision to supply an additional parameter to the reader: # "Foo:stuff" effectively invokes ReadFoo(Tokenize('stuff')) # similarly for the other variants. What the 'stuff" means depends on the type. sub parseParameters { my ($proto, $for) = @_; my $p = $proto; my @params = (); while ($p) { # Handle possibly nested cases, such as {Number} if ($p =~ s/^(\{([^\}]*)\})\s*//) { my ($spec, $inner_spec) = ($1, $2); my $inner = ($inner_spec ? parseParameters($inner_spec, $for) : undef); push(@params, LaTeXML::Core::Parameter->new('Plain', $spec, extra => [$inner])); } elsif ($p =~ s/^(\[([^\]]*)\])\s*//) { # Ditto for Optional my ($spec, $inner_spec) = ($1, $2); if ($inner_spec =~ /^Default:(.*)$/) { push(@params, LaTeXML::Core::Parameter->new('Optional', $spec, extra => [TokenizeInternal($1), undef])); } elsif ($inner_spec) { push(@params, LaTeXML::Core::Parameter->new('Optional', $spec, extra => [undef, parseParameters($inner_spec, $for)])); } else { push(@params, LaTeXML::Core::Parameter->new('Optional', $spec)); } } elsif ($p =~ s/^((\w*)(:([^\s\{\[]*))?)\s*//) { my ($spec, $type, $extra) = ($1, $2, $4); my @extra = map { TokenizeInternal($_) } split('\|', $extra || ''); push(@params, LaTeXML::Core::Parameter->new($type, $spec, extra => [@extra])); } else { Fatal('misdefined', $for, undef, "Unrecognized parameter specification at \"$proto\""); } } return (@params ? LaTeXML::Core::Parameters->new(@params) : undef); } # Convert a LaTeX-style argument spec to our Package form. # Ie. given $nargs and $optional, being the two optional arguments to # something like \newcommand, convert it to the form we use sub convertLaTeXArgs { my ($nargs, $optional) = @_; $nargs = $nargs->toString if ref $nargs; $nargs = 0 unless $nargs; my @params = (); if ($optional) { push(@params, LaTeXML::Core::Parameter->new('Optional', "[Default:" . UnTeX($optional) . "]", extra => [$optional, undef])); $nargs--; } push(@params, map { LaTeXML::Core::Parameter->new('Plain', '{}') } 1 .. $nargs); return (@params ? LaTeXML::Core::Parameters->new(@params) : undef); } #====================================================================== # Convenience functions for writing definitions. #====================================================================== sub LookupValue { my ($name) = @_; return $STATE->lookupValue($name); } sub AssignValue { my ($name, $value, $scope) = @_; $STATE->assignValue($name, $value, $scope); return; } sub PushValue { my ($name, @values) = @_; $STATE->pushValue($name, @values); return; } sub PopValue { my ($name) = @_; return $STATE->popValue($name); } sub UnshiftValue { my ($name, @values) = @_; $STATE->unshiftValue($name, @values); return; } sub ShiftValue { my ($name) = @_; return $STATE->shiftValue($name); } sub LookupMapping { my ($map, $key) = @_; return $STATE->lookupMapping($map, $key); } sub AssignMapping { my ($map, $key, $value) = @_; return $STATE->assignMapping($map, $key, $value); } sub LookupMappingKeys { my ($map) = @_; return $STATE->lookupMappingKeys($map); } sub LookupCatcode { my ($char) = @_; return $STATE->lookupCatcode($char); } sub AssignCatcode { my ($char, $catcode, $scope) = @_; $STATE->assignCatcode($char, $catcode, $scope); return; } sub LookupMeaning { my ($name) = @_; return $STATE->lookupMeaning($name); } sub LookupDefinition { my ($name) = @_; return $STATE->lookupDefinition($name); } sub InstallDefinition { my ($name, $definition, $scope) = @_; $STATE->installDefinition($name, $definition, $scope); return } sub XEquals { my ($token1, $token2) = @_; my $def1 = LookupMeaning($token1); my $def2 = LookupMeaning($token2); if (defined $def1 != defined $def2) { # False, if they don't both have defs or both not have defs return; } elsif (!defined $def1 && !defined $def2) { # If neither have defs, then must have same catcode & chars return ($token1->getCatcode == $token2->getCatcode) && ($token1->getCharcode == $token2->getCharcode); } elsif ($def1->equals($def2)) { # If both have defns, must be same defn! return 1; } return; } sub LookupMathcode { my ($char) = @_; return $STATE->lookupMathcode($char); } sub AssignMathcode { my ($char, $mathcode, $scope) = @_; $STATE->assignMathcode($char, $mathcode, $scope); return; } sub LookupSFcode { my ($char) = @_; return $STATE->lookupSFcode($char); } sub AssignSFcode { my ($char, $sfcode, $scope) = @_; $STATE->assignSFcode($char, $sfcode, $scope); return; } sub LookupLCcode { my ($char) = @_; return $STATE->lookupLCcode($char); } sub AssignLCcode { my ($char, $lccode, $scope) = @_; $STATE->assignLCcode($char, $lccode, $scope); return; } sub LookupUCcode { my ($char) = @_; return $STATE->lookupUCcode($char); } sub AssignUCcode { my ($char, $uccode, $scope) = @_; $STATE->assignUCcode($char, $uccode, $scope); return; } sub LookupDelcode { my ($char) = @_; return $STATE->lookupDelcode($char); } sub AssignDelcode { my ($char, $delcode, $scope) = @_; $STATE->assignDelcode($char, $delcode, $scope); return; } sub Let { my ($token1, $token2, $scope) = @_; # If strings are given, assume CS tokens (most common case) $token1 = T_CS($token1) unless ref $token1; $token2 = T_CS($token2) unless ref $token2; $STATE->assignMeaning($token1, $STATE->lookupMeaning($token2), $scope); AfterAssignment(); return; } sub Digest { my (@stuff) = @_; return $STATE->getStomach->digest(Tokens(map { (ref $_ ? $_ : TokenizeInternal($_)) } @stuff)); } sub DigestText { my (@stuff) = @_; my $stomach = $STATE->getStomach; $stomach->beginMode('text'); my $result = $stomach->digest(Tokens(map { (ref $_ ? $_ : TokenizeInternal($_)) } @stuff)); $stomach->endMode('text'); return $result; } # probably need to export this, as well? sub DigestLiteral { my (@stuff) = @_; # Perhaps should do StartSemiverbatim, but is it safe to push a frame? (we might cover over valid changes of state!) my $stomach = $STATE->getStomach; $stomach->beginMode('text'); my $font = LookupValue('font'); AssignValue(font => $font->merge(encoding => 'ASCII'), 'local'); # try to stay as ASCII as possible my $value = $STATE->getStomach->digest(Tokens(map { (ref $_ ? $_ : Tokenize($_)) } @stuff)); AssignValue(font => $font); $stomach->endMode('text'); return $value; } sub DigestIf { my ($token) = @_; $token = T_CS($token) unless ref $token; if (my $defn = LookupDefinition($token)) { return $STATE->getStomach->digest($token); } else { return; } } sub ReadParameters { my ($gullet, $spec) = @_; my $for = T_OTHER("Anonymous"); my $parm = parseParameters($spec, $for); return ($parm ? $parm->readArguments($gullet, $for) : ()); } # This new declaration allows you to define the type associated with # the value for specific keys. sub DefKeyVal { my ($keyset, $key, $type, $default) = @_; my $paramlist = LaTeXML::Package::parseParameters($type, "KeyVal $key in set $keyset"); AssignValue('KEYVAL@' . $keyset . '@' . $key => $paramlist); AssignValue('KEYVAL@' . $keyset . '@' . $key . '@default' => Tokenize($default)) if defined $default; return; } # These functions allow convenient access to KeyVal objects within constructors. # Access the value associated with a given key. # Can use in constructor: eg. sub GetKeyVal { my ($keyval, $key) = @_; return (defined $keyval) && $keyval->getValue($key); } # Access the entire hash. # Can use in constructor: sub GetKeyVals { my ($keyval) = @_; return (defined $keyval ? $keyval->getKeyVals : {}); } # Merge the current font with the style specifications sub MergeFont { my (@kv) = @_; AssignValue(font => LookupValue('font')->merge(@kv), 'local'); return; } # Dumb place for this, but where else... # The TeX way! (bah!! hint: try a large number) my @rmletters = ('i', 'v', 'x', 'l', 'c', 'd', 'm'); # [CONSTANT] sub roman_aux { my ($n) = @_; my $div = 1000; my $s = ($n > $div ? ('m' x int($n / $div)) : ''); my $p = 4; while ($n %= $div) { $div /= 10; my $d = int($n / $div); if ($d % 5 == 4) { $s .= $rmletters[$p]; $d++; } if ($d > 4) { $s .= $rmletters[$p + int($d / 5)]; $d %= 5; } if ($d) { $s .= $rmletters[$p] x $d; } $p -= 2; } return $s; } # Convert the number to lower case roman numerals, returning a list of LaTeXML::Core::Token sub roman { my (@stuff) = @_; return ExplodeText(roman_aux(@stuff)); } # Convert the number to upper case roman numerals, returning a list of LaTeXML::Core::Token sub Roman { my (@stuff) = @_; return ExplodeText(uc(roman_aux(@stuff))); } #====================================================================== # Cleaners #====================================================================== sub CleanID { my ($key) = @_; $key = ToString($key); $key =~ s/^\s+//s; $key =~ s/\s+$//s; # Trim leading/trailing, in any case $key =~ s/\s//sg; # Remove common idiom: $key =~ s/\$\{\}\^\{(.*?)\}\$/$1/g; # transform some forbidden chars $key =~ s/:/../g; # No colons! $key =~ s/@/-at-/g; $key =~ s/\*/-star-/g; $key =~ s/\$/-dollar-/g; $key =~ s/,/-comma-/g; $key =~ s/%/-pct-/g; $key =~ s/&/-amp-/g; $key =~ s/[^\w\_\-.]//g; # remove everything else. return $key; } sub CleanLabel { my ($label, $prefix) = @_; my $key = ToString($label); $key =~ s/^\s+//s; $key =~ s/\s+$//s; # Trim leading/trailing, in any case $key =~ s/\s+/_/sg; return ($prefix || "LABEL") . ":" . $key; } sub CleanIndexKey { my ($key) = @_; $key = ToString($key); $key =~ s/^\s+//s; $key =~ s/\s+$//s; # Trim leading/trailing, in any case # We don't want accented chars (do we?) but we need to decompose the accents! $key = NFD($key); $key =~ s/[^a-zA-Z0-9]//g; $key = NFC($key); # just to be safe(?) ## Shouldn't be case insensitive? ## $key =~ tr|A-Z|a-z|; return $key; } sub CleanBibKey { my ($key) = @_; $key = ToString($key); # Originally lc() here, but let's preserve case till Postproc. $key =~ s/^\s+//s; $key =~ s/\s+$//s; # Trim leading/trailing, in any case $key =~ s/\s//sg; return $key; } # Return the bibkey in a form to ACTUALLY lookup. # Usually use CleanBibKey to preserve key in the original form (case) sub NormalizeBibKey { my ($key) = @_; return ($key ? lc(CleanBibKey($key)) : undef); } sub CleanURL { my ($url) = @_; $url = ToString($url); $url =~ s/^\s+//s; $url =~ s/\s+$//s; # Trim leading/trailing, in any case $url =~ s/\\~\{\}/~/g; return $url; } #====================================================================== # Defining new Control-sequence Parameter types. #====================================================================== my $parameter_options = { # [CONSTANT] nargs => 1, reversion => 1, optional => 1, novalue => 1, semiverbatim => 1, undigested => 1 }; sub DefParameterType { my ($type, $reader, %options) = @_; CheckOptions("DefParameterType $type", $parameter_options, %options); AssignMapping('PARAMETER_TYPES', $type, { reader => $reader, %options }); return; } sub DefColumnType { my ($proto, $expansion) = @_; if ($proto =~ s/^(.)//) { my $char = $1; $proto =~ s/^\s*//; # Defer # $proto = parseParameters($proto, $char); # $expansion = TokenizeInternal($expansion) unless ref $expansion; DefMacroI(T_CS('\NC@rewrite@' . $char), $proto, $expansion); } else { Warn('expected', 'character', undef, "Expected Column specifier"); } return; } #====================================================================== # Counters #====================================================================== # This is modelled on LaTeX's counter mechanisms, but since it also # provides support for ID's, even where there is no visible reference number, # it is defined in genera. # These id's should be both unique, and parallel the visible reference numbers # (as much as possible). Also, for consistency, we add id's to unnumbered # document elements (eg from \section*); this requires an additional counter # (eg. UNsection) and mechanisms to track it. # Defines a new counter named $ctr. # If $within is defined, $ctr will be reset whenever $within is incremented. # Keywords: # idprefix : specifies a prefix to be used in formatting ID's for document structure elements # counted by this counter. Ie. subsection 3 in section 2 might get: id="S2.SS3" # idwithin : specifies that the ID is composed from $idwithin's ID,, even though # the counter isn't numbered within it. (mainly to avoid duplicated ids) # nested : a list of counters that correspond to scopes which are "inside" this one. # Whenever any definitions scoped to this counter are deactivated, # the inner counter's scopes are also deactivated. # NOTE: I'm not sure this is even a sensible implementation, # or why inner should be different than the counters reset by incrementing this counter. sub NewCounter { my ($ctr, $within, %options) = @_; my $unctr = "UN$ctr"; # UNctr is counter for generating ID's for UN-numbered items. DefRegisterI(T_CS("\\c\@$ctr"), undef, Number(0)); AssignValue("\\c\@$ctr" => Number(0), 'global'); AfterAssignment(); AssignValue("\\cl\@$ctr" => Tokens(), 'global') unless LookupValue("\\cl\@$ctr"); DefRegisterI(T_CS("\\c\@$unctr"), undef, Number(0)); AssignValue("\\c\@$unctr" => Number(0), 'global'); AssignValue("\\cl\@$unctr" => Tokens(), 'global') unless LookupValue("\\cl\@$unctr"); my $x; AssignValue("\\cl\@$within" => Tokens(T_CS($ctr), T_CS($unctr), (($x = LookupValue("\\cl\@$within")) ? $x->unlist : ())), 'global') if $within; AssignValue("\\cl\@UN$within" => Tokens(T_CS($unctr), (($x = LookupValue("\\cl\@UN$within")) ? $x->unlist : ())), 'global') if $within; AssignValue('nested_counters_' . $ctr => $options{nested}, 'global') if $options{nested}; # default is equivalent to \arabic{ctr}, but w/o using the LaTeX macro! DefMacroI(T_CS("\\the$ctr"), undef, sub { ExplodeText(CounterValue($ctr)->valueOf); }, scope => 'global'); my $prefix = $options{idprefix}; AssignValue('@ID@prefix@' . $ctr => $prefix, 'global') if $prefix; $prefix = LookupValue('@ID@prefix@' . $ctr) || CleanID($ctr) unless $prefix; if (defined $prefix) { if (my $idwithin = $options{idwithin} || $within) { DefMacroI(T_CS("\\the$ctr\@ID"), undef, "\\expandafter\\ifx\\csname the$idwithin\@ID\\endcsname\\\@empty" . "\\else\\csname the$idwithin\@ID\\endcsname.\\fi" . " $prefix\\csname \@$ctr\@ID\\endcsname", scope => 'global'); } else { DefMacroI(T_CS("\\the$ctr\@ID"), undef, "$prefix\\csname \@$ctr\@ID\\endcsname", scope => 'global'); } DefMacroI(T_CS("\\\@$ctr\@ID"), undef, "0", scope => 'global'); } return; } sub CounterValue { my ($ctr) = @_; $ctr = ToString($ctr) if ref $ctr; my $value = LookupValue('\c@' . $ctr); if (!$value) { Warn('undefined', $ctr, $STATE->getStomach, "Counter '$ctr' was not defined; assuming 0"); $value = Number(0); } return $value; } sub AfterAssignment { if (my $after = $STATE->lookupValue('afterAssignment')) { $STATE->assignValue(afterAssignment => undef, 'global'); $STATE->getStomach->getGullet->unread($after); } # primitive returns boxes, so these need to be digested! return; } sub SetCounter { my ($ctr, $value) = @_; $ctr = ToString($ctr) if ref $ctr; AssignValue('\c@' . $ctr => $value, 'global'); AfterAssignment(); DefMacroI(T_CS("\\\@$ctr\@ID"), undef, Tokens(Explode($value->valueOf)), scope => 'global'); return; } sub AddToCounter { my ($ctr, $value) = @_; $ctr = ToString($ctr) if ref $ctr; my $v = CounterValue($ctr)->add($value); AssignValue('\c@' . $ctr => $v, 'global'); AfterAssignment(); DefMacroI(T_CS("\\\@$ctr\@ID"), undef, Tokens(Explode($v->valueOf)), scope => 'global'); return; } sub StepCounter { my ($ctr) = @_; my $value = CounterValue($ctr); AssignValue("\\c\@$ctr" => $value->add(Number(1)), 'global'); AfterAssignment(); DefMacroI(T_CS("\\\@$ctr\@ID"), undef, Tokens(Explode(LookupValue('\c@' . $ctr)->valueOf)), scope => 'global'); # and reset any within counters! if (my $nested = LookupValue("\\cl\@$ctr")) { foreach my $c ($nested->unlist) { ResetCounter(ToString($c)); } } DigestIf(T_CS("\\the$ctr")); return; } # HOW can we retract this? sub RefStepCounter { my ($ctr) = @_; StepCounter($ctr); my $iddef = LookupDefinition(T_CS("\\the$ctr\@ID")); my $has_id = $iddef && ((!defined $iddef->getParameters) || ($iddef->getParameters->getNumArgs == 0)); DefMacroI(T_CS('\@currentlabel'), undef, T_CS("\\the$ctr"), scope => 'global'); DefMacroI(T_CS('\@currentID'), undef, T_CS("\\the$ctr\@ID"), scope => 'global') if $has_id; ### my $id = $has_id && ToString(Digest($idtokens)); # my $id = $has_id && ToString(DigestLiteral($idtokens)); my $id = $has_id && ToString(DigestLiteral(T_CS("\\the$ctr\@ID"))); # my $refnum = ToString(Digest(T_CS("\\the$ctr"))); # my $frefnum = ToString(Digest(Invocation(T_CS('\lx@fnum@@'),$ctr))); # my $rrefnum = ToString(Digest(Invocation(T_CS('\lx@refnum@@'),$ctr))); my $refnum = DigestText(T_CS("\\the$ctr")); my $frefnum = DigestText(Invocation(T_CS('\lx@fnum@@'), $ctr)); my $rrefnum = DigestText(Invocation(T_CS('\lx@refnum@@'), $ctr)); my $s_refnum = ToString($refnum); my $s_frefnum = ToString($frefnum); my $s_rrefnum = ToString($rrefnum); # Any scopes activated for previous value of this counter (& any nested counters) must be removed. # This may also include scopes activated for \label deactivateCounterScope($ctr); # And install the scope (if any) for this reference number. AssignValue(current_counter => $ctr, 'local'); AssignValue('scopes_for_counter:' . $ctr => [$ctr . ':' . $s_refnum], 'local'); $STATE->activateScope($ctr . ':' . $s_refnum); return (refnum => $refnum, ($frefnum && (!$refnum || ($s_frefnum ne $s_refnum)) ? (frefnum => $frefnum) : ()), ($rrefnum && ($frefnum ? ($s_rrefnum ne $s_frefnum) : (!$refnum || ($s_rrefnum ne $s_refnum))) ? (rrefnum => $rrefnum) : ()), ($has_id ? (id => $id) : ())); } sub deactivateCounterScope { my ($ctr) = @_; # print STDERR "Unusing scopes for $ctr\n"; if (my $scopes = LookupValue('scopes_for_counter:' . $ctr)) { map { $STATE->deactivateScope($_) } @$scopes; } foreach my $inner_ctr (@{ LookupValue('nested_counters_' . $ctr) || [] }) { deactivateCounterScope($inner_ctr); } return; } # For UN-numbered units sub RefStepID { my ($ctr) = @_; my $unctr = "UN$ctr"; StepCounter($unctr); DefMacroI(T_CS("\\\@$ctr\@ID"), undef, Tokens(T_OTHER('x'), Explode(LookupValue('\c@' . $unctr)->valueOf)), scope => 'global'); DefMacroI(T_CS('\@currentID'), undef, T_CS("\\the$ctr\@ID")); return (id => ToString(DigestLiteral(T_CS("\\the$ctr\@ID")))); } sub ResetCounter { my ($ctr) = @_; AssignValue('\c@' . $ctr => Number(0), 'global'); # and reset any within counters! if (my $nested = LookupValue("\\cl\@$ctr")) { foreach my $c ($nested->unlist) { ResetCounter(ToString($c)); } } return; } #********************************************************************** # This function computes an xml:id for a node, if it hasn't already got one. # It is suitable for use in Tag afterOpen as # Tag('ltx:para',afterOpen=>sub { GenerateID(@_,'p'); }); # It generates an id of the form . # The parent node (the one with ID=) also maintains a counter # stored in an attribute _ID_counter_ recording the last used # for amongst its descendents. sub GenerateID { my ($document, $node, $whatsit, $prefix) = @_; # If node doesn't already have an id, and can if (!$node->hasAttribute('xml:id') && $document->canHaveAttribute($node, 'xml:id') # but isn't a _Capture_ node (which ultimately should disappear) && ($document->getNodeQName($node) ne 'ltx:_Capture_')) { my $ancestor = $document->findnode('ancestor::*[@xml:id][1]', $node) || $document->getDocument->documentElement; ## Old versions don't like $ancestor->getAttribute('xml:id'); my $ancestor_id = $ancestor && $ancestor->getAttributeNS("http://www.w3.org/XML/1998/namespace", 'id'); # If we've got no $ancestor_id, then we've got no $ancestor (no document yet!), # or $ancestor IS the root element (but without an id); # If we also have no $prefix, we'll end up with an illegal id (just digits)!!! # We'll use "id" for an id prefix; this will work whether or not we have an $ancestor. $prefix = 'id' unless $prefix || $ancestor_id; my $ctrkey = '_ID_counter_' . (defined $prefix ? $prefix . '_' : ''); my $ctr = ($ancestor && $ancestor->getAttribute($ctrkey)) || 0; my $id = ($ancestor_id ? $ancestor_id . "." : '') . (defined $prefix ? $prefix : '') . (++$ctr); $ancestor->setAttribute($ctrkey => $ctr) if $ancestor; $document->setAttribute($node, 'xml:id' => $id); } return; } #====================================================================== # #====================================================================== # Return $tokens with all tokens expanded sub Expand { my (@tokens) = @_; return () unless @tokens; return $STATE->getStomach->getGullet->readingFromMouth(LaTeXML::Core::Mouth->new(), sub { my ($gullet) = @_; $gullet->unread(@tokens); my @expanded = (); while (defined(my $t = $gullet->readXToken(0))) { push(@expanded, $t); } return Tokens(@expanded); }); } sub Invocation { my ($token, @args) = @_; if (my $defn = LookupDefinition((ref $token ? $token : T_CS($token)))) { return Tokens($defn->invocation(@args)); } else { Fatal('undefined', $token, undef, "Can't invoke " . Stringify($token) . "; it is undefined"); return Tokens(); } } sub RawTeX { my ($text) = @_; # It could be as simple as this, except if catcodes get changed, it's too late!!! # Digest(TokenizeInternal($text)); my $stomach = $STATE->getStomach; my $savedcc = $STATE->lookupCatcode('@'); $STATE->assignCatcode('@' => CC_LETTER); $stomach->getGullet->readingFromMouth(LaTeXML::Core::Mouth->new($text), sub { my ($gullet) = @_; my $token; while ($token = $gullet->readXToken(0)) { next if $token->equals(T_SPACE); $stomach->invokeToken($token); } }); $STATE->assignCatcode('@' => $savedcc); return; } sub StartSemiverbatim { $STATE->beginSemiverbatim; return; } sub EndSemiverbatim { $STATE->endSemiverbatim; return; } # WARNING: These two utilities bind $STATE to simple State objects with known fixed catcodes. # The State normally contains ALL the bindings, etc and links to other important objects. # We CAN do that here, since we are ONLY tokenizing from a new Mouth, bypassing stomach & gullet. # However, be careful with any changes. our $STD_CATTABLE; our $STY_CATTABLE; # Tokenize($string); Tokenizes the string using the standard cattable, returning a LaTeXML::Core::Tokens sub Tokenize { my ($string) = @_; $STD_CATTABLE = LaTeXML::Core::State->new(catcodes => 'standard') unless $STD_CATTABLE; local $STATE = $STD_CATTABLE; return LaTeXML::Core::Mouth->new($string)->readTokens; } # TokenizeInternal($string); Tokenizes the string using the internal cattable, returning a LaTeXML::Core::Tokens sub TokenizeInternal { my ($string) = @_; $STY_CATTABLE = LaTeXML::Core::State->new(catcodes => 'style') unless $STY_CATTABLE; local $STATE = $STY_CATTABLE; return LaTeXML::Core::Mouth->new($string)->readTokens; } #====================================================================== # Non-exported support for defining forms. #====================================================================== sub CheckOptions { my ($operation, $allowed, %options) = @_; my @badops = grep { !$$allowed{$_} } keys %options; Error('misdefined', $operation, $STATE->getStomach, "$operation does not accept options:" . join(', ', @badops)) if @badops; return; } sub requireMath { my ($cs) = @_; $cs = ToString($cs); Warn('unexpected', $cs, $STATE->getStomach, "$cs should only appear in math mode") unless LookupValue('IN_MATH'); return; } sub forbidMath { my ($cs) = @_; $cs = ToString($cs); Warn('unexpected', $cs, $STATE->getStomach, "$cs should not appear in math mode") if LookupValue('IN_MATH'); return; } #********************************************************************** # Definitions #********************************************************************** #====================================================================== # Defining Expandable Control Sequences. #====================================================================== # Define an expandable control sequence. It will be expanded in the Gullet. # The $replacement should be a LaTeXML::Core::Tokens (the arguments will be # substituted for any #1,...), or a sub which returns a list of tokens (or just return;). # Those tokens, if any, will be reinserted into the input. # There are no options to these definitions. my $expandable_options = { # [CONSTANT] scope => 1, locked => 1 }; sub DefExpandable { my ($proto, $expansion, %options) = @_; Warn('deprecated', 'DefExpandable', $STATE->getStomach, "DefExpandable ($proto) is deprecated; use DefMacro"); DefMacro($proto, $expansion, %options); return; } # Define a Macro: Essentially an alias for DefExpandable # For convenience, the $expansion can be a string which will be tokenized. my $macro_options = { # [CONSTANT] scope => 1, locked => 1, mathactive => 1 }; sub DefMacro { my ($proto, $expansion, %options) = @_; CheckOptions("DefMacro ($proto)", $macro_options, %options); DefMacroI(parsePrototype($proto), $expansion, %options); return; } sub DefMacroI { my ($cs, $paramlist, $expansion, %options) = @_; if (!defined $expansion) { $expansion = Tokens(); } # Optimization: Defer till macro actually used # elsif (!ref $expansion) { $expansion = TokenizeInternal($expansion); } if ((length($cs) == 1) && $options{mathactive}) { $STATE->assignMathcode($cs => 0x8000, $options{scope}); } $cs = coerceCS($cs); ### $paramlist = parseParameters($paramlist, $cs) if defined $paramlist && !ref $paramlist; $STATE->installDefinition(LaTeXML::Core::Definition::Expandable->new($cs, $paramlist, $expansion, %options), $options{scope}); AssignValue(ToString($cs) . ":locked" => 1, 'global') if $options{locked}; return; } #====================================================================== # Defining Conditional Control Sequences. #====================================================================== # Define a conditional control sequence. Its processing takes place in # the Gullet. The test is applied to the arguments (if any), # which determines which branch is executed. # If the test is undefined, the conditional is a "user defined" one; # Two additional primitives are defined \footrue and \foofalse; # the test is then determined by the most recently called of those. # If you supply a skipper instead of a test, it is also applied to the arguments # and should skip to the right place in the following \or, \else, \fi. # This is ONLY used for \ifcase. my $conditional_options = { # [CONSTANT] scope => 1, locked => 1, skipper => 1 }; sub DefConditional { my ($proto, $test, %options) = @_; CheckOptions("DefConditional ($proto)", $conditional_options, %options); DefConditionalI(parsePrototype($proto), $test, %options); return; } sub DefConditionalI { my ($cs, $paramlist, $test, %options) = @_; $cs = coerceCS($cs); my $csname = ToString($cs); # Special cases... if ($csname eq '\fi') { $STATE->installDefinition(LaTeXML::Core::Definition::Conditional->new( $cs, undef, undef, conditional_type => 'fi', %options), $options{scope}); } elsif ($csname eq '\else') { $STATE->installDefinition(LaTeXML::Core::Definition::Conditional->new( $cs, undef, undef, conditional_type => 'else', %options), $options{scope}); } elsif ($csname eq '\or') { $STATE->installDefinition(LaTeXML::Core::Definition::Conditional->new( $cs, undef, undef, conditional_type => 'or', %options), $options{scope}); } elsif ($csname =~ /^\\(?:if(.*)|unless)$/) { my $name = $1; if ((defined $name) && ($name ne 'case') && (!defined $test)) { # user-defined conditional, like with \newif DefMacroI(T_CS('\\' . $name . 'true'), undef, Tokens(T_CS('\let'), $cs, T_CS('\iftrue'))); DefMacroI(T_CS('\\' . $name . 'false'), undef, Tokens(T_CS('\let'), $cs, T_CS('\iffalse'))); Let($cs, T_CS('\iffalse')); } else { # For \ifcase, the parameter list better be a single Number !! $STATE->installDefinition(LaTeXML::Core::Definition::Conditional->new($cs, $paramlist, $test, conditional_type => 'if', %options), $options{scope}); } } else { Error('misdefined', $cs, $STATE->getStomach, "The conditional " . Stringify($cs) . " is being defined but doesn't start with \\if"); } AssignValue(ToString($cs) . ":locked" => 1) if $options{locked}; return; } sub IfCondition { my ($if, @args) = @_; my $gullet = $STATE->getStomach->getGullet; $if = coerceCS($if); my ($defn, $test); if (($defn = $STATE->lookupMeaning($if)) && (($$defn{conditional_type} || '') eq 'if') && ($test = $defn->getTest)) { return &$test($gullet, @args); } else { Error('expected', 'conditional', $gullet, "Expected a conditional, got '" . ToString($if) . "'"); return; } } #====================================================================== # Define a primitive control sequence. #====================================================================== # Primitives are executed in the Stomach. # The $replacement should be a sub which returns nothing, or a list of Box's or Whatsit's. # The options are: # isPrefix : 1 for things like \global, \long, etc. # registerType : for parameters (but needs to be worked into DefParameter, below). my $primitive_options = { # [CONSTANT] isPrefix => 1, scope => 1, mode => 1, font => 1, requireMath => 1, forbidMath => 1, beforeDigest => 1, afterDigest => 1, bounded => 1, locked => 1, alias => 1 }; sub DefPrimitive { my ($proto, $replacement, %options) = @_; CheckOptions("DefPrimitive ($proto)", $primitive_options, %options); DefPrimitiveI(parsePrototype($proto), $replacement, %options); return; } sub DefPrimitiveI { my ($cs, $paramlist, $replacement, %options) = @_; ##### $replacement = sub { (); } unless defined $replacement; my $string = $replacement; $replacement = sub { Box($string, undef, undef, Invocation($options{alias} || $cs, @_[1 .. $#_])); } unless ref $replacement; $cs = coerceCS($cs); ### $paramlist = parseParameters($paramlist, $cs) if defined $paramlist && !ref $paramlist; my $mode = $options{mode}; my $bounded = $options{bounded}; $STATE->installDefinition(LaTeXML::Core::Definition::Primitive ->new($cs, $paramlist, $replacement, beforeDigest => flatten(($options{requireMath} ? (sub { requireMath($cs); }) : ()), ($options{forbidMath} ? (sub { forbidMath($cs); }) : ()), ($mode ? (sub { $_[0]->beginMode($mode); }) : ($bounded ? (sub { $_[0]->bgroup; }) : ())), ($options{font} ? (sub { MergeFont(%{ $options{font} }); }) : ()), $options{beforeDigest}), afterDigest => flatten($options{afterDigest}, ($mode ? (sub { $_[0]->endMode($mode) }) : ($bounded ? (sub { $_[0]->egroup; }) : ()))), isPrefix => $options{isPrefix}), $options{scope}); AssignValue(ToString($cs) . ":locked" => 1) if $options{locked}; return; } my $register_options = { # [CONSTANT] readonly => 1, getter => 1, setter => 1 }; my %register_types = ( # [CONSTANT] 'LaTeXML::Common::Number' => 'Number', 'LaTeXML::Common::Dimension' => 'Dimension', 'LaTeXML::Common::Glue' => 'Glue', 'LaTeXML::Core::MuGlue' => 'MuGlue', 'LaTeXML::Core::Tokens' => 'Tokens', 'LaTeXML::Core::Token' => 'Token', ); sub DefRegister { my ($proto, $value, %options) = @_; CheckOptions("DefRegister ($proto)", $register_options, %options); DefRegisterI(parsePrototype($proto), $value, %options); return; } sub DefRegisterI { my ($cs, $paramlist, $value, %options) = @_; $cs = coerceCS($cs); ### $paramlist = parseParameters($paramlist, $cs) if defined $paramlist && !ref $paramlist; my $type = $register_types{ ref $value }; my $name = ToString($cs); my $getter = $options{getter} || sub { LookupValue(join('', $name, map { ToString($_) } @_)) || $value; }; my $setter = $options{setter} || ($options{readonly} ? sub { my ($v, @args) = @_; Error('unexpected', $name, $STATE->getStomach, "Can't assign to register $name"); return; } : sub { my ($v, @args) = @_; AssignValue(join('', $name, map { ToString($_) } @args) => $v); }); # Not really right to set the value! AssignValue(ToString($cs) => $value) if defined $value; $STATE->installDefinition(LaTeXML::Core::Definition::Register->new($cs, $paramlist, registerType => $type, getter => $getter, setter => $setter, readonly => $options{readonly}), 'global'); return; } sub flatten { my (@stuff) = @_; return [map { (defined $_ ? (ref $_ eq 'ARRAY' ? @$_ : ($_)) : ()) } @stuff]; } #====================================================================== # Define a constructor control sequence. #====================================================================== # The arguments, if any, will be collected and processed in the Stomach, and # a Whatsit will be constructed. # It is the Whatsit that will be processed in the Document: It is responsible # for constructing XML Nodes. The $replacement should be a sub which inserts nodes, # or a string specifying a constructor pattern (See somewhere). # # Options are: # bounded : any side effects of before/after daemans are bounded; they are # automatically enclosed by bgroup/egroup pair. # mode : causes a switch into the given mode during the Whatsit building in the stomach. # reversion : a string representing the preferred TeX form of the invocation. # beforeDigest : code to be executed (in the stomach) before parsing & constructing the Whatsit. # Can be used for changing modes, beginning groups, etc. # afterDigest : code to be executed (in the stomach) after parsing & constructing the Whatsit. # useful for setting Whatsit properties, # properties : a hashref listing default values of properties to assign to the Whatsit. # These properties can be used in the constructor. my $constructor_options = { # [CONSTANT] mode => 1, requireMath => 1, forbidMath => 1, font => 1, alias => 1, reversion => 1, sizer => 1, properties => 1, nargs => 1, beforeDigest => 1, afterDigest => 1, beforeConstruct => 1, afterConstruct => 1, captureBody => 1, scope => 1, bounded => 1, locked => 1 }; sub inferSizer { my ($sizer, $reversion) = @_; return (defined $sizer ? $sizer : ((defined $reversion) && (!ref $reversion) && ($reversion =~ /^(?:#\w+)*$/) ? $reversion : undef)); } sub DefConstructor { my ($proto, $replacement, %options) = @_; CheckOptions("DefConstructor ($proto)", $constructor_options, %options); DefConstructorI(parsePrototype($proto), $replacement, %options); return; } sub DefConstructorI { my ($cs, $paramlist, $replacement, %options) = @_; $cs = coerceCS($cs); ### $paramlist = parseParameters($paramlist, $cs) if defined $paramlist && !ref $paramlist; my $mode = $options{mode}; my $bounded = $options{bounded}; $STATE->installDefinition(LaTeXML::Core::Definition::Constructor ->new($cs, $paramlist, $replacement, beforeDigest => flatten(($options{requireMath} ? (sub { requireMath($cs); }) : ()), ($options{forbidMath} ? (sub { forbidMath($cs); }) : ()), ($mode ? (sub { $_[0]->beginMode($mode); }) : ($bounded ? (sub { $_[0]->bgroup; }) : ())), ($options{font} ? (sub { MergeFont(%{ $options{font} }); }) : ()), $options{beforeDigest}), afterDigest => flatten($options{afterDigest}, ($mode ? (sub { $_[0]->endMode($mode) }) : ($bounded ? (sub { $_[0]->egroup; }) : ()))), beforeConstruct => flatten($options{beforeConstruct}), afterConstruct => flatten($options{afterConstruct}), nargs => $options{nargs}, alias => $options{alias}, reversion => $options{reversion}, sizer => inferSizer($options{sizer}, $options{reversion}), captureBody => $options{captureBody}, properties => $options{properties} || {}), $options{scope}); AssignValue(ToString($cs) . ":locked" => 1) if $options{locked}; return; } #====================================================================== # Support for XMDual # Perhaps it would be better to use a label(-like) indirection here, # so all ID's can stay in the desired format? sub getXMArgID { StepCounter('@XMARG'); DefMacroI(T_CS('\@@XMARG@ID'), undef, Tokens(Explode(LookupValue('\c@@XMARG')->valueOf)), scope => 'global'); return Expand(T_CS('\the@XMARG@ID')); } # Given a list of Tokens (to be expanded into mathematical objects) # return two lists: # (1) The Tokens' wrapped in an XMAarg, with an ID added # (2) a corresponding list of Tokens creating XMRef's to those IDs # Ah, but there are complications!!! # On the one hand, arguments may be hidden, never appearing on the presentation side # (all will be passed to the content side); This argues for putting the XMArg's on the content side. # OTOH, they ought to be on the presentation side, so that they can be expanded & digested in # the proper context they will be presented, and pick up all the styling (font size, displaystyle..) # I don't know how to work around the latter, so we'll put args on the presentation side, # UNLESS they are hidden, in which case they'll be on the content side. # So, how do we know if they're hidden? We'll scan the presentation for #\d, that's how! sub dualize_arglist { my ($presentation, @args) = @_; my %used = (); $presentation = ToString($presentation); $presentation =~ s/#(\d)/{ $used{$1}++; }/ge; # Get the args that were actually used! my (@cargs, @pargs); my $i = 0; foreach my $arg (@args) { $i++; if (!(defined $arg) || !$arg->unlist) { # undefined or empty args, just pass through push(@pargs, $arg); push(@cargs, $arg); } elsif ($used{$i}) { # used in presentation? my $id = getXMArgID(); push(@pargs, Invocation(T_CS('\@XMArg'), $id, $arg)); # put XMArg in presentation push(@cargs, Invocation(T_CS('\@XMRef'), $id)); } else { # Hidden arg, put XMArg in content. my $id = getXMArgID(); push(@cargs, Invocation(T_CS('\@XMArg'), $id, $arg)); push(@pargs, Invocation(T_CS('\@XMRef'), $id)); } } return ([@cargs], [@pargs]); } # Given a list of XML nodes (either libxml nodes, or array representations) # ensure each has an ID, and return a list of ltx:XMRef's to those nodes. # Note that ltx:XMHint nodes are ephemeral and shouldn't be ref'd! # likewise, we avoid creating XMRefs to XMRefs sub createXMRefs { my ($document, @args) = @_; my @refs = (); foreach my $arg (@args) { my $isarray = (ref $arg eq 'ARRAY'); my $qname = ($isarray ? $$arg[0] : $document->getNodeQName($arg)); my $box = ($isarray ? $$arg[1]{_box} : $document->getNodeBox($arg)); # XMHint's are ephemeral, they may disappear; so just clone it w/o id if ($qname eq 'ltx:XMHint') { my %attr = ($isarray ? %{ $$arg[1] } : (map { $_->nodeName => $_->getValue } $arg->attributes)); delete $attr{'xml:id'}; push(@refs, [$qname, {%attr}]); } # Likewise, clone an XMRef (w/o any attributes or id ?) rather than create an XMRef to an XMRef. elsif ($qname eq 'ltx:XMRef') { push(@refs, [$qname, { idref => $arg->getAttribute('idref'), _box => $box }]); } else { my $id = ($isarray ? $$arg[1]{'xml:id'} : $arg->getAttribute('xml:id')); if (!$id) { $id = ToString(getXMArgID()); if ($isarray) { $$arg[1]{'xml:id'} = $id; } else { $document->setAttribute($arg, 'xml:id' => $id); } } push(@refs, ['ltx:XMRef', { 'idref' => $id, _box => $box }]); } } return @refs; } # DefMath Define a Mathematical symbol or function. # There are two sets of cases: # (1) If the presentation appears to be TeX code, we create an XMDual, # since the presentation may end up with structure, etc. # (2) But if the presentation is a simple string, or unicode, # it is just the content of the symbol; even if the function takes arguments. # ALSO # arrange that the operator token gets cs="$cs" # ALSO # Possibly some trick with SUMOP/INTOP affecting limits ? # Well, not exactly, but.... # HMM.... Still fishy. # When to make a dual ? # If the $presentation seems to be TeX (ie. it involves #1... but not ONLY!) my $math_options = { # [CONSTANT] name => 1, meaning => 1, omcd => 1, reversion => 1, sizer => 1, alias => 1, role => 1, operator_role => 1, reorder => 1, dual => 1, mathstyle => 1, font => 1, scriptpos => 1, operator_scriptpos => 1, stretchy => 1, operator_stretchy => 1, beforeDigest => 1, afterDigest => 1, scope => 1, nogroup => 1, locked => 1 }; my $simpletoken_options = { # [CONSTANT] name => 1, meaning => 1, omcd => 1, role => 1, mathstyle => 1, font => 1, scriptpos => 1, scope => 1, locked => 1 }; sub DefMath { my ($proto, $presentation, %options) = @_; CheckOptions("DefMath ($proto)", $math_options, %options); DefMathI(parsePrototype($proto), $presentation, %options); return; } sub DefMathI { my ($cs, $paramlist, $presentation, %options) = @_; $cs = coerceCS($cs); # Can't defer parsing parameters since we need to know number of args! $paramlist = parseParameters($paramlist, $cs) if defined $paramlist && !ref $paramlist; my $nargs = ($paramlist ? scalar($paramlist->getParameters) : 0); my $csname = $cs->getString; my $meaning = $options{meaning}; my $name = $options{alias} || $csname; $name =~ s/^\\//; $name = $options{name} if defined $options{name}; $name = undef if (defined $name) && (($name eq $presentation) || ($name eq '') || ((defined $meaning) && ($meaning eq $name))); $options{name} = $name; $options{role} = 'UNKNOWN' if ($nargs == 0) && !defined $options{role}; $options{operator_role} = 'UNKNOWN' if ($nargs > 0) && !defined $options{operator_role}; # Store some data for introspection defmath_introspective($cs, $paramlist, $presentation, %options); # If single character, handle with a rewrite rule if (length($csname) == 1) { defmath_rewrite($cs, %options); } # If the presentation is complex, and involves arguments, # we will create an XMDual to separate content & presentation. elsif ((ref $presentation eq 'CODE') || ((ref $presentation) && grep { $_->equals(T_PARAM) } $presentation->unlist) || (!(ref $presentation) && ($presentation =~ /\#\d|\\./)) || ((ref $presentation) && (grep { $_->isExecutable } $presentation->unlist))) { defmath_dual($cs, $paramlist, $presentation, %options); } # EXPERIMENT: Introduce an intermediate case for simple symbols # Define a primitive that will create a Box with the appropriate set of XMTok attributes. elsif (($nargs == 0) && !grep { !$$simpletoken_options{$_} } keys %options) { defmath_prim($cs, $paramlist, $presentation, %options); } else { defmath_cons($cs, $paramlist, $presentation, %options); } AssignValue($csname . ":locked" => 1) if $options{locked}; return; } sub defmath_introspective { my ($cs, $paramlist, $presentation, %options) = @_; # Store some data for introspection [should be optional?] my $nargs = ($paramlist ? scalar($paramlist->getParameters) : 0); AssignValue(join("##", "math_definition", $cs->getString, $nargs, $options{role} || $options{operator_role} || '', $options{name} || '', (defined $options{meaning} ? $options{meaning} : ''), $STATE->getStomach->getGullet->getMouth->getSource, (ref $presentation ? '' : $presentation)) => 1, global => 1); return; } sub defmath_rewrite { my ($cs, %options) = @_; my $csname = $cs->getString; #### $STATE->assignMathcode($csname=>0x8000, $options{scope}); } # No, do NOT make mathactive; screws up things like babel french, or... ? # EXPERIMENT: store XMTok attributes for if this char ends up a Math Token. # But only some DefMath options make sense! my $rw_options = { name => 1, meaning => 1, omcd => 1, role => 1, mathstyle => 1, stretchy => 1 }; # (well, mathstyle?) CheckOptions("DefMath reimplemented as DefRewrite ($csname)", $rw_options, %options); AssignValue('math_token_attributes_' . $csname => {%options}, 'global'); return; } sub defmath_common_constructor_options { my ($cs, $presentation, %options) = @_; my $sizer = inferSizer($options{sizer}, $options{reversion}); return ( alias => $options{alias} || $cs->getString, (defined $options{reversion} ? (reversion => $options{reversion}) : ()), (defined $sizer ? (sizer => $sizer) : ()), beforeDigest => flatten(sub { requireMath($cs->getString); }, ($options{nogroup} ? () : (sub { $_[0]->bgroup; })), ($options{font} ? (sub { MergeFont(%{ $options{font} }); }) : ()), $options{beforeDigest}), afterDigest => flatten($options{afterDigest}, ($options{nogroup} ? () : (sub { $_[0]->egroup; }))), beforeConstruct => flatten($options{beforeConstruct}), afterConstruct => flatten($options{afterConstruct}), properties => { name => $options{name}, meaning => $options{meaning}, omcd => $options{omcd}, role => $options{role}, operator_role => $options{operator_role}, mathstyle => $options{mathstyle}, scriptpos => $options{scriptpos}, operator_scriptpos => $options{operator_scriptpos}, stretchy => $options{stretchy}, operator_stretchy => $options{operator_stretchy}, font => sub { LookupValue('font')->specialize($presentation); } }, scope => $options{scope}); } # If the presentation is complex, and involves arguments, # we will create an XMDual to separate content & presentation. # This involves creating 3 control sequences: # \cs macro that expands into \DUAL{pres}{content} # \cs@content constructor creates the content branch # \cs@presentation macro that expands into code in the presentation branch. # OK, this is getting a bit out-of-hand; I can't, myself, predict whether XMDual gets involved! # The basic distinction seems to be whether the arguments are explicitly involved # in the presentation form; # This excludes (at least?) (OVER|UNDER)ACCENT's sub defmath_dual { my ($cs, $paramlist, $presentation, %options) = @_; my $csname = $cs->getString; my $cont_cs = T_CS($csname . "\@content"); my $pres_cs = T_CS($csname . "\@presentation"); # Make the original CS expand into a DUAL invoking a presentation macro and content constructor $STATE->installDefinition(LaTeXML::Core::Definition::Expandable->new($cs, $paramlist, sub { my ($self, @args) = @_; my ($cargs, $pargs) = dualize_arglist($presentation, @args); Invocation(T_CS('\DUAL'), ($options{role} ? T_OTHER($options{role}) : undef), Invocation($cont_cs, @$cargs), Invocation($pres_cs, @$pargs))->unlist; }), $options{scope}); # Make the presentation macro. $presentation = TokenizeInternal($presentation) unless ref $presentation; $presentation = Invocation(T_CS('\@ASSERT@MEANING'), T_OTHER($options{meaning}), $presentation) if $options{meaning}; $STATE->installDefinition(LaTeXML::Core::Definition::Expandable->new($pres_cs, $paramlist, $presentation), $options{scope}); my $nargs = ($paramlist ? scalar($paramlist->getParameters) : 0); my $cons_attr = "name='#name' meaning='#meaning' omcd='#omcd' mathstyle='#mathstyle'"; $STATE->installDefinition(LaTeXML::Core::Definition::Constructor->new($cont_cs, $paramlist, ($nargs == 0 ? "" : "" . "" . join('', map { "#$_" } ($options{reorder} ? @{ $options{reorder} } : (1 .. $nargs))) . ""), defmath_common_constructor_options($cs, $presentation, %options)), $options{scope}); return; } sub defmath_prim { my ($cs, $paramlist, $presentation, %options) = @_; my $string = ToString($presentation); my $reqfont = $options{font} || {}; delete $options{locked}; delete $options{font}; $STATE->installDefinition(LaTeXML::Core::Definition::Primitive->new($cs, undef, sub { my ($stomach) = @_; my $locator = $stomach->getGullet->getLocator; my %properties = %options; my $font = LookupValue('font')->merge(%$reqfont)->specialize($string); foreach my $key (keys %properties) { my $value = $properties{$key}; if (ref $value eq 'CODE') { $properties{$key} = &$value(); } } LaTeXML::Core::Box->new($string, $font, $locator, $cs, mode => 'math', %properties); })); return; } sub defmath_cons { my ($cs, $paramlist, $presentation, %options) = @_; # do we need to do anything about digesting the presentation? my $end_tok = (defined $presentation ? '>' . ToString($presentation) . '' : "/>"); my $cons_attr = "name='#name' meaning='#meaning' omcd='#omcd' mathstyle='#mathstyle'"; my $nargs = ($paramlist ? scalar($paramlist->getParameters) : 0); $STATE->installDefinition(LaTeXML::Core::Definition::Constructor->new($cs, $paramlist, ($nargs == 0 # If trivial presentation, allow it in Text ? ($presentation !~ /(?:\(|\)|\\)/ ? "?#isMath(" . "#$_" } 1 .. $nargs) . ""), defmath_common_constructor_options($cs, $presentation, sizer => sub { # my $font = $_[1]->getFont || LaTeXML::Common::Font->mathDefault; my $font = LaTeXML::Common::Font->mathDefault; $font->computeStringSize($presentation); }, %options)), $options{scope}); return; } #====================================================================== # Define a LaTeX environment # Note that the body of the environment is treated is the 'body' parameter in the constructor. my $environment_options = { # [CONSTANT] mode => 1, requireMath => 1, forbidMath => 1, properties => 1, nargs => 1, font => 1, beforeDigest => 1, afterDigest => 1, afterDigestBegin => 1, beforeDigestEnd => 1, afterDigestBody => 1, beforeConstruct => 1, afterConstruct => 1, reversion => 1, sizer => 1, scope => 1, locked => 1 }; sub DefEnvironment { my ($proto, $replacement, %options) = @_; CheckOptions("DefEnvironment ($proto)", $environment_options, %options); ## $proto =~ s/^\{([^\}]+)\}\s*//; # Pull off the environment name as {name} ## my $paramlist=parseParameters($proto,"Environment $name"); ## my $name = $1; my ($name, $paramlist) = Text::Balanced::extract_bracketed($proto, '{}'); $name =~ s/[\{\}]//g; $paramlist =~ s/^\s*//; ## $paramlist = parseParameters($paramlist, "Environment $name"); DefEnvironmentI($name, $paramlist, $replacement, %options); return; } sub DefEnvironmentI { my ($name, $paramlist, $replacement, %options) = @_; my $mode = $options{mode}; $name = ToString($name) if ref $name; ## $paramlist = parseParameters($paramlist, $name) if defined $paramlist && !ref $paramlist; # This is for the common case where the environment is opened by \begin{env} my $sizer = inferSizer($options{sizer}, $options{reversion}); $STATE->installDefinition(LaTeXML::Core::Definition::Constructor ->new(T_CS("\\begin{$name}"), $paramlist, $replacement, beforeDigest => flatten(($options{requireMath} ? (sub { requireMath($name); }) : ()), ($options{forbidMath} ? (sub { forbidMath($name); }) : ()), ($mode ? (sub { $_[0]->beginMode($mode); }) : (sub { $_[0]->bgroup; })), sub { AssignValue(current_environment => $name); DefMacroI('\@currenvir', undef, $name); }, ($options{font} ? (sub { MergeFont(%{ $options{font} }); }) : ()), $options{beforeDigest}), afterDigest => flatten($options{afterDigestBegin}), afterDigestBody => flatten($options{afterDigestBody}), beforeConstruct => flatten(sub { $STATE->pushFrame; }, $options{beforeConstruct}), # Curiously, it's the \begin whose afterConstruct gets called. afterConstruct => flatten($options{afterConstruct}, sub { $STATE->popFrame; }), nargs => $options{nargs}, captureBody => 1, properties => $options{properties} || {}, (defined $options{reversion} ? (reversion => $options{reversion}) : ()), (defined $sizer ? (sizer => $sizer) : ()), ), $options{scope}); $STATE->installDefinition(LaTeXML::Core::Definition::Constructor ->new(T_CS("\\end{$name}"), "", "", beforeDigest => flatten($options{beforeDigestEnd}), afterDigest => flatten($options{afterDigest}, sub { my $env = LookupValue('current_environment'); Error('unexpected', "\\end{$name}", $_[0], "Can't close environment $name", "Current are " . join(', ', $STATE->lookupStackedValues('current_environment'))) unless $env && $name eq $env; return; }, ($mode ? (sub { $_[0]->endMode($mode); }) : (sub { $_[0]->egroup; }))), ), $options{scope}); # For the uncommon case opened by \csname env\endcsname $STATE->installDefinition(LaTeXML::Core::Definition::Constructor ->new(T_CS("\\$name"), $paramlist, $replacement, beforeDigest => flatten(($options{requireMath} ? (sub { requireMath($name); }) : ()), ($options{forbidMath} ? (sub { forbidMath($name); }) : ()), ($mode ? (sub { $_[0]->beginMode($mode); }) : ()), ($options{font} ? (sub { MergeFont(%{ $options{font} }); }) : ()), $options{beforeDigest}), afterDigest => flatten($options{afterDigestBegin}), afterDigestBody => flatten($options{afterDigestBody}), beforeConstruct => flatten(sub { $STATE->pushFrame; }, $options{beforeConstruct}), # Curiously, it's the \begin whose afterConstruct gets called. afterConstruct => flatten($options{afterConstruct}, sub { $STATE->popFrame; }), nargs => $options{nargs}, captureBody => T_CS("\\end$name"), # Required to capture!! properties => $options{properties} || {}, (defined $options{reversion} ? (reversion => $options{reversion}) : ()), (defined $sizer ? (sizer => $sizer) : ()), ), $options{scope}); $STATE->installDefinition(LaTeXML::Core::Definition::Constructor ->new(T_CS("\\end$name"), "", "", beforeDigest => flatten($options{beforeDigestEnd}), afterDigest => flatten($options{afterDigest}, ($mode ? (sub { $_[0]->endMode($mode); }) : ())), ), $options{scope}); if ($options{locked}) { AssignValue("\\begin{$name}:locked" => 1); AssignValue("\\end{$name}:locked" => 1); AssignValue("\\$name:locked" => 1); AssignValue("\\end$name:locked" => 1); } return; } #====================================================================== # Declaring and Adjusting the Document Model. #====================================================================== # Specify the properties of a Node tag. my $tag_options = { # [CONSTANT] autoOpen => 1, autoClose => 1, afterOpen => 1, afterClose => 1, 'afterOpen:early' => 1, 'afterClose:early' => 1, 'afterOpen:late' => 1, 'afterClose:late' => 1 }; my $tag_prepend_options = { # [CONSTANT] 'afterOpen:early' => 1, 'afterClose:early' => 1 }; my $tag_append_options = { # [CONSTANT] 'afterOpen' => 1, 'afterClose' => 1, 'afterOpen:late' => 1, 'afterClose:late' => 1 }; sub Tag { my ($tag, %properties) = @_; CheckOptions("Tag ($tag)", $tag_options, %properties); my $model = $STATE->getModel; AssignMapping('TAG_PROPERTIES', $tag => {}) unless LookupMapping('TAG_PROPERTIES', $tag); my $props = LookupMapping('TAG_PROPERTIES', $tag); foreach my $key (keys %properties) { my $new = $properties{$key}; my $old = $$props{$key}; # These keys accumulate information which should not carry over daemon frames. if ($$tag_prepend_options{$key}) { $new = flatten($new, $old); } elsif ($$tag_append_options{$key}) { $new = flatten($old, $new); } $$props{$key} = $new; } return; } sub DocType { my ($rootelement, $pubid, $sysid, %namespaces) = @_; my $model = $STATE->getModel; $model->setDocType($rootelement, $pubid, $sysid); foreach my $prefix (keys %namespaces) { $model->registerDocumentNamespace($prefix => $namespaces{$prefix}); } return; } # What verb here? Set, Choose,... sub RelaxNGSchema { my ($schema, %namespaces) = @_; my $model = $STATE->getModel; $model->setRelaxNGSchema($schema,); foreach my $prefix (keys %namespaces) { $model->registerDocumentNamespace($prefix => $namespaces{$prefix}); } return; } sub RegisterNamespace { my ($prefix, $namespace) = @_; $STATE->getModel->registerNamespace($prefix => $namespace); return; } sub RegisterDocumentNamespace { my ($prefix, $namespace) = @_; $STATE->getModel->registerDocumentNamespace($prefix => $namespace); return; } #====================================================================== # Package, Class and File Loading #====================================================================== # Does this test even make sense (or can it?) # Shouldn't this more likely be dependent on the context? # Ah, but what about \InputFileIfExists type stuff... # should we assume a raw type can be processed if being read from within a raw type???? # yeah, that sounds about right... my %definition_name = ( # [CONSTANT] sty => 'package', cls => 'class', clo => 'class options', 'cnf' => 'configuration', 'cfg' => 'configuration', 'ldf' => 'language definitions', 'def' => 'definitions', 'dfu' => 'definitions'); sub pathname_is_raw { my ($pathname) = @_; return ($pathname =~ /\.(tex|pool|sty|cls|clo|cnf|cfg|ldf|def|dfu)$/); } my $findfile_options = { # [CONSTANT] type => 1, notex => 1, noltxml => 1 }; sub FindFile { my ($file, %options) = @_; $file = ToString($file); if ($options{raw}) { delete $options{raw}; Warn('deprecated', 'raw', $STATE->getStomach->getGullet, "FindFile option raw is deprecated; it is not needed"); } CheckOptions("FindFile ($file)", $findfile_options, %options); if (pathname_is_literaldata($file)) { # If literal protocol return immediately (unless notex!) return ($options{notex} ? undef : $file); } # If a known special protocol return immediately elsif (pathname_is_literaldata($file) || pathname_is_url($file)) { return $file; } # Otherwise, it's some kind of "real" file, and we might have to search for it if ($options{type}) { # Specific type requested? Search for it. # Add the extension, if it isn't already there. $file = $file . "." . $options{type} unless $file =~ /\.\Q$options{type}\E$/; return FindFile_aux($file, %options); } # If no type given, we MAY expect .tex, or maybe NOT!! elsif ($file =~ /\.tex$/) { # No requested type, then .tex; Of course, it may already have it! return FindFile_aux($file, %options); } else { return FindFile_aux("$file.tex", %options) || FindFile_aux($file, %options); } } sub FindFile_aux { my ($file, %options) = @_; my $path; # If cached, return simple path (it's a key into the cache) if (defined LookupValue($file . '_contents')) { return $file; } if (pathname_is_absolute($file)) { # And if we've got an absolute path, return $file if -f $file; # No need to search, just check if it exists. return; } # otherwise we're never going to find it. elsif (pathname_is_nasty($file)) { # If it is a nasty filename, we won't touch it. return; } # we DO NOT want to pass this to kpathse or such! # Note that the strategy is complicated by the fact that # (1) we prefer .ltxml bindings, if present # (2) those MAY be present in kpsewhich's DB (although our searchpaths take precedence!) # (3) BUT we want to avoid kpsewhich if we can, since it's slower # (4) depending on switches we may EXCLUDE .ltxml OR raw tex OR allow both. my $paths = LookupValue('SEARCHPATHS'); my $urlbase = LookupValue('URLBASE'); my $nopaths = LookupValue('REMOTE_REQUEST'); my $ltxml_paths = $nopaths ? [] : $paths; # If we're looking for ltxml, look within our paths & installation first (faster than kpse) if (!$options{noltxml} && ($path = pathname_find("$file.ltxml", paths => $ltxml_paths, installation_subdir => 'Package'))) { return $path; } # If we're looking for TeX, look within our paths & installation first (faster than kpse) if (!$options{notex} && ($path = pathname_find($file, paths => $paths))) { return $path; } # Otherwise, pass on to kpsewhich # Depending on flags, maybe search for ltxml in texmf or for plain tex in ours! # The main point, though, is to we make only ONE (more) call. return if grep { pathname_is_nasty($_) } @$paths; # SECURITY! No nasty paths in cmdline # Do we need to sanitize these environment variables? my $kpsewhich = which($ENV{LATEXML_KPSEWHICH} || 'kpsewhich'); local $ENV{TEXINPUTS} = join($Config::Config{'path_sep'}, @$paths, $ENV{TEXINPUTS} || $Config::Config{'path_sep'}); my $candidates = join(' ', ((!$options{noltxml} && !$nopaths) ? ("$file.ltxml") : ()), (!$options{notex} ? ($file) : ())); if ($kpsewhich && (my $result = `"$kpsewhich" $candidates`)) { if ($result =~ /^\s*(.+?)\s*\n/s) { return $1; } } if ($urlbase && ($path = url_find($file, urlbase => $urlbase))) { return $path; } return; } sub pathname_is_nasty { my ($pathname) = @_; return $pathname =~ /[^\w\-_\+\=\/\\\.~\:]/; } sub maybeReportSearchPaths { if (LookupValue('SEARCHPATHS_REPORTED')) { return (); } else { AssignValue('SEARCHPATHS_REPORTED' => 1, 'global'); return ("search paths are " . join(', ', @{ LookupValue('SEARCHPATHS') })); } } my $inputcontent_options = { # [CONSTANT] noerror => 1, type => 1 }; sub InputContent { my ($request, %options) = @_; CheckOptions("InputContent ($request)", $inputcontent_options, %options); if (my $path = FindFile($request, type => $options{type}, noltxml => 1)) { loadTeXContent($path); } elsif (!$options{noerror}) { Error('missing_file', $request, $STATE->getStomach->getGullet, "Can't find TeX file $request", maybeReportSearchPaths()); } return; } # This is essentially the \input equivalent; # we are most likely expecting to get actual content, # (possibly with definitions included, as well) # but might actually be getting pure definitions, # (like a proper style file) # in which case we may really want to load a latexml binding. # Note that generic style files (non-latex) often have a .tex extension. # But we may have implemented a .sty.ltxml, so we override the .tex. # Is this actually safe, or should we be explicilty providing .tex.ltxml ? my $input_options = {}; # [CONSTANT] sub Input { my ($request, %options) = @_; $request = ToString($request); CheckOptions("Input ($request)", $input_options, %options); # HEURISTIC! First check if equivalent style file, but only under very specific circumstances if (pathname_is_literaldata($request)) { my ($dir, $name, $type) = pathname_split($request); my $file = $name; $file .= '.' . $type if $type; my $path; # Firstly, check if we are going to OVERRIDE the requested raw .tex file # with a latexml binding to a style file. if ((!$dir) && (!$type || ($type eq 'tex')) # No SPECIFIC directory, but a raw tex file. # AND, in preamble; SHOULD be style file, OR also if we can't find the raw file. && (LookupValue('inPreamble') || !FindFile($file)) && ($path = FindFile($name, type => 'sty', notex => 1))) { # AND there IS such a style file Info('ignore', $request, $STATE->getStomach->getGullet, "Ignoring input of tex $request, using package $name instead"); RequirePackage($name); # Then override, assuming we'll find $name as a package file! return; } } # Next special case: If we were currently reading a "known" style or binding file, # then this file, even if .tex, must also be definitions rather than content.!!(?) if (LookupValue('INTERPRETING_DEFINITIONS')) { InputDefinitions($request); } elsif (my $path = FindFile($request)) { # Found something plausible.. my $type = (pathname_is_literaldata($path) ? 'tex' : pathname_type($path)); # Should we be doing anything about options in the next 2 cases?..... I kinda think not, but? if ($type eq 'ltxml') { # it's a LaTeXML binding. loadLTXML($request, $path); } # Else some sort of "known" definitions type file, but not simply 'tex' elsif (($type ne 'tex') && (pathname_is_raw($path))) { loadTeXDefinitions($request, $path); } else { loadTeXContent($path); } } else { # Couldn't find anything? $STATE->noteStatus(missing => $request); Error('missing_file', $request, $STATE->getStomach->getGullet, "Can't find TeX file $request", maybeReportSearchPaths()); } return; } # Pass in the "requested path" to the next two, since that's what gets # recorded as having been loaded (by \@ifpackageloade, eg). sub loadLTXML { my ($request, $pathname) = @_; # Note: $type will typically be ltxml and $name will include the .sty, .cls or whatever. # Note: we're NOT expecting (allowing?) either literal nor remote data objects here. if (my $p = pathname_is_literaldata($pathname) || pathname_is_url($pathname)) { Error('misdefined', 'loadLTXML', $STATE->getStomach->getGullet, "You can't load LaTeXML binding using protocol $p"); return; } my ($dir, $name, $type) = pathname_split($pathname); # Don't load if the requested path was loaded (with or without the .ltxml) # We want to check against the original request, but WITH the type $request .= '.' . $type unless $request =~ /\Q.$type\E$/; # make sure the .ltxml is added here my $trequest = $request; $trequest =~ s/\.ltxml$//; # and NOT added here! return if LookupValue($request . '_loaded') || LookupValue($trequest . '_loaded'); # Note (only!) that the ltxml version of this was loaded; still could load raw tex! AssignValue($request . '_loaded' => 1, 'global'); $STATE->getStomach->getGullet->readingFromMouth(LaTeXML::Core::Mouth::Binding->new($pathname), sub { do $pathname; Fatal('die', $pathname, $STATE->getStomach->getGullet, "File $pathname had an error:\n $@") if $@; # If we've opened anything, we should read it in completely. # But we'll assume that anything opened has already been processed by loadTeXDefinitions. }); return; } sub loadTeXDefinitions { my ($request, $pathname) = @_; if (!pathname_is_literaldata($pathname)) { # We can't analyze literal data's pathnames! my ($dir, $name, $type) = pathname_split($pathname); # Don't load if we've already loaded it before. # Note that we'll still load it if we've already loaded only the ltxml version # since someone's presumably asking _explicitly_ for the raw TeX version. # It's probably even the ltxml version is asking for it!! # Of course, now it will be marked and wont get reloaded! return if LookupValue($request . '_loaded'); AssignValue($request . '_loaded' => 1, 'global'); } my $stomach = $STATE->getStomach; # Note that we are reading definitions (and recursive input is assumed also defintions) my $was_interpreting = LookupValue('INTERPRETING_DEFINITIONS'); # And that if we're interpreting this TeX file of definitions, # we probably should interpret any TeX files IT loads. my $was_including_styles = LookupValue('INCLUDE_STYLES'); AssignValue('INTERPRETING_DEFINITIONS' => 1); # If we're reading in these definitions, probaly will accept included ones? # (but not forbid ltxml ?) AssignValue('INCLUDE_STYLES' => 1); # When set, this variable allows redefinitions of locked defns. # It is set in before/after methods to allow local rebinding of commands # but loading of sources & bindings is typically done in before/after methods of constructors! # This re-locks defns during reading of TeX packages. local $LaTeXML::Core::State::UNLOCKED = 0; $stomach->getGullet->readingFromMouth( LaTeXML::Core::Mouth->create($pathname, fordefinitions => 1, notes => 1, content => LookupValue($pathname . '_contents')), sub { my ($gullet) = @_; my $token; while ($token = $gullet->readXToken(0)) { next if $token->equals(T_SPACE); $stomach->invokeToken($token); } }); AssignValue('INTERPRETING_DEFINITIONS' => $was_interpreting); AssignValue('INCLUDE_STYLES' => $was_including_styles); return; } sub loadTeXContent { my ($pathname) = @_; my $gullet = $STATE->getStomach->getGullet; # If there is a file-specific declaration file (name.latexml), load it first! my $file = $pathname; $file =~ s/\.tex//; if (my $conf = !pathname_is_literaldata($pathname) && pathname_find("$file.latexml", paths => LookupValue('SEARCHPATHS'))) { loadLTXML($conf, $conf); } $gullet->openMouth(LaTeXML::Core::Mouth->create($pathname, notes => 1, content => LookupValue($pathname . '_contents')), 0); return; } #====================================================================== # Option Handling for Packages and Classes # Declare an option for the current package or class # If $option is undef, it is the default. # $code can be a sub (as a primitive), or a string to be expanded. # (effectively a macro) sub DeclareOption { my ($option, $code) = @_; $option = ToString($option) if ref $option; PushValue('@declaredoptions', $option) if $option; my $cs = ($option ? '\ds@' . $option : '\default@ds'); # print STDERR "Declaring option: ".($option ? $option : '')."\n"; if ((!defined $code) || (ref $code eq 'CODE')) { DefPrimitiveI($cs, undef, $code); } else { DefMacroI($cs, undef, $code); } return; } # Pass the sequence of @options to the package $name (if $ext is 'sty'), # or class $name (if $ext is 'cls'). sub PassOptions { my ($name, $ext, @options) = @_; PushValue('opt@' . $name . '.' . $ext, map { ToString($_) } @options); # print STDERR "Passing to $name.$ext options: " . join(', ', @options) . "\n"; return; } # Process the options passed to the currently loading package or class. # If inorder=>1, they are processed in the order given (like \ProcessOptions*), # otherwise, they are processed in the order declared. # Unless noundefine=>1 (like for \ExecuteOptions), all option definitions # undefined after execution. my $processoptions_options = { # [CONSTANT] inorder => 1 }; sub ProcessOptions { my (%options) = @_; CheckOptions("ProcessOptions", $processoptions_options, %options); my $name = LookupDefinition(T_CS('\@currname')) && ToString(Digest(T_CS('\@currname'))); my $ext = LookupDefinition(T_CS('\@currext')) && ToString(Digest(T_CS('\@currext'))); my @declaredoptions = @{ LookupValue('@declaredoptions') }; my @curroptions = @{ (defined($name) && defined($ext) && LookupValue('opt@' . $name . '.' . $ext)) || [] }; my @classoptions = @{ LookupValue('class_options') || [] }; # print STDERR "\nProcessOptions for $name.$ext\n" # . " declared: " . join(',', @declaredoptions) . "\n" # . " provided: " . join(',', @curroptions) . "\n" # . " class: " . join(',', @classoptions) . "\n"; my $defaultcs = T_CS('\default@ds'); # Execute options in declared order (unless \ProcessOptions*) if ($options{inorder}) { # Execute options in the order passed in (eg. \ProcessOptions*) foreach my $option (@classoptions) { # process global options, but no error if (executeOption_internal($option)) { } elsif (executeDefaultOption_internal($option)) { } } foreach my $option (@curroptions) { if (executeOption_internal($option)) { } elsif (executeDefaultOption_internal($option)) { } } } else { # Execute options in declared order (eg. \ProcessOptions) foreach my $option (@declaredoptions) { if (grep { $option eq $_ } @curroptions, @classoptions) { @curroptions = grep { $option ne $_ } @curroptions; # Remove it, since it's been handled. executeOption_internal($option); } } # Now handle any remaining options (eg. default options), in the given order. foreach my $option (@curroptions) { executeDefaultOption_internal($option); } } # Now, undefine the handlers? foreach my $option (@declaredoptions) { Let('\ds@' . $option, '\relax'); } return; } sub executeOption_internal { my ($option) = @_; my $cs = T_CS('\ds@' . $option); if (LookupDefinition($cs)) { # print STDERR "\nPROCESS OPTION $option\n"; DefMacroI('\CurrentOption', undef, $option); AssignValue('@unusedoptionlist', [grep { $_ ne $option } @{ LookupValue('@unusedoptionlist') || [] }]); Digest($cs); return 1; } else { return; } } sub executeDefaultOption_internal { my ($option) = @_; # print STDERR "\nPROCESS DEFAULT OPTION $option\n"; # presumably should NOT remove from @unusedoptionlist ? DefMacroI('\CurrentOption', undef, $option); Digest(T_CS('\default@ds')); return 1; } sub ExecuteOptions { my (@options) = @_; my %unhandled = (); foreach my $option (@options) { if (executeOption_internal($option)) { } else { $unhandled{$option} = 1; } } foreach my $option (keys %unhandled) { Info('unexpected', $option, $STATE->getStomach->getGullet, "Unexpected options passed to ExecuteOptions '$option'"); } return; } sub resetOptions { AssignValue('@declaredoptions', []); Let('\default@ds', (ToString(Digest(T_CS('\@currext'))) eq 'cls' ? '\OptionNotUsed' : '\@unknownoptionerror')); return; } sub AddToMacro { my ($cs, @tokens) = @_; $cs = T_CS($cs) unless ref $cs; @tokens = map { (ref $_ ? $_ : TokenizeInternal($_)) } @tokens; # Needs error checking! my $defn = LookupDefinition($cs); if (!defined $defn || !$defn->isExpandable) { Error('unexpected', $cs, $STATE->getStomach->getGullet, ToString($cs) . " is not an expandable control sequence"); } else { DefMacroI($cs, undef, Tokens($defn->getExpansion->unlist, map { $_->unlist } map { (ref $_ ? $_ : TokenizeInternal($_)) } @tokens), scope => 'global'); } return; } #====================================================================== my $inputdefinitions_options = { # [CONSTANT] options => 1, withoptions => 1, handleoptions => 1, type => 1, as_class => 1, noltxml => 1, notex => 1, noerror => 1, after => 1 }; # options=>[options...] # withoptions=>boolean : pass options from calling class/package # after=>code or tokens or string as $name.$type-hook macro. (executed after the package is loaded) # Returns the path that was loaded, or undef, if none found. sub InputDefinitions { my ($name, %options) = @_; $name = ToString($name) if ref $name; $name =~ s/^\s*//; $name =~ s/\s*$//; CheckOptions("InputDefinitions ($name)", $inputdefinitions_options, %options); my $prevname = $options{handleoptions} && LookupDefinition(T_CS('\@currname')) && ToString(Digest(T_CS('\@currname'))); my $prevext = $options{handleoptions} && LookupDefinition(T_CS('\@currext')) && ToString(Digest(T_CS('\@currext'))); # This file will be treated somewhat as if it were a class # IF as_class is true # OR if it is loaded by such a class, and has withoptions true!!! (yikes) $options{as_class} = 1 if $options{handleoptions} && $options{withoptions} && grep { $prevname eq $_ } @{ LookupValue('@masquerading@as@class') || [] }; $options{raw} = 1 if $options{noltxml}; # so it will be read as raw by Gullet.!L! my $astype = ($options{as_class} ? 'cls' : $options{type}); my $filename = $name; $filename .= '.' . $options{type} if $options{type}; if (my $file = FindFile($filename, type => $options{type}, notex => $options{notex}, noltxml => $options{noltxml})) { if ($options{handleoptions}) { # For \RequirePackageWithOptions, pass the options from the outer class/style to the inner one. if (my $passoptions = $options{withoptions} && $prevname && LookupValue('opt@' . $prevname . "." . $prevext)) { # Only pass those class options that are declared by the package! my @declaredoptions = @{ LookupValue('@declaredoptions') }; my @topass = (); foreach my $op (@$passoptions) { push(@topass, $op) if grep { $op eq $_ } @declaredoptions; } PassOptions($name, $astype, @topass) if @topass; } DefMacroI('\@currname', undef, Tokens(Explode($name))); DefMacroI('\@currext', undef, Tokens(Explode($astype))); # reset options (Note reset & pass were in opposite order in LoadClass ????) resetOptions(); PassOptions($name, $astype, @{ $options{options} || [] }); # passed explicit options. # Note which packages are pretending to be classes. PushValue('@masquerading@as@class', $name) if $options{as_class}; DefMacroI(T_CS("\\$name.$astype-hook"), undef, $options{after} || ''); DefMacroI(T_CS('\opt@' . $name . '.' . $astype), undef, Tokens(Explode(join(',', @{ LookupValue('opt@' . $name . "." . $astype) })))); } my ($fdir, $fname, $ftype) = pathname_split($file); if ($ftype eq 'ltxml') { loadLTXML($filename, $file); } # Perl module. else { loadTeXDefinitions($filename, $file); } if ($options{handleoptions}) { Digest(T_CS("\\$name.$astype-hook")); DefMacroI('\@currname', undef, Tokens(Explode($prevname))) if $prevname; DefMacroI('\@currext', undef, Tokens(Explode($prevext))) if $prevext; # Add an appropriately faked entry into \@filelist my ($d, $n, $e) = ($fdir, $fname, $ftype); # If ftype is ltxml, reparse to get sty/cls! ($d, $n, $e) = pathname_split(pathname_concat($d, $n)) if $e eq 'ltxml'; # Fake it??? my @p = (LookupDefinition(T_CS('\@filelist')) ? Expand(T_CS('\@filelist'))->unlist : ()); my @n = Explode($e ? $n . '.' . $e : $n); DefMacroI('\@filelist', undef, (@p ? Tokens(@p, T_OTHER(','), @n) : Tokens(@n))); resetOptions(); } # And reset options afterwards, too. return $file; } elsif (!$options{noerror}) { $STATE->noteStatus(missing => $name . ($options{type} ? '.' . $options{type} : '')); Error('missing_file', $name, $STATE->getStomach->getGullet, "Can't find " . ($options{notex} ? "binding for " : "") . (($options{type} && $definition_name{ $options{type} }) || 'definitions') . ' ' . $name, maybeReportSearchPaths()); } return; } my $require_options = { # [CONSTANT] options => 1, withoptions => 1, type => 1, as_class => 1, noltxml => 1, notex => 1, raw => 1, after => 1 }; # This (& FindFile) needs to evolve a bit to support reading raw .sty (.def, etc) files from # the standard texmf directories. Maybe even use kpsewhich itself (INSTEAD of pathname_find ???) # Another potentially useful option might be that if we are reading a raw file, # perhaps it should just get digested immediately, since it shouldn't contribute any boxes. sub RequirePackage { my ($package, %options) = @_; $package = ToString($package) if ref $package; if ($options{raw}) { delete $options{raw}; $options{notex} = 0; Warn('deprecated', 'raw', $STATE->getStomach->getGullet, "RequirePackage option raw is obsolete; it is not needed"); } CheckOptions("RequirePackage ($package)", $require_options, %options); # We'll usually disallow raw TeX, unless the option explicitly given, or globally set. $options{notex} = 1 if !defined $options{notex} && !LookupValue('INCLUDE_STYLES') && !$options{noltxml}; InputDefinitions($package, type => $options{type} || 'sty', handleoptions => 1, # Pass classes options if we have NONE! withoptions => !($options{options} && @{ $options{options} }), %options); return; } my $loadclass_options = { # [CONSTANT] options => 1, withoptions => 1, after => 1 }; sub LoadClass { my ($class, %options) = @_; $class = ToString($class) if ref $class; CheckOptions("LoadClass ($class)", $loadclass_options, %options); # AssignValue(class_options => [$options{options} ? @{ $options{options} } : ()]); PushValue(class_options => ($options{options} ? @{ $options{options} } : ())); # Note that we'll handle errors specifically for this case. if (my $success = InputDefinitions($class, type => 'cls', notex => 1, handleoptions => 1, noerror => 1, %options)) { return $success; } else { $STATE->noteStatus(missing => $class . '.cls'); my $alternate = 'OmniBus'; # was 'article' Warn('missing_file', $class, $STATE->getStomach->getGullet, "Can't find binding for class $class (using $alternate)", maybeReportSearchPaths()); if (my $success = InputDefinitions($alternate, type => 'cls', noerror => 1, handleoptions => 1, %options)) { return $success; } else { Fatal('missing_file', $alternate . '.cls.ltxml', $STATE->getStomach->getGullet, "Can't find binding for class $alternate (installation error)"); return; } } } sub LoadPool { my ($pool) = @_; $pool = ToString($pool) if ref $pool; if (my $success = InputDefinitions($pool, type => 'pool', notex => 1, noerror => 1)) { return $success; } else { Fatal('missing_file', "$pool.pool.ltxml", $STATE->getStomach->getGullet, "Can't find binding for pool $pool (installation error)", maybeReportSearchPaths()); return; } } sub AtBeginDocument { my (@operations) = @_; AssignValue('@at@begin@document', []) unless LookupValue('@at@begin@document'); foreach my $op (@operations) { next unless $op; my $t = ref $op; if (!$t) { # Presumably String? $op = TokenizeInternal($op); } elsif ($t eq 'CODE') { my $tn = T_CS(ToString($op)); DefMacroI($tn, undef, $op); $op = $tn; } PushValue('@at@begin@document', $op->unlist); } return; } sub AtEndDocument { my (@operations) = @_; AssignValue('@at@end@document', []) unless LookupValue('@at@end@document'); foreach my $op (@operations) { next unless $op; my $t = ref $op; if (!$t) { # Presumably String? $op = TokenizeInternal($op); } elsif ($t eq 'CODE') { my $tn = T_CS(ToString($op)); DefMacroI($tn, undef, $op); $op = $tn; } PushValue('@at@end@document', $op->unlist); } return; } #====================================================================== # my $fontmap_options = { # [CONSTANT] family => 1 }; sub DeclareFontMap { my ($name, $map, %options) = @_; CheckOptions("DeclareFontMap", $fontmap_options, %options); my $mapname = ToString($name) . ($options{family} ? '_' . $options{family} : '') . '_fontmap'; AssignValue($mapname => $map, 'global'); return; } # Decode a codepoint using the fontmap for a given font and/or fontencoding. # If $encoding not provided, then lookup according to the current font's # encoding; the font family may also be used to choose the fontmap (think tt fonts!). # When $implicit is false, we are "explicitly" asking for a decoding, such as # with \char, \mathchar, \symbol, DeclareTextSymbol and such cases. # In such cases, only codepoints specifically within the map are covered; the rest are undef. # If $implicit is true, we'll decode token content that has made it to the stomach: # We're going to assume that SOME sort of handling of input encoding is taking place, # so that if anything above 128 comes in, it must already be Unicode!. # The lower half plane still needs to go through decoding, though, to deal # with TeX's rearrangement of ASCII... sub FontDecode { my ($code, $encoding, $implicit) = @_; return if !defined $code || ($code < 0); my ($map, $font); if (!$encoding) { $font = LookupValue('font'); $encoding = $font->getEncoding; } if ($encoding && ($map = LoadFontMap($encoding))) { # OK got some map. my ($family, $fmap); if ($font && ($family = $font->getFamily) && ($fmap = LookupValue($encoding . '_' . $family . '_fontmap'))) { $map = $fmap; } } # Use the family specific map, if any. if ($implicit) { if ($map && ($code < 128)) { return $$map[$code]; } else { return pack('U', $code); } } else { return ($map ? $$map[$code] : undef); } } sub FontDecodeString { my ($string, $encoding, $implicit) = @_; return if !defined $string; my ($map, $font); if (!$encoding) { $font = LookupValue('font'); $encoding = $font->getEncoding; } if ($encoding && ($map = LoadFontMap($encoding))) { # OK got some map. my ($family, $fmap); if ($font && ($family = $font->getFamily) && ($fmap = LookupValue($encoding . '_' . $family . '_fontmap'))) { $map = $fmap; } } # Use the family specific map, if any. return join('', grep { defined $_ } map { ($implicit ? (($map && ($_ < 128)) ? $$map[$_] : pack('U', $_)) : ($map ? $$map[$_] : undef)) } map { ord($_) } split(//, $string)); } sub LoadFontMap { my ($encoding) = @_; my $map = LookupValue($encoding . '_fontmap'); if (!$map && !LookupValue($encoding . '_fontmap_failed_to_load')) { AssignValue($encoding . '_fontmap_failed_to_load' => 1); # Stop recursion? RequirePackage(lc($encoding), type => 'fontmap'); if ($map = LookupValue($encoding . '_fontmap')) { # Got map? AssignValue($encoding . '_fontmap_failed_to_load' => 0); } else { AssignValue($encoding . '_fontmap_failed_to_load' => 1, 'global'); } } return $map; } #====================================================================== # Color sub LookupColor { my ($name) = @_; if (my $color = LookupValue('color_' . $name)) { return $color; } else { Error('undefined', $name, $STATE->getStomach, "color '$name' is undefined..."); return Black; } } sub DefColor { my ($name, $color, $scope) = @_; #print STDERR "DEFINE ".ToString($name)." => ".join(',',@$color)."\n"; my ($model, @spec) = @$color; $scope = 'global' if LookupDefinition(T_CS('\ifglobalcolors')) && IfCondition(T_CS('\ifglobalcolors')); AssignValue('color_' . $name => $color, $scope); # We could store these pieces separately,or in a list for above, # so that extract could use them more reasonably? # This is perhaps too xcolor specific? DefMacroI('\\\\color@' . $name, undef, '\relax\relax{' . join(' ', $model, @spec) . '}{' . $model . '}{' . join(',', @spec) . '}', scope => $scope); return; } # Need 3 things for Derived Models: # derivedfrom : the core model that this model is "derived from" # convertto : code to convert to the (a) core model # convertfrom : code to convert from the core model sub DefColorModel { my ($model, $coremodel, $tocore, $fromcore) = @_; AssignValue('derived_color_model_' . $model => [$coremodel, $tocore, $fromcore], 'global'); return; } #====================================================================== # Defining Rewrite rules that act on the DOM # These are applied after the document is completely constructed my $rewrite_options = { # [CONSTANT] label => 1, scope => 1, xpath => 1, match => 1, attributes => 1, replace => 1, regexp => 1, select => 1 }; sub DefRewrite { my (@specs) = @_; CheckOptions("DefRewrite", $rewrite_options, @specs); PushValue('DOCUMENT_REWRITE_RULES', LaTeXML::Core::Rewrite->new('text', processRewriteSpecs(0, @specs))); return; } sub DefMathRewrite { my (@specs) = @_; CheckOptions("DefMathRewrite", $rewrite_options, @specs); PushValue('DOCUMENT_REWRITE_RULES', LaTeXML::Core::Rewrite->new('math', processRewriteSpecs(1, @specs))); return; } sub processRewriteSpecs { my ($math, @specs) = @_; my @procspecs = (); my $delimiter = ($math ? '$' : ''); while (@specs) { my $k = shift(@specs); my $v = shift(@specs); # Make sure match & replace are (at least) tokenized if (($k eq 'match') || ($k eq 'replace')) { if (ref $v eq 'ARRAY') { $v = [map { (ref $_ ? $_ : Tokenize($delimiter . $_ . $delimiter)) } @$v]; } elsif (!ref $v) { $v = Tokenize($delimiter . $v . $delimiter); } } push(@procspecs, $k, $v); } return @procspecs; } #====================================================================== # Defining "Ligatures" rules that act on the DOM # These are actually a sort of rewrite that is applied while the doom # is being constructed, in particular as each node is closed. my $ligature_options = { # [CONSTANT] fontTest => 1 }; sub DefLigature { my ($regexp, $replacement, %options) = @_; CheckOptions("DefLigature", $ligature_options, %options); UnshiftValue('TEXT_LIGATURES', { regexp => $regexp, code => sub { $_[0] =~ s/$regexp/$replacement/g; $_[0]; }, %options }); return; } my $math_ligature_options = {}; # [CONSTANT] sub DefMathLigature { my ($matcher, %options) = @_; CheckOptions("DefMathLigature", $math_ligature_options, %options); UnshiftValue('MATH_LIGATURES', { matcher => $matcher, %options }); return; } #====================================================================== # Support for requiring "Resources", ie CSS, Javascript, whatever my $resource_options = { # [CONSTANT] type => 1, media => 1, content => 1 }; my $resource_types = { # [CONSTANT] css => 'text/css', js => 'text/javascript' }; sub RequireResource { my ($resource, %options) = @_; CheckOptions("RequireResource", $resource_options, %options); if (!$options{content} && !$resource) { Warn('expected', 'resource', undef, "Resource must have a resource pathname or content; skipping"); return; } if (!$options{type}) { my $ext = $resource && pathname_type($resource); $options{type} = $ext && $$resource_types{$ext}; } if (!$options{type}) { my $ext = $resource && pathname_type($resource); my $t = $ext && $$resource_types{$ext}; Warn('expected', 'type', undef, "Resource must have a mime-type; skipping"); return; } if ($LaTeXML::DOCUMENT) { # If we've got a document, go ahead & put the resource in. addResource($LaTeXML::DOCUMENT, $resource, %options); } else { AssignValue(PENDING_RESOURCES => [], 'global') unless LookupValue('PENDING_RESOURCES'); PushValue(PENDING_RESOURCES => [$resource, %options]); } return; } # No checking... sub addResource { my ($document, $resource, %options) = @_; my $savenode = $document->floatToElement('ltx:resource'); $document->insertElement('ltx:resource', $options{content}, src => $resource, type => $options{type}, media => $options{media}); $document->setNode($savenode) if $savenode; return; } sub ProcessPendingResources { my ($document) = @_; if (my $req = LookupValue('PENDING_RESOURCES')) { map { addResource($document, @$_) } @$req; AssignValue(PENDING_RESOURCES => [], 'global'); } return; } #********************************************************************** 1; __END__ =pod =head1 NAME C - Support for package implementations and document customization. =head1 SYNOPSIS This package defines and exports most of the procedures users will need to customize or extend LaTeXML. The LaTeXML implementation of some package might look something like the following, but see the installed C directory for realistic examples. package LaTeXML::Package::pool; # to put new subs & variables in common pool use LaTeXML::Package; # to load these definitions use strict; # good style use warnings; # # Load "anotherpackage" RequirePackage('anotherpackage'); # # A simple macro, just like in TeX DefMacro('\thesection', '\thechapter.\roman{section}'); # # A constructor defines how a control sequence generates XML: DefConstructor('\thanks{}', "#1"); # # And a simple environment ... DefEnvironment('{abstract}','#body'); # # A math symbol \Real to stand for the Reals: DefMath('\Real', "\x{211D}", role=>'ID'); # # Or a semantic floor: DefMath('\floor{}','\left\lfloor#1\right\rfloor'); # # More esoteric ... # Use a RelaxNG schema RelaxNGSchema("MySchema"); # Or use a special DocType if you have to: # DocType("rootelement", # "-//Your Site//Your DocType",'your.dtd', # prefix=>"http://whatever/"); # # Allow sometag elements to be automatically closed if needed Tag('prefix:sometag', autoClose=>1); # # Don't forget this, so perl knows the package loaded. 1; =head1 DESCRIPTION This module provides a large set of utilities and declarations that are useful for writing `bindings': LaTeXML-specific implementations of a set of control sequences such as would be defined in a LaTeX style or class file. They are also useful for controlling and customization of LaTeXML's processing. See the L section, below, for additional lower-level modules imported & re-exported. To a limited extent (and currently only when explicitly enabled), LaTeXML can process the raw TeX code found in style files. However, to preserve document structure and semantics, as well as for efficiency, it is usually necessary to supply a LaTeXML-specific `binding' for style and class files. For example, a binding C would encode LaTeXML-specific implementations of all the control sequences in C so that C<\usepackage{mypackage}> would work. Similarly for C. Additionally, document-specific bindings can be supplied: before processing a TeX source file, eg C, LaTeXML will automatically include the definitions and settings in C. These C<.ltxml> and C<.latexml> files should be placed LaTeXML's searchpaths, where will find them: either in the current directory or in a directory given to the --path option, or possibly added to the variable SEARCHPATHS). Since LaTeXML mimics TeX, a familiarity with TeX's processing model is critical. LaTeXML models: catcodes and tokens (See L, L) which are extracted from the plain source text characters by the L; L, which are expanded within the L; and L, which are digested within the L to produce L, L. A key additional feature is the L: when digested they generate a L which, upon absorbtion by L, inserts text or XML fragments in the final document tree. I Many of the following forms take code references as arguments or options. That is, either a reference to a defined sub, eg. C<\&somesub>, or an anonymous function C. To document these cases, and the arguments that are passed in each case, we'll use a notation like C($stomach,...)>. =head2 Control Sequences Many of the following forms define the behaviour of control sequences. While in TeX you'll typically only define macros, LaTeXML is effectively redefining TeX itself, so we define L as well as L, L, L and L. These define the behaviour of these control sequences when processed during the various phases of LaTeX's imitation of TeX's digestive tract. =head3 Prototypes LaTeXML uses a more convienient method of specifying parameter patterns for control sequences. The first argument to each of these defining forms (C, C, etc) is a I consisting of the control sequence being defined along with the specification of parameters required by the control sequence. Each parameter describes how to parse tokens following the control sequence into arguments or how to delimit them. To simplify coding and capture common idioms in TeX/LaTeX programming, latexml's parameter specifications are more expressive than TeX's C<\def> or LaTeX's C<\newcommand>. Examples of the prototypes for familiar TeX or LaTeX control sequences are: DefConstructor('\usepackage[]{}',... DefPrimitive('\multiply Variable SkipKeyword:by Number',.. DefPrimitive('\newcommand OptionalMatch:* DefToken[]{}', ... The general syntax for parameter specification is =over 4 =item C<{I}> reads a regular TeX argument. I can be omitted (ie. C<{}>). Otherwise I is itself a parameter specification and the argument is reparsed to accordingly. (C<{}> is a shorthand for C.) =item C<[I]> reads an LaTeX-style optional argument. I can be omitted (ie. C<{}>). Otherwise, if I is of the form Default:stuff, then stuff would be the default value. Otherwise I is itself a parameter specification and the argument, if supplied, is reparsed according to that specification. (C<[]> is a shorthand for C.) =item I Reads an argument of the given type, where either Type has been declared, or there exists a ReadType function accessible from LaTeXML::Package::Pool. See the available types, below. =item C:I | I:I:I...> These forms invoke the parser for I but pass additional Tokens to the reader function. Typically this would supply defaults or parameters to a match. =item C> Similar to I, but it is not considered an error if the reader returns undef. =item C> Similar to CI, but the value returned from the reader is ignored, and does not occupy a position in the arguments list. =back The predefined argument Is are as follows. =over 4 =item C XX Reads a standard TeX argument being either the next token, or if the next token is an {, the balanced token list. In the case of C, many catcodes are disabled, which is handy for URL's, labels and similar. =item C XX Read a single TeX Token. For C, if the next token is expandable, it is repeatedly expanded until an unexpandable token remains, which is returned. =item C XXXX Read an Object corresponding to Number, Dimension, Glue or MuGlue, using TeX's rules for parsing these objects. =item C | XUntil:>I> XX Reads tokens until a match to the tokens I is found, returning the tokens preceding the match. This corresponds to TeX delimited arguments. For C, tokens are expanded as they are matched and accumulated. =item C X Reads tokens until the next open brace C<{>. This corresponds to the peculiar TeX construct C<\def\foo#{...>. =item C | Keyword:>I> XX Reads tokens expecting a match to one of the token lists I, returning the one that matches, or undef. For C, case and catcode of the I are ignored. Additionally, any leading spaces are skipped. =item C X Read tokens until a closing }, but respecting nested {} pairs. =item C X Read a parenthesis delimited tokens, but does I balance any nested parentheses. =item C> XX These types alter the usual sequence of tokenization and digestion in separate stages (like TeX). A C parameter inhibits digestion completely and remains in token form. A C parameter gets digested until the (required) opening { is balanced; this is useful when the content would usually need to have been protected in order to correctly deal with catcodes. C digests tokens until a token matching I is found. =item C X Reads a token, expanding if necessary, and expects a control sequence naming a writable register. If such is found, it returns an array of the corresponding definition object, and any arguments required by that definition. =item C XX Skips one, or any number of, space tokens, if present, but contributes nothing to the argument list. =back =head3 Common Options =over =item C'local' | 'global' | I> Most defining commands accept an option to control how the definition is stored, for global or local definitions, or using a named I A named scope saves a set of definitions and values that can be activated at a later time. Particularly interesting forms of scope are those that get automatically activated upon changes of counter and label. For example, definitions that have C'section:1.1'> will be activated when the section number is "1.1", and will be deactivated when that section ends. =item CI> This option controls whether this definition is locked from further changes in the TeX sources; this keeps local 'customizations' by an author from overriding important LaTeXML definitions and breaking the conversion. =back =head3 Macros =over 4 =item C, I, I<%options>);> X Defines the macro expansion for I; a macro control sequence that is expanded during macro expansion time in the L. The I should be one of I | I | I($gullet,@args)>: a I will be tokenized upon first usage. Any macro arguments will be substituted for parameter indicators (eg #1) in the I or tokenized I and the result is used as the expansion of the control sequence. If I is used, it is called at expansion time and should return a list of tokens as its result. DefMacro options are =over 4 =item CI>, =item CI> See L. =item CI> specifies a definition that will only be expanded in math mode; the control sequence must be a single character. =back Examples: DefMacro('\thefootnote','\arabic{footnote}'); DefMacro('\today',sub { ExplodeText(today()); }); =item C, I, I, I<%options>);> X Internal form of C where the control sequence and parameter list have already been separated; useful for definitions from within code. Also, slightly more efficient for macros with no arguments (use C for I), and useful for obscure cases like defining C<\begin{something*}> as a Macro. =back =head3 Conditionals =over 4 =item C, I, I<%options>);> X Defines a conditional for I; a control sequence that is processed during macro expansion time (in the L). A conditional corresponds to a TeX C<\if>. If the I is C, a C<\newif> type of conditional is defined, which is controlled with control sequences like C<\footrue> and C<\foofalse>. Otherwise the I should be C($gullet,@args)> (with the control sequence's arguments) that is called at expand time to determine the condition. Depending on whether the result of that evaluation returns a true or false value (in the usual Perl sense), the result of the expansion is either the first or else code following, in the usual TeX sense. DefConditional options are =over 4 =item CI>, =item CI> See L. =item CI($gullet)> This option is I used to define C<\ifcase>. =back Example: DefConditional('\ifmmode',sub { LookupValue('IN_MATH'); }); =item C, I, I, I<%options>);> X Internal form of C where the control sequence and parameter list have already been parsed; useful for definitions from within code. Also, slightly more efficient for conditinal with no arguments (use C for C). =item C,I<@args>)> X C allows you to test a conditional from within perl. Thus something like C might be equivalent to TeX's C<\ifmmode domath \else dotext \fi>. =back =head3 Primitives =over 4 =item C, I, I<%options>);> X Defines a primitive control sequence; a primitive is processed during digestion (in the L), after macro expansion but before Construction time. Primitive control sequences generate Boxes or Lists, generally containing basic Unicode content, rather than structured XML. Primitive control sequences are also executed for side effect during digestion, effecting changes to the L. The I can be a string used as the text content of a Box to be created (using the current font). Alternatively I can be C($stomach,@args)> (with the control sequence's arguments) which is invoked at digestion time, probably for side-effect, but returning Boxes or Lists or nothing. I may also be undef, which contributes nothing to the document, but does record the TeX code that created it. DefPrimitive options are =over 4 =item CI>, =item CI> See L. =item C ('text' | 'display_math' | 'inline_math')> Changes to this mode during digestion. =item C{I<%fontspec>}> Specifies the font to use (see L). If the font change is to only apply to material generated within this command, you would also use C<1>>; otherwise, the font will remain in effect afterwards as for a font switching command. =item CI> If true, TeX grouping (ie. C<{}>) is enforced around this invocation. =item CI>, =item CI> specifies whether the given constructor can I appear, or I appear, in math mode. =item CI($stomach)> supplies a hook to execute during digestion just before the main part of the primitive is executed (and before any arguments have been read). The I should either return nothing (return;) or a list of digested items (Box's,List,Whatsit). It can thus change the State and/or add to the digested output. =item CI($stomach)> supplies a hook to execute during digestion just after the main part of the primitive ie executed. it should either return nothing (return;) or digested items. It can thus change the State and/or add to the digested output. =item CI> indicates whether this is a prefix type of command; This is only used for the special TeX assignment prefixes, like C<\global>. =back Example: DefPrimitive('\begingroup',sub { $_[0]->begingroup; }); =item C, I, I($stomach,@args), I<%options>);> X Internal form of C where the control sequence and parameter list have already been separated; useful for definitions from within code. =back =head3 Registers =over =item C, I, I<%options>);> X Defines a register with I as the initial value (a Number, Dimension, Glue, MuGlue or Tokens --- I haven't handled Box's yet). Usually, the I is just the control sequence, but registers are also handled by prototypes like C<\count{Number}>. C arranges that the register value can be accessed when a numeric, dimension, ... value is being read, and also defines the control sequence for assignment. Options are =over 4 =item CI> specifies if it is not allowed to change this value. =item CI(@args)>, =item CI($value,@args)> By default I is stored in the State's Value table under a name concatenating the control sequence and argument values. These options allow other means of fetching and storing the value. =back Example: DefRegister('\pretolerance',Number(100)); =item C, I, I, I<%options>);> X Internal form of C where the control sequence and parameter list have already been parsed; useful for definitions from within code. =back =head3 Constructors =over 4 =item C, I<$replacement>, I<%options>);> X The Constructor is where LaTeXML really starts getting interesting; invoking the control sequence will generate an arbitrary XML fragment in the document tree. More specifically: during digestion, the arguments will be read and digested, creating a L to represent the object. During absorbtion by the L, the C will generate the XML fragment according to I. The I can be C($document,@args,%properties)> which is called during document absorbtion to create the appropriate XML (See the methods of L). More conveniently, I can be an pattern: simply a bit of XML as a string with certain substitutions to be made. The substitutions are of the following forms: =over 4 =item C<#1, #2 ... #name> These are replaced by the corresponding argument (for #1) or property (for #name) stored with the Whatsit. Each are turned into a string when it appears as in an attribute position, or recursively processed when it appears as content. =item C<&I(@args)> Another form of substituted value is prefixed with C<&> which invokes a function. For example, C< &func(#1) > would invoke the function C on the first argument to the control sequence; what it returns will be inserted into the document. =item C(I)> or C(I)(I)> Patterns can be conditionallized using this form. The I is any of the above expressions (eg. C<#1>), considered true if the result is non-empty. Thus C<< ?#1() >> would add the empty element C if the first argument were given. =item C<^> If the constuctor I with C<^>, the XML fragment is allowed to I to a parent node that is allowed to contain it, according to the Document Type. =back The Whatsit property C is defined by default. Additional properties C and C are defined when C is true, or for environments. By using C<< $whatsit->setProperty(key=>$value); >> within C, or by using the C option, other properties can be added. DefConstructor options are =over 4 =item CI>, =item CI> See L. =item CI>, =item C{I<%fontspec>}>, =item CI>, =item CI>, =item CI> These options are the same as for L =item CI | I($whatsit,#1,#2,...)> specifies the reversion of the invocation back into TeX tokens (if the default reversion is not appropriate). The I string can include C<#1>, C<#2>... The I is called with the C<$whatsit> and digested arguments and must return a list of Token's. =item CI> provides a control sequence to be used in the C instead of the one defined in the C. This is a convenient alternative for reversion when a 'public' command conditionally expands into an internal one, but the reversion should be for the public command. =item CI | I($whatsit)> specifies how to compute (approximate) the displayed size of the object, if that size is ever needed (typically needed for graphics generation). If a string is given, it should contain only a sequence of C<#1> or C<#name> to access arguments and properties of the Whatsit: the size is computed from these items layed out side-by-side. If I is given, it should return the three Dimensions (width, height and depth). If neither is given, and the C specification is of suitible format, it will be used for the sizer. =item C{I<%properties>} | I($stomach,#1,#2...)> supplies additional properties to be set on the generated Whatsit. In the first form, the values can be of any type, but if a value is a code references, it takes the same args ($stomach,#1,#2,...) and should return the value; it is executed before creating the Whatsit. In the second form, the code should return a hash of properties. =item CI($stomach)> supplies a hook to execute during digestion just before the Whatsit is created. The I should either return nothing (return;) or a list of digested items (Box's,List,Whatsit). It can thus change the State and/or add to the digested output. =item CI($stomach,$whatsit)> supplies a hook to execute during digestion just after the Whatsit is created (and so the Whatsit already has its arguments and properties). It should either return nothing (return;) or digested items. It can thus change the State, modify the Whatsit, and/or add to the digested output. =item CI($document,$whatsit)> supplies a hook to execute before constructing the XML (generated by I). =item CI($document,$whatsit)> Supplies I to execute after constructing the XML. =item CI | I> if true, arbitrary following material will be accumulated into a `body' until the current grouping level is reverted, or till the C is encountered if the option is a C. This body is available as the C property of the Whatsit. This is used by environments and math. =item CI> This gives a number of args for cases where it can't be infered directly from the I (eg. when more args are explictly read by hooks). =back =item C, I, I, I<%options>);> X Internal form of C where the control sequence and parameter list have already been separated; useful for definitions from within code. =item C, I, I<%options>);> X A common shorthand constructor; it defines a control sequence that creates a mathematical object, such as a symbol, function or operator application. The options given can effectively create semantic macros that contribute to the eventual parsing of mathematical content. In particular, it generates an XMDual using the replacement I for the presentation. The content information is drawn from the name and options C accepts the options: =over 4 =item CI>, =item CI> See L. =item C{I<%fontspec>}>, =item CI>, =item CI>, =item CI>, =item CI>, =item CI($stomach)>, =item CI($stomach,$whatsit)>, These options are the same as for L =item CI> gives a name attribute for the object =item CI> gives the OpenMath content dictionary that name is from. =item CI> adds a grammatical role attribute to the object; this specifies the grammatical role that the object plays in surrounding expressions. This direly needs documentation! =item C('display' | 'text' | 'inline')> Controls whether the this object will be presented in a specific mathstyle, or according to the current setting of C. =item C('mid' | 'post')> Controls the positioning of any sub and super-scripts relative to this object; whether they be stacked over or under it, or whether they will appear in the usual position. TeX.pool defines a function C which is useful for operators like C<\sum> in that it sets to C position when in displaystyle, otherwise C. =item CI> Whether or not the object is stretchy when displayed. =item CI>, =item CI>, =item CI> These three are similar to C, C and C, but are used in unusual cases. These apply to the given attributes to the operator token in the content branch. =item CI> Normally, these commands are digested with an implicit grouping around them, localizing changes to fonts, etc; C<< noggroup=>1 >> inhibits this. =back Example: DefMath('\infty',"\x{221E}", role=>'ID', meaning=>'infinity'); =item C, I, I, I<%options>);> X Internal form of C where the control sequence and parameter list have already been separated; useful for definitions from within code. =back =head3 Environments =over =item C, I, I<%options>);> X Defines an Environment that generates a specific XML fragment. C is of the same form as for DefConstructor, but will generally include reference to the C<#body> property. Upon encountering a C<\begin{env}>: the mode is switched, if needed, else a new group is opened; then the environment name is noted; the beforeDigest hook is run. Then the Whatsit representing the begin command (but ultimately the whole environment) is created and the afterDigestBegin hook is run. Next, the body will be digested and collected until the balancing C<\end{env}>. Then, any afterDigest hook is run, the environment is ended, finally the mode is ended or the group is closed. The body and C<\end{env}> whatsit are added to the C<\begin{env}>'s whatsit as body and trailer, respectively. C takes the following options: =over 4 =item CI>, =item CI> See L. =item CI>, =item C{I<%fontspec>}> =item CI>, =item CI>, These options are the same as for L =item CI>, =item CI>, =item CI>, =item CI>, =item CI> These options are the same as for L =item CI($stomach)> This hook is similar to that for C, but it applies to the C<\begin{environment}> control sequence. =item CI($stomach,$whatsit)> This hook is similar to C's C but it applies to the C<\begin{environment}> control sequence. The Whatsit is the one for the begining control sequence, but represents the environment as a whole. Note that although the arguments and properties are present in the Whatsit, the body of the environment is I yet available! =item CI($stomach)> This hook is similar to C's C but it applies to the C<\end{environment}> control sequence. =item CI($stomach,$whatsit)> This hook is simlar to C's C but it applies to the C<\end{environment}> control sequence. Note, however that the Whatsit is only for the ending control sequence, I the Whatsit for the environment as a whole. =item CI($stomach,$whatsit)> This option supplies a hook to be executed during digestion after the ending control sequence has been digested (and all the 4 other digestion hook have executed) and after the body of the environment has been obtained. The Whatsit is the (usefull) one representing the whole environment, and it now does have the body and trailer available, stored as a properties. =back Example: DefConstructor('\emph{}', "#1'text'); =item C, I, I, I<%options>);> X Internal form of C where the control sequence and parameter list have already been separated; useful for definitions from within code. =back =head2 Inputing Content and Definitions =over 4 =item C, I<%options>);> X Find an appropriate file with the given I in the current directories in C. If a file ending with C<.ltxml> is found, it will be preferred. Note that if the C starts with a recognized I (currently one of C<(literal|http|https|ftp)>) followed by a colon, the name is returned, as is, and no search for files is carried out. The options are: =over 4 =item CI> specifies the file type. If not set, it will search for both C.tex> and I. =item C1> inhibits searching for a LaTeXML binding (C.I.ltxml>) to use instead of the file itself. =item C1> inhibits searching for raw tex version of the file. That is, it will I search for the LaTeXML binding. =back =item C, I<%options>);> X C is used for cases when the file (or data) is plain TeX material that is expected to contribute content to the document (as opposed to pure definitions). A Mouth is opened onto the file, and subsequent reading and/or digestion will pull Tokens from that Mouth until it is exhausted, or closed. In some circumstances it may be useful to provide a string containing the TeX material explicitly, rather than referencing a file. In this case, the C pseudo-protocal may be used: InputContent('literal:\textit{Hey}'); If a file named C<$request.latexml> exists, it will be read in as if it were a latexml binding file, before processing. This can be used for adhoc customization of the conversion of specific files, without modifying the source, or creating more elaborate bindings. The only option to C is: =over 4 =item CI> Inhibits signalling an error if no appropriate file is found. =back =item C);> X C is analogous to LaTeX's C<\input>, and is used in cases where it isn't completely clear whether content or definitions is expected. Once a file is found, the approach specified by C or C is used, depending on which type of file is found. =item C, I<%options>);> X C is used for loading I, ie. various macros, settings, etc, rather than document content; it can be used to load LaTeXML's binding files, or for reading in raw TeX definitions or style files. It reads and processes the material completely before returning, even in the case of TeX definitions. This procedure optionally supports the conventions used for standard LaTeX packages and classes (see C and C). Options for C are: =over =item CI> the file type to search for. =item CI> inhibits searching for a LaTeXML binding; only raw TeX files will be sought and loaded. =item CI> inhibits searching for raw TeX files, only a LaTeXML binding will be sought and loaded. =item CI> inhibits reporting an error if no appropriate file is found. =back The following options are primarily useful when C is supporting standard LaTeX package and class loading. =over =item CI> indicates whether to pass in any options from the calling class or package. =item CI> indicates whether options processing should be handled. =item C[...]> specifies a list of options (in the 'package options' sense) to be passed (possibly in addition to any provided by the calling class or package). =item CI | I($gullet)> provides I or I to be processed by a C.I-hook> macro. =item CI> fishy option that indicates that this definitions file should be treated as if it were defining a class; typically shows up in latex compatibility mode, or AMSTeX. =back A handy method to use most of the TeX distribution's raw TeX definitions for a package, but override only a few with LaTeXML bindings is by defining a binding file, say C, to contain InputDefinitions('tikz', type => 'sty', noltxml => 1); which would find and read in C, and then follow it by a couple of strategic LaTeXML definitions, C, etc. =back =head2 Class and Packages =over =item C, I<%options>);> X Finds and loads a package implementation (usually C.sty.ltxml>, unless C is specified)for the requested I. It returns the pathname of the loaded package. The options are: =over =item CI> specifies the file type (default C. =item C[...]> specifies a list of package options. =item CI> inhibits searching for the LaTeXML binding for the file (ie. C.I.ltxml> =item C1> inhibits searching for raw tex version of the file. That is, it will I search for the LaTeXML binding. =back =item C, I<%options>);> X Finds and loads a class definition (usually C.cls.ltxml>). It returns the pathname of the loaded class. The only option is =over =item C[...]> specifies a list of class options. =back =item C, I<%options>);> X Loads a I file (usually C.pool.ltxml>), one of the top-level definition files, such as TeX, LaTeX or AMSTeX. It returns the pathname of the loaded file. =item C, I | I | I($stomach));> X Declares an option for the current package or class. The 2nd argument can be a I (which will be tokenized and expanded) or I (which will be macro expanded), to provide the value for the option, or it can be a code reference which is treated as a primitive for side-effect. If a package or class wants to accomodate options, it should start with one or more C, followed by C. =item C, I, I<@options>); >> X Causes the given I<@options> (strings) to be passed to the package (if I is C) or class (if I is C) named by I. =item C);> X Processes the options that have been passed to the current package or class in a fashion similar to LaTeX. The only option (to C is CI> indicating whehter the (package) options are processed in the order they were used, like C. =item C);> X Process the options given explicitly in I<@options>. =item C); >> X Arranges for I<@stuff> to be carried out after the preamble, at the beginning of the document. I<@stuff> should typically be macro-level stuff, but carried out for side effect; it should be tokens, tokens lists, strings (which will be tokenized), or C($gullet)> which would yeild tokens to be expanded. This operation is useful for style files loaded with C<--preload> or document specific customization files (ie. ending with C<.latexml>); normally the contents would be executed before LaTeX and other style files are loaded and thus can be overridden by them. By deferring the evaluation to begin-document time, these contents can override those style files. This is likely to only be meaningful for LaTeX documents. =item C)> Arranges for I<@stuff> to be carried out just before C<\\end{document}>. These tokens can be used for side effect, or any content they generate will appear as the last children of the document. =back =head2 Counters and IDs =over 4 =item C, I, I<%options>);> X Defines a new counter, like LaTeX's \newcounter, but extended. It defines a counter that can be used to generate reference numbers, and defines C<\theI>, etc. It also defines an "uncounter" which can be used to generate ID's (xml:id) for unnumbered objects. I is the name of the counter. If defined, I is the name of another counter which, when incremented, will cause this counter to be reset. The options are =over =item CI> Specifies a prefix to be used to generate ID's when using this counter =item C Not sure that this is even sane. =back =item C<< $num = CounterValue($ctr); >> X Fetches the value associated with the counter C<$ctr>. =item C<< $tokens = StepCounter($ctr); >> X Analog of C<\stepcounter>, steps the counter and returns the expansion of C<\the$ctr>. Usually you should use C instead. =item C<< $keys = RefStepCounter($ctr); >> X Analog of C<\refstepcounter>, steps the counter and returns a hash containing the keys C$refnum, id=>$id>. This makes it suitable for use in a C option to constructors. The C is generated in parallel with the reference number to assist debugging. =item C<< $keys = RefStepID($ctr); >> X Like to C, but only steps the "uncounter", and returns only the id; This is useful for unnumbered cases of objects that normally get both a refnum and id. =item C<< ResetCounter($ctr); >> X Resets the counter C<$ctr> to zero. =item C<< GenerateID($document,$node,$whatsit,$prefix); >> X Generates an ID for nodes during the construction phase, useful for cases where the counter based scheme is inappropriate. The calling pattern makes it appropriate for use in Tag, as in Tag('ltx:para',afterClose=>sub { GenerateID(@_,'p'); }) If C<$node> doesn't already have an xml:id set, it computes an appropriate id by concatenating the xml:id of the closest ancestor with an id (if any), the prefix (if any) and a unique counter. =back =head2 Document Model Constructors define how TeX markup will generate XML fragments, but the Document Model is used to control exactly how those fragments are assembled. =over =item C, I<%properties>);> X Declares properties of elements with the name I. Note that C can set or add properties to any element from any binding file, unlike the properties set on control by C, C, etc.. And, since the properties are recorded in the current Model, they are not subject to TeX grouping; once set, they remain in effect until changed or the end of the document. The I can be specified in one of three forms: prefix:name matches specific name in specific namespace prefix:* matches any tag in the specific namespace; * matches any tag in any namespace. There are two kinds of properties: =over =item Scalar properties For scalar properties, only a single value is returned for a given element. When the property is looked up, each of the above forms is considered (the specific element name, the namespace, and all elements); the first defined value is returned. The recognized scalar properties are: =over =item CI> Specifies whether I can be automatically opened if needed to insert an element that can only be contained by I. This property can help match the more SGML-like LaTeX to XML. =item CI> Specifies whether this I can be automatically closed if needed to close an ancestor node, or insert an element into an ancestor. This property can help match the more SGML-like LaTeX to XML. =back =item Code properties These properties provide a bit of code to be run at the times of certain events associated with an element. I the code bits that match a given element will be run, and since they can be added by any binding file, and be specified in a random orders, a little bit of extra control is desirable. Firstly, any I codes are run (eg C), then any normal codes (without modifier) are run, and finally any I codes are run (eg. C). Within I of those groups, the codes assigned for an element's specific name are run first, then those assigned for its package and finally the generic one (C<*>); that is, the most specific codes are run first. When code properties are accumulated by C for normal or late events, the code is appended to the end of the current list (if there were any previous codes added); for early event, the code is prepended. The recognized code properties are: =over =item CI($document,$box)> Provides I to be run whenever a node with this I is opened. It is called with the document being constructed, and the initiating digested object as arguments. It is called after the node has been created, and after any initial attributes due to the constructor (passed to openElement) are added. C or C can be used in place of C; these will be run as a group bfore, or after (respectively) the unmodified blocks. =item CI($document,$box)> Provides I to be run whenever a node with this I is closed. It is called with the document being constructed, and the initiating digested object as arguments. C or C can be used in place of C; these will be run as a group bfore, or after (respectively) the unmodified blocks. =back =back =item C);> X Specifies the schema to use for determining document model. You can leave off the extension; it will look for C.rng> (and maybe eventually, C<.rnc> if that is ever implemented). =item C, I);> X Declares the I to be associated with the given I. These prefixes may be used in ltxml files, particularly for constructors, xpath expressions, etc. They are not necessarily the same as the prefixes that will be used in the generated document Use the prefix C<#default> for the default, non-prefixed, namespace. (See RegisterDocumentNamespace, as well as DocType or RelaxNGSchema). =item C, I);> X Declares the I to be associated with the given I used within the generated XML. They are not necessarily the same as the prefixes used in code (RegisterNamespace). This function is less rarely needed, as the namespace declarations are generally obtained from the DTD or Schema themselves Use the prefix C<#default> for the default, non-prefixed, namespace. (See DocType or RelaxNGSchema). =item C, I, I, I<%namespaces>);> X Declares the expected I, the public and system ID's of the document type to be used in the final document. The hash I<%namespaces> specifies the namespaces prefixes that are expected to be found in the DTD, along with each associated namespace URI. Use the prefix C<#default> for the default namespace (ie. the namespace of non-prefixed elements in the DTD). The prefixes defined for the DTD may be different from the prefixes used in implementation CODE (eg. in ltxml files; see RegisterNamespace). The generated document will use the namespaces and prefixes defined for the DTD. =back =head2 Document Rewriting During document construction, as each node gets closed, the text content gets simplfied. We'll call it I, for lack of a better name. =over =item C, I<%options>);> X Apply the regular expression (given as a string: "/fa/fa/" since it will be converted internally to a true regexp), to the text content. The only option is CI($font)>; if given, then the substitution is applied only when C returns true. Predefined Ligatures combine sequences of "." or single-quotes into appropriate Unicode characters. =item C($document,@nodes));> X I is called on each sequence of math nodes at a given level. If they should be replaced, return a list of C<($n,$string,%attributes)> to replace the text content of the first node with C<$string> content and add the given attributes. The next C<$n-1> nodes are removed. If no replacement is called for, CODE should return undef. Predefined Math Ligatures combine letter or digit Math Tokens (XMTok) into multicharacter symbols or numbers, depending on the font (non math italic). =back After document construction, various rewriting and augmenting of the document can take place. =over =item C);> =item C);> XX These two declarations define document rewrite rules that are applied to the document tree after it has been constructed, but before math parsing, or any other postprocessing, is done. The I<%specification> consists of a sequence of key/value pairs with the initial specs successively narrowing the selection of document nodes, and the remaining specs indicating how to modify or replace the selected nodes. The following select portions of the document: =over =item CI