XML-DOM-1.44/0000755000076400007640000000000010271306205012741 5ustar tjmathertjmatherXML-DOM-1.44/lib/0000755000076400007640000000000010271306205013507 5ustar tjmathertjmatherXML-DOM-1.44/lib/XML/0000755000076400007640000000000010271306205014147 5ustar tjmathertjmatherXML-DOM-1.44/lib/XML/DOM/0000755000076400007640000000000010271306205014566 5ustar tjmathertjmatherXML-DOM-1.44/lib/XML/DOM/DocumentType.pod0000644000076400007640000001176607431101737017734 0ustar tjmathertjmather=head1 NAME XML::DOM::DocumentType - An XML document type (DTD) in XML::DOM =head1 DESCRIPTION XML::DOM::DocumentType extends L. Each Document has a doctype attribute whose value is either null or a DocumentType object. The DocumentType interface in the DOM Level 1 Core provides an interface to the list of entities that are defined for the document, and little else because the effect of namespaces and the various XML scheme efforts on DTD representation are not clearly understood as of this writing. The DOM Level 1 doesn't support editing DocumentType nodes. B: This implementation has added a lot of extra functionality to the DOM Level 1 interface. To allow editing of the DocumentType nodes, see XML::DOM::ignoreReadOnly. =head2 METHODS =over 4 =item getName Returns the name of the DTD, i.e. the name immediately following the DOCTYPE keyword. =item getEntities A NamedNodeMap containing the general entities, both external and internal, declared in the DTD. Duplicates are discarded. For example in: ]> the interface provides access to foo and bar but not baz. Every node in this map also implements the Entity interface. The DOM Level 1 does not support editing entities, therefore entities cannot be altered in any way. B: See XML::DOM::ignoreReadOnly to edit the DocumentType etc. =item getNotations A NamedNodeMap containing the notations declared in the DTD. Duplicates are discarded. Every node in this map also implements the Notation interface. The DOM Level 1 does not support editing notations, therefore notations cannot be altered in any way. B: See XML::DOM::ignoreReadOnly to edit the DocumentType etc. =head2 Additional methods not in the DOM Spec =item Creating and setting the DocumentType A new DocumentType can be created with: $doctype = $doc->createDocumentType ($name, $sysId, $pubId, $internal); To set (or replace) the DocumentType for a particular document, use: $doc->setDocType ($doctype); =item getSysId and setSysId (sysId) Returns or sets the system id. =item getPubId and setPubId (pudId) Returns or sets the public id. =item setName (name) Sets the name of the DTD, i.e. the name immediately following the DOCTYPE keyword. Note that this should always be the same as the element tag name of the root element. =item getAttlistDecl (elemName) Returns the AttlistDecl for the Element with the specified name, or undef. =item getElementDecl (elemName) Returns the ElementDecl for the Element with the specified name, or undef. =item getEntity (entityName) Returns the Entity with the specified name, or undef. =item addAttlistDecl (elemName) Adds a new AttDecl node with the specified elemName if one doesn't exist yet. Returns the AttlistDecl (new or existing) node. =item addElementDecl (elemName, model) Adds a new ElementDecl node with the specified elemName and model if one doesn't exist yet. Returns the AttlistDecl (new or existing) node. The model is ignored if one already existed. =item addEntity (notationName, value, sysId, pubId, ndata, parameter) Adds a new Entity node. Don't use createEntity and appendChild, because it should be added to the internal NamedNodeMap containing the entities. Parameters: I the entity name. I the entity value. I the system id (if any.) I the public id (if any.) I the NDATA declaration (if any, for general unparsed entities.) I whether it is a parameter entity (%ent;) or not (&ent;). SysId, pubId and ndata may be undefined. DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the notationName does not conform to the XML spec. =back =item addNotation (name, base, sysId, pubId) Adds a new Notation object. Parameters: I the notation name. I the base to be used for resolving a relative URI. I the system id. I the public id. Base, sysId, and pubId may all be undefined. (These parameters are passed by the XML::Parser Notation handler.) DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the notationName does not conform to the XML spec. =back =item addAttDef (elemName, attrName, type, default, fixed) Adds a new attribute definition. It will add the AttDef node to the AttlistDecl if it exists. If an AttDef with the specified attrName already exists for the given elemName, this function only generates a warning. See XML::DOM::AttDef::new for the other parameters. =item getDefaultAttrValue (elem, attr) Returns the default attribute value as a string or undef, if none is available. Parameters: I The element tagName. I The attribute name. =item expandEntity (entity [, parameter]) Expands the specified entity or parameter entity (if parameter=1) and returns its value as a string, or undef if the entity does not exist. (The entity name should not contain the '%', '&' or ';' delimiters.) =back XML-DOM-1.44/lib/XML/DOM/DocumentFragment.pod0000644000076400007640000000403307045337007020545 0ustar tjmathertjmather=head1 NAME XML::DOM::DocumentFragment - Facilitates cut & paste in XML::DOM documents =head1 DESCRIPTION XML::DOM::DocumentFragment extends L DocumentFragment is a "lightweight" or "minimal" Document object. It is very common to want to be able to extract a portion of a document's tree or to create a new fragment of a document. Imagine implementing a user command like cut or rearranging a document by moving fragments around. It is desirable to have an object which can hold such fragments and it is quite natural to use a Node for this purpose. While it is true that a Document object could fulfil this role, a Document object can potentially be a heavyweight object, depending on the underlying implementation. What is really needed for this is a very lightweight object. DocumentFragment is such an object. Furthermore, various operations -- such as inserting nodes as children of another Node -- may take DocumentFragment objects as arguments; this results in all the child nodes of the DocumentFragment being moved to the child list of this node. The children of a DocumentFragment node are zero or more nodes representing the tops of any sub-trees defining the structure of the document. DocumentFragment nodes do not need to be well-formed XML documents (although they do need to follow the rules imposed upon well-formed XML parsed entities, which can have multiple top nodes). For example, a DocumentFragment might have only one child and that child node could be a Text node. Such a structure model represents neither an HTML document nor a well-formed XML document. When a DocumentFragment is inserted into a Document (or indeed any other Node that may take children) the children of the DocumentFragment and not the DocumentFragment itself are inserted into the Node. This makes the DocumentFragment very useful when the user wishes to create nodes that are siblings; the DocumentFragment acts as the parent of these nodes so that the user can use the standard methods from the Node interface, such as insertBefore() and appendChild(). XML-DOM-1.44/lib/XML/DOM/AttlistDecl.pod0000644000076400007640000000224607045337006017522 0ustar tjmathertjmather=head1 NAME XML::DOM::AttlistDecl - An XML ATTLIST declaration in XML::DOM =head1 DESCRIPTION XML::DOM::AttlistDecl extends L but is not part of the DOM Level 1 specification. This node represents an ATTLIST declaration, e.g. Each attribute definition is stored a separate AttDef node. The AttDef nodes can be retrieved with getAttDef and added with addAttDef. (The AttDef nodes are stored in a NamedNodeMap internally.) =head2 METHODS =over 4 =item getName Returns the Element tagName. =item getAttDef (attrName) Returns the AttDef node for the attribute with the specified name. =item addAttDef (attrName, type, default, [ fixed ]) Adds a AttDef node for the attribute with the specified name. Parameters: I the attribute name. I the attribute type (e.g. "CDATA" or "(male|female)".) I the default value enclosed in quotes (!), the string #IMPLIED or the string #REQUIRED. I whether the attribute is '#FIXED' (default is 0.) =back XML-DOM-1.44/lib/XML/DOM/Notation.pod0000644000076400007640000000155207045337011017074 0ustar tjmathertjmather=head1 NAME XML::DOM::Notation - An XML NOTATION in XML::DOM =head1 DESCRIPTION XML::DOM::Notation extends L. This node represents a Notation, e.g. =head2 METHODS =over 4 =item getName and setName (name) Returns (or sets) the Notation name, which is the first token after the NOTATION keyword. =item getSysId and setSysId (sysId) Returns (or sets) the system ID, which is the token after the optional SYSTEM keyword. =item getPubId and setPubId (pubId) Returns (or sets) the public ID, which is the token after the optional PUBLIC keyword. =item getBase This is passed by XML::Parser in the Notation handler. I don't know what it is yet. =item getNodeName Returns the same as getName. =back XML-DOM-1.44/lib/XML/DOM/Attr.pod0000644000076400007640000000520507045337006016216 0ustar tjmathertjmather=head1 NAME XML::DOM::Attr - An XML attribute in XML::DOM =head1 DESCRIPTION XML::DOM::Attr extends L. The Attr nodes built by the XML::DOM::Parser always have one child node which is a Text node containing the expanded string value (i.e. EntityReferences are always expanded.) EntityReferences may be added when modifying or creating a new Document. The Attr interface represents an attribute in an Element object. Typically the allowable values for the attribute are defined in a document type definition. Attr objects inherit the Node interface, but since they are not actually child nodes of the element they describe, the DOM does not consider them part of the document tree. Thus, the Node attributes parentNode, previousSibling, and nextSibling have a undef value for Attr objects. The DOM takes the view that attributes are properties of elements rather than having a separate identity from the elements they are associated with; this should make it more efficient to implement such features as default attributes associated with all elements of a given type. Furthermore, Attr nodes may not be immediate children of a DocumentFragment. However, they can be associated with Element nodes contained within a DocumentFragment. In short, users and implementors of the DOM need to be aware that Attr nodes have some things in common with other objects inheriting the Node interface, but they also are quite distinct. The attribute's effective value is determined as follows: if this attribute has been explicitly assigned any value, that value is the attribute's effective value; otherwise, if there is a declaration for this attribute, and that declaration includes a default value, then that default value is the attribute's effective value; otherwise, the attribute does not exist on this element in the structure model until it has been explicitly added. Note that the nodeValue attribute on the Attr instance can also be used to retrieve the string version of the attribute's value(s). In XML, where the value of an attribute can contain entity references, the child nodes of the Attr node provide a representation in which entity references are not expanded. These child nodes may be either Text or EntityReference nodes. Because the attribute type may be unknown, there are no tokenized attribute values. =head2 METHODS =over 4 =item getValue On retrieval, the value of the attribute is returned as a string. Character and general entity references are replaced with their values. =item setValue (str) DOM Spec: On setting, this creates a Text node with the unparsed contents of the string. =item getName Returns the name of this attribute. =back XML-DOM-1.44/lib/XML/DOM/ProcessingInstruction.pod0000644000076400007640000000144207045337011021655 0ustar tjmathertjmather=head1 NAME XML::DOM::ProcessingInstruction - An XML processing instruction in XML::DOM =head1 DESCRIPTION XML::DOM::ProcessingInstruction extends L. It represents a "processing instruction", used in XML as a way to keep processor-specific information in the text of the document. An example: Here, "PI" is the target and "processing instruction" is the data. =head2 METHODS =over 4 =item getTarget The target of this processing instruction. XML defines this as being the first token following the markup that begins the processing instruction. =item getData and setData (data) The content of this processing instruction. This is from the first non white space character after the target to the character immediately preceding the ?>. =back XML-DOM-1.44/lib/XML/DOM/Entity.pod0000644000076400007640000000211407045337010016547 0ustar tjmathertjmather=head1 NAME XML::DOM::Entity - An XML ENTITY in XML::DOM =head1 DESCRIPTION XML::DOM::Entity extends L. This node represents an Entity declaration, e.g. The first one is called a parameter entity and is referenced like this: %draft; The 2nd is a (regular) entity and is referenced like this: &hatch-pic; =head2 METHODS =over 4 =item getNotationName Returns the name of the notation for the entity. I The DOM Spec says: For unparsed entities, the name of the notation for the entity. For parsed entities, this is null. (This implementation does not support unparsed entities.) =item getSysId Returns the system id, or undef. =item getPubId Returns the public id, or undef. =back =head2 Additional methods not in the DOM Spec =over 4 =item isParameterEntity Whether it is a parameter entity (%ent;) or not (&ent;) =item getValue Returns the entity value. =item getNdata Returns the NDATA declaration (for general unparsed entities), or undef. =back XML-DOM-1.44/lib/XML/DOM/Document.pod0000644000076400007640000001302307342202726017057 0ustar tjmathertjmather=head1 NAME XML::DOM::Document - An XML document node in XML::DOM =head1 DESCRIPTION XML::DOM::Document extends L. It is the main root of the XML document structure as returned by XML::DOM::Parser::parse and XML::DOM::Parser::parsefile. Since elements, text nodes, comments, processing instructions, etc. cannot exist outside the context of a Document, the Document interface also contains the factory methods needed to create these objects. The Node objects created have a getOwnerDocument method which associates them with the Document within whose context they were created. =head2 METHODS =over 4 =item getDocumentElement This is a convenience method that allows direct access to the child node that is the root Element of the document. =item getDoctype The Document Type Declaration (see DocumentType) associated with this document. For HTML documents as well as XML documents without a document type declaration this returns undef. The DOM Level 1 does not support editing the Document Type Declaration. B: This implementation allows editing the doctype. See I for details. =item getImplementation The DOMImplementation object that handles this document. A DOM application may use objects from multiple implementations. =item createElement (tagName) Creates an element of the type specified. Note that the instance returned implements the Element interface, so attributes can be specified directly on the returned object. DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the tagName does not conform to the XML spec. =back =item createTextNode (data) Creates a Text node given the specified string. =item createComment (data) Creates a Comment node given the specified string. =item createCDATASection (data) Creates a CDATASection node given the specified string. =item createAttribute (name [, value [, specified ]]) Creates an Attr of the given name. Note that the Attr instance can then be set on an Element using the setAttribute method. B: The DOM Spec does not allow passing the value or the specified property in this method. In this implementation they are optional. Parameters: I The attribute's value. See Attr::setValue for details. If the value is not supplied, the specified property is set to 0. I Whether the attribute value was specified or whether the default value was used. If not supplied, it's assumed to be 1. DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the name does not conform to the XML spec. =back =item createProcessingInstruction (target, data) Creates a ProcessingInstruction node given the specified name and data strings. Parameters: I The target part of the processing instruction. I The data for the node. DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the target does not conform to the XML spec. =back =item createDocumentFragment Creates an empty DocumentFragment object. =item createEntityReference (name) Creates an EntityReference object. =back =head2 Additional methods not in the DOM Spec =over 4 =item getXMLDecl and setXMLDecl (xmlDecl) Returns the XMLDecl for this Document or undef if none was specified. Note that XMLDecl is not part of the list of child nodes. =item setDoctype (doctype) Sets or replaces the DocumentType. B: Don't use appendChild or insertBefore to set the DocumentType. Even though doctype will be part of the list of child nodes, it is handled specially. =item getDefaultAttrValue (elem, attr) Returns the default attribute value as a string or undef, if none is available. Parameters: I The element tagName. I The attribute name. =item getEntity (name) Returns the Entity with the specified name. =item createXMLDecl (version, encoding, standalone) Creates an XMLDecl object. All parameters may be undefined. =item createDocumentType (name, sysId, pubId) Creates a DocumentType object. SysId and pubId may be undefined. =item createNotation (name, base, sysId, pubId) Creates a new Notation object. Consider using XML::DOM::DocumentType::addNotation! =item createEntity (parameter, notationName, value, sysId, pubId, ndata) Creates an Entity object. Consider using XML::DOM::DocumentType::addEntity! =item createElementDecl (name, model) Creates an ElementDecl object. DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the element name (tagName) does not conform to the XML spec. =back =item createAttlistDecl (name) Creates an AttlistDecl object. DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the element name (tagName) does not conform to the XML spec. =back =item expandEntity (entity [, parameter]) Expands the specified entity or parameter entity (if parameter=1) and returns its value as a string, or undef if the entity does not exist. (The entity name should not contain the '%', '&' or ';' delimiters.) =item check ( [$checker] ) Uses the specified L to validate the document. If no XML::Checker is supplied, a new XML::Checker is created. See L for details. =item check_sax ( [$checker] ) Similar to check() except it uses the SAX interface to XML::Checker instead of the expat interface. This method may disappear or replace check() at some time. =item createChecker () Creates an XML::Checker based on the document's DTD. The $checker can be reused to check any elements within the document. Create a new L whenever the DOCTYPE section of the document is altered! =back XML-DOM-1.44/lib/XML/DOM/Parser.pod0000644000076400007640000000523407522072657016552 0ustar tjmathertjmather=head1 NAME XML::DOM::Parser - An XML::Parser that builds XML::DOM document structures =head1 SYNOPSIS use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("file.xml"); $doc->dispose; # Avoid memory leaks - cleanup circular references =head1 DESCRIPTION XML::DOM::Parser extends L The XML::Parser module was written by Clark Cooper and is built on top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library. XML::DOM::Parser parses XML strings or files and builds a data structure that conforms to the API of the Document Object Model as described at L. See the L manpage for other additional properties of the XML::DOM::Parser class. Note that the 'Style' property should not be used (it is set internally.) The XML::Parser B option is more or less supported, in that it will generate EntityReference objects whenever an entity reference is encountered in character data. I'm not sure how useful this is. Any comments are welcome. As described in the synopsis, when you create an XML::DOM::Parser object, the parse and parsefile methods create an L object from the specified input. This Document object can then be examined, modified and written back out to a file or converted to a string. When using XML::DOM with XML::Parser version 2.19 and up, setting the XML::DOM::Parser option B to 1 will store CDATASections in CDATASection nodes, instead of converting them to Text nodes. Subsequent CDATASection nodes will be merged into one. Let me know if this is a problem. =head1 Using LWP to parse URLs The parsefile() method now also supports URLs, e.g. I. It uses LWP to download the file and then calls parse() on the resulting string. By default it will use a L that is created as follows: use LWP::UserAgent; $LWP_USER_AGENT = LWP::UserAgent->new; $LWP_USER_AGENT->env_proxy; Note that env_proxy reads proxy settings from environment variables, which is what I need to do to get thru our firewall. If you want to use a different LWP::UserAgent, you can either set it globally with: XML::DOM::Parser::set_LWP_UserAgent ($my_agent); or, you can specify it for a specific XML::DOM::Parser by passing it to the constructor: my $parser = new XML::DOM::Parser (LWP_UserAgent => $my_agent); Currently, LWP is used when the filename (passed to parsefile) starts with one of the following URL schemes: http, https, ftp, wais, gopher, or file (followed by a colon.) If I missed one, please let me know. The LWP modules are part of libwww-perl which is available at CPAN. XML-DOM-1.44/lib/XML/DOM/NodeList.pm0000644000076400007640000000126707045337010016656 0ustar tjmathertjmather###################################################################### package XML::DOM::NodeList; ###################################################################### use vars qw ( $EMPTY ); # Empty NodeList $EMPTY = new XML::DOM::NodeList; sub new { bless [], $_[0]; } sub item { $_[0]->[$_[1]]; } sub getLength { int (@{$_[0]}); } #------------------------------------------------------------ # Extra method implementations sub dispose { my $self = shift; for my $kid (@{$self}) { $kid->dispose; } } sub setOwnerDocument { my ($self, $doc) = @_; for my $kid (@{$self}) { $kid->setOwnerDocument ($doc); } } 1; # package return code XML-DOM-1.44/lib/XML/DOM/Node.pod0000644000076400007640000003142707052622574016203 0ustar tjmathertjmather=head1 NAME XML::DOM::Node - Super class of all nodes in XML::DOM =head1 DESCRIPTION XML::DOM::Node is the super class of all nodes in an XML::DOM document. This means that all nodes that subclass XML::DOM::Node also inherit all the methods that XML::DOM::Node implements. =head2 GLOBAL VARIABLES =over 4 =item @NodeNames The variable @XML::DOM::Node::NodeNames maps the node type constants to strings. It is used by XML::DOM::Node::getNodeTypeName. =back =head2 METHODS =over 4 =item getNodeType Return an integer indicating the node type. See XML::DOM constants. =item getNodeName Return a property or a hardcoded string, depending on the node type. Here are the corresponding functions or values: Attr getName AttDef getName AttlistDecl getName CDATASection "#cdata-section" Comment "#comment" Document "#document" DocumentType getNodeName DocumentFragment "#document-fragment" Element getTagName ElementDecl getName EntityReference getEntityName Entity getNotationName Notation getName ProcessingInstruction getTarget Text "#text" XMLDecl "#xml-declaration" B: AttDef, AttlistDecl, ElementDecl and XMLDecl were added for completeness. =item getNodeValue and setNodeValue (value) Returns a string or undef, depending on the node type. This method is provided for completeness. In other languages it saves the programmer an upcast. The value is either available thru some other method defined in the subclass, or else undef is returned. Here are the corresponding methods: Attr::getValue, Text::getData, CDATASection::getData, Comment::getData, ProcessingInstruction::getData. =item getParentNode and setParentNode (parentNode) The parent of this node. All nodes, except Document, DocumentFragment, and Attr may have a parent. However, if a node has just been created and not yet added to the tree, or if it has been removed from the tree, this is undef. =item getChildNodes A NodeList that contains all children of this node. If there are no children, this is a NodeList containing no nodes. The content of the returned NodeList is "live" in the sense that, for instance, changes to the children of the node object that it was created from are immediately reflected in the nodes returned by the NodeList accessors; it is not a static snapshot of the content of the node. This is true for every NodeList, including the ones returned by the getElementsByTagName method. NOTE: this implementation does not return a "live" NodeList for getElementsByTagName. See L. When this method is called in a list context, it returns a regular perl list containing the child nodes. Note that this list is not "live". E.g. @list = $node->getChildNodes; # returns a perl list $nodelist = $node->getChildNodes; # returns a NodeList (object reference) for my $kid ($node->getChildNodes) # iterate over the children of $node =item getFirstChild The first child of this node. If there is no such node, this returns undef. =item getLastChild The last child of this node. If there is no such node, this returns undef. =item getPreviousSibling The node immediately preceding this node. If there is no such node, this returns undef. =item getNextSibling The node immediately following this node. If there is no such node, this returns undef. =item getAttributes A NamedNodeMap containing the attributes (Attr nodes) of this node (if it is an Element) or undef otherwise. Note that adding/removing attributes from the returned object, also adds/removes attributes from the Element node that the NamedNodeMap came from. =item getOwnerDocument The Document object associated with this node. This is also the Document object used to create new nodes. When this node is a Document this is undef. =item insertBefore (newChild, refChild) Inserts the node newChild before the existing child node refChild. If refChild is undef, insert newChild at the end of the list of children. If newChild is a DocumentFragment object, all of its children are inserted, in the same order, before refChild. If the newChild is already in the tree, it is first removed. Return Value: The node being inserted. DOMExceptions: =over 4 =item * HIERARCHY_REQUEST_ERR Raised if this node is of a type that does not allow children of the type of the newChild node, or if the node to insert is one of this node's ancestors. =item * WRONG_DOCUMENT_ERR Raised if newChild was created from a different document than the one that created this node. =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =item * NOT_FOUND_ERR Raised if refChild is not a child of this node. =back =item replaceChild (newChild, oldChild) Replaces the child node oldChild with newChild in the list of children, and returns the oldChild node. If the newChild is already in the tree, it is first removed. Return Value: The node replaced. DOMExceptions: =over 4 =item * HIERARCHY_REQUEST_ERR Raised if this node is of a type that does not allow children of the type of the newChild node, or it the node to put in is one of this node's ancestors. =item * WRONG_DOCUMENT_ERR Raised if newChild was created from a different document than the one that created this node. =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =item * NOT_FOUND_ERR Raised if oldChild is not a child of this node. =back =item removeChild (oldChild) Removes the child node indicated by oldChild from the list of children, and returns it. Return Value: The node removed. DOMExceptions: =over 4 =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =item * NOT_FOUND_ERR Raised if oldChild is not a child of this node. =back =item appendChild (newChild) Adds the node newChild to the end of the list of children of this node. If the newChild is already in the tree, it is first removed. If it is a DocumentFragment object, the entire contents of the document fragment are moved into the child list of this node Return Value: The node added. DOMExceptions: =over 4 =item * HIERARCHY_REQUEST_ERR Raised if this node is of a type that does not allow children of the type of the newChild node, or if the node to append is one of this node's ancestors. =item * WRONG_DOCUMENT_ERR Raised if newChild was created from a different document than the one that created this node. =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =back =item hasChildNodes This is a convenience method to allow easy determination of whether a node has any children. Return Value: 1 if the node has any children, 0 otherwise. =item cloneNode (deep) Returns a duplicate of this node, i.e., serves as a generic copy constructor for nodes. The duplicate node has no parent (parentNode returns undef.). Cloning an Element copies all attributes and their values, including those generated by the XML processor to represent defaulted attributes, but this method does not copy any text it contains unless it is a deep clone, since the text is contained in a child Text node. Cloning any other type of node simply returns a copy of this node. Parameters: I If true, recursively clone the subtree under the specified node. If false, clone only the node itself (and its attributes, if it is an Element). Return Value: The duplicate node. =item normalize Puts all Text nodes in the full depth of the sub-tree underneath this Element into a "normal" form where only markup (e.g., tags, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are no adjacent Text nodes. This can be used to ensure that the DOM view of a document is the same as if it were saved and re-loaded, and is useful when operations (such as XPointer lookups) that depend on a particular document tree structure are to be used. B: In the DOM Spec this method is defined in the Element and Document class interfaces only, but it doesn't hurt to have it here... =item getElementsByTagName (name [, recurse]) Returns a NodeList of all descendant elements with a given tag name, in the order in which they would be encountered in a preorder traversal of the Element tree. Parameters: I The name of the tag to match on. The special value "*" matches all tags. I Whether it should return only direct child nodes (0) or any descendant that matches the tag name (1). This argument is optional and defaults to 1. It is not part of the DOM spec. Return Value: A list of matching Element nodes. NOTE: this implementation does not return a "live" NodeList for getElementsByTagName. See L. When this method is called in a list context, it returns a regular perl list containing the result nodes. E.g. @list = $node->getElementsByTagName("tag"); # returns a perl list $nodelist = $node->getElementsByTagName("tag"); # returns a NodeList (object ref.) for my $elem ($node->getElementsByTagName("tag")) # iterate over the result nodes =back =head2 Additional methods not in the DOM Spec =over 4 =item getNodeTypeName Return the string describing the node type. E.g. returns "ELEMENT_NODE" if getNodeType returns ELEMENT_NODE. It uses @XML::DOM::Node::NodeNames. =item toString Returns the entire subtree as a string. =item printToFile (filename) Prints the entire subtree to the file with the specified filename. Croaks: if the file could not be opened for writing. =item printToFileHandle (handle) Prints the entire subtree to the file handle. E.g. to print to STDOUT: $node->printToFileHandle (\*STDOUT); =item print (obj) Prints the entire subtree using the object's print method. E.g to print to a FileHandle object: $f = new FileHandle ("file.out", "w"); $node->print ($f); =item getChildIndex (child) Returns the index of the child node in the list returned by getChildNodes. Return Value: the index or -1 if the node is not found. =item getChildAtIndex (index) Returns the child node at the specifed index or undef. =item addText (text) Appends the specified string to the last child if it is a Text node, or else appends a new Text node (with the specified text.) Return Value: the last child if it was a Text node or else the new Text node. =item dispose Removes all circular references in this node and its descendants so the objects can be claimed for garbage collection. The objects should not be used afterwards. =item setOwnerDocument (doc) Sets the ownerDocument property of this node and all its children (and attributes etc.) to the specified document. This allows the user to cut and paste document subtrees between different XML::DOM::Documents. The node should be removed from the original document first, before calling setOwnerDocument. This method does nothing when called on a Document node. =item isAncestor (parent) Returns 1 if parent is an ancestor of this node or if it is this node itself. =item expandEntityRefs (str) Expands all the entity references in the string and returns the result. The entity references can be character references (e.g. "{" or "ῂ"), default entity references (""", ">", "<", "'" and "&") or entity references defined in Entity objects as part of the DocumentType of the owning Document. Character references are expanded into UTF-8. Parameter entity references (e.g. %ent;) are not expanded. =item to_sax ( %HANDLERS ) E.g. $node->to_sax (DocumentHandler => $my_handler, Handler => $handler2 ); %HANDLERS may contain the following handlers: =over 4 =item * DocumentHandler =item * DTDHandler =item * EntityResolver =item * Handler Default handler when one of the above is not specified =back Each XML::DOM::Node generates the appropriate SAX callbacks (for the appropriate SAX handler.) Different SAX handlers can be plugged in to accomplish different things, e.g. L would check the node (currently only Document and Element nodes are supported), L would create a new DOM subtree (thereby, in essence, copying the Node) and in the near future, XML::Writer could print the node. All Perl SAX related work is still in flux, so this interface may change a little. See PerlSAX for the description of the SAX interface. =item check ( [$checker] ) See descriptions for check() in L and L. =item xql ( @XQL_OPTIONS ) To use the xql method, you must first I L and L. This method is basically a shortcut for: $query = new XML::XQL::Query ( @XQL_OPTIONS ); return $query->solve ($node); If the first parameter in @XQL_OPTIONS is the XQL expression, you can leave off the 'Expr' keyword, so: $node->xql ("doc//elem1[@attr]", @other_options); is identical to: $node->xql (Expr => "doc//elem1[@attr]", @other_options); See L for other available XQL_OPTIONS. See L and L for more info. =item isHidden () Whether the node is hidden. See L for details. =back XML-DOM-1.44/lib/XML/DOM/ElementDecl.pod0000644000076400007640000000102107045337007017456 0ustar tjmathertjmather=head1 NAME XML::DOM::ElementDecl - An XML ELEMENT declaration in XML::DOM =head1 DESCRIPTION XML::DOM::ElementDecl extends L but is not part of the DOM Level 1 specification. This node represents an Element declaration, e.g. =head2 METHODS =over 4 =item getName Returns the Element tagName. =item getModel and setModel (model) Returns and sets the model as a string, e.g. "(street+, city, state, zip, country?)" in the above example. =back XML-DOM-1.44/lib/XML/DOM/EntityReference.pod0000644000076400007640000000243107045337010020370 0ustar tjmathertjmather=head1 NAME XML::DOM::EntityReference - An XML ENTITY reference in XML::DOM =head1 DESCRIPTION XML::DOM::EntityReference extends L. EntityReference objects may be inserted into the structure model when an entity reference is in the source document, or when the user wishes to insert an entity reference. Note that character references and references to predefined entities are considered to be expanded by the HTML or XML processor so that characters are represented by their Unicode equivalent rather than by an entity reference. Moreover, the XML processor may completely expand references to entities while building the structure model, instead of providing EntityReference objects. If it does provide such objects, then for a given EntityReference node, it may be that there is no Entity node representing the referenced entity; but if such an Entity exists, then the child list of the EntityReference node is the same as that of the Entity node. As with the Entity node, all descendants of the EntityReference are readonly. The resolution of the children of the EntityReference (the replacement value of the referenced Entity) may be lazily evaluated; actions by the user (such as calling the childNodes method on the EntityReference node) are assumed to trigger the evaluation. XML-DOM-1.44/lib/XML/DOM/NodeList.pod0000644000076400007640000000210707045337010017016 0ustar tjmathertjmather=head1 NAME XML::DOM::NodeList - A node list as used by XML::DOM =head1 DESCRIPTION The NodeList interface provides the abstraction of an ordered collection of nodes, without defining or constraining how this collection is implemented. The items in the NodeList are accessible via an integral index, starting from 0. Although the DOM spec states that all NodeLists are "live" in that they allways reflect changes to the DOM tree, the NodeList returned by getElementsByTagName is not live in this implementation. See L for details. =head2 METHODS =over 4 =item item (index) Returns the indexth item in the collection. If index is greater than or equal to the number of nodes in the list, this returns undef. =item getLength The number of nodes in the list. The range of valid child node indices is 0 to length-1 inclusive. =back =head2 Additional methods not in the DOM Spec =over 4 =item dispose Removes all circular references in this NodeList and its descendants so the objects can be claimed for garbage collection. The objects should not be used afterwards. =back XML-DOM-1.44/lib/XML/DOM/CharacterData.pod0000644000076400007640000000537407045337006020001 0ustar tjmathertjmather=head1 NAME XML::DOM::CharacterData - Common interface for Text, CDATASections and Comments =head1 DESCRIPTION XML::DOM::CharacterData extends L The CharacterData interface extends Node with a set of attributes and methods for accessing character data in the DOM. For clarity this set is defined here rather than on each object that uses these attributes and methods. No DOM objects correspond directly to CharacterData, though Text, Comment and CDATASection do inherit the interface from it. All offsets in this interface start from 0. =head2 METHODS =over 4 =item getData and setData (data) The character data of the node that implements this interface. The DOM implementation may not put arbitrary limits on the amount of data that may be stored in a CharacterData node. However, implementation limits may mean that the entirety of a node's data may not fit into a single DOMString. In such cases, the user may call substringData to retrieve the data in appropriately sized pieces. =item getLength The number of characters that are available through data and the substringData method below. This may have the value zero, i.e., CharacterData nodes may be empty. =item substringData (offset, count) Extracts a range of data from the node. Parameters: I Start offset of substring to extract. I The number of characters to extract. Return Value: The specified substring. If the sum of offset and count exceeds the length, then all characters to the end of the data are returned. =item appendData (str) Appends the string to the end of the character data of the node. Upon success, data provides access to the concatenation of data and the DOMString specified. =item insertData (offset, arg) Inserts a string at the specified character offset. Parameters: I The character offset at which to insert. I The DOMString to insert. =item deleteData (offset, count) Removes a range of characters from the node. Upon success, data and length reflect the change. If the sum of offset and count exceeds length then all characters from offset to the end of the data are deleted. Parameters: I The offset from which to remove characters. I The number of characters to delete. =item replaceData (offset, count, arg) Replaces the characters starting at the specified character offset with the specified string. Parameters: I The offset from which to start replacing. I The number of characters to replace. I The DOMString with which the range must be replaced. If the sum of offset and count exceeds length, then all characters to the end of the data are replaced (i.e., the effect is the same as a remove method call with the same range, followed by an append method invocation). =back XML-DOM-1.44/lib/XML/DOM/DOMException.pm0000644000076400007640000000354407045337007017441 0ustar tjmathertjmather###################################################################### package XML::DOM::DOMException; ###################################################################### use Exporter; use overload '""' => \&stringify; use vars qw ( @ISA @EXPORT @ErrorNames ); BEGIN { @ISA = qw( Exporter ); @EXPORT = qw( INDEX_SIZE_ERR DOMSTRING_SIZE_ERR HIERARCHY_REQUEST_ERR WRONG_DOCUMENT_ERR INVALID_CHARACTER_ERR NO_DATA_ALLOWED_ERR NO_MODIFICATION_ALLOWED_ERR NOT_FOUND_ERR NOT_SUPPORTED_ERR INUSE_ATTRIBUTE_ERR ); } sub UNKNOWN_ERR () {0;} # not in the DOM Spec! sub INDEX_SIZE_ERR () {1;} sub DOMSTRING_SIZE_ERR () {2;} sub HIERARCHY_REQUEST_ERR () {3;} sub WRONG_DOCUMENT_ERR () {4;} sub INVALID_CHARACTER_ERR () {5;} sub NO_DATA_ALLOWED_ERR () {6;} sub NO_MODIFICATION_ALLOWED_ERR () {7;} sub NOT_FOUND_ERR () {8;} sub NOT_SUPPORTED_ERR () {9;} sub INUSE_ATTRIBUTE_ERR () {10;} @ErrorNames = ( "UNKNOWN_ERR", "INDEX_SIZE_ERR", "DOMSTRING_SIZE_ERR", "HIERARCHY_REQUEST_ERR", "WRONG_DOCUMENT_ERR", "INVALID_CHARACTER_ERR", "NO_DATA_ALLOWED_ERR", "NO_MODIFICATION_ALLOWED_ERR", "NOT_FOUND_ERR", "NOT_SUPPORTED_ERR", "INUSE_ATTRIBUTE_ERR" ); sub new { my ($type, $code, $msg) = @_; my $self = bless {Code => $code}, $type; $self->{Message} = $msg if defined $msg; # print "=> Exception: " . $self->stringify . "\n"; $self; } sub getCode { $_[0]->{Code}; } #------------------------------------------------------------ # Extra method implementations sub getName { $ErrorNames[$_[0]->{Code}]; } sub getMessage { $_[0]->{Message}; } sub stringify { my $self = shift; "XML::DOM::DOMException(Code=" . $self->getCode . ", Name=" . $self->getName . ", Message=" . $self->getMessage . ")"; } 1; # package return code XML-DOM-1.44/lib/XML/DOM/PerlSAX.pm0000644000076400007640000000165407555402770016427 0ustar tjmathertjmatherpackage XML::DOM::PerlSAX; use strict; BEGIN { if ($^W) { warn "XML::DOM::PerlSAX has been renamed to XML::Handler::BuildDOM, please modify your code accordingly."; } } use XML::Handler::BuildDOM; use vars qw{ @ISA }; @ISA = qw{ XML::Handler::BuildDOM }; 1; # package return code __END__ =head1 NAME XML::DOM::PerlSAX - Old name of L =head1 SYNOPSIS See L =head1 DESCRIPTION XML::DOM::PerlSAX was renamed to L to comply with naming conventions for PerlSAX filters/handlers. For backward compatibility, this package will remain in existence (it simply includes XML::Handler::BuildDOM), but it will print a warning when running with I<'perl -w'>. =head1 AUTHOR Enno Derksen is the original author. Send bug reports, hints, tips, suggestions to T.J Mather at >. =head1 SEE ALSO L, L XML-DOM-1.44/lib/XML/DOM/NamedNodeMap.pm0000644000076400007640000001227707045337010017430 0ustar tjmathertjmather###################################################################### package XML::DOM::NamedNodeMap; ###################################################################### use strict; use Carp; use XML::DOM::DOMException; use XML::DOM::NodeList; use vars qw( $Special ); # Constant definition: # Note: a real Name should have at least 1 char, so nobody else should use this $Special = ""; sub new { my ($class, %args) = @_; $args{Values} = new XML::DOM::NodeList; # Store all NamedNodeMap properties in element $Special bless { $Special => \%args}, $class; } sub getNamedItem { # Don't return the $Special item! ($_[1] eq $Special) ? undef : $_[0]->{$_[1]}; } sub setNamedItem { my ($self, $node) = @_; my $prop = $self->{$Special}; my $name = $node->getNodeName; if ($XML::DOM::SafeMode) { croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR) if $self->isReadOnly; croak new XML::DOM::DOMException (WRONG_DOCUMENT_ERR) if $node->[XML::DOM::Node::_Doc] != $prop->{Doc}; croak new XML::DOM::DOMException (INUSE_ATTRIBUTE_ERR) if defined ($node->[XML::DOM::Node::_UsedIn]); croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "can't add name with NodeName [$name] to NamedNodeMap") if $name eq $Special; } my $values = $prop->{Values}; my $index = -1; my $prev = $self->{$name}; if (defined $prev) { # decouple previous node $prev->decoupleUsedIn; # find index of $prev $index = 0; for my $val (@{$values}) { last if ($val == $prev); $index++; } } $self->{$name} = $node; $node->[XML::DOM::Node::_UsedIn] = $self; if ($index == -1) { push (@{$values}, $node); } else # replace previous node with new node { splice (@{$values}, $index, 1, $node); } $prev; } sub removeNamedItem { my ($self, $name) = @_; # Be careful that user doesn't delete $Special node! croak new XML::DOM::DOMException (NOT_FOUND_ERR) if $name eq $Special; my $node = $self->{$name}; croak new XML::DOM::DOMException (NOT_FOUND_ERR) unless defined $node; # The DOM Spec doesn't mention this Exception - I think it's an oversight croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR) if $self->isReadOnly; $node->decoupleUsedIn; delete $self->{$name}; # remove node from Values list my $values = $self->getValues; my $index = 0; for my $val (@{$values}) { if ($val == $node) { splice (@{$values}, $index, 1, ()); last; } $index++; } $node; } # The following 2 are really bogus. DOM should use an iterator instead (Clark) sub item { my ($self, $item) = @_; $self->{$Special}->{Values}->[$item]; } sub getLength { my ($self) = @_; my $vals = $self->{$Special}->{Values}; int (@$vals); } #------------------------------------------------------------ # Extra method implementations sub isReadOnly { return 0 if $XML::DOM::IgnoreReadOnly; my $used = $_[0]->{$Special}->{UsedIn}; defined $used ? $used->isReadOnly : 0; } sub cloneNode { my ($self, $deep) = @_; my $prop = $self->{$Special}; my $map = new XML::DOM::NamedNodeMap (Doc => $prop->{Doc}); # Not copying Parent property on purpose! local $XML::DOM::IgnoreReadOnly = 1; # temporarily... for my $val (@{$prop->{Values}}) { my $key = $val->getNodeName; my $newNode = $val->cloneNode ($deep); $newNode->[XML::DOM::Node::_UsedIn] = $map; $map->{$key} = $newNode; push (@{$map->{$Special}->{Values}}, $newNode); } $map; } sub setOwnerDocument { my ($self, $doc) = @_; my $special = $self->{$Special}; $special->{Doc} = $doc; for my $kid (@{$special->{Values}}) { $kid->setOwnerDocument ($doc); } } sub getChildIndex { my ($self, $attr) = @_; my $i = 0; for my $kid (@{$self->{$Special}->{Values}}) { return $i if $kid == $attr; $i++; } -1; # not found } sub getValues { wantarray ? @{ $_[0]->{$Special}->{Values} } : $_[0]->{$Special}->{Values}; } # Remove circular dependencies. The NamedNodeMap and its values should # not be used afterwards. sub dispose { my $self = shift; for my $kid (@{$self->getValues}) { undef $kid->[XML::DOM::Node::_UsedIn]; # was delete $kid->dispose; } delete $self->{$Special}->{Doc}; delete $self->{$Special}->{Parent}; delete $self->{$Special}->{Values}; for my $key (keys %$self) { delete $self->{$key}; } } sub setParentNode { $_[0]->{$Special}->{Parent} = $_[1]; } sub getProperty { $_[0]->{$Special}->{$_[1]}; } #?? remove after debugging sub toString { my ($self) = @_; my $str = "NamedNodeMap["; while (my ($key, $val) = each %$self) { if ($key eq $Special) { $str .= "##Special ("; while (my ($k, $v) = each %$val) { if ($k eq "Values") { $str .= $k . " => ["; for my $a (@$v) { # $str .= $a->getNodeName . "=" . $a . ","; $str .= $a->toString . ","; } $str .= "], "; } else { $str .= $k . " => " . $v . ", "; } } $str .= "), "; } else { $str .= $key . " => " . $val . ", "; } } $str . "]"; } 1; # package return code XML-DOM-1.44/lib/XML/DOM/NamedNodeMap.pod0000644000076400007640000000707207045337010017573 0ustar tjmathertjmather=head1 NAME XML::DOM::NamedNodeMap - A hash table interface for XML::DOM =head1 DESCRIPTION Objects implementing the NamedNodeMap interface are used to represent collections of nodes that can be accessed by name. Note that NamedNodeMap does not inherit from NodeList; NamedNodeMaps are not maintained in any particular order. Objects contained in an object implementing NamedNodeMap may also be accessed by an ordinal index, but this is simply to allow convenient enumeration of the contents of a NamedNodeMap, and does not imply that the DOM specifies an order to these Nodes. Note that in this implementation, the objects added to a NamedNodeMap are kept in order. =head2 METHODS =over 4 =item getNamedItem (name) Retrieves a node specified by name. Return Value: A Node (of any type) with the specified name, or undef if the specified name did not identify any node in the map. =item setNamedItem (arg) Adds a node using its nodeName attribute. As the nodeName attribute is used to derive the name which the node must be stored under, multiple nodes of certain types (those that have a "special" string value) cannot be stored as the names would clash. This is seen as preferable to allowing nodes to be aliased. Parameters: I A node to store in a named node map. The node will later be accessible using the value of the nodeName attribute of the node. If a node with that name is already present in the map, it is replaced by the new one. Return Value: If the new Node replaces an existing node with the same name the previously existing Node is returned, otherwise undef is returned. DOMExceptions: =over 4 =item * WRONG_DOCUMENT_ERR Raised if arg was created from a different document than the one that created the NamedNodeMap. =item * NO_MODIFICATION_ALLOWED_ERR Raised if this NamedNodeMap is readonly. =item * INUSE_ATTRIBUTE_ERR Raised if arg is an Attr that is already an attribute of another Element object. The DOM user must explicitly clone Attr nodes to re-use them in other elements. =back =item removeNamedItem (name) Removes a node specified by name. If the removed node is an Attr with a default value it is immediately replaced. Return Value: The node removed from the map or undef if no node with such a name exists. DOMException: =over 4 =item * NOT_FOUND_ERR Raised if there is no node named name in the map. =back =item item (index) Returns the indexth item in the map. If index is greater than or equal to the number of nodes in the map, this returns undef. Return Value: The node at the indexth position in the NamedNodeMap, or undef if that is not a valid index. =item getLength Returns the number of nodes in the map. The range of valid child node indices is 0 to length-1 inclusive. =back =head2 Additional methods not in the DOM Spec =over 4 =item getValues Returns a NodeList with the nodes contained in the NamedNodeMap. The NodeList is "live", in that it reflects changes made to the NamedNodeMap. When this method is called in a list context, it returns a regular perl list containing the values. Note that this list is not "live". E.g. @list = $map->getValues; # returns a perl list $nodelist = $map->getValues; # returns a NodeList (object ref.) for my $val ($map->getValues) # iterate over the values =item getChildIndex (node) Returns the index of the node in the NodeList as returned by getValues, or -1 if the node is not in the NamedNodeMap. =item dispose Removes all circular references in this NamedNodeMap and its descendants so the objects can be claimed for garbage collection. The objects should not be used afterwards. =back XML-DOM-1.44/lib/XML/DOM/Comment.pod0000644000076400007640000000066607045337007016715 0ustar tjmathertjmather=head1 NAME XML::DOM::Comment - An XML comment in XML::DOM =head1 DESCRIPTION XML::DOM::Comment extends L which extends L. This node represents the content of a comment, i.e., all the characters between the starting ''. Note that this is the definition of a comment in XML, and, in practice, HTML, although some HTML tools may implement the full SGML comment structure. XML-DOM-1.44/lib/XML/DOM/Element.pod0000644000076400007640000001152507045337007016700 0ustar tjmathertjmather=head1 NAME XML::DOM::Element - An XML element node in XML::DOM =head1 DESCRIPTION XML::DOM::Element extends L. By far the vast majority of objects (apart from text) that authors encounter when traversing a document are Element nodes. Assume the following XML document: When represented using DOM, the top node is an Element node for "elementExample", which contains two child Element nodes, one for "subelement1" and one for "subelement2". "subelement1" contains no child nodes. Elements may have attributes associated with them; since the Element interface inherits from Node, the generic Node interface method getAttributes may be used to retrieve the set of all attributes for an element. There are methods on the Element interface to retrieve either an Attr object by name or an attribute value by name. In XML, where an attribute value may contain entity references, an Attr object should be retrieved to examine the possibly fairly complex sub-tree representing the attribute value. On the other hand, in HTML, where all attributes have simple string values, methods to directly access an attribute value can safely be used as a convenience. =head2 METHODS =over 4 =item getTagName The name of the element. For example, in: ... tagName has the value "elementExample". Note that this is case-preserving in XML, as are all of the operations of the DOM. =item getAttribute (name) Retrieves an attribute value by name. Return Value: The Attr value as a string, or the empty string if that attribute does not have a specified or default value. =item setAttribute (name, value) Adds a new attribute. If an attribute with that name is already present in the element, its value is changed to be that of the value parameter. This value is a simple string, it is not parsed as it is being set. So any markup (such as syntax to be recognized as an entity reference) is treated as literal text, and needs to be appropriately escaped by the implementation when it is written out. In order to assign an attribute value that contains entity references, the user must create an Attr node plus any Text and EntityReference nodes, build the appropriate subtree, and use setAttributeNode to assign it as the value of an attribute. DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the specified name contains an invalid character. =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =back =item removeAttribute (name) Removes an attribute by name. If the removed attribute has a default value it is immediately replaced. DOMExceptions: =over 4 =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =back =item getAttributeNode Retrieves an Attr node by name. Return Value: The Attr node with the specified attribute name or undef if there is no such attribute. =item setAttributeNode (attr) Adds a new attribute. If an attribute with that name is already present in the element, it is replaced by the new one. Return Value: If the newAttr attribute replaces an existing attribute with the same name, the previously existing Attr node is returned, otherwise undef is returned. DOMExceptions: =over 4 =item * WRONG_DOCUMENT_ERR Raised if newAttr was created from a different document than the one that created the element. =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =item * INUSE_ATTRIBUTE_ERR Raised if newAttr is already an attribute of another Element object. The DOM user must explicitly clone Attr nodes to re-use them in other elements. =back =item removeAttributeNode (oldAttr) Removes the specified attribute. If the removed Attr has a default value it is immediately replaced. If the Attr already is the default value, nothing happens and nothing is returned. Parameters: I The Attr node to remove from the attribute list. Return Value: The Attr node that was removed. DOMExceptions: =over 4 =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =item * NOT_FOUND_ERR Raised if oldAttr is not an attribute of the element. =back =head2 Additional methods not in the DOM Spec =over 4 =item setTagName (newTagName) Sets the tag name of the Element. Note that this method is not portable between DOM implementations. DOMExceptions: =over 4 =item * INVALID_CHARACTER_ERR Raised if the specified name contains an invalid character. =back =item check ($checker) Uses the specified L to validate the document. NOTE: an XML::Checker must be supplied. The checker can be created in different ways, e.g. when parsing a document with XML::DOM::ValParser, or with XML::DOM::Document::createChecker(). See L for more info. =back XML-DOM-1.44/lib/XML/DOM/XMLDecl.pod0000644000076400007640000000125607045337011016532 0ustar tjmathertjmather=head1 NAME XML::DOM::XMLDecl - XML declaration in XML::DOM =head1 DESCRIPTION XML::DOM::XMLDecl extends L, but is not part of the DOM Level 1 specification. It contains the XML declaration, e.g. See also XML::DOM::Document::getXMLDecl. =head2 METHODS =over 4 =item getVersion and setVersion (version) Returns and sets the XML version. At the time of this writing the version should always be "1.0" =item getEncoding and setEncoding (encoding) undef may be specified for the encoding value. =item getStandalone and setStandalone (standalone) undef may be specified for the standalone value. =back XML-DOM-1.44/lib/XML/DOM/CDATASection.pod0000644000076400007640000000235107045337006017444 0ustar tjmathertjmather=head1 NAME XML::DOM::CDATASection - Escaping XML text blocks in XML::DOM =head1 DESCRIPTION XML::DOM::CDATASection extends L which extends L. CDATA sections are used to escape blocks of text containing characters that would otherwise be regarded as markup. The only delimiter that is recognized in a CDATA section is the "]]>" string that ends the CDATA section. CDATA sections can not be nested. The primary purpose is for including material such as XML fragments, without needing to escape all the delimiters. The DOMString attribute of the Text node holds the text that is contained by the CDATA section. Note that this may contain characters that need to be escaped outside of CDATA sections and that, depending on the character encoding ("charset") chosen for serialization, it may be impossible to write out some characters as part of a CDATA section. The CDATASection interface inherits the CharacterData interface through the Text interface. Adjacent CDATASections nodes are not merged by use of the Element.normalize() method. B XML::DOM::Parser and XML::DOM::ValParser convert all CDATASections to regular text by default. To preserve CDATASections, set the parser option KeepCDATA to 1. XML-DOM-1.44/lib/XML/DOM/DOMImplementation.pod0000644000076400007640000000123107045337007020625 0ustar tjmathertjmather=head1 NAME XML::DOM::DOMImplementation - Information about XML::DOM implementation =head1 DESCRIPTION The DOMImplementation interface provides a number of methods for performing operations that are independent of any particular instance of the document object model. The DOM Level 1 does not specify a way of creating a document instance, and hence document creation is an operation specific to an implementation. Future Levels of the DOM specification are expected to provide methods for creating documents directly. =head2 METHODS =over 4 =item hasFeature (feature, version) Returns 1 if and only if feature equals "XML" and version equals "1.0". =back XML-DOM-1.44/lib/XML/DOM/Text.pod0000644000076400007640000000353607045337011016231 0ustar tjmathertjmather=head1 NAME XML::DOM::Text - A piece of XML text in XML::DOM =head1 DESCRIPTION XML::DOM::Text extends L, which extends L. The Text interface represents the textual content (termed character data in XML) of an Element or Attr. If there is no markup inside an element's content, the text is contained in a single object implementing the Text interface that is the only child of the element. If there is markup, it is parsed into a list of elements and Text nodes that form the list of children of the element. When a document is first made available via the DOM, there is only one Text node for each block of text. Users may create adjacent Text nodes that represent the contents of a given element without any intervening markup, but should be aware that there is no way to represent the separations between these nodes in XML or HTML, so they will not (in general) persist between DOM editing sessions. The normalize() method on Element merges any such adjacent Text objects into a single node for each block of text; this is recommended before employing operations that depend on a particular document structure, such as navigation with XPointers. =head2 METHODS =over 4 =item splitText (offset) Breaks this Text node into two Text nodes at the specified offset, keeping both in the tree as siblings. This node then only contains all the content up to the offset point. And a new Text node, which is inserted as the next sibling of this node, contains all the content at and after the offset point. Parameters: I The offset at which to split, starting from 0. Return Value: The new Text node. DOMExceptions: =over 4 =item * INDEX_SIZE_ERR Raised if the specified offset is negative or greater than the number of characters in data. =item * NO_MODIFICATION_ALLOWED_ERR Raised if this node is readonly. =back =back XML-DOM-1.44/lib/XML/DOM/AttDef.pod0000644000076400007640000000122607045337006016452 0ustar tjmathertjmather=head1 NAME XML::DOM::AttDef - A single XML attribute definition in an ATTLIST in XML::DOM =head1 DESCRIPTION XML::DOM::AttDef extends L, but is not part of the DOM Level 1 specification. Each object of this class represents one attribute definition in an AttlistDecl. =head2 METHODS =over 4 =item getName Returns the attribute name. =item getDefault Returns the default value, or undef. =item isFixed Whether the attribute value is fixed (see #FIXED keyword.) =item isRequired Whether the attribute value is required (see #REQUIRED keyword.) =item isImplied Whether the attribute value is implied (see #IMPLIED keyword.) =back XML-DOM-1.44/lib/XML/DOM.pm0000644000076400007640000033012210271306023015123 0ustar tjmathertjmather################################################################################ # # Perl module: XML::DOM # # By Enno Derksen # ################################################################################ # # To do: # # * optimize Attr if it only contains 1 Text node to hold the value # * fix setDocType! # # * BUG: setOwnerDocument - does not process default attr values correctly, # they still point to the old doc. # * change Exception mechanism # * maybe: more checking of sysId etc. # * NoExpand mode (don't know what else is useful) # * various odds and ends: see comments starting with "??" # * normalize(1) could also expand CDataSections and EntityReferences # * parse a DocumentFragment? # * encoding support # ###################################################################### ###################################################################### package XML::DOM; ###################################################################### use strict; use vars qw( $VERSION @ISA @EXPORT $IgnoreReadOnly $SafeMode $TagStyle %DefaultEntities %DecodeDefaultEntity ); use Carp; use XML::RegExp; BEGIN { require XML::Parser; $VERSION = '1.44'; my $needVersion = '2.28'; die "need at least XML::Parser version $needVersion (current=${XML::Parser::VERSION})" unless $XML::Parser::VERSION >= $needVersion; @ISA = qw( Exporter ); # Constants for XML::DOM Node types @EXPORT = qw( UNKNOWN_NODE ELEMENT_NODE ATTRIBUTE_NODE TEXT_NODE CDATA_SECTION_NODE ENTITY_REFERENCE_NODE ENTITY_NODE PROCESSING_INSTRUCTION_NODE COMMENT_NODE DOCUMENT_NODE DOCUMENT_TYPE_NODE DOCUMENT_FRAGMENT_NODE NOTATION_NODE ELEMENT_DECL_NODE ATT_DEF_NODE XML_DECL_NODE ATTLIST_DECL_NODE ); } #---- Constant definitions # Node types sub UNKNOWN_NODE () { 0 } # not in the DOM Spec sub ELEMENT_NODE () { 1 } sub ATTRIBUTE_NODE () { 2 } sub TEXT_NODE () { 3 } sub CDATA_SECTION_NODE () { 4 } sub ENTITY_REFERENCE_NODE () { 5 } sub ENTITY_NODE () { 6 } sub PROCESSING_INSTRUCTION_NODE () { 7 } sub COMMENT_NODE () { 8 } sub DOCUMENT_NODE () { 9 } sub DOCUMENT_TYPE_NODE () { 10} sub DOCUMENT_FRAGMENT_NODE () { 11} sub NOTATION_NODE () { 12} sub ELEMENT_DECL_NODE () { 13 } # not in the DOM Spec sub ATT_DEF_NODE () { 14 } # not in the DOM Spec sub XML_DECL_NODE () { 15 } # not in the DOM Spec sub ATTLIST_DECL_NODE () { 16 } # not in the DOM Spec %DefaultEntities = ( "quot" => '"', "gt" => ">", "lt" => "<", "apos" => "'", "amp" => "&" ); %DecodeDefaultEntity = ( '"' => """, ">" => ">", "<" => "<", "'" => "'", "&" => "&" ); # # If you don't want DOM warnings to use 'warn', override this method like this: # # { # start block scope # local *XML::DOM::warning = \&my_warn; # ... your code here ... # } # end block scope (old XML::DOM::warning takes effect again) # sub warning # static { warn @_; } # # This method defines several things in the caller's package, so you can use named constants to # access the array that holds the member data, i.e. $self->[_Data]. It assumes the caller's package # defines a class that is implemented as a blessed array reference. # Note that this is very similar to using 'use fields' and 'use base'. # # E.g. if $fields eq "Name Model", $parent eq "XML::DOM::Node" and # XML::DOM::Node had "A B C" as fields and it was called from package "XML::DOM::ElementDecl", # then this code would basically do the following: # # package XML::DOM::ElementDecl; # # sub _Name () { 3 } # Note that parent class had three fields # sub _Model () { 4 } # # # Maps constant names (without '_') to constant (int) value # %HFIELDS = ( %XML::DOM::Node::HFIELDS, Name => _Name, Model => _Model ); # # # Define XML:DOM::ElementDecl as a subclass of XML::DOM::Node # @ISA = qw{ XML::DOM::Node }; # # # The following function names can be exported into the user's namespace. # @EXPORT_OK = qw{ _Name _Model }; # # # The following function names can be exported into the user's namespace # # with: import XML::DOM::ElementDecl qw( :Fields ); # %EXPORT_TAGS = ( Fields => qw{ _Name _Model } ); # sub def_fields # static { my ($fields, $parent) = @_; my ($pkg) = caller; no strict 'refs'; my @f = split (/\s+/, $fields); my $n = 0; my %hfields; if (defined $parent) { my %pf = %{"$parent\::HFIELDS"}; %hfields = %pf; $n = scalar (keys %pf); @{"$pkg\::ISA"} = ( $parent ); } my $i = $n; for (@f) { eval "sub $pkg\::_$_ () { $i }"; $hfields{$_} = $i; $i++; } %{"$pkg\::HFIELDS"} = %hfields; @{"$pkg\::EXPORT_OK"} = map { "_$_" } @f; ${"$pkg\::EXPORT_TAGS"}{Fields} = [ map { "_$_" } @f ]; } # sub blesh # { # my $hashref = shift; # my $class = shift; # no strict 'refs'; # my $self = bless [\%{"$class\::FIELDS"}], $class; # if (defined $hashref) # { # for (keys %$hashref) # { # $self->{$_} = $hashref->{$_}; # } # } # $self; # } # sub blesh2 # { # my $hashref = shift; # my $class = shift; # no strict 'refs'; # my $self = bless [\%{"$class\::FIELDS"}], $class; # if (defined $hashref) # { # for (keys %$hashref) # { # eval { $self->{$_} = $hashref->{$_}; }; # croak "ERROR in field [$_] $@" if $@; # } # } # $self; #} # # CDATA section may not contain "]]>" # sub encodeCDATA { my ($str) = shift; $str =~ s/]]>/]]>/go; $str; } # # PI may not contain "?>" # sub encodeProcessingInstruction { my ($str) = shift; $str =~ s/\?>/?>/go; $str; } # #?? Not sure if this is right - must prevent double minus somehow... # sub encodeComment { my ($str) = shift; return undef unless defined $str; $str =~ s/--/--/go; $str; } # # For debugging # sub toHex { my $str = shift; my $len = length($str); my @a = unpack ("C$len", $str); my $s = ""; for (@a) { $s .= sprintf ("%02x", $_); } $s; } # # 2nd parameter $default: list of Default Entity characters that need to be # converted (e.g. "&<" for conversion to "&" and "<" resp.) # sub encodeText { my ($str, $default) = @_; return undef unless defined $str; if ($] >= 5.006) { $str =~ s/([$default])|(]]>)/ defined ($1) ? $DecodeDefaultEntity{$1} : "]]>" /egs; } else { $str =~ s/([\xC0-\xDF].|[\xE0-\xEF]..|[\xF0-\xFF]...)|([$default])|(]]>)/ defined($1) ? XmlUtf8Decode ($1) : defined ($2) ? $DecodeDefaultEntity{$2} : "]]>" /egs; } #?? could there be references that should not be expanded? # e.g. should not replace &#nn; ¯ and &abc; # $str =~ s/&(?!($ReName|#[0-9]+|#x[0-9a-fA-F]+);)/&/go; $str; } # # Used by AttDef - default value # sub encodeAttrValue { encodeText (shift, '"&<>'); } # # Converts an integer (Unicode - ISO/IEC 10646) to a UTF-8 encoded character # sequence. # Used when converting e.g. { or Ͽ to a string value. # # Algorithm borrowed from expat/xmltok.c/XmlUtf8Encode() # # not checking for bad characters: < 0, x00-x08, x0B-x0C, x0E-x1F, xFFFE-xFFFF # sub XmlUtf8Encode { my $n = shift; if ($n < 0x80) { return chr ($n); } elsif ($n < 0x800) { return pack ("CC", (($n >> 6) | 0xc0), (($n & 0x3f) | 0x80)); } elsif ($n < 0x10000) { return pack ("CCC", (($n >> 12) | 0xe0), ((($n >> 6) & 0x3f) | 0x80), (($n & 0x3f) | 0x80)); } elsif ($n < 0x110000) { return pack ("CCCC", (($n >> 18) | 0xf0), ((($n >> 12) & 0x3f) | 0x80), ((($n >> 6) & 0x3f) | 0x80), (($n & 0x3f) | 0x80)); } croak "number is too large for Unicode [$n] in &XmlUtf8Encode"; } # # Opposite of XmlUtf8Decode plus it adds prefix "&#" or "&#x" and suffix ";" # The 2nd parameter ($hex) indicates whether the result is hex encoded or not. # sub XmlUtf8Decode { my ($str, $hex) = @_; my $len = length ($str); my $n; if ($len == 2) { my @n = unpack "C2", $str; $n = (($n[0] & 0x3f) << 6) + ($n[1] & 0x3f); } elsif ($len == 3) { my @n = unpack "C3", $str; $n = (($n[0] & 0x1f) << 12) + (($n[1] & 0x3f) << 6) + ($n[2] & 0x3f); } elsif ($len == 4) { my @n = unpack "C4", $str; $n = (($n[0] & 0x0f) << 18) + (($n[1] & 0x3f) << 12) + (($n[2] & 0x3f) << 6) + ($n[3] & 0x3f); } elsif ($len == 1) # just to be complete... { $n = ord ($str); } else { croak "bad value [$str] for XmlUtf8Decode"; } $hex ? sprintf ("&#x%x;", $n) : "&#$n;"; } $IgnoreReadOnly = 0; $SafeMode = 1; sub getIgnoreReadOnly { $IgnoreReadOnly; } # # The global flag $IgnoreReadOnly is set to the specified value and the old # value of $IgnoreReadOnly is returned. # # To temporarily disable read-only related exceptions (i.e. when parsing # XML or temporarily), do the following: # # my $oldIgnore = XML::DOM::ignoreReadOnly (1); # ... do whatever you want ... # XML::DOM::ignoreReadOnly ($oldIgnore); # sub ignoreReadOnly { my $i = $IgnoreReadOnly; $IgnoreReadOnly = $_[0]; return $i; } # # XML spec seems to break its own rules... (see ENTITY xmlpio) # sub forgiving_isValidName { use bytes; # XML::RegExp expressed in terms encoded UTF8 $_[0] =~ /^$XML::RegExp::Name$/o; } # # Don't allow names starting with xml (either case) # sub picky_isValidName { use bytes; # XML::RegExp expressed in terms encoded UTF8 $_[0] =~ /^$XML::RegExp::Name$/o and $_[0] !~ /^xml/i; } # Be forgiving by default, *isValidName = \&forgiving_isValidName; sub allowReservedNames # static { *isValidName = ($_[0] ? \&forgiving_isValidName : \&picky_isValidName); } sub getAllowReservedNames # static { *isValidName == \&forgiving_isValidName; } # # Always compress empty tags by default # This is used by Element::print. # $TagStyle = sub { 0 }; sub setTagCompression { $TagStyle = shift; } ###################################################################### package XML::DOM::PrintToFileHandle; ###################################################################### # # Used by XML::DOM::Node::printToFileHandle # sub new { my($class, $fn) = @_; bless $fn, $class; } sub print { my ($self, $str) = @_; print $self $str; } ###################################################################### package XML::DOM::PrintToString; ###################################################################### use vars qw{ $Singleton }; # # Used by XML::DOM::Node::toString to concatenate strings # sub new { my($class) = @_; my $str = ""; bless \$str, $class; } sub print { my ($self, $str) = @_; $$self .= $str; } sub toString { my $self = shift; $$self; } sub reset { ${$_[0]} = ""; } $Singleton = new XML::DOM::PrintToString; ###################################################################### package XML::DOM::DOMImplementation; ###################################################################### $XML::DOM::DOMImplementation::Singleton = bless \$XML::DOM::DOMImplementation::Singleton, 'XML::DOM::DOMImplementation'; sub hasFeature { my ($self, $feature, $version) = @_; uc($feature) eq 'XML' and ($version eq '1.0' || $version eq ''); } ###################################################################### package XML::XQL::Node; # forward declaration ###################################################################### ###################################################################### package XML::DOM::Node; ###################################################################### use vars qw( @NodeNames @EXPORT @ISA %HFIELDS @EXPORT_OK @EXPORT_TAGS ); BEGIN { use XML::DOM::DOMException; import Carp; require FileHandle; @ISA = qw( Exporter XML::XQL::Node ); # NOTE: SortKey is used in XML::XQL::Node. # UserData is reserved for users (Hang your data here!) XML::DOM::def_fields ("C A Doc Parent ReadOnly UsedIn Hidden SortKey UserData"); push (@EXPORT, qw( UNKNOWN_NODE ELEMENT_NODE ATTRIBUTE_NODE TEXT_NODE CDATA_SECTION_NODE ENTITY_REFERENCE_NODE ENTITY_NODE PROCESSING_INSTRUCTION_NODE COMMENT_NODE DOCUMENT_NODE DOCUMENT_TYPE_NODE DOCUMENT_FRAGMENT_NODE NOTATION_NODE ELEMENT_DECL_NODE ATT_DEF_NODE XML_DECL_NODE ATTLIST_DECL_NODE )); } #---- Constant definitions # Node types sub UNKNOWN_NODE () {0;} # not in the DOM Spec sub ELEMENT_NODE () {1;} sub ATTRIBUTE_NODE () {2;} sub TEXT_NODE () {3;} sub CDATA_SECTION_NODE () {4;} sub ENTITY_REFERENCE_NODE () {5;} sub ENTITY_NODE () {6;} sub PROCESSING_INSTRUCTION_NODE () {7;} sub COMMENT_NODE () {8;} sub DOCUMENT_NODE () {9;} sub DOCUMENT_TYPE_NODE () {10;} sub DOCUMENT_FRAGMENT_NODE () {11;} sub NOTATION_NODE () {12;} sub ELEMENT_DECL_NODE () {13;} # not in the DOM Spec sub ATT_DEF_NODE () {14;} # not in the DOM Spec sub XML_DECL_NODE () {15;} # not in the DOM Spec sub ATTLIST_DECL_NODE () {16;} # not in the DOM Spec @NodeNames = ( "UNKNOWN_NODE", # not in the DOM Spec! "ELEMENT_NODE", "ATTRIBUTE_NODE", "TEXT_NODE", "CDATA_SECTION_NODE", "ENTITY_REFERENCE_NODE", "ENTITY_NODE", "PROCESSING_INSTRUCTION_NODE", "COMMENT_NODE", "DOCUMENT_NODE", "DOCUMENT_TYPE_NODE", "DOCUMENT_FRAGMENT_NODE", "NOTATION_NODE", "ELEMENT_DECL_NODE", "ATT_DEF_NODE", "XML_DECL_NODE", "ATTLIST_DECL_NODE" ); sub decoupleUsedIn { my $self = shift; undef $self->[_UsedIn]; # was delete } sub getParentNode { $_[0]->[_Parent]; } sub appendChild { my ($self, $node) = @_; # REC 7473 if ($XML::DOM::SafeMode) { croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; } my $doc = $self->[_Doc]; if ($node->isDocumentFragmentNode) { if ($XML::DOM::SafeMode) { for my $n (@{$node->[_C]}) { croak new XML::DOM::DOMException (WRONG_DOCUMENT_ERR, "nodes belong to different documents") if $doc != $n->[_Doc]; croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "node is ancestor of parent node") if $n->isAncestor ($self); croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "bad node type") if $self->rejectChild ($n); } } my @list = @{$node->[_C]}; # don't try to compress this for my $n (@list) { $n->setParentNode ($self); } push @{$self->[_C]}, @list; } else { if ($XML::DOM::SafeMode) { croak new XML::DOM::DOMException (WRONG_DOCUMENT_ERR, "nodes belong to different documents") if $doc != $node->[_Doc]; croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "node is ancestor of parent node") if $node->isAncestor ($self); croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "bad node type") if $self->rejectChild ($node); } $node->setParentNode ($self); push @{$self->[_C]}, $node; } $node; } sub getChildNodes { # NOTE: if node can't have children, $self->[_C] is undef. my $kids = $_[0]->[_C]; # Return a list if called in list context. wantarray ? (defined ($kids) ? @{ $kids } : ()) : (defined ($kids) ? $kids : $XML::DOM::NodeList::EMPTY); } sub hasChildNodes { my $kids = $_[0]->[_C]; defined ($kids) && @$kids > 0; } # This method is overriden in Document sub getOwnerDocument { $_[0]->[_Doc]; } sub getFirstChild { my $kids = $_[0]->[_C]; defined $kids ? $kids->[0] : undef; } sub getLastChild { my $kids = $_[0]->[_C]; defined $kids ? $kids->[-1] : undef; } sub getPreviousSibling { my $self = shift; my $pa = $self->[_Parent]; return undef unless $pa; my $index = $pa->getChildIndex ($self); return undef unless $index; $pa->getChildAtIndex ($index - 1); } sub getNextSibling { my $self = shift; my $pa = $self->[_Parent]; return undef unless $pa; $pa->getChildAtIndex ($pa->getChildIndex ($self) + 1); } sub insertBefore { my ($self, $node, $refNode) = @_; return $self->appendChild ($node) unless $refNode; # append at the end croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; my @nodes = ($node); @nodes = @{$node->[_C]} if $node->getNodeType == DOCUMENT_FRAGMENT_NODE; my $doc = $self->[_Doc]; for my $n (@nodes) { croak new XML::DOM::DOMException (WRONG_DOCUMENT_ERR, "nodes belong to different documents") if $doc != $n->[_Doc]; croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "node is ancestor of parent node") if $n->isAncestor ($self); croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "bad node type") if $self->rejectChild ($n); } my $index = $self->getChildIndex ($refNode); croak new XML::DOM::DOMException (NOT_FOUND_ERR, "reference node not found") if $index == -1; for my $n (@nodes) { $n->setParentNode ($self); } splice (@{$self->[_C]}, $index, 0, @nodes); $node; } sub replaceChild { my ($self, $node, $refNode) = @_; croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; my @nodes = ($node); @nodes = @{$node->[_C]} if $node->getNodeType == DOCUMENT_FRAGMENT_NODE; for my $n (@nodes) { croak new XML::DOM::DOMException (WRONG_DOCUMENT_ERR, "nodes belong to different documents") if $self->[_Doc] != $n->[_Doc]; croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "node is ancestor of parent node") if $n->isAncestor ($self); croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "bad node type") if $self->rejectChild ($n); } my $index = $self->getChildIndex ($refNode); croak new XML::DOM::DOMException (NOT_FOUND_ERR, "reference node not found") if $index == -1; for my $n (@nodes) { $n->setParentNode ($self); } splice (@{$self->[_C]}, $index, 1, @nodes); $refNode->removeChildHoodMemories; $refNode; } sub removeChild { my ($self, $node) = @_; croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; my $index = $self->getChildIndex ($node); croak new XML::DOM::DOMException (NOT_FOUND_ERR, "reference node not found") if $index == -1; splice (@{$self->[_C]}, $index, 1, ()); $node->removeChildHoodMemories; $node; } # Merge all subsequent Text nodes in this subtree sub normalize { my ($self) = shift; my $prev = undef; # previous Text node return unless defined $self->[_C]; my @nodes = @{$self->[_C]}; my $i = 0; my $n = @nodes; while ($i < $n) { my $node = $self->getChildAtIndex($i); my $type = $node->getNodeType; if (defined $prev) { # It should not merge CDATASections. Dom Spec says: # Adjacent CDATASections nodes are not merged by use # of the Element.normalize() method. if ($type == TEXT_NODE) { $prev->appendData ($node->getData); $self->removeChild ($node); $i--; $n--; } else { $prev = undef; if ($type == ELEMENT_NODE) { $node->normalize; if (defined $node->[_A]) { for my $attr (@{$node->[_A]->getValues}) { $attr->normalize; } } } } } else { if ($type == TEXT_NODE) { $prev = $node; } elsif ($type == ELEMENT_NODE) { $node->normalize; if (defined $node->[_A]) { for my $attr (@{$node->[_A]->getValues}) { $attr->normalize; } } } } $i++; } } # # Return all Element nodes in the subtree that have the specified tagName. # If tagName is "*", all Element nodes are returned. # NOTE: the DOM Spec does not specify a 3rd or 4th parameter # sub getElementsByTagName { my ($self, $tagName, $recurse, $list) = @_; $recurse = 1 unless defined $recurse; $list = (wantarray ? [] : new XML::DOM::NodeList) unless defined $list; return unless defined $self->[_C]; # preorder traversal: check parent node first for my $kid (@{$self->[_C]}) { if ($kid->isElementNode) { if ($tagName eq "*" || $tagName eq $kid->getTagName) { push @{$list}, $kid; } $kid->getElementsByTagName ($tagName, $recurse, $list) if $recurse; } } wantarray ? @{ $list } : $list; } sub getNodeValue { undef; } sub setNodeValue { # no-op } # # Redefined by XML::DOM::Element # sub getAttributes { undef; } #------------------------------------------------------------ # Extra method implementations sub setOwnerDocument { my ($self, $doc) = @_; $self->[_Doc] = $doc; return unless defined $self->[_C]; for my $kid (@{$self->[_C]}) { $kid->setOwnerDocument ($doc); } } sub cloneChildren { my ($self, $node, $deep) = @_; return unless $deep; return unless defined $self->[_C]; local $XML::DOM::IgnoreReadOnly = 1; for my $kid (@{$node->[_C]}) { my $newNode = $kid->cloneNode ($deep); push @{$self->[_C]}, $newNode; $newNode->setParentNode ($self); } } # # For internal use only! # sub removeChildHoodMemories { my ($self) = @_; undef $self->[_Parent]; # was delete } # # Remove circular dependencies. The Node and its children should # not be used afterwards. # sub dispose { my $self = shift; $self->removeChildHoodMemories; if (defined $self->[_C]) { $self->[_C]->dispose; undef $self->[_C]; # was delete } undef $self->[_Doc]; # was delete } # # For internal use only! # sub setParentNode { my ($self, $parent) = @_; # REC 7473 my $oldParent = $self->[_Parent]; if (defined $oldParent) { # remove from current parent my $index = $oldParent->getChildIndex ($self); # NOTE: we don't have to check if [_C] is defined, # because were removing a child here! splice (@{$oldParent->[_C]}, $index, 1, ()); $self->removeChildHoodMemories; } $self->[_Parent] = $parent; } # # This function can return 3 values: # 1: always readOnly # 0: never readOnly # undef: depends on parent node # # Returns 1 for DocumentType, Notation, Entity, EntityReference, Attlist, # ElementDecl, AttDef. # The first 4 are readOnly according to the DOM Spec, the others are always # children of DocumentType. (Naturally, children of a readOnly node have to be # readOnly as well...) # These nodes are always readOnly regardless of who their ancestors are. # Other nodes, e.g. Comment, are readOnly only if their parent is readOnly, # which basically means that one of its ancestors has to be one of the # aforementioned node types. # Document and DocumentFragment return 0 for obvious reasons. # Attr, Element, CDATASection, Text return 0. The DOM spec says that they can # be children of an Entity, but I don't think that that's possible # with the current XML::Parser. # Attr uses a {ReadOnly} property, which is only set if it's part of a AttDef. # Always returns 0 if ignoreReadOnly is set. # sub isReadOnly { # default implementation for Nodes that are always readOnly ! $XML::DOM::IgnoreReadOnly; } sub rejectChild { 1; } sub getNodeTypeName { $NodeNames[$_[0]->getNodeType]; } sub getChildIndex { my ($self, $node) = @_; my $i = 0; return -1 unless defined $self->[_C]; for my $kid (@{$self->[_C]}) { return $i if $kid == $node; $i++; } -1; } sub getChildAtIndex { my $kids = $_[0]->[_C]; defined ($kids) ? $kids->[$_[1]] : undef; } sub isAncestor { my ($self, $node) = @_; do { return 1 if $self == $node; $node = $node->[_Parent]; } while (defined $node); 0; } # # Added for optimization. Overriden in XML::DOM::Text # sub isTextNode { 0; } # # Added for optimization. Overriden in XML::DOM::DocumentFragment # sub isDocumentFragmentNode { 0; } # # Added for optimization. Overriden in XML::DOM::Element # sub isElementNode { 0; } # # Add a Text node with the specified value or append the text to the # previous Node if it is a Text node. # sub addText { # REC 9456 (if it was called) my ($self, $str) = @_; my $node = ${$self->[_C]}[-1]; # $self->getLastChild if (defined ($node) && $node->isTextNode) { # REC 5475 (if it was called) $node->appendData ($str); } else { $node = $self->[_Doc]->createTextNode ($str); $self->appendChild ($node); } $node; } # # Add a CDATASection node with the specified value or append the text to the # previous Node if it is a CDATASection node. # sub addCDATA { my ($self, $str) = @_; my $node = ${$self->[_C]}[-1]; # $self->getLastChild if (defined ($node) && $node->getNodeType == CDATA_SECTION_NODE) { $node->appendData ($str); } else { $node = $self->[_Doc]->createCDATASection ($str); $self->appendChild ($node); } } sub removeChildNodes { my $self = shift; my $cref = $self->[_C]; return unless defined $cref; my $kid; while ($kid = pop @{$cref}) { undef $kid->[_Parent]; # was delete } } sub toString { my $self = shift; my $pr = $XML::DOM::PrintToString::Singleton; $pr->reset; $self->print ($pr); $pr->toString; } sub to_sax { my $self = shift; unshift @_, 'Handler' if (@_ == 1); my %h = @_; my $doch = exists ($h{DocumentHandler}) ? $h{DocumentHandler} : $h{Handler}; my $dtdh = exists ($h{DTDHandler}) ? $h{DTDHandler} : $h{Handler}; my $enth = exists ($h{EntityResolver}) ? $h{EntityResolver} : $h{Handler}; $self->_to_sax ($doch, $dtdh, $enth); } sub printToFile { my ($self, $fileName) = @_; my $fh = new FileHandle ($fileName, "w") || croak "printToFile - can't open output file $fileName"; $self->print ($fh); $fh->close; } # # Use print to print to a FileHandle object (see printToFile code) # sub printToFileHandle { my ($self, $FH) = @_; my $pr = new XML::DOM::PrintToFileHandle ($FH); $self->print ($pr); } # # Used by AttDef::setDefault to convert unexpanded default attribute value # sub expandEntityRefs { my ($self, $str) = @_; my $doctype = $self->[_Doc]->getDoctype; use bytes; # XML::RegExp expressed in terms encoded UTF8 $str =~ s/&($XML::RegExp::Name|(#([0-9]+)|#x([0-9a-fA-F]+)));/ defined($2) ? XML::DOM::XmlUtf8Encode ($3 || hex ($4)) : expandEntityRef ($1, $doctype)/ego; $str; } sub expandEntityRef { my ($entity, $doctype) = @_; my $expanded = $XML::DOM::DefaultEntities{$entity}; return $expanded if defined $expanded; $expanded = $doctype->getEntity ($entity); return $expanded->getValue if (defined $expanded); #?? is this an error? croak "Could not expand entity reference of [$entity]\n"; # return "&$entity;"; # entity not found } sub isHidden { $_[0]->[_Hidden]; } ###################################################################### package XML::DOM::Attr; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("Name Specified", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; sub new { my ($class, $doc, $name, $value, $specified) = @_; if ($XML::DOM::SafeMode) { croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Attr name [$name]") unless XML::DOM::isValidName ($name); } my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_C] = new XML::DOM::NodeList; $self->[_Name] = $name; if (defined $value) { $self->setValue ($value); $self->[_Specified] = (defined $specified) ? $specified : 1; } else { $self->[_Specified] = 0; } $self; } sub getNodeType { ATTRIBUTE_NODE; } sub isSpecified { $_[0]->[_Specified]; } sub getName { $_[0]->[_Name]; } sub getValue { my $self = shift; my $value = ""; for my $kid (@{$self->[_C]}) { $value .= $kid->getData if defined $kid->getData; } $value; } sub setValue { my ($self, $value) = @_; # REC 1147 $self->removeChildNodes; $self->appendChild ($self->[_Doc]->createTextNode ($value)); $self->[_Specified] = 1; } sub getNodeName { $_[0]->getName; } sub getNodeValue { $_[0]->getValue; } sub setNodeValue { $_[0]->setValue ($_[1]); } sub cloneNode { my ($self) = @_; # parameter deep is ignored my $node = $self->[_Doc]->createAttribute ($self->getName); $node->[_Specified] = $self->[_Specified]; $node->[_ReadOnly] = 1 if $self->[_ReadOnly]; $node->cloneChildren ($self, 1); $node; } #------------------------------------------------------------ # Extra method implementations # sub isReadOnly { # ReadOnly property is set if it's part of a AttDef ! $XML::DOM::IgnoreReadOnly && defined ($_[0]->[_ReadOnly]); } sub print { my ($self, $FILE) = @_; my $name = $self->[_Name]; $FILE->print ("$name=\""); for my $kid (@{$self->[_C]}) { if ($kid->getNodeType == TEXT_NODE) { $FILE->print (XML::DOM::encodeAttrValue ($kid->getData)); } else # ENTITY_REFERENCE_NODE { $kid->print ($FILE); } } $FILE->print ("\""); } sub rejectChild { my $t = $_[1]->getNodeType; $t != TEXT_NODE && $t != ENTITY_REFERENCE_NODE; } ###################################################################### package XML::DOM::ProcessingInstruction; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("Target Data", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; sub new { my ($class, $doc, $target, $data, $hidden) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad ProcessingInstruction Target [$target]") unless (XML::DOM::isValidName ($target) && $target !~ /^xml$/io); my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_Target] = $target; $self->[_Data] = $data; $self->[_Hidden] = $hidden; $self; } sub getNodeType { PROCESSING_INSTRUCTION_NODE; } sub getTarget { $_[0]->[_Target]; } sub getData { $_[0]->[_Data]; } sub setData { my ($self, $data) = @_; croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; $self->[_Data] = $data; } sub getNodeName { $_[0]->[_Target]; } # # Same as getData # sub getNodeValue { $_[0]->[_Data]; } sub setNodeValue { $_[0]->setData ($_[1]); } sub cloneNode { my $self = shift; $self->[_Doc]->createProcessingInstruction ($self->getTarget, $self->getData, $self->isHidden); } #------------------------------------------------------------ # Extra method implementations sub isReadOnly { return 0 if $XML::DOM::IgnoreReadOnly; my $pa = $_[0]->[_Parent]; defined ($pa) ? $pa->isReadOnly : 0; } sub print { my ($self, $FILE) = @_; $FILE->print ("print ($self->[_Target]); $FILE->print (" "); $FILE->print (XML::DOM::encodeProcessingInstruction ($self->[_Data])); $FILE->print ("?>"); } sub _to_sax { my ($self, $doch) = @_; $doch->processing_instruction({Target => $self->getTarget, Data => $self->getData}); } ###################################################################### package XML::DOM::Notation; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("Name Base SysId PubId", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; sub new { my ($class, $doc, $name, $base, $sysId, $pubId, $hidden) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Notation Name [$name]") unless XML::DOM::isValidName ($name); my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_Name] = $name; $self->[_Base] = $base; $self->[_SysId] = $sysId; $self->[_PubId] = $pubId; $self->[_Hidden] = $hidden; $self; } sub getNodeType { NOTATION_NODE; } sub getPubId { $_[0]->[_PubId]; } sub setPubId { $_[0]->[_PubId] = $_[1]; } sub getSysId { $_[0]->[_SysId]; } sub setSysId { $_[0]->[_SysId] = $_[1]; } sub getName { $_[0]->[_Name]; } sub setName { $_[0]->[_Name] = $_[1]; } sub getBase { $_[0]->[_Base]; } sub getNodeName { $_[0]->[_Name]; } sub print { my ($self, $FILE) = @_; my $name = $self->[_Name]; my $sysId = $self->[_SysId]; my $pubId = $self->[_PubId]; $FILE->print ("print (" PUBLIC \"$pubId\""); } if (defined $sysId) { $FILE->print (" SYSTEM \"$sysId\""); } $FILE->print (">"); } sub cloneNode { my ($self) = @_; $self->[_Doc]->createNotation ($self->[_Name], $self->[_Base], $self->[_SysId], $self->[_PubId], $self->[_Hidden]); } sub to_expat { my ($self, $iter) = @_; $iter->Notation ($self->getName, $self->getBase, $self->getSysId, $self->getPubId); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; $dtdh->notation_decl ( { Name => $self->getName, Base => $self->getBase, SystemId => $self->getSysId, PublicId => $self->getPubId }); } ###################################################################### package XML::DOM::Entity; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("NotationName Parameter Value Ndata SysId PubId", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; sub new { my ($class, $doc, $notationName, $value, $sysId, $pubId, $ndata, $isParam, $hidden) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Entity Name [$notationName]") unless XML::DOM::isValidName ($notationName); my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_NotationName] = $notationName; $self->[_Parameter] = $isParam; $self->[_Value] = $value; $self->[_Ndata] = $ndata; $self->[_SysId] = $sysId; $self->[_PubId] = $pubId; $self->[_Hidden] = $hidden; $self; #?? maybe Value should be a Text node } sub getNodeType { ENTITY_NODE; } sub getPubId { $_[0]->[_PubId]; } sub getSysId { $_[0]->[_SysId]; } # Dom Spec says: # For unparsed entities, the name of the notation for the # entity. For parsed entities, this is null. #?? do we have unparsed entities? sub getNotationName { $_[0]->[_NotationName]; } sub getNodeName { $_[0]->[_NotationName]; } sub cloneNode { my $self = shift; $self->[_Doc]->createEntity ($self->[_NotationName], $self->[_Value], $self->[_SysId], $self->[_PubId], $self->[_Ndata], $self->[_Parameter], $self->[_Hidden]); } sub rejectChild { return 1; #?? if value is split over subnodes, recode this section # also add: C => new XML::DOM::NodeList, my $t = $_[1]; return $t == TEXT_NODE || $t == ENTITY_REFERENCE_NODE || $t == PROCESSING_INSTRUCTION_NODE || $t == COMMENT_NODE || $t == CDATA_SECTION_NODE || $t == ELEMENT_NODE; } sub getValue { $_[0]->[_Value]; } sub isParameterEntity { $_[0]->[_Parameter]; } sub getNdata { $_[0]->[_Ndata]; } sub print { my ($self, $FILE) = @_; my $name = $self->[_NotationName]; my $par = $self->isParameterEntity ? "% " : ""; $FILE->print ("[_Value]; my $sysId = $self->[_SysId]; my $pubId = $self->[_PubId]; my $ndata = $self->[_Ndata]; if (defined $value) { #?? Not sure what to do if it contains both single and double quote $value = ($value =~ /\"/) ? "'$value'" : "\"$value\""; $FILE->print (" $value"); } if (defined $pubId) { $FILE->print (" PUBLIC \"$pubId\""); } elsif (defined $sysId) { $FILE->print (" SYSTEM"); } if (defined $sysId) { $FILE->print (" \"$sysId\""); } $FILE->print (" NDATA $ndata") if defined $ndata; $FILE->print (">"); } sub to_expat { my ($self, $iter) = @_; my $name = ($self->isParameterEntity ? '%' : "") . $self->getNotationName; $iter->Entity ($name, $self->getValue, $self->getSysId, $self->getPubId, $self->getNdata); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; my $name = ($self->isParameterEntity ? '%' : "") . $self->getNotationName; $dtdh->entity_decl ( { Name => $name, Value => $self->getValue, SystemId => $self->getSysId, PublicId => $self->getPubId, Notation => $self->getNdata } ); } ###################################################################### package XML::DOM::EntityReference; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("EntityName Parameter NoExpand", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; sub new { my ($class, $doc, $name, $parameter, $noExpand) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Entity Name [$name] in EntityReference") unless XML::DOM::isValidName ($name); my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_EntityName] = $name; $self->[_Parameter] = ($parameter || 0); $self->[_NoExpand] = ($noExpand || 0); $self; } sub getNodeType { ENTITY_REFERENCE_NODE; } sub getNodeName { $_[0]->[_EntityName]; } #------------------------------------------------------------ # Extra method implementations sub getEntityName { $_[0]->[_EntityName]; } sub isParameterEntity { $_[0]->[_Parameter]; } sub getData { my $self = shift; my $name = $self->[_EntityName]; my $parameter = $self->[_Parameter]; my $data; if ($self->[_NoExpand]) { $data = "&$name;" if $name; } else { $data = $self->[_Doc]->expandEntity ($name, $parameter); } unless (defined $data) { #?? this is probably an error, but perhaps requires check to NoExpand # will fix it? my $pc = $parameter ? "%" : "&"; $data = "$pc$name;"; } $data; } sub print { my ($self, $FILE) = @_; my $name = $self->[_EntityName]; #?? or do we expand the entities? my $pc = $self->[_Parameter] ? "%" : "&"; $FILE->print ("$pc$name;"); } # Dom Spec says: # [...] but if such an Entity exists, then # the child list of the EntityReference node is the same as that of the # Entity node. # # The resolution of the children of the EntityReference (the replacement # value of the referenced Entity) may be lazily evaluated; actions by the # user (such as calling the childNodes method on the EntityReference # node) are assumed to trigger the evaluation. sub getChildNodes { my $self = shift; my $entity = $self->[_Doc]->getEntity ($self->[_EntityName]); defined ($entity) ? $entity->getChildNodes : new XML::DOM::NodeList; } sub cloneNode { my $self = shift; $self->[_Doc]->createEntityReference ($self->[_EntityName], $self->[_Parameter], $self->[_NoExpand], ); } sub to_expat { my ($self, $iter) = @_; $iter->EntityRef ($self->getEntityName, $self->isParameterEntity); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; my @par = $self->isParameterEntity ? (Parameter => 1) : (); #?? not supported by PerlSAX: $self->isParameterEntity $doch->entity_reference ( { Name => $self->getEntityName, @par } ); } # NOTE: an EntityReference can't really have children, so rejectChild # is not reimplemented (i.e. it always returns 0.) ###################################################################### package XML::DOM::AttDef; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("Name Type Fixed Default Required Implied Quote", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; #------------------------------------------------------------ # Extra method implementations # AttDef is not part of DOM Spec sub new { my ($class, $doc, $name, $attrType, $default, $fixed, $hidden) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Attr name in AttDef [$name]") unless XML::DOM::isValidName ($name); my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_Name] = $name; $self->[_Type] = $attrType; if (defined $default) { if ($default eq "#REQUIRED") { $self->[_Required] = 1; } elsif ($default eq "#IMPLIED") { $self->[_Implied] = 1; } else { # strip off quotes - see Attlist handler in XML::Parser # this regexp doesn't work with 5.8.0 unicode # $default =~ m#^(["'])(.*)['"]$#; # $self->[_Quote] = $1; # keep track of the quote character # $self->[_Default] = $self->setDefault ($2); # workaround for 5.8.0 unicode $default =~ s!^(["'])!!; $self->[_Quote] = $1; $default =~ s!(["'])$!!; $self->[_Default] = $self->setDefault ($default); #?? should default value be decoded - what if it contains e.g. "&" } } $self->[_Fixed] = $fixed if defined $fixed; $self->[_Hidden] = $hidden if defined $hidden; $self; } sub getNodeType { ATT_DEF_NODE; } sub getName { $_[0]->[_Name]; } # So it can be added to a NamedNodeMap sub getNodeName { $_[0]->[_Name]; } sub getType { $_[0]->[_Type]; } sub setType { $_[0]->[_Type] = $_[1]; } sub getDefault { $_[0]->[_Default]; } sub setDefault { my ($self, $value) = @_; # specified=0, it's the default ! my $attr = $self->[_Doc]->createAttribute ($self->[_Name], undef, 0); $attr->[_ReadOnly] = 1; #?? this should be split over Text and EntityReference nodes, just like other # Attr nodes - just expand the text for now $value = $self->expandEntityRefs ($value); $attr->addText ($value); #?? reimplement in NoExpand mode! $attr; } sub isFixed { $_[0]->[_Fixed] || 0; } sub isRequired { $_[0]->[_Required] || 0; } sub isImplied { $_[0]->[_Implied] || 0; } sub print { my ($self, $FILE) = @_; my $name = $self->[_Name]; my $type = $self->[_Type]; my $fixed = $self->[_Fixed]; my $default = $self->[_Default]; # $FILE->print ("$name $type"); # replaced line above with the two lines below # seems to be a bug in perl 5.6.0 that causes # test 3 of dom_jp_attr.t to fail? $FILE->print ($name); $FILE->print (" $type"); $FILE->print (" #FIXED") if defined $fixed; if ($self->[_Required]) { $FILE->print (" #REQUIRED"); } elsif ($self->[_Implied]) { $FILE->print (" #IMPLIED"); } elsif (defined ($default)) { my $quote = $self->[_Quote]; $FILE->print (" $quote"); for my $kid (@{$default->[_C]}) { $kid->print ($FILE); } $FILE->print ($quote); } } sub getDefaultString { my $self = shift; my $default; if ($self->[_Required]) { return "#REQUIRED"; } elsif ($self->[_Implied]) { return "#IMPLIED"; } elsif (defined ($default = $self->[_Default])) { my $quote = $self->[_Quote]; $default = $default->toString; return "$quote$default$quote"; } undef; } sub cloneNode { my $self = shift; my $node = new XML::DOM::AttDef ($self->[_Doc], $self->[_Name], $self->[_Type], undef, $self->[_Fixed]); $node->[_Required] = 1 if $self->[_Required]; $node->[_Implied] = 1 if $self->[_Implied]; $node->[_Fixed] = $self->[_Fixed] if defined $self->[_Fixed]; $node->[_Hidden] = $self->[_Hidden] if defined $self->[_Hidden]; if (defined $self->[_Default]) { $node->[_Default] = $self->[_Default]->cloneNode(1); } $node->[_Quote] = $self->[_Quote]; $node; } sub setOwnerDocument { my ($self, $doc) = @_; $self->SUPER::setOwnerDocument ($doc); if (defined $self->[_Default]) { $self->[_Default]->setOwnerDocument ($doc); } } ###################################################################### package XML::DOM::AttlistDecl; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); import XML::DOM::AttDef qw{ :Fields }; XML::DOM::def_fields ("ElementName", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; #------------------------------------------------------------ # Extra method implementations # AttlistDecl is not part of the DOM Spec sub new { my ($class, $doc, $name) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Element TagName [$name] in AttlistDecl") unless XML::DOM::isValidName ($name); my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_C] = new XML::DOM::NodeList; $self->[_ReadOnly] = 1; $self->[_ElementName] = $name; $self->[_A] = new XML::DOM::NamedNodeMap (Doc => $doc, ReadOnly => 1, Parent => $self); $self; } sub getNodeType { ATTLIST_DECL_NODE; } sub getName { $_[0]->[_ElementName]; } sub getNodeName { $_[0]->[_ElementName]; } sub getAttDef { my ($self, $attrName) = @_; $self->[_A]->getNamedItem ($attrName); } sub addAttDef { my ($self, $attrName, $type, $default, $fixed, $hidden) = @_; my $node = $self->getAttDef ($attrName); if (defined $node) { # data will be ignored if already defined my $elemName = $self->getName; XML::DOM::warning ("multiple definitions of attribute $attrName for element $elemName, only first one is recognized"); } else { $node = new XML::DOM::AttDef ($self->[_Doc], $attrName, $type, $default, $fixed, $hidden); $self->[_A]->setNamedItem ($node); } $node; } sub getDefaultAttrValue { my ($self, $attr) = @_; my $attrNode = $self->getAttDef ($attr); (defined $attrNode) ? $attrNode->getDefault : undef; } sub cloneNode { my ($self, $deep) = @_; my $node = $self->[_Doc]->createAttlistDecl ($self->[_ElementName]); $node->[_A] = $self->[_A]->cloneNode ($deep); $node; } sub setOwnerDocument { my ($self, $doc) = @_; $self->SUPER::setOwnerDocument ($doc); $self->[_A]->setOwnerDocument ($doc); } sub print { my ($self, $FILE) = @_; my $name = $self->getName; my @attlist = @{$self->[_A]->getValues}; my $hidden = 1; for my $att (@attlist) { unless ($att->[_Hidden]) { $hidden = 0; last; } } unless ($hidden) { $FILE->print ("print (" "); $attlist[0]->print ($FILE); } else { for my $attr (@attlist) { next if $attr->[_Hidden]; $FILE->print ("\x0A "); $attr->print ($FILE); } } $FILE->print (">"); } } sub to_expat { my ($self, $iter) = @_; my $tag = $self->getName; for my $a ($self->[_A]->getValues) { my $default = $a->isImplied ? '#IMPLIED' : ($a->isRequired ? '#REQUIRED' : ($a->[_Quote] . $a->getDefault->getValue . $a->[_Quote])); $iter->Attlist ($tag, $a->getName, $a->getType, $default, $a->isFixed); } } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; my $tag = $self->getName; for my $a ($self->[_A]->getValues) { my $default = $a->isImplied ? '#IMPLIED' : ($a->isRequired ? '#REQUIRED' : ($a->[_Quote] . $a->getDefault->getValue . $a->[_Quote])); $dtdh->attlist_decl ({ ElementName => $tag, AttributeName => $a->getName, Type => $a->[_Type], Default => $default, Fixed => $a->isFixed }); } } ###################################################################### package XML::DOM::ElementDecl; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("Name Model", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; #------------------------------------------------------------ # Extra method implementations # ElementDecl is not part of the DOM Spec sub new { my ($class, $doc, $name, $model, $hidden) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Element TagName [$name] in ElementDecl") unless XML::DOM::isValidName ($name); my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_Name] = $name; $self->[_ReadOnly] = 1; $self->[_Model] = $model; $self->[_Hidden] = $hidden; $self; } sub getNodeType { ELEMENT_DECL_NODE; } sub getName { $_[0]->[_Name]; } sub getNodeName { $_[0]->[_Name]; } sub getModel { $_[0]->[_Model]; } sub setModel { my ($self, $model) = @_; $self->[_Model] = $model; } sub print { my ($self, $FILE) = @_; my $name = $self->[_Name]; my $model = $self->[_Model]; $FILE->print ("") unless $self->[_Hidden]; } sub cloneNode { my $self = shift; $self->[_Doc]->createElementDecl ($self->[_Name], $self->[_Model], $self->[_Hidden]); } sub to_expat { #?? add support for Hidden?? (allover, also in _to_sax!!) my ($self, $iter) = @_; $iter->Element ($self->getName, $self->getModel); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; $dtdh->element_decl ( { Name => $self->getName, Model => $self->getModel } ); } ###################################################################### package XML::DOM::Element; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("TagName", "XML::DOM::Node"); } use XML::DOM::DOMException; use XML::DOM::NamedNodeMap; use Carp; sub new { my ($class, $doc, $tagName) = @_; if ($XML::DOM::SafeMode) { croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Element TagName [$tagName]") unless XML::DOM::isValidName ($tagName); } my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_C] = new XML::DOM::NodeList; $self->[_TagName] = $tagName; # Now we're creating the NamedNodeMap only when needed (REC 2313 => 1147) # $self->[_A] = new XML::DOM::NamedNodeMap (Doc => $doc, # Parent => $self); $self; } sub getNodeType { ELEMENT_NODE; } sub getTagName { $_[0]->[_TagName]; } sub getNodeName { $_[0]->[_TagName]; } sub getAttributeNode { my ($self, $name) = @_; return undef unless defined $self->[_A]; $self->getAttributes->{$name}; } sub getAttribute { my ($self, $name) = @_; my $attr = $self->getAttributeNode ($name); (defined $attr) ? $attr->getValue : ""; } sub setAttribute { my ($self, $name, $val) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Attr Name [$name]") unless XML::DOM::isValidName ($name); croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; my $node = $self->getAttributes->{$name}; if (defined $node) { $node->setValue ($val); } else { $node = $self->[_Doc]->createAttribute ($name, $val); $self->[_A]->setNamedItem ($node); } } sub setAttributeNode { my ($self, $node) = @_; my $attr = $self->getAttributes; my $name = $node->getNodeName; # REC 1147 if ($XML::DOM::SafeMode) { croak new XML::DOM::DOMException (WRONG_DOCUMENT_ERR, "nodes belong to different documents") if $self->[_Doc] != $node->[_Doc]; croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; my $attrParent = $node->[_UsedIn]; croak new XML::DOM::DOMException (INUSE_ATTRIBUTE_ERR, "Attr is already used by another Element") if (defined ($attrParent) && $attrParent != $attr); } my $other = $attr->{$name}; $attr->removeNamedItem ($name) if defined $other; $attr->setNamedItem ($node); $other; } sub removeAttributeNode { my ($self, $node) = @_; croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; my $attr = $self->[_A]; unless (defined $attr) { croak new XML::DOM::DOMException (NOT_FOUND_ERR); return undef; } my $name = $node->getNodeName; my $attrNode = $attr->getNamedItem ($name); #?? should it croak if it's the default value? croak new XML::DOM::DOMException (NOT_FOUND_ERR) unless $node == $attrNode; # Not removing anything if it's the default value already return undef unless $node->isSpecified; $attr->removeNamedItem ($name); # Substitute with default value if it's defined my $default = $self->getDefaultAttrValue ($name); if (defined $default) { local $XML::DOM::IgnoreReadOnly = 1; $default = $default->cloneNode (1); $attr->setNamedItem ($default); } $node; } sub removeAttribute { my ($self, $name) = @_; my $attr = $self->[_A]; unless (defined $attr) { croak new XML::DOM::DOMException (NOT_FOUND_ERR); return; } my $node = $attr->getNamedItem ($name); if (defined $node) { #?? could use dispose() to remove circular references for gc, but what if #?? somebody is referencing it? $self->removeAttributeNode ($node); } } sub cloneNode { my ($self, $deep) = @_; my $node = $self->[_Doc]->createElement ($self->getTagName); # Always clone the Attr nodes, even if $deep == 0 if (defined $self->[_A]) { $node->[_A] = $self->[_A]->cloneNode (1); # deep=1 $node->[_A]->setParentNode ($node); } $node->cloneChildren ($self, $deep); $node; } sub getAttributes { $_[0]->[_A] ||= XML::DOM::NamedNodeMap->new (Doc => $_[0]->[_Doc], Parent => $_[0]); } #------------------------------------------------------------ # Extra method implementations # Added for convenience sub setTagName { my ($self, $tagName) = @_; croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR, "bad Element TagName [$tagName]") unless XML::DOM::isValidName ($tagName); $self->[_TagName] = $tagName; } sub isReadOnly { 0; } # Added for optimization. sub isElementNode { 1; } sub rejectChild { my $t = $_[1]->getNodeType; $t != TEXT_NODE && $t != ENTITY_REFERENCE_NODE && $t != PROCESSING_INSTRUCTION_NODE && $t != COMMENT_NODE && $t != CDATA_SECTION_NODE && $t != ELEMENT_NODE; } sub getDefaultAttrValue { my ($self, $attr) = @_; $self->[_Doc]->getDefaultAttrValue ($self->[_TagName], $attr); } sub dispose { my $self = shift; $self->[_A]->dispose if defined $self->[_A]; $self->SUPER::dispose; } sub setOwnerDocument { my ($self, $doc) = @_; $self->SUPER::setOwnerDocument ($doc); $self->[_A]->setOwnerDocument ($doc) if defined $self->[_A]; } sub print { my ($self, $FILE) = @_; my $name = $self->[_TagName]; $FILE->print ("<$name"); if (defined $self->[_A]) { for my $att (@{$self->[_A]->getValues}) { # skip un-specified (default) Attr nodes if ($att->isSpecified) { $FILE->print (" "); $att->print ($FILE); } } } my @kids = @{$self->[_C]}; if (@kids > 0) { $FILE->print (">"); for my $kid (@kids) { $kid->print ($FILE); } $FILE->print (""); } else { my $style = &$XML::DOM::TagStyle ($name, $self); if ($style == 0) { $FILE->print ("/>"); } elsif ($style == 1) { $FILE->print (">"); } else { $FILE->print (" />"); } } } sub check { my ($self, $checker) = @_; die "Usage: \$xml_dom_elem->check (\$checker)" unless $checker; $checker->InitDomElem; $self->to_expat ($checker); $checker->FinalDomElem; } sub to_expat { my ($self, $iter) = @_; my $tag = $self->getTagName; $iter->Start ($tag); if (defined $self->[_A]) { for my $attr ($self->[_A]->getValues) { $iter->Attr ($tag, $attr->getName, $attr->getValue, $attr->isSpecified); } } $iter->EndAttr; for my $kid ($self->getChildNodes) { $kid->to_expat ($iter); } $iter->End; } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; my $tag = $self->getTagName; my @attr = (); my $attrOrder; my $attrDefaulted; if (defined $self->[_A]) { my @spec = (); # names of specified attributes my @unspec = (); # names of defaulted attributes for my $attr ($self->[_A]->getValues) { my $attrName = $attr->getName; push @attr, $attrName, $attr->getValue; if ($attr->isSpecified) { push @spec, $attrName; } else { push @unspec, $attrName; } } $attrOrder = [ @spec, @unspec ]; $attrDefaulted = @spec; } $doch->start_element (defined $attrOrder ? { Name => $tag, Attributes => { @attr }, AttributeOrder => $attrOrder, Defaulted => $attrDefaulted } : { Name => $tag, Attributes => { @attr } } ); for my $kid ($self->getChildNodes) { $kid->_to_sax ($doch, $dtdh, $enth); } $doch->end_element ( { Name => $tag } ); } ###################################################################### package XML::DOM::CharacterData; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("Data", "XML::DOM::Node"); } use XML::DOM::DOMException; use Carp; # # CharacterData nodes should never be created directly, only subclassed! # sub new { my ($class, $doc, $data) = @_; my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_Data] = $data; $self; } sub appendData { my ($self, $data) = @_; if ($XML::DOM::SafeMode) { croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; } $self->[_Data] .= $data; } sub deleteData { my ($self, $offset, $count) = @_; croak new XML::DOM::DOMException (INDEX_SIZE_ERR, "bad offset [$offset]") if ($offset < 0 || $offset >= length ($self->[_Data])); #?? DOM Spec says >, but >= makes more sense! croak new XML::DOM::DOMException (INDEX_SIZE_ERR, "negative count [$count]") if $count < 0; croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; substr ($self->[_Data], $offset, $count) = ""; } sub getData { $_[0]->[_Data]; } sub getLength { length $_[0]->[_Data]; } sub insertData { my ($self, $offset, $data) = @_; croak new XML::DOM::DOMException (INDEX_SIZE_ERR, "bad offset [$offset]") if ($offset < 0 || $offset >= length ($self->[_Data])); #?? DOM Spec says >, but >= makes more sense! croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; substr ($self->[_Data], $offset, 0) = $data; } sub replaceData { my ($self, $offset, $count, $data) = @_; croak new XML::DOM::DOMException (INDEX_SIZE_ERR, "bad offset [$offset]") if ($offset < 0 || $offset >= length ($self->[_Data])); #?? DOM Spec says >, but >= makes more sense! croak new XML::DOM::DOMException (INDEX_SIZE_ERR, "negative count [$count]") if $count < 0; croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; substr ($self->[_Data], $offset, $count) = $data; } sub setData { my ($self, $data) = @_; croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; $self->[_Data] = $data; } sub substringData { my ($self, $offset, $count) = @_; my $data = $self->[_Data]; croak new XML::DOM::DOMException (INDEX_SIZE_ERR, "bad offset [$offset]") if ($offset < 0 || $offset >= length ($data)); #?? DOM Spec says >, but >= makes more sense! croak new XML::DOM::DOMException (INDEX_SIZE_ERR, "negative count [$count]") if $count < 0; substr ($data, $offset, $count); } sub getNodeValue { $_[0]->getData; } sub setNodeValue { $_[0]->setData ($_[1]); } ###################################################################### package XML::DOM::CDATASection; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::CharacterData qw( :DEFAULT :Fields ); import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("", "XML::DOM::CharacterData"); } use XML::DOM::DOMException; sub getNodeName { "#cdata-section"; } sub getNodeType { CDATA_SECTION_NODE; } sub cloneNode { my $self = shift; $self->[_Doc]->createCDATASection ($self->getData); } #------------------------------------------------------------ # Extra method implementations sub isReadOnly { 0; } sub print { my ($self, $FILE) = @_; $FILE->print ("print (XML::DOM::encodeCDATA ($self->getData)); $FILE->print ("]]>"); } sub to_expat { my ($self, $iter) = @_; $iter->CData ($self->getData); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; $doch->start_cdata; $doch->characters ( { Data => $self->getData } ); $doch->end_cdata; } ###################################################################### package XML::DOM::Comment; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::CharacterData qw( :DEFAULT :Fields ); import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("", "XML::DOM::CharacterData"); } use XML::DOM::DOMException; use Carp; #?? setData - could check comment for double minus sub getNodeType { COMMENT_NODE; } sub getNodeName { "#comment"; } sub cloneNode { my $self = shift; $self->[_Doc]->createComment ($self->getData); } #------------------------------------------------------------ # Extra method implementations sub isReadOnly { return 0 if $XML::DOM::IgnoreReadOnly; my $pa = $_[0]->[_Parent]; defined ($pa) ? $pa->isReadOnly : 0; } sub print { my ($self, $FILE) = @_; my $comment = XML::DOM::encodeComment ($self->[_Data]); $FILE->print (""); } sub to_expat { my ($self, $iter) = @_; $iter->Comment ($self->getData); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; $doch->comment ( { Data => $self->getData }); } ###################################################################### package XML::DOM::Text; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::CharacterData qw( :DEFAULT :Fields ); import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("", "XML::DOM::CharacterData"); } use XML::DOM::DOMException; use Carp; sub getNodeType { TEXT_NODE; } sub getNodeName { "#text"; } sub splitText { my ($self, $offset) = @_; my $data = $self->getData; croak new XML::DOM::DOMException (INDEX_SIZE_ERR, "bad offset [$offset]") if ($offset < 0 || $offset >= length ($data)); #?? DOM Spec says >, but >= makes more sense! croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR, "node is ReadOnly") if $self->isReadOnly; my $rest = substr ($data, $offset); $self->setData (substr ($data, 0, $offset)); my $node = $self->[_Doc]->createTextNode ($rest); # insert new node after this node $self->[_Parent]->insertBefore ($node, $self->getNextSibling); $node; } sub cloneNode { my $self = shift; $self->[_Doc]->createTextNode ($self->getData); } #------------------------------------------------------------ # Extra method implementations sub isReadOnly { 0; } sub print { my ($self, $FILE) = @_; $FILE->print (XML::DOM::encodeText ($self->getData, '<&>"')); } sub isTextNode { 1; } sub to_expat { my ($self, $iter) = @_; $iter->Char ($self->getData); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; $doch->characters ( { Data => $self->getData } ); } ###################################################################### package XML::DOM::XMLDecl; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("Version Encoding Standalone", "XML::DOM::Node"); } use XML::DOM::DOMException; #------------------------------------------------------------ # Extra method implementations # XMLDecl is not part of the DOM Spec sub new { my ($class, $doc, $version, $encoding, $standalone) = @_; my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_Version] = $version if defined $version; $self->[_Encoding] = $encoding if defined $encoding; $self->[_Standalone] = $standalone if defined $standalone; $self; } sub setVersion { if (defined $_[1]) { $_[0]->[_Version] = $_[1]; } else { undef $_[0]->[_Version]; # was delete } } sub getVersion { $_[0]->[_Version]; } sub setEncoding { if (defined $_[1]) { $_[0]->[_Encoding] = $_[1]; } else { undef $_[0]->[_Encoding]; # was delete } } sub getEncoding { $_[0]->[_Encoding]; } sub setStandalone { if (defined $_[1]) { $_[0]->[_Standalone] = $_[1]; } else { undef $_[0]->[_Standalone]; # was delete } } sub getStandalone { $_[0]->[_Standalone]; } sub getNodeType { XML_DECL_NODE; } sub cloneNode { my $self = shift; new XML::DOM::XMLDecl ($self->[_Doc], $self->[_Version], $self->[_Encoding], $self->[_Standalone]); } sub print { my ($self, $FILE) = @_; my $version = $self->[_Version]; my $encoding = $self->[_Encoding]; my $standalone = $self->[_Standalone]; $standalone = ($standalone ? "yes" : "no") if defined $standalone; $FILE->print ("print (" version=\"$version\"") if defined $version; $FILE->print (" encoding=\"$encoding\"") if defined $encoding; $FILE->print (" standalone=\"$standalone\"") if defined $standalone; $FILE->print ("?>"); } sub to_expat { my ($self, $iter) = @_; $iter->XMLDecl ($self->getVersion, $self->getEncoding, $self->getStandalone); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; $dtdh->xml_decl ( { Version => $self->getVersion, Encoding => $self->getEncoding, Standalone => $self->getStandalone } ); } ###################################################################### package XML::DOM::DocumentFragment; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("", "XML::DOM::Node"); } use XML::DOM::DOMException; sub new { my ($class, $doc) = @_; my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_C] = new XML::DOM::NodeList; $self; } sub getNodeType { DOCUMENT_FRAGMENT_NODE; } sub getNodeName { "#document-fragment"; } sub cloneNode { my ($self, $deep) = @_; my $node = $self->[_Doc]->createDocumentFragment; $node->cloneChildren ($self, $deep); $node; } #------------------------------------------------------------ # Extra method implementations sub isReadOnly { 0; } sub print { my ($self, $FILE) = @_; for my $node (@{$self->[_C]}) { $node->print ($FILE); } } sub rejectChild { my $t = $_[1]->getNodeType; $t != TEXT_NODE && $t != ENTITY_REFERENCE_NODE && $t != PROCESSING_INSTRUCTION_NODE && $t != COMMENT_NODE && $t != CDATA_SECTION_NODE && $t != ELEMENT_NODE; } sub isDocumentFragmentNode { 1; } ###################################################################### package XML::DOM::DocumentType; # forward declaration ###################################################################### ###################################################################### package XML::DOM::Document; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); XML::DOM::def_fields ("Doctype XmlDecl", "XML::DOM::Node"); } use Carp; use XML::DOM::NodeList; use XML::DOM::DOMException; sub new { my ($class) = @_; my $self = bless [], $class; # keep Doc pointer, even though getOwnerDocument returns undef $self->[_Doc] = $self; $self->[_C] = new XML::DOM::NodeList; $self; } sub getNodeType { DOCUMENT_NODE; } sub getNodeName { "#document"; } #?? not sure about keeping a fixed order of these nodes.... sub getDoctype { $_[0]->[_Doctype]; } sub getDocumentElement { my ($self) = @_; for my $kid (@{$self->[_C]}) { return $kid if $kid->isElementNode; } undef; } sub getOwnerDocument { undef; } sub getImplementation { $XML::DOM::DOMImplementation::Singleton; } # # Added extra parameters ($val, $specified) that are passed straight to the # Attr constructor # sub createAttribute { new XML::DOM::Attr (@_); } sub createCDATASection { new XML::DOM::CDATASection (@_); } sub createComment { new XML::DOM::Comment (@_); } sub createElement { new XML::DOM::Element (@_); } sub createTextNode { new XML::DOM::Text (@_); } sub createProcessingInstruction { new XML::DOM::ProcessingInstruction (@_); } sub createEntityReference { new XML::DOM::EntityReference (@_); } sub createDocumentFragment { new XML::DOM::DocumentFragment (@_); } sub createDocumentType { new XML::DOM::DocumentType (@_); } sub cloneNode { my ($self, $deep) = @_; my $node = new XML::DOM::Document; $node->cloneChildren ($self, $deep); my $xmlDecl = $self->[_XmlDecl]; $node->[_XmlDecl] = $xmlDecl->cloneNode ($deep) if defined $xmlDecl; $node; } sub appendChild { my ($self, $node) = @_; # Extra check: make sure we don't end up with more than one Element. # Don't worry about multiple DocType nodes, because DocumentFragment # can't contain DocType nodes. my @nodes = ($node); @nodes = @{$node->[_C]} if $node->getNodeType == DOCUMENT_FRAGMENT_NODE; my $elem = 0; for my $n (@nodes) { $elem++ if $n->isElementNode; } if ($elem > 0 && defined ($self->getDocumentElement)) { croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "document can have only one Element"); } $self->SUPER::appendChild ($node); } sub insertBefore { my ($self, $node, $refNode) = @_; # Extra check: make sure sure we don't end up with more than 1 Elements. # Don't worry about multiple DocType nodes, because DocumentFragment # can't contain DocType nodes. my @nodes = ($node); @nodes = @{$node->[_C]} if $node->getNodeType == DOCUMENT_FRAGMENT_NODE; my $elem = 0; for my $n (@nodes) { $elem++ if $n->isElementNode; } if ($elem > 0 && defined ($self->getDocumentElement)) { croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "document can have only one Element"); } $self->SUPER::insertBefore ($node, $refNode); } sub replaceChild { my ($self, $node, $refNode) = @_; # Extra check: make sure sure we don't end up with more than 1 Elements. # Don't worry about multiple DocType nodes, because DocumentFragment # can't contain DocType nodes. my @nodes = ($node); @nodes = @{$node->[_C]} if $node->getNodeType == DOCUMENT_FRAGMENT_NODE; my $elem = 0; $elem-- if $refNode->isElementNode; for my $n (@nodes) { $elem++ if $n->isElementNode; } if ($elem > 0 && defined ($self->getDocumentElement)) { croak new XML::DOM::DOMException (HIERARCHY_REQUEST_ERR, "document can have only one Element"); } $self->SUPER::replaceChild ($node, $refNode); } #------------------------------------------------------------ # Extra method implementations sub isReadOnly { 0; } sub print { my ($self, $FILE) = @_; my $xmlDecl = $self->getXMLDecl; if (defined $xmlDecl) { $xmlDecl->print ($FILE); $FILE->print ("\x0A"); } for my $node (@{$self->[_C]}) { $node->print ($FILE); $FILE->print ("\x0A"); } } sub setDoctype { my ($self, $doctype) = @_; my $oldDoctype = $self->[_Doctype]; if (defined $oldDoctype) { $self->replaceChild ($doctype, $oldDoctype); } else { #?? before root element, but after XmlDecl ! $self->appendChild ($doctype); } $_[0]->[_Doctype] = $_[1]; } sub removeDoctype { my $self = shift; my $doctype = $self->removeChild ($self->[_Doctype]); undef $self->[_Doctype]; # was delete $doctype; } sub rejectChild { my $t = $_[1]->getNodeType; $t != ELEMENT_NODE && $t != PROCESSING_INSTRUCTION_NODE && $t != COMMENT_NODE && $t != DOCUMENT_TYPE_NODE; } sub expandEntity { my ($self, $ent, $param) = @_; my $doctype = $self->getDoctype; (defined $doctype) ? $doctype->expandEntity ($ent, $param) : undef; } sub getDefaultAttrValue { my ($self, $elem, $attr) = @_; my $doctype = $self->getDoctype; (defined $doctype) ? $doctype->getDefaultAttrValue ($elem, $attr) : undef; } sub getEntity { my ($self, $entity) = @_; my $doctype = $self->getDoctype; (defined $doctype) ? $doctype->getEntity ($entity) : undef; } sub dispose { my $self = shift; $self->[_XmlDecl]->dispose if defined $self->[_XmlDecl]; undef $self->[_XmlDecl]; # was delete undef $self->[_Doctype]; # was delete $self->SUPER::dispose; } sub setOwnerDocument { # Do nothing, you can't change the owner document! #?? could throw exception... } sub getXMLDecl { $_[0]->[_XmlDecl]; } sub setXMLDecl { $_[0]->[_XmlDecl] = $_[1]; } sub createXMLDecl { new XML::DOM::XMLDecl (@_); } sub createNotation { new XML::DOM::Notation (@_); } sub createElementDecl { new XML::DOM::ElementDecl (@_); } sub createAttlistDecl { new XML::DOM::AttlistDecl (@_); } sub createEntity { new XML::DOM::Entity (@_); } sub createChecker { my $self = shift; my $checker = XML::Checker->new; $checker->Init; my $doctype = $self->getDoctype; $doctype->to_expat ($checker) if $doctype; $checker->Final; $checker; } sub check { my ($self, $checker) = @_; $checker ||= XML::Checker->new; $self->to_expat ($checker); } sub to_expat { my ($self, $iter) = @_; $iter->Init; for my $kid ($self->getChildNodes) { $kid->to_expat ($iter); } $iter->Final; } sub check_sax { my ($self, $checker) = @_; $checker ||= XML::Checker->new; $self->to_sax (Handler => $checker); } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; $doch->start_document; for my $kid ($self->getChildNodes) { $kid->_to_sax ($doch, $dtdh, $enth); } $doch->end_document; } ###################################################################### package XML::DOM::DocumentType; ###################################################################### use vars qw{ @ISA @EXPORT_OK %EXPORT_TAGS %HFIELDS }; BEGIN { import XML::DOM::Node qw( :DEFAULT :Fields ); import XML::DOM::Document qw( :Fields ); XML::DOM::def_fields ("Entities Notations Name SysId PubId Internal", "XML::DOM::Node"); } use XML::DOM::DOMException; use XML::DOM::NamedNodeMap; sub new { my $class = shift; my $doc = shift; my $self = bless [], $class; $self->[_Doc] = $doc; $self->[_ReadOnly] = 1; $self->[_C] = new XML::DOM::NodeList; $self->[_Entities] = new XML::DOM::NamedNodeMap (Doc => $doc, Parent => $self, ReadOnly => 1); $self->[_Notations] = new XML::DOM::NamedNodeMap (Doc => $doc, Parent => $self, ReadOnly => 1); $self->setParams (@_); $self; } sub getNodeType { DOCUMENT_TYPE_NODE; } sub getNodeName { $_[0]->[_Name]; } sub getName { $_[0]->[_Name]; } sub getEntities { $_[0]->[_Entities]; } sub getNotations { $_[0]->[_Notations]; } sub setParentNode { my ($self, $parent) = @_; $self->SUPER::setParentNode ($parent); $parent->[_Doctype] = $self if $parent->getNodeType == DOCUMENT_NODE; } sub cloneNode { my ($self, $deep) = @_; my $node = new XML::DOM::DocumentType ($self->[_Doc], $self->[_Name], $self->[_SysId], $self->[_PubId], $self->[_Internal]); #?? does it make sense to make a shallow copy? # clone the NamedNodeMaps $node->[_Entities] = $self->[_Entities]->cloneNode ($deep); $node->[_Notations] = $self->[_Notations]->cloneNode ($deep); $node->cloneChildren ($self, $deep); $node; } #------------------------------------------------------------ # Extra method implementations sub getSysId { $_[0]->[_SysId]; } sub getPubId { $_[0]->[_PubId]; } sub getInternal { $_[0]->[_Internal]; } sub setSysId { $_[0]->[_SysId] = $_[1]; } sub setPubId { $_[0]->[_PubId] = $_[1]; } sub setInternal { $_[0]->[_Internal] = $_[1]; } sub setName { $_[0]->[_Name] = $_[1]; } sub removeChildHoodMemories { my ($self, $dontWipeReadOnly) = @_; my $parent = $self->[_Parent]; if (defined $parent && $parent->getNodeType == DOCUMENT_NODE) { undef $parent->[_Doctype]; # was delete } $self->SUPER::removeChildHoodMemories; } sub dispose { my $self = shift; $self->[_Entities]->dispose; $self->[_Notations]->dispose; $self->SUPER::dispose; } sub setOwnerDocument { my ($self, $doc) = @_; $self->SUPER::setOwnerDocument ($doc); $self->[_Entities]->setOwnerDocument ($doc); $self->[_Notations]->setOwnerDocument ($doc); } sub expandEntity { my ($self, $ent, $param) = @_; my $kid = $self->[_Entities]->getNamedItem ($ent); return $kid->getValue if (defined ($kid) && $param == $kid->isParameterEntity); undef; # entity not found } sub getAttlistDecl { my ($self, $elemName) = @_; for my $kid (@{$_[0]->[_C]}) { return $kid if ($kid->getNodeType == ATTLIST_DECL_NODE && $kid->getName eq $elemName); } undef; # not found } sub getElementDecl { my ($self, $elemName) = @_; for my $kid (@{$_[0]->[_C]}) { return $kid if ($kid->getNodeType == ELEMENT_DECL_NODE && $kid->getName eq $elemName); } undef; # not found } sub addElementDecl { my ($self, $name, $model, $hidden) = @_; my $node = $self->getElementDecl ($name); #?? could warn unless (defined $node) { $node = $self->[_Doc]->createElementDecl ($name, $model, $hidden); $self->appendChild ($node); } $node; } sub addAttlistDecl { my ($self, $name) = @_; my $node = $self->getAttlistDecl ($name); unless (defined $node) { $node = $self->[_Doc]->createAttlistDecl ($name); $self->appendChild ($node); } $node; } sub addNotation { my $self = shift; my $node = $self->[_Doc]->createNotation (@_); $self->[_Notations]->setNamedItem ($node); $node; } sub addEntity { my $self = shift; my $node = $self->[_Doc]->createEntity (@_); $self->[_Entities]->setNamedItem ($node); $node; } # All AttDefs for a certain Element are merged into a single ATTLIST sub addAttDef { my $self = shift; my $elemName = shift; # create the AttlistDecl if it doesn't exist yet my $attListDecl = $self->addAttlistDecl ($elemName); $attListDecl->addAttDef (@_); } sub getDefaultAttrValue { my ($self, $elem, $attr) = @_; my $elemNode = $self->getAttlistDecl ($elem); (defined $elemNode) ? $elemNode->getDefaultAttrValue ($attr) : undef; } sub getEntity { my ($self, $entity) = @_; $self->[_Entities]->getNamedItem ($entity); } sub setParams { my ($self, $name, $sysid, $pubid, $internal) = @_; $self->[_Name] = $name; #?? not sure if we need to hold on to these... $self->[_SysId] = $sysid if defined $sysid; $self->[_PubId] = $pubid if defined $pubid; $self->[_Internal] = $internal if defined $internal; $self; } sub rejectChild { # DOM Spec says: DocumentType -- no children not $XML::DOM::IgnoreReadOnly; } sub print { my ($self, $FILE) = @_; my $name = $self->[_Name]; my $sysId = $self->[_SysId]; my $pubId = $self->[_PubId]; $FILE->print ("print (" PUBLIC \"$pubId\" \"$sysId\""); } elsif (defined $sysId) { $FILE->print (" SYSTEM \"$sysId\""); } my @entities = @{$self->[_Entities]->getValues}; my @notations = @{$self->[_Notations]->getValues}; my @kids = @{$self->[_C]}; if (@entities || @notations || @kids) { $FILE->print (" [\x0A"); for my $kid (@entities) { next if $kid->[_Hidden]; $FILE->print (" "); $kid->print ($FILE); $FILE->print ("\x0A"); } for my $kid (@notations) { next if $kid->[_Hidden]; $FILE->print (" "); $kid->print ($FILE); $FILE->print ("\x0A"); } for my $kid (@kids) { next if $kid->[_Hidden]; $FILE->print (" "); $kid->print ($FILE); $FILE->print ("\x0A"); } $FILE->print ("]"); } $FILE->print (">"); } sub to_expat { my ($self, $iter) = @_; $iter->Doctype ($self->getName, $self->getSysId, $self->getPubId, $self->getInternal); for my $ent ($self->getEntities->getValues) { next if $ent->[_Hidden]; $ent->to_expat ($iter); } for my $nota ($self->getNotations->getValues) { next if $nota->[_Hidden]; $nota->to_expat ($iter); } for my $kid ($self->getChildNodes) { next if $kid->[_Hidden]; $kid->to_expat ($iter); } } sub _to_sax { my ($self, $doch, $dtdh, $enth) = @_; $dtdh->doctype_decl ( { Name => $self->getName, SystemId => $self->getSysId, PublicId => $self->getPubId, Internal => $self->getInternal }); for my $ent ($self->getEntities->getValues) { next if $ent->[_Hidden]; $ent->_to_sax ($doch, $dtdh, $enth); } for my $nota ($self->getNotations->getValues) { next if $nota->[_Hidden]; $nota->_to_sax ($doch, $dtdh, $enth); } for my $kid ($self->getChildNodes) { next if $kid->[_Hidden]; $kid->_to_sax ($doch, $dtdh, $enth); } } ###################################################################### package XML::DOM::Parser; ###################################################################### use vars qw ( @ISA ); @ISA = qw( XML::Parser ); sub new { my ($class, %args) = @_; $args{Style} = 'XML::Parser::Dom'; $class->SUPER::new (%args); } # This method needed to be overriden so we can restore some global # variables when an exception is thrown sub parse { my $self = shift; local $XML::Parser::Dom::_DP_doc; local $XML::Parser::Dom::_DP_elem; local $XML::Parser::Dom::_DP_doctype; local $XML::Parser::Dom::_DP_in_prolog; local $XML::Parser::Dom::_DP_end_doc; local $XML::Parser::Dom::_DP_saw_doctype; local $XML::Parser::Dom::_DP_in_CDATA; local $XML::Parser::Dom::_DP_keep_CDATA; local $XML::Parser::Dom::_DP_last_text; # Temporarily disable checks that Expat already does (for performance) local $XML::DOM::SafeMode = 0; # Temporarily disable ReadOnly checks local $XML::DOM::IgnoreReadOnly = 1; my $ret; eval { $ret = $self->SUPER::parse (@_); }; my $err = $@; if ($err) { my $doc = $XML::Parser::Dom::_DP_doc; if ($doc) { $doc->dispose; } die $err; } $ret; } my $LWP_USER_AGENT; sub set_LWP_UserAgent { $LWP_USER_AGENT = shift; } sub parsefile { my $self = shift; my $url = shift; # Any other URL schemes? if ($url =~ /^(https?|ftp|wais|gopher|file):/) { # Read the file from the web with LWP. # # Note that we read in the entire file, which may not be ideal # for large files. LWP::UserAgent also provides a callback style # request, which we could convert to a stream with a fork()... my $result; eval { use LWP::UserAgent; my $ua = $self->{LWP_UserAgent}; unless (defined $ua) { unless (defined $LWP_USER_AGENT) { $LWP_USER_AGENT = LWP::UserAgent->new; # Load proxy settings from environment variables, i.e.: # http_proxy, ftp_proxy, no_proxy etc. (see LWP::UserAgent(3)) # You need these to go thru firewalls. $LWP_USER_AGENT->env_proxy; } $ua = $LWP_USER_AGENT; } my $req = new HTTP::Request 'GET', $url; my $response = $ua->request ($req); # Parse the result of the HTTP request $result = $self->parse ($response->content, @_); }; if ($@) { die "Couldn't parsefile [$url] with LWP: $@"; } return $result; } else { return $self->SUPER::parsefile ($url, @_); } } ###################################################################### package XML::Parser::Dom; ###################################################################### BEGIN { import XML::DOM::Node qw( :Fields ); import XML::DOM::CharacterData qw( :Fields ); } use vars qw( $_DP_doc $_DP_elem $_DP_doctype $_DP_in_prolog $_DP_end_doc $_DP_saw_doctype $_DP_in_CDATA $_DP_keep_CDATA $_DP_last_text $_DP_level $_DP_expand_pent ); # This adds a new Style to the XML::Parser class. # From now on you can say: $parser = new XML::Parser ('Style' => 'Dom' ); # but that is *NOT* how a regular user should use it! $XML::Parser::Built_In_Styles{Dom} = 1; sub Init { $_DP_elem = $_DP_doc = new XML::DOM::Document(); $_DP_doctype = new XML::DOM::DocumentType ($_DP_doc); $_DP_doc->setDoctype ($_DP_doctype); $_DP_keep_CDATA = $_[0]->{KeepCDATA}; # Prepare for document prolog $_DP_in_prolog = 1; # We haven't passed the root element yet $_DP_end_doc = 0; # Expand parameter entities in the DTD by default $_DP_expand_pent = defined $_[0]->{ExpandParamEnt} ? $_[0]->{ExpandParamEnt} : 1; if ($_DP_expand_pent) { $_[0]->{DOM_Entity} = {}; } $_DP_level = 0; undef $_DP_last_text; } sub Final { unless ($_DP_saw_doctype) { my $doctype = $_DP_doc->removeDoctype; $doctype->dispose; } $_DP_doc; } sub Char { my $str = $_[1]; if ($_DP_in_CDATA && $_DP_keep_CDATA) { undef $_DP_last_text; # Merge text with previous node if possible $_DP_elem->addCDATA ($str); } else { # Merge text with previous node if possible # Used to be: $expat->{DOM_Element}->addText ($str); if ($_DP_last_text) { $_DP_last_text->[_Data] .= $str; } else { $_DP_last_text = $_DP_doc->createTextNode ($str); $_DP_last_text->[_Parent] = $_DP_elem; push @{$_DP_elem->[_C]}, $_DP_last_text; } } } sub Start { my ($expat, $elem, @attr) = @_; my $parent = $_DP_elem; my $doc = $_DP_doc; if ($parent == $doc) { # End of document prolog, i.e. start of first Element $_DP_in_prolog = 0; } undef $_DP_last_text; my $node = $doc->createElement ($elem); $_DP_elem = $node; $parent->appendChild ($node); my $n = @attr; return unless $n; # Add attributes my $first_default = $expat->specified_attr; my $i = 0; while ($i < $n) { my $specified = $i < $first_default; my $name = $attr[$i++]; undef $_DP_last_text; my $attr = $doc->createAttribute ($name, $attr[$i++], $specified); $node->setAttributeNode ($attr); } } sub End { $_DP_elem = $_DP_elem->[_Parent]; undef $_DP_last_text; # Check for end of root element $_DP_end_doc = 1 if ($_DP_elem == $_DP_doc); } # Called at end of file, i.e. whitespace following last closing tag # Also for Entity references # May also be called at other times... sub Default { my ($expat, $str) = @_; # shift; deb ("Default", @_); if ($_DP_in_prolog) # still processing Document prolog... { #?? could try to store this text later #?? I've only seen whitespace here so far } elsif (!$_DP_end_doc) # ignore whitespace at end of Document { # if ($expat->{NoExpand}) # { # Got a TextDecl () from an external entity here once # create non-parameter entity reference, correct? return unless $str =~ s!^&!!; return unless $str =~ s!;$!!; $_DP_elem->appendChild ( $_DP_doc->createEntityReference ($str,0,$expat->{NoExpand})); undef $_DP_last_text; # } # else # { # $expat->{DOM_Element}->addText ($str); # } } } # XML::Parser 2.19 added support for CdataStart and CdataEnd handlers # If they are not defined, the Default handler is called instead # with the text "createComment ($_[1]); $_DP_elem->appendChild ($comment); } } sub deb { # return; my $name = shift; print "$name (" . join(",", map {defined($_)?$_ : "(undef)"} @_) . ")\n"; } sub Doctype { my $expat = shift; # deb ("Doctype", @_); $_DP_doctype->setParams (@_); $_DP_saw_doctype = 1; } sub Attlist { my $expat = shift; # deb ("Attlist", @_); $_[5] = "Hidden" unless $_DP_expand_pent || $_DP_level == 0; $_DP_doctype->addAttDef (@_); } sub XMLDecl { my $expat = shift; # deb ("XMLDecl", @_); undef $_DP_last_text; $_DP_doc->setXMLDecl (new XML::DOM::XMLDecl ($_DP_doc, @_)); } sub Entity { my $expat = shift; # deb ("Entity", @_); # check to see if Parameter Entity if ($_[5]) { if (defined $_[2]) # was sysid specified? { # Store the Entity mapping for use in ExternEnt if (exists $expat->{DOM_Entity}->{$_[2]}) { # If this ever happens, the name of entity may be the wrong one # when writing out the Document. XML::DOM::warning ("Entity $_[2] is known as %$_[0] and %" . $expat->{DOM_Entity}->{$_[2]}); } else { $expat->{DOM_Entity}->{$_[2]} = $_[0]; } #?? remove this block when XML::Parser has better support } } # no value on things with sysId if (defined $_[2] && defined $_[1]) { # print STDERR "XML::DOM Warning $_[0] had both value($_[1]) And SYSId ($_[2]), removing value.\n"; $_[1] = undef; } undef $_DP_last_text; $_[6] = "Hidden" unless $_DP_expand_pent || $_DP_level == 0; $_DP_doctype->addEntity (@_); } # # Unparsed is called when it encounters e.g: # # # sub Unparsed { Entity (@_); # same as regular ENTITY, as far as DOM is concerned } sub Element { shift; # deb ("Element", @_); # put in to convert XML::Parser::ContentModel object to string # ($_[1] used to be a string in XML::Parser 2.27 and # dom_attr.t fails if we don't stringify here) $_[1] = "$_[1]"; undef $_DP_last_text; push @_, "Hidden" unless $_DP_expand_pent || $_DP_level == 0; $_DP_doctype->addElementDecl (@_); } sub Notation { shift; # deb ("Notation", @_); undef $_DP_last_text; $_[4] = "Hidden" unless $_DP_expand_pent || $_DP_level == 0; $_DP_doctype->addNotation (@_); } sub Proc { shift; # deb ("Proc", @_); undef $_DP_last_text; push @_, "Hidden" unless $_DP_expand_pent || $_DP_level == 0; $_DP_elem->appendChild ($_DP_doc->createProcessingInstruction (@_)); } # # ExternEnt is called when an external entity, such as: # # # # is referenced in the document, e.g. with: &externalEntity; # If ExternEnt is not specified, the entity reference is passed to the Default # handler as e.g. "&externalEntity;", where an EntityReference object is added. # # Also for %externalEntity; references in the DTD itself. # # It can also be called when XML::Parser parses the DOCTYPE header # (just before calling the DocType handler), when it contains a # reference like "docbook.dtd" below: # # 2.27 since file_ext_ent_handler # now returns a IO::File object instead of a content string # Invoke XML::Parser's default ExternEnt handler my $content; if ($XML::Parser::have_LWP) { $content = XML::Parser::lwp_ext_ent_handler (@_); } else { $content = XML::Parser::file_ext_ent_handler (@_); } if ($_DP_expand_pent) { return $content; } else { my $entname = $expat->{DOM_Entity}->{$sysid}; if (defined $entname) { $_DP_doctype->appendChild ($_DP_doc->createEntityReference ($entname, 1, $expat->{NoExpand})); # Wrap the contents in special comments, so we know when we reach the # end of parsing the entity. This way we can omit the contents from # the DTD, when ExpandParamEnt is set to 0. return "" . $content . ""; } else { # We either read the entity ref'd by the system id in the # header, or the entity was undefined. # In either case, don't bother with maintaining the entity # reference, just expand the contents. return "" . $content . ""; } } } 1; # module return code __END__ =head1 NAME XML::DOM - A perl module for building DOM Level 1 compliant document structures =head1 SYNOPSIS use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("file.xml"); # print all HREF attributes of all CODEBASE elements my $nodes = $doc->getElementsByTagName ("CODEBASE"); my $n = $nodes->getLength; for (my $i = 0; $i < $n; $i++) { my $node = $nodes->item ($i); my $href = $node->getAttributeNode ("HREF"); print $href->getValue . "\n"; } # Print doc file $doc->printToFile ("out.xml"); # Print to string print $doc->toString; # Avoid memory leaks - cleanup circular references for garbage collection $doc->dispose; =head1 DESCRIPTION This module extends the XML::Parser module by Clark Cooper. The XML::Parser module is built on top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library. XML::DOM::Parser is derived from XML::Parser. It parses XML strings or files and builds a data structure that conforms to the API of the Document Object Model as described at http://www.w3.org/TR/REC-DOM-Level-1. See the XML::Parser manpage for other available features of the XML::DOM::Parser class. Note that the 'Style' property should not be used (it is set internally.) The XML::Parser I option is more or less supported, in that it will generate EntityReference objects whenever an entity reference is encountered in character data. I'm not sure how useful this is. Any comments are welcome. As described in the synopsis, when you create an XML::DOM::Parser object, the parse and parsefile methods create an I object from the specified input. This Document object can then be examined, modified and written back out to a file or converted to a string. When using XML::DOM with XML::Parser version 2.19 and up, setting the XML::DOM::Parser option I to 1 will store CDATASections in CDATASection nodes, instead of converting them to Text nodes. Subsequent CDATASection nodes will be merged into one. Let me know if this is a problem. When using XML::Parser 2.27 and above, you can suppress expansion of parameter entity references (e.g. %pent;) in the DTD, by setting I to 1 and I to 0. See L for details. A Document has a tree structure consisting of I objects. A Node may contain other nodes, depending on its type. A Document may have Element, Text, Comment, and CDATASection nodes. Element nodes may have Attr, Element, Text, Comment, and CDATASection nodes. The other nodes may not have any child nodes. This module adds several node types that are not part of the DOM spec (yet.) These are: ElementDecl (for declarations), AttlistDecl (for declarations), XMLDecl (for declarations) and AttDef (for attribute definitions in an AttlistDecl.) =head1 XML::DOM Classes The XML::DOM module stores XML documents in a tree structure with a root node of type XML::DOM::Document. Different nodes in tree represent different parts of the XML file. The DOM Level 1 Specification defines the following node types: =over 4 =item * L - Super class of all node types =item * L - The root of the XML document =item * L - Describes the document structure: =item * L - An XML element: ... =item * L - An XML element attribute: name="value" =item * L - Super class of Text, Comment and CDATASection =item * L - Text in an XML element =item * L - Escaped block of text: =item * L - An XML comment: =item * L - Refers to an ENTITY: &ent; or %ent; =item * L - An ENTITY definition: =item * L - =item * L - Lightweight node for cut & paste =item * L - An NOTATION definition: =back In addition, the XML::DOM module contains the following nodes that are not part of the DOM Level 1 Specification: =over 4 =item * L - Defines an element: =item * L - Defines one or more attributes in an =item * L - Defines one attribute in an =item * L - An XML declaration: =back Other classes that are part of the DOM Level 1 Spec: =over 4 =item * L - Provides information about this implementation. Currently it doesn't do much. =item * L - Used internally to store a node's child nodes. Also returned by getElementsByTagName. =item * L - Used internally to store an element's attributes. =back Other classes that are not part of the DOM Level 1 Spec: =over 4 =item * L - An non-validating XML parser that creates XML::DOM::Documents =item * L - A validating XML parser that creates XML::DOM::Documents. It uses L to check against the DocumentType (DTD) =item * L - A PerlSAX handler that creates XML::DOM::Documents. =back =head1 XML::DOM package =over 4 =item Constant definitions The following predefined constants indicate which type of node it is. =back UNKNOWN_NODE (0) The node type is unknown (not part of DOM) ELEMENT_NODE (1) The node is an Element. ATTRIBUTE_NODE (2) The node is an Attr. TEXT_NODE (3) The node is a Text node. CDATA_SECTION_NODE (4) The node is a CDATASection. ENTITY_REFERENCE_NODE (5) The node is an EntityReference. ENTITY_NODE (6) The node is an Entity. PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction. COMMENT_NODE (8) The node is a Comment. DOCUMENT_NODE (9) The node is a Document. DOCUMENT_TYPE_NODE (10) The node is a DocumentType. DOCUMENT_FRAGMENT_NODE (11) The node is a DocumentFragment. NOTATION_NODE (12) The node is a Notation. ELEMENT_DECL_NODE (13) The node is an ElementDecl (not part of DOM) ATT_DEF_NODE (14) The node is an AttDef (not part of DOM) XML_DECL_NODE (15) The node is an XMLDecl (not part of DOM) ATTLIST_DECL_NODE (16) The node is an AttlistDecl (not part of DOM) Usage: if ($node->getNodeType == ELEMENT_NODE) { print "It's an Element"; } B: The DOM Spec does not mention UNKNOWN_NODE and, quite frankly, you should never encounter it. The last 4 node types were added to support the 4 added node classes. =head2 Global Variables =over 4 =item $VERSION The variable $XML::DOM::VERSION contains the version number of this implementation, e.g. "1.43". =back =head2 METHODS These methods are not part of the DOM Level 1 Specification. =over 4 =item getIgnoreReadOnly and ignoreReadOnly (readOnly) The DOM Level 1 Spec does not allow you to edit certain sections of the document, e.g. the DocumentType, so by default this implementation throws DOMExceptions (i.e. NO_MODIFICATION_ALLOWED_ERR) when you try to edit a readonly node. These readonly checks can be disabled by (temporarily) setting the global IgnoreReadOnly flag. The ignoreReadOnly method sets the global IgnoreReadOnly flag and returns its previous value. The getIgnoreReadOnly method simply returns its current value. my $oldIgnore = XML::DOM::ignoreReadOnly (1); eval { ... do whatever you want, catching any other exceptions ... }; XML::DOM::ignoreReadOnly ($oldIgnore); # restore previous value Another way to do it, using a local variable: { # start new scope local $XML::DOM::IgnoreReadOnly = 1; ... do whatever you want, don't worry about exceptions ... } # end of scope ($IgnoreReadOnly is set back to its previous value) =item isValidName (name) Whether the specified name is a valid "Name" as specified in the XML spec. Characters with Unicode values > 127 are now also supported. =item getAllowReservedNames and allowReservedNames (boolean) The first method returns whether reserved names are allowed. The second takes a boolean argument and sets whether reserved names are allowed. The initial value is 1 (i.e. allow reserved names.) The XML spec states that "Names" starting with (X|x)(M|m)(L|l) are reserved for future use. (Amusingly enough, the XML version of the XML spec (REC-xml-19980210.xml) breaks that very rule by defining an ENTITY with the name 'xmlpio'.) A "Name" in this context means the Name token as found in the BNF rules in the XML spec. XML::DOM only checks for errors when you modify the DOM tree, not when the DOM tree is built by the XML::DOM::Parser. =item setTagCompression (funcref) There are 3 possible styles for printing empty Element tags: =over 4 =item Style 0 or XML::DOM uses this style by default for all Elements. =item Style 1 or =item Style 2 or This style is sometimes desired when using XHTML. (Note the extra space before the slash "/") See L Appendix C for more details. =back By default XML::DOM compresses all empty Element tags (style 0.) You can control which style is used for a particular Element by calling XML::DOM::setTagCompression with a reference to a function that takes 2 arguments. The first is the tag name of the Element, the second is the XML::DOM::Element that is being printed. The function should return 0, 1 or 2 to indicate which style should be used to print the empty tag. E.g. XML::DOM::setTagCompression (\&my_tag_compression); sub my_tag_compression { my ($tag, $elem) = @_; # Print empty br, hr and img tags like this:
return 2 if $tag =~ /^(br|hr|img)$/; # Print other empty tags like this: return 1; } =back =head1 IMPLEMENTATION DETAILS =over 4 =item * Perl Mappings The value undef was used when the DOM Spec said null. The DOM Spec says: Applications must encode DOMString using UTF-16 (defined in Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]). In this implementation we use plain old Perl strings encoded in UTF-8 instead of UTF-16. =item * Text and CDATASection nodes The Expat parser expands EntityReferences and CDataSection sections to raw strings and does not indicate where it was found. This implementation does therefore convert both to Text nodes at parse time. CDATASection and EntityReference nodes that are added to an existing Document (by the user) will be preserved. Also, subsequent Text nodes are always merged at parse time. Text nodes that are added later can be merged with the normalize method. Consider using the addText method when adding Text nodes. =item * Printing and toString When printing (and converting an XML Document to a string) the strings have to encoded differently depending on where they occur. E.g. in a CDATASection all substrings are allowed except for "]]>". In regular text, certain characters are not allowed, e.g. ">" has to be converted to ">". These routines should be verified by someone who knows the details. =item * Quotes Certain sections in XML are quoted, like attribute values in an Element. XML::Parser strips these quotes and the print methods in this implementation always uses double quotes, so when parsing and printing a document, single quotes may be converted to double quotes. The default value of an attribute definition (AttDef) in an AttlistDecl, however, will maintain its quotes. =item * AttlistDecl Attribute declarations for a certain Element are always merged into a single AttlistDecl object. =item * Comments Comments in the DOCTYPE section are not kept in the right place. They will become child nodes of the Document. =item * Hidden Nodes Previous versions of XML::DOM would expand parameter entity references (like B<%pent;>), so when printing the DTD, it would print the contents of the external entity, instead of the parameter entity reference. With this release (1.27), you can prevent this by setting the XML::DOM::Parser options ParseParamEnt => 1 and ExpandParamEnt => 0. When it is parsing the contents of the external entities, it *DOES* still add the nodes to the DocumentType, but it marks these nodes by setting the 'Hidden' property. In addition, it adds an EntityReference node to the DocumentType node. When printing the DocumentType node (or when using to_expat() or to_sax()), the 'Hidden' nodes are suppressed, so you will see the parameter entity reference instead of the contents of the external entities. See test case t/dom_extent.t for an example. The reason for adding the 'Hidden' nodes to the DocumentType node, is that the nodes may contain definitions that are referenced further in the document. (Simply not adding the nodes to the DocumentType could cause such entity references to be expanded incorrectly.) Note that you need XML::Parser 2.27 or higher for this to work correctly. =back =head1 SEE ALSO L The Japanese version of this document by Takanori Kawai (Hippo2000) at L The DOM Level 1 specification at L The XML spec (Extensible Markup Language 1.0) at L The L and L manual pages. L also provides a DOM Parser, and is significantly faster than XML::DOM, and is under active development. It requires that you download the Gnome libxml library. L will provide the DOM Level 2 Core API, and should be as fast as XML::LibXML, but more robust, since it uses the memory management functions of libgdome. For more details see L =head1 CAVEATS The method getElementsByTagName() does not return a "live" NodeList. Whether this is an actual caveat is debatable, but a few people on the www-dom mailing list seemed to think so. I haven't decided yet. It's a pain to implement, it slows things down and the benefits seem marginal. Let me know what you think. =head1 AUTHOR Enno Derksen is the original author. Send patches to T.J. Mather at >. Paid support is available from directly from the maintainers of this package. Please see L for more details. Thanks to Clark Cooper for his help with the initial version. =cut XML-DOM-1.44/lib/XML/Handler/0000755000076400007640000000000010271306205015524 5ustar tjmathertjmatherXML-DOM-1.44/lib/XML/Handler/BuildDOM.pm0000644000076400007640000001654607051074016017501 0ustar tjmathertjmatherpackage XML::Handler::BuildDOM; use strict; use XML::DOM; # # TODO: # - add support for parameter entity references # - expand API: insert Elements in the tree or stuff into DocType etc. sub new { my ($class, %args) = @_; bless \%args, $class; } #-------- PerlSAX Handler methods ------------------------------ sub start_document # was Init { my $self = shift; # Define Document if it's not set & not obtainable from Element or DocType $self->{Document} ||= (defined $self->{Element} ? $self->{Element}->getOwnerDocument : undef) || (defined $self->{DocType} ? $self->{DocType}->getOwnerDocument : undef) || new XML::DOM::Document(); $self->{Element} ||= $self->{Document}; unless (defined $self->{DocType}) { $self->{DocType} = $self->{Document}->getDoctype if defined $self->{Document}; unless (defined $self->{Doctype}) { #?? should be $doc->createDocType for extensibility! $self->{DocType} = new XML::DOM::DocumentType ($self->{Document}); $self->{Document}->setDoctype ($self->{DocType}); } } # Prepare for document prolog $self->{InProlog} = 1; # We haven't passed the root element yet $self->{EndDoc} = 0; undef $self->{LastText}; } sub end_document # was Final { my $self = shift; unless ($self->{SawDocType}) { my $doctype = $self->{Document}->removeDoctype; $doctype->dispose; #?? do we always want to destroy the Doctype? } $self->{Document}; } sub characters # was Char { my $self = $_[0]; my $str = $_[1]->{Data}; if ($self->{InCDATA} && $self->{KeepCDATA}) { undef $self->{LastText}; # Merge text with previous node if possible $self->{Element}->addCDATA ($str); } else { # Merge text with previous node if possible # Used to be: $expat->{DOM_Element}->addText ($str); if ($self->{LastText}) { $self->{LastText}->appendData ($str); } else { $self->{LastText} = $self->{Document}->createTextNode ($str); $self->{Element}->appendChild ($self->{LastText}); } } } sub start_element # was Start { my ($self, $hash) = @_; my $elem = $hash->{Name}; my $attr = $hash->{Attributes}; my $parent = $self->{Element}; my $doc = $self->{Document}; if ($parent == $doc) { # End of document prolog, i.e. start of first Element $self->{InProlog} = 0; } undef $self->{LastText}; my $node = $doc->createElement ($elem); $self->{Element} = $node; $parent->appendChild ($node); my $i = 0; my $n = scalar keys %$attr; return unless $n; if (exists $hash->{AttributeOrder}) { my $defaulted = $hash->{Defaulted}; my @order = @{ $hash->{AttributeOrder} }; # Specified attributes for (my $i = 0; $i < $defaulted; $i++) { my $a = $order[$i]; my $att = $doc->createAttribute ($a, $attr->{$a}, 1); $node->setAttributeNode ($att); } # Defaulted attributes for (my $i = $defaulted; $i < @order; $i++) { my $a = $order[$i]; my $att = $doc->createAttribute ($elem, $attr->{$a}, 0); $node->setAttributeNode ($att); } } else { # We're assuming that all attributes were specified (1) for my $a (keys %$attr) { my $att = $doc->createAttribute ($a, $attr->{$a}, 1); $node->setAttributeNode ($att); } } } sub end_element { my $self = shift; $self->{Element} = $self->{Element}->getParentNode; undef $self->{LastText}; # Check for end of root element $self->{EndDoc} = 1 if ($self->{Element} == $self->{Document}); } sub entity_reference # was Default { my $self = $_[0]; my $name = $_[1]->{Name}; $self->{Element}->appendChild ( $self->{Document}->createEntityReference ($name)); undef $self->{LastText}; } sub start_cdata { my $self = shift; $self->{InCDATA} = 1; } sub end_cdata { my $self = shift; $self->{InCDATA} = 0; } sub comment { my $self = $_[0]; local $XML::DOM::IgnoreReadOnly = 1; undef $self->{LastText}; my $comment = $self->{Document}->createComment ($_[1]->{Data}); $self->{Element}->appendChild ($comment); } sub doctype_decl { my ($self, $hash) = @_; $self->{DocType}->setParams ($hash->{Name}, $hash->{SystemId}, $hash->{PublicId}, $hash->{Internal}); $self->{SawDocType} = 1; } sub attlist_decl { my ($self, $hash) = @_; local $XML::DOM::IgnoreReadOnly = 1; $self->{DocType}->addAttDef ($hash->{ElementName}, $hash->{AttributeName}, $hash->{Type}, $hash->{Default}, $hash->{Fixed}); } sub xml_decl { my ($self, $hash) = @_; local $XML::DOM::IgnoreReadOnly = 1; undef $self->{LastText}; $self->{Document}->setXMLDecl (new XML::DOM::XMLDecl ($self->{Document}, $hash->{Version}, $hash->{Encoding}, $hash->{Standalone})); } sub entity_decl { my ($self, $hash) = @_; local $XML::DOM::IgnoreReadOnly = 1; # Parameter Entities names are passed starting with '%' my $parameter = 0; #?? parameter entities currently not supported by PerlSAX! undef $self->{LastText}; $self->{DocType}->addEntity ($parameter, $hash->{Name}, $hash->{Value}, $hash->{SystemId}, $hash->{PublicId}, $hash->{Notation}); } # Unparsed is called when it encounters e.g: # # # sub unparsed_decl { my ($self, $hash) = @_; local $XML::DOM::IgnoreReadOnly = 1; # same as regular ENTITY, as far as DOM is concerned $self->entity_decl ($hash); } sub element_decl { my ($self, $hash) = @_; local $XML::DOM::IgnoreReadOnly = 1; undef $self->{LastText}; $self->{DocType}->addElementDecl ($hash->{Name}, $hash->{Model}); } sub notation_decl { my ($self, $hash) = @_; local $XML::DOM::IgnoreReadOnly = 1; undef $self->{LastText}; $self->{DocType}->addNotation ($hash->{Name}, $hash->{Base}, $hash->{SystemId}, $hash->{PublicId}); } sub processing_instruction { my ($self, $hash) = @_; local $XML::DOM::IgnoreReadOnly = 1; undef $self->{LastText}; $self->{Element}->appendChild (new XML::DOM::ProcessingInstruction ($self->{Document}, $hash->{Target}, $hash->{Data})); } return 1; __END__ =head1 NAME XML::Handler::BuildDOM - PerlSAX handler that creates XML::DOM document structures =head1 SYNOPSIS use XML::Handler::BuildDOM; use XML::Parser::PerlSAX; my $handler = new XML::Handler::BuildDOM (KeepCDATA => 1); my $parser = new XML::Parser::PerlSAX (Handler => $handler); my $doc = $parser->parsefile ("file.xml"); =head1 DESCRIPTION XML::Handler::BuildDOM creates L document structures (i.e. L) from PerlSAX events. This class used to be called L prior to libxml-enno 1.0.1. =head2 CONSTRUCTOR OPTIONS The XML::Handler::BuildDOM constructor supports the following options: =over 4 =item * KeepCDATA => 1 If set to 0 (default), CDATASections will be converted to regular text. =item * Document => $doc If undefined, start_document will extract it from Element or DocType (if set), otherwise it will create a new XML::DOM::Document. =item * Element => $elem If undefined, it is set to Document. This will be the insertion point (or parent) for the nodes defined by the following callbacks. =item * DocType => $doctype If undefined, start_document will extract it from Document (if possible). Otherwise it adds a new XML::DOM::DocumentType to the Document. =back XML-DOM-1.44/t/0000755000076400007640000000000010271306205013204 5ustar tjmathertjmatherXML-DOM-1.44/t/dom_jp_minus.t0000644000076400007640000000166207554171436016101 0ustar tjmathertjmatherBEGIN {print "1..2\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use utf8; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < Hello children Hello Chef Whoowhoo whoo Shut up you loser Cartman, you fat ass END # This example has attribute names with "-" (non alphanumerics) # A previous bug caused attributes with non-alphanumeric names to always # be interpreted as default attribute values. When printing out the document # they would not be printed, because default attributes aren't printed. my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); my $out = $doc->toString; $out =~ tr/\012/\n/; assert_ok ($out eq $str); XML-DOM-1.44/t/dom_noexpand.t0000644000076400007640000000316007457156363016070 0ustar tjmathertjmatheruse strict; use Test; # check the behavior of accessing a text node that contains # an entity. The value should be the entity name when # NoExpand => 1. my $loaded; BEGIN { $| = 1; plan tests => 3; } END { ok(0) unless $loaded; } require XML::DOM; $loaded = 1; ok(1); # set up my $parser = getParser(); my $xml_string = < ]> some regular text data &myEntityWithValue; XML ## TEST ## # parse my $doc = $parser->parse($xml_string); my $root = $doc->getDocumentElement(); my @testNodes = getElementsByTagName($root, 'test1'); my $i = 0; my @expected = ('some regular text data','&myEntityWithValue;'); foreach my $testNode (@testNodes) { # print it out $testNode->normalize; foreach my $child ($testNode->getChildNodes) { ok($child->getData, $expected[$i++]); # print STDERR "Test1 ==> text child of # NODE:",$testNode->getAttribute('name')," has value: ", $child->getData, "\n"; } } exit 0; sub getParser { my ($my_string) = @_; my %options = ( NoExpand => 1, ParseParamEnt => 0, ); my $parser = new XML::DOM::Parser(%options); } # convience method to return a list rather than nodeList sub getElementsByTagName { my ($node, $tag) = @_; my @list; my $nodes = $node->getElementsByTagName($tag); my $numOfNodes= $nodes->getLength(); for (my $i=0; $i< $numOfNodes; $i++) { push @list, $nodes->item($i); } return @list; } XML-DOM-1.44/t/dom_template.t0000644000076400007640000000040606777462737016100 0ustar tjmathertjmatherBEGIN {print "1..2\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 print "ok 2\n"; XML-DOM-1.44/t/dom_jp_cdata.t0000644000076400007640000000204107554171630016006 0ustar tjmathertjmatherBEGIN {print "1..3\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CheckAncestors; use CmpDOM; use utf8; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < <テキスト> <ã‚¿ã‚°> END my $oldStr = < <テキスト> <ã‚¿ã‚°> ã‚¿ã‚°ã®èªè­˜å¯¾è±¡ã¨ã—ãŸããªã„テキストデータ END # Keep CDATASections intact. Without this option set (default), it will convert # CDATASections to Text nodes. The KeepCDATA option is only supported # with XML::Parser versions 2.19 and up. my $parser = new XML::DOM::Parser (KeepCDATA => 1); my $doc = $parser->parse ($str); assert_ok (not $@); my $out = $doc->toString; $out =~ tr/\012/\n/; my $result = ($XML::Parser::VERSION >= 2.19) ? $str : $oldStr; assert_ok ($out eq $result); XML-DOM-1.44/t/dom_jp_modify.t0000644000076400007640000001072607554177247016244 0ustar tjmathertjmatherBEGIN {print "1..16\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CheckAncestors; use utf8; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } sub charRef2U8{ my $charRef = shift; my $u8; $charRef = pack("H*",sprintf("%x",$charRef)); for (my $iLen = 0;$charRef ne "";$charRef = substr($charRef,$iLen)){ if($charRef =~ /^\x00([\x00-\x7F])/){ $iLen = 2; $u8 .= $1; }elsif($charRef =~ /^\x00([\x80-\xFF])/){ $iLen = 2; $u8 .= pack("v@", (ord("\xC0")| ((ord($1) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\x01-\x07])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@", (ord("\xC0")| ((ord($1) & 7) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\x08-\xD7])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@",(ord("\xE0") | ((ord($1) & 240) >> 4))); $u8 .= pack("v@",(ord("\x80") | ((ord($1) & 15) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\xD8-\xDB])([\x00-\xFF])([\xDC-\xDF])([\x00-\xFF])/){ $iLen = 4; $u8 .= pack("v@",(ord("\xF4") |ord($1) & 3)); $u8 .= pack("v@",(ord("\x80") |((ord($2) & 252)>> 2))); $u8 .= pack("v@",(ord("\x80") | ((ord($2) & 3) << 4) | ((ord($3) & 3) << 2) | ((ord($4) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80") | (ord($4) & 63))); }elsif($charRef =~ /^([\xE0-\xFF])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@",(ord("\xE0") | ((ord($1) & 240) >> 4))); $u8 .= pack("v@",(ord("\x80") | ((ord($1) & 15) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }else{ die "can\'t convert!\n"; } } return $u8; } #Test 2 my $str = < <シェフ> ãŠãŠã£ã™ã€ã¿ã‚“㪠<å­ä¾›é”> ã“ã‚“ã¡ã¯ シェフ <ケニー> ウォワォワー <カートマン> ã ã¾ã‚Œè² ã‘犬 <カイル> カートマンã€ã§ã‹ã‘ã¤ã… END my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); my $chef = $doc->getElementsByTagName ("シェフ")->item(0); my $kenny = $doc->getElementsByTagName ("ケニー")->item(0); my $children = $doc->getElementsByTagName ("å­ä¾›é”")->item(0); my $stan = $doc->createElement ("スタン"); $children->appendChild ($stan); my $snap1 =$doc->toString; my $stanlist = $doc->getElementsByTagName ("スタン"); assert_ok ($stanlist->getLength == 1); $children->appendChild ($stan); $stanlist = $doc->getElementsByTagName ("スタン"); assert_ok ($stanlist->getLength == 1); my $snap2 = $doc->toString; assert_ok ($snap1 eq $snap2); # can't add Attr node directly to Element my $attr = $doc->createAttribute ("ãŠã„", "ã¦ã‚ãˆ"); eval { $kenny->appendChild ($attr); }; assert_ok ($@); $kenny->appendChild ($stan); assert_ok ($kenny == $stan->getParentNode); # force hierarchy exception eval { $stan->appendChild ($kenny); }; assert_ok ($@); # force hierarchy exception eval { $stan->appendChild ($stan); }; assert_ok ($@); my $frag = $doc->createDocumentFragment; $frag->appendChild ($stan); $frag->appendChild ($kenny); $chef->appendChild ($frag); assert_ok ($frag->getElementsByTagName ("*")->getLength == 0); assert_ok (not defined $frag->getParentNode); my $kenny2 = $chef->removeChild ($kenny); assert_ok ($kenny == $kenny2); assert_ok (!defined $kenny->getParentNode); # force exception - can't have 2 element nodes in a document eval { $doc->appendChild ($kenny); }; assert_ok ($@); $doc->getDocumentElement->appendChild ($kenny); $kenny2 = $doc->getDocumentElement->replaceChild ($stan, $kenny); assert_ok ($kenny == $kenny2); $doc->getDocumentElement->appendChild ($kenny); assert_ok (CheckAncestors::doit ($doc)); $str = $doc->toString; $str =~ tr/\012/\n/; $str =~ s/(\&\#(\d+);)/sprintf("%s",charRef2U8($2))/eg; my $end = < <シェフ> ãŠãŠã£ã™ã€ã¿ã‚“㪠<å­ä¾›é”> ã“ã‚“ã¡ã¯ シェフ <カートマン> ã ã¾ã‚Œè² ã‘犬 <カイル> カートマンã€ã§ã‹ã‘ã¤ã… <スタン/><ケニー> ウォワォワー END assert_ok ($str eq $end); XML-DOM-1.44/t/dom_text.t0000644000076400007640000000135707415471274015241 0ustar tjmathertjmatheruse strict; use Test; my $loaded; BEGIN { $| = 1; plan tests => 5; } END { ok(0) unless $loaded; } require XML::DOM; $loaded = 1; ok(1); my $str = qq[This is a simple test for XML::DOM::Text.]; # test 1 -- check for correct parsing of input string my $parser = new XML::DOM::Parser; my $doc = eval { $parser->parse($str); }; ok((not $@) && defined $doc); # test 2 -- check for working splitText function # eval it because in splitText was a bug which kills perl my $text = $doc->getDocumentElement()->getFirstChild(); my $new_node = $text->splitText(10); ok($text->getNodeValue, 'This is a '); ok($new_node->getNodeValue, 'simple test for XML::DOM::Text.'); ok($text->getNextSibling, $new_node); XML-DOM-1.44/t/dom_cdata.t0000644000076400007640000000171506777462734015342 0ustar tjmathertjmatherBEGIN {print "1..3\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CheckAncestors; use CmpDOM; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < END my $oldStr = < Trenton Literary Review Honorable Mention END # Keep CDATASections intact. Without this option set (default), it will convert # CDATASections to Text nodes. The KeepCDATA option is only supported # with XML::Parser versions 2.19 and up. my $parser = new XML::DOM::Parser (KeepCDATA => 1); my $doc = $parser->parse ($str); assert_ok (not $@); my $out = $doc->toString; $out =~ tr/\012/\n/; my $result = ($XML::Parser::VERSION >= 2.19) ? $str : $oldStr; assert_ok ($out eq $result); XML-DOM-1.44/t/dom_jp_attr.t0000644000076400007640000001105107554171566015715 0ustar tjmathertjmatherBEGIN {print "1..23\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CheckAncestors; use CmpDOM; use utf8; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } sub charRef2U8{ my $charRef = shift; my $u8; $charRef = pack("H*",sprintf("%x",$charRef)); for (my $iLen = 0;$charRef ne "";$charRef = substr($charRef,$iLen)){ if($charRef =~ /^\x00([\x00-\x7F])/){ $iLen = 2; $u8 .= $1; }elsif($charRef =~ /^\x00([\x80-\xFF])/){ $iLen = 2; $u8 .= pack("v@", (ord("\xC0")| ((ord($1) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\x01-\x07])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@", (ord("\xC0")| ((ord($1) & 7) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\x08-\xD7])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@",(ord("\xE0") | ((ord($1) & 240) >> 4))); $u8 .= pack("v@",(ord("\x80") | ((ord($1) & 15) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\xD8-\xDB])([\x00-\xFF])([\xDC-\xDF])([\x00-\xFF])/){ $iLen = 4; $u8 .= pack("v@",(ord("\xF4") |ord($1) & 3)); $u8 .= pack("v@",(ord("\x80") |((ord($2) & 252)>> 2))); $u8 .= pack("v@",(ord("\x80") | ((ord($2) & 3) << 4) | ((ord($3) & 3) << 2) | ((ord($4) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80") | (ord($4) & 63))); }elsif($charRef =~ /^([\xE0-\xFF])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@",(ord("\xE0") | ((ord($1) & 240) >> 4))); $u8 .= pack("v@",(ord("\x80") | ((ord($1) & 15) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }else{ die "can\'t convert!\n"; } } return $u8; } #Test 2 my $str = < ]> <シンプソンズ> <人物 åå‰="ホーマー" 髪="ãªã—" 性別="男性"/> <人物 åå‰="マージ" 髪="é’色" 性別="女性"/> <人物 åå‰="ãƒãƒ¼ãƒˆ" 性別="ã¾ã æ°—ã«ã—ãªã„"/> <人物 åå‰="リサ" 性別="全然気ã«ã—ãªã„"/> END my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); assert_ok (not $@); my $out = $doc->toString; $out =~ tr/\012/\n/; $out =~ s/(\&\#(\d+);)/sprintf("%s",charRef2U8($2))/eg; assert_ok ($out eq $str); my $root = $doc->getDocumentElement; my $bart = $root->getElementsByTagName("人物")->item(2); assert_ok (defined $bart); my $lisa = $root->getElementsByTagName("人物")->item(3); assert_ok (defined $lisa); my $battr = $bart->getAttributes; assert_ok ($battr->getLength == 3); my $lattr = $lisa->getAttributes; assert_ok ($lattr->getLength == 3); # Use getValues in list context my @attrList = $lattr->getValues; assert_ok (@attrList == 3); my $hair = $battr->getNamedItem ("髪"); assert_ok ($hair->getValue eq "黄色"); assert_ok (not $hair->isSpecified); my $hair2 = $bart->removeAttributeNode ($hair); # we're not returning default attribute nodes assert_ok (not defined $hair2); # check if hair is still defaulted $hair2 = $battr->getNamedItem ("髪"); assert_ok ($hair2->getValue eq "黄色"); assert_ok (not $hair2->isSpecified); # replace default hair with pointy hair $battr->setNamedItem ($doc->createAttribute ("髪", "ã¤ã‚“ã¤ã‚“")); assert_ok ($bart->getAttribute("髪") eq "ã¤ã‚“ã¤ã‚“"); $hair2 = $battr->getNamedItem ("髪"); assert_ok ($hair2->isSpecified); # exception - can't share Attr nodes eval { $lisa->setAttributeNode ($hair2); }; assert_ok ($@); # add it again - it replaces itself $bart->setAttributeNode ($hair2); assert_ok ($battr->getLength == 3); # (cloned) hair transplant from bart to lisa $lisa->setAttributeNode ($hair2->cloneNode); $hair = $lattr->getNamedItem ("髪"); assert_ok ($hair->isSpecified); assert_ok ($hair->getValue eq "ã¤ã‚“ã¤ã‚“"); my $doc2 = $doc->cloneNode(1); my $cmp = new CmpDOM; unless (assert_ok ($doc->equals ($doc2, $cmp))) { # This shouldn't happen print "Context: ", $cmp->context, "\n"; } assert_ok ($hair->getNodeTypeName eq "ATTRIBUTE_NODE"); $bart->removeAttribute ("髪"); # check if hair is still defaulted $hair2 = $battr->getNamedItem ("髪"); assert_ok ($hair2->getValue eq "黄色"); assert_ok (not $hair2->isSpecified); XML-DOM-1.44/t/dom_extent.t0000644000076400007640000000212607222751404015547 0ustar tjmathertjmatherBEGIN {print "1..1\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; $loaded = 1; print "ok 1\n"; # this test is temporary disabled because # i think i have found a bug in expat that # calls the ExternEnt handler instead of Entity for # external parameter entities exit; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } # my $xml =< %globalInfo; ]> EOF # # Tell XML::Parser to parse the external entities (ParseParamEnt => 1) # Tell XML::DOM::Parser to 'hide' the contents of the external entities # so you see '%globalInfo;' when printing. my $parser = new XML::DOM::Parser( ParseParamEnt => 1, ExpandParamEnt => 0, ErrorContext => 5); my $dom = $parser->parse ($xml); my $domstr = $dom->toString; # Compare output with original file assert_ok ($domstr eq $xml); print "$domstr\n$xml\n"; XML-DOM-1.44/t/dom_minus.t0000644000076400007640000000165406777462735015424 0ustar tjmathertjmatherBEGIN {print "1..2\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < Hello children Hello Chef Whoowhoo whoo Shut up you loser Cartman, you fat ass END # This example has attribute names with "-" (non alphanumerics) # A previous bug caused attributes with non-alphanumeric names to always # be interpreted as default attribute values. When printing out the document # they would not be printed, because default attributes aren't printed. my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); my $out = $doc->toString; $out =~ tr/\012/\n/; assert_ok ($out eq $str); XML-DOM-1.44/t/dom_attr.t0000644000076400007640000000546707222676105015231 0ustar tjmathertjmatherBEGIN {print "1..23\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CheckAncestors; use CmpDOM; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < ]> END my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); assert_ok (not $@); my $out = $doc->toString; $out =~ tr/\012/\n/; assert_ok ($out eq $str); my $root = $doc->getDocumentElement; my $bart = $root->getElementsByTagName("person")->item(2); assert_ok (defined $bart); my $lisa = $root->getElementsByTagName("person")->item(3); assert_ok (defined $lisa); my $battr = $bart->getAttributes; assert_ok ($battr->getLength == 3); my $lattr = $lisa->getAttributes; assert_ok ($lattr->getLength == 3); # Use getValues in list context my @attrList = $lattr->getValues; assert_ok (@attrList == 3); my $hair = $battr->getNamedItem ("hair"); assert_ok ($hair->getValue eq "yellow"); assert_ok (not $hair->isSpecified); my $hair2 = $bart->removeAttributeNode ($hair); # we're not returning default attribute nodes assert_ok (not defined $hair2); # check if hair is still defaulted $hair2 = $battr->getNamedItem ("hair"); assert_ok ($hair2->getValue eq "yellow"); assert_ok (not $hair2->isSpecified); # replace default hair with pointy hair $battr->setNamedItem ($doc->createAttribute ("hair", "pointy")); assert_ok ($bart->getAttribute("hair") eq "pointy"); $hair2 = $battr->getNamedItem ("hair"); assert_ok ($hair2->isSpecified); # exception - can't share Attr nodes eval { $lisa->setAttributeNode ($hair2); }; assert_ok ($@); # add it again - it replaces itself $bart->setAttributeNode ($hair2); assert_ok ($battr->getLength == 3); # (cloned) hair transplant from bart to lisa $lisa->setAttributeNode ($hair2->cloneNode); $hair = $lattr->getNamedItem ("hair"); assert_ok ($hair->isSpecified); assert_ok ($hair->getValue eq "pointy"); my $doc2 = $doc->cloneNode(1); my $cmp = new CmpDOM; # (tjmather) there were problems here until I patched # XML::Parser::Dom::Element to convert Model arg to string # from XML::Parser::ContentModel unless (assert_ok ($doc->equals ($doc2, $cmp))) { # This shouldn't happen print "Context: ", $cmp->context, "\n"; } assert_ok ($hair->getNodeTypeName eq "ATTRIBUTE_NODE"); $bart->removeAttribute ("hair"); # check if hair is still defaulted $hair2 = $battr->getNamedItem ("hair"); assert_ok ($hair2->getValue eq "yellow"); assert_ok (not $hair2->isSpecified); XML-DOM-1.44/t/dom_encode.t0000644000076400007640000000111007573671010015470 0ustar tjmathertjmatherBEGIN {print "1..3\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CheckAncestors; use CmpDOM; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < END my $expected = < END my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); assert_ok (not $@); my $out = $doc->toString; assert_ok ($out eq $expected); XML-DOM-1.44/t/dom_jp_example.t0000644000076400007640000000253407554171661016400 0ustar tjmathertjmatherBEGIN {print "1..5\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use utf8; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < <商å“> <商å“番å·>P001 <ジャンル> <生産国>米国 <国内連絡先> <使‰€>隣ã®ã‚¢ãƒ‘ート <商å“> <商å“番å·>0002 <ジャンル> <生産国>米国 <国内連絡先> <使‰€>横須賀市 å…‰ã®ä¸˜ END my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); assert_ok (not $@); my $error = 0; my $ckls = $doc->getElementsByTagName ("商å“"); assert_ok ($ckls->getLength == 2); for my $ckl (@$ckls) { my $cklids = $ckl->getElementsByTagName ("商å“番å·"); my $cklid = $cklids->[0]->getFirstChild->getData; $error++ if ($cklid ne "P001" && $cklid ne "0002"); my $countries = $ckl->getElementsByTagName ("生産国"); my $country = $countries->[0]->getFirstChild->getData; $error++ if ($country ne "米国"); } assert_ok ($error == 0); # Use getElementsByTagName in list context my @ckls = $doc->getElementsByTagName ("商å“"); assert_ok (@ckls == 2); XML-DOM-1.44/t/dom_extent.ent0000644000076400007640000000007107052615622016071 0ustar tjmathertjmather XML-DOM-1.44/t/dom_jp_print.t0000644000076400007640000000561007554312132016065 0ustar tjmathertjmatherBEGIN {print "1..3\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use utf8; $loaded = 1; print "ok 1\n"; #Test 2 sub charRef2U8{ my $charRef = shift; my $u8; $charRef = pack("H*",sprintf("%x",$charRef)); for (my $iLen = 0;$charRef ne "";$charRef = substr($charRef,$iLen)){ if($charRef =~ /^\x00([\x00-\x7F])/){ $iLen = 2; $u8 .= $1; }elsif($charRef =~ /^\x00([\x80-\xFF])/){ $iLen = 2; $u8 .= pack("v@", (ord("\xC0")| ((ord($1) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\x01-\x07])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@", (ord("\xC0")| ((ord($1) & 7) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\x08-\xD7])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@",(ord("\xE0") | ((ord($1) & 240) >> 4))); $u8 .= pack("v@",(ord("\x80") | ((ord($1) & 15) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }elsif($charRef =~ /^([\xD8-\xDB])([\x00-\xFF])([\xDC-\xDF])([\x00-\xFF])/){ $iLen = 4; $u8 .= pack("v@",(ord("\xF4") |ord($1) & 3)); $u8 .= pack("v@",(ord("\x80") |((ord($2) & 252)>> 2))); $u8 .= pack("v@",(ord("\x80") | ((ord($2) & 3) << 4) | ((ord($3) & 3) << 2) | ((ord($4) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80") | (ord($4) & 63))); }elsif($charRef =~ /^([\xE0-\xFF])([\x00-\xFF])/){ $iLen = 2; $u8 .= pack("v@",(ord("\xE0") | ((ord($1) & 240) >> 4))); $u8 .= pack("v@",(ord("\x80") | ((ord($1) & 15) << 2) | ((ord($2) & 192) >> 6))); $u8 .= pack("v@",(ord("\x80")| (ord($2) & 63))); }else{ die "can\'t convert!\n"; } } return $u8; } my $str = < ]> <文書> <ビーãƒã‚¹> ãŠã„ã€ãƒãƒƒãƒˆãƒ˜ãƒƒãƒ‰ï¼ <ãƒãƒƒãƒˆãƒ˜ãƒƒãƒ‰> ãªã‚“ã ã„ã€ãƒ“ーãƒã‚¹ <ビーãƒã‚¹> ãŠã¾ãˆå±ã“ã„ãŸã ã‚ &ã¯ã£; <ãƒãƒƒãƒˆãƒ˜ãƒƒãƒ‰> &ã¯ã£; ãã®ã¨ã‰ã‚Š &ã¯ã£; END my $parser = new XML::DOM::Parser (NoExpand => 1); my $doc = $parser->parse ($str); my $out = $doc->toString; $out =~ tr/\012/\n/; $out =~ s/(\&\#(\d+);)/sprintf("%s",charRef2U8($2))/eg; if ($out ne $str) { print "not "; } print "ok 2\n"; $str = $doc->getElementsByTagName("ãƒãƒƒãƒˆãƒ˜ãƒƒãƒ‰")->item(0)->toString; $str =~ tr/\012/\n/; $str =~ s/(\&\#(\d+);)/sprintf("%s",charRef2U8($2))/eg; if ($str ne "<ãƒãƒƒãƒˆãƒ˜ãƒƒãƒ‰>\nãªã‚“ã ã„ã€ãƒ“ーãƒã‚¹\n ") { print "not "; } print "ok 3\n"; XML-DOM-1.44/t/dom_documenttype.t0000644000076400007640000000044507277764523017002 0ustar tjmathertjmatherBEGIN {print "1..1\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; $loaded = 1; my $xml = new XML::DOM::Document; $xml->setDoctype($xml->createDocumentType('Sample', 'Sample.dtd')); print "not " unless $xml->toString eq qq{\n}; print "ok 1\n"; XML-DOM-1.44/t/dom_extent.dtd0000644000076400007640000000002507222706227016056 0ustar tjmathertjmather XML-DOM-1.44/t/dom_jp_astress.t0000644000076400007640000000210107014636325016411 0ustar tjmathertjmatherBEGIN {print "1..4\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CmpDOM; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } sub filename { my $name = shift; if ((defined $^O and $^O =~ /MSWin32/i || $^O =~ /Windows_95/i || $^O =~ /Windows_NT/i) || (defined $ENV{OS} and $ENV{OS} =~ /MSWin32/i || $ENV{OS} =~ /Windows_95/i || $ENV{OS} =~ /Windows_NT/i)) { $name =~ s!/!\\!g; } elsif ((defined $^O and $^O =~ /MacOS/i) || (defined $ENV{OS} and $ENV{OS} =~ /MacOS/i)) { $name =~ s!/!:!g; $name = ":$name"; } $name; } # Test 2 my $parser = new XML::DOM::Parser; unless (assert_ok ($parser)) { exit; } my $doc; eval { $doc = $parser->parsefile (filename ('samples/minutes.xml')); }; assert_ok (not $@); my $doc2 = $doc->cloneNode (1); my $cmp = new CmpDOM; unless (assert_ok ($doc->equals ($doc2, $cmp))) { print $cmp->context . "\n"; } XML-DOM-1.44/t/build_dom.t0000644000076400007640000000332207221177213015335 0ustar tjmathertjmatherBEGIN {print "1..2\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use XML::Parser::PerlSAX; use XML::Handler::BuildDOM; #use XML::Filter::SAXT; #use XML::Handler::PrintEvents; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < ]> END my $build_dom = new XML::Handler::BuildDOM; my $parser = new XML::Parser::PerlSAX (UseAttributeOrder => 1, Handler => $build_dom); # # This commented code is for debugging. It inserts a PrintEvents handler, # so you can see what events are coming thru. # #my $build_dom = new XML::Handler::BuildDOM; #my $pr_evt = new XML::Handler::PrintEvents; #my $saxt = new XML::Filter::SAXT ({ Handler => $pr_evt }, # { Handler => $build_dom }); #my $parser = new XML::Parser::PerlSAX (UseAttributeOrder => 1, # Handler => $saxt); my $doc = $parser->parse ($str); # It throws an exception with XML::Parser 2.27: # # Can't use string ("toString; $out =~ tr/\012/\n/; print "out: $out --end\n\nstr: $str --end\n"; assert_ok ($out eq $str); XML-DOM-1.44/t/dom_print.t0000644000076400007640000000155107222677265015411 0ustar tjmathertjmatherBEGIN {print "1..3\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; $loaded = 1; print "ok 1\n"; #Test 2 my $str = < ]> Hey Butthead! Yes, Beavis. You farted. &huh; &huh; Yeah &huh; END my $parser = new XML::DOM::Parser (NoExpand => 1); my $doc = $parser->parse ($str); my $out = $doc->toString; $out =~ tr/\012/\n/; if ($out ne $str) { print "not "; } print "ok 2\n"; $str = $doc->getElementsByTagName("butthead")->item(0)->toString; $str =~ tr/\012/\n/; if ($str ne "\nYes, Beavis.\n ") { print "not "; } print "ok 3\n"; XML-DOM-1.44/t/dom_astress.t0000644000076400007640000000317307222664665015743 0ustar tjmathertjmather# Before `make install' is performed this script should be runnable with # `make test'. After `make install' it should work as `perl test.pl' ######################### We start with some black magic to print on failure. # Change 1..1 below to 1..last_test_to_print . # (It may become useful if the test is moved to ./t subdirectory.) BEGIN {print "1..4\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CmpDOM; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } # Replaces the filepath separator if necessary (i.e for Macs and Windows/DOS) sub filename { my $name = shift; if ((defined $^O and $^O =~ /MSWin32/i || $^O =~ /Windows_95/i || $^O =~ /Windows_NT/i) || (defined $ENV{OS} and $ENV{OS} =~ /MSWin32/i || $ENV{OS} =~ /Windows_95/i || $ENV{OS} =~ /Windows_NT/i)) { $name =~ s!/!\\!g; } elsif ((defined $^O and $^O =~ /MacOS/i) || (defined $ENV{OS} and $ENV{OS} =~ /MacOS/i)) { $name =~ s!/!:!g; $name = ":$name"; } $name; } ######################### End of black magic. # Insert your test code below (better if it prints "ok 13" # (correspondingly "not ok 13") depending on the success of chunk 13 # of the test code): # Test 2 my $parser = new XML::DOM::Parser; unless (assert_ok ($parser)) { exit; } my $doc; eval { $doc = $parser->parsefile (filename ('samples/REC-xml-19980210.xml')); }; print $@; assert_ok (not $@); my $doc2 = $doc->cloneNode (1); my $cmp = new CmpDOM; unless (assert_ok ($doc->equals ($doc2, $cmp))) { print $cmp->context . "\n"; } XML-DOM-1.44/t/dom_modify.t0000644000076400007640000000472206777462736015560 0ustar tjmathertjmatherBEGIN {print "1..16\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; use CheckAncestors; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < Hello children Hello Chef Whoowhoo whoo Shut up you loser Cartman, you fat ass END my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); my $chef = $doc->getElementsByTagName ("chef")->item(0); my $kenny = $doc->getElementsByTagName ("kenny")->item(0); my $children = $doc->getElementsByTagName ("children")->item(0); my $stan = $doc->createElement ("stan"); $children->appendChild ($stan); my $snap1 =$doc->toString; my $stanlist = $doc->getElementsByTagName ("stan"); assert_ok ($stanlist->getLength == 1); $children->appendChild ($stan); $stanlist = $doc->getElementsByTagName ("stan"); assert_ok ($stanlist->getLength == 1); my $snap2 = $doc->toString; assert_ok ($snap1 eq $snap2); # can't add Attr node directly to Element my $attr = $doc->createAttribute ("hey", "you"); eval { $kenny->appendChild ($attr); }; assert_ok ($@); $kenny->appendChild ($stan); assert_ok ($kenny == $stan->getParentNode); # force hierarchy exception eval { $stan->appendChild ($kenny); }; assert_ok ($@); # force hierarchy exception eval { $stan->appendChild ($stan); }; assert_ok ($@); my $frag = $doc->createDocumentFragment; $frag->appendChild ($stan); $frag->appendChild ($kenny); $chef->appendChild ($frag); assert_ok ($frag->getElementsByTagName ("*")->getLength == 0); assert_ok (not defined $frag->getParentNode); my $kenny2 = $chef->removeChild ($kenny); assert_ok ($kenny == $kenny2); assert_ok (!defined $kenny->getParentNode); # force exception - can't have 2 element nodes in a document eval { $doc->appendChild ($kenny); }; assert_ok ($@); $doc->getDocumentElement->appendChild ($kenny); $kenny2 = $doc->getDocumentElement->replaceChild ($stan, $kenny); assert_ok ($kenny == $kenny2); $doc->getDocumentElement->appendChild ($kenny); assert_ok (CheckAncestors::doit ($doc)); $str = $doc->toString; $str =~ tr/\012/\n/; my $end = < Hello children Hello Chef Shut up you loser Cartman, you fat ass Whoowhoo whoo END assert_ok ($str eq $end); XML-DOM-1.44/t/dom_example.t0000644000076400007640000000232506777462735015720 0ustar tjmathertjmatherBEGIN {print "1..5\n";} END {print "not ok 1\n" unless $loaded;} use XML::DOM; $loaded = 1; print "ok 1\n"; my $test = 1; sub assert_ok { my $ok = shift; print "not " unless $ok; ++$test; print "ok $test\n"; $ok; } #Test 2 my $str = < P001 USA
HNLLHIWP
0002 USA
45 HOLOMOA STREET
END my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); assert_ok (not $@); my $error = 0; my $ckls = $doc->getElementsByTagName ("CKL"); assert_ok ($ckls->getLength == 2); for my $ckl (@$ckls) { my $cklids = $ckl->getElementsByTagName ("CKLID"); my $cklid = $cklids->[0]->getFirstChild->getData; $error++ if ($cklid ne "P001" && $cklid ne "0002"); my $countries = $ckl->getElementsByTagName ("COUNTRY"); my $country = $countries->[0]->getFirstChild->getData; $error++ if ($country ne "USA"); } assert_ok ($error == 0); # Use getElementsByTagName in list context my @ckls = $doc->getElementsByTagName ("CKL"); assert_ok (@ckls == 2); XML-DOM-1.44/FAQ.xml0000644000076400007640000000677607342214320014113 0ustar tjmathertjmather This document contains answers to common questions. I need to add a lot more stuff and make sure it is valid XML. (The format may change, but since it's XML, it's easy to transform...) libxml-enno FAQ
Known Build/Make Problems 'unrecognized pod directive in ...: head3' warnings in 'make install' These are caused by pod2man. (A bug in pod2man, IMHO.) It doesn't seem to recognize '=head3' in pod files. Ignore the warnings. t/out/dc_attr3.err missing from distribution Create the file. It should be empty (i.e. size 0.) WinZIP sometimes doesn't add or extract the file when it has size 0. I've only seen this on Windows.
Getting Started
Element node has too many children. NAME -------------------------------------------------- and the perl code: -------------------------------------------------- my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile("test.xml"); my $root = $doc->getDocumentElement(); my $i = 0; for my $kids ($root->getChildNodes()) { print STDERR " Child $i is $kids\n"; print STDERR " name ", $kids->getNodeName(), "\n"; print STDERR " type ", $kids->getNodeType(), "\n"; print STDERR " value ", $kids->getNodeValue(), "\n"; $i++; } -------------------------------------------------- And I found that my root node has 3 children, where I thought it should have one: 1) a text node having as value a cariage return and two spaces 2) an element node named Image with no value (in fact the value is in a text child of that element) 3) an other text child having as value a carriage return I thought the XML root would have only one child (Image) with the value 'NAME'. ]]> That's what an XML processor is supposed to do: all characters, including whitespace (outside of markup) are reported to the application. Your DOM script should then decide what to do with unnecessary whitespaces. You can use a PerlSAX filter (like XML::Handler::DetectWS) to filter out the whitespace at parse time, before it reaches the DOM document. How do I move (not copy) parts of one XML::DOM::Document (or Element) to another? You can use cloneNode() to copy a subtree or use removeNode() to cut the subtree out of document A. Then use setOwnerDocument($docB) on the subtree and use insertNode() or appendNode() to add it to document B. (Note that setOwnerNode is not part of the DOM Level 1 specification, so your code won't be portable to other DOM implementations.) One problem: if you have attributes in the subtree with defaulted values (i.e. they were not specified in document A, but XML::Parser (expat) generated them because an ATTLIST declaration specified a default value for that attribute), the attributes will still point to the default values in document A. I haven't found a good solution for this problem yet, because users may want different things in different situations. Any thoughts are welcome.
XML-DOM-1.44/Changes0000644000076400007640000005542210271301112014234 0ustar tjmathertjmatherChange History for XML-DOM: 1.44 (tjmather) 07/25/2005 - Only use 'use bytes' where needed (by XML::RegExp) (Gisle Aas) 1.43 (tjmather) 07/28/2003 - Fixed bug that manifests itself with XML::Parser 2.32 and greater, specify external style as 'XML::Parser::Dom' rather than just 'Dom'. (Matt Sergeant) 1.42 (tjmather) 12/05/2002 - Fixed bug where XML::DOM doesn't encode some characters in attribute values under Perl 5.8.0 (Steve Hay) - Added t/dom_encode.t test to check encoding on attribute values - Fixed warning message and use in XML::DOM::PerlSAX (Mike Castle) 1.41 (tjmather) 10/19/2002 - included XML-Parser-2.31.patch, required for XML::Parser to work with 5.8.0 unicode - use utf8 in unicode test scripts, fixes 5.8.0 test failures NOTE - you should use the utf8 pragma in your programs if you are passing utf8 to XML::DOM. - only use encodeText for Perl < 5.6.0 - replace match w/ substitution in AttDef::new, workaround for 5.8.0 unicode - replace match w/ substitution in Default handler for non-paramter entity reference, workaround for 5.8.0 unicode 1.40 (tjmather) 10/13/2002 - Fixed problem when defining user LWP object (Willems Luc) - Autodetect whether to 'use bytes' (Ed Avis) - Added dispose method to XML::DOM::Parser Synopsis (Ruben Diez) - Fixed warning message in Attr.getValue method (Christian Lizell) 1.39 (tjmather) 04/16/2002 - Deletes value if both System ID and value are defined (Brian Thomas) - Fixed bug, now TextNode->getData doesn't expand entities when NoExpand => 1, added t/dom_noexpand.t test script (Brian Thomas) 1.38 (tjmather) 04/05/2002 - Removed bin/pretty.pl, it is now in XML-Filter-Reindent - Removed return from addCDATA function to fix memory leak (Steve Hay) - Added missing _to_sax method to ProcessingInstruction class (Patrick Whittle) - Removed extranous debugging statement from ExternEnt subroutine (Jon Grov) 1.37 (tjmather) 02/15/2002 - parameter should be last argument of DocumentType::addEntity (Patrick Whittle) 1.36 (tjmather) 01/04/2002 - Replaced 'our' with 'my' in t/dom_text.t, to work with perl < 5.6.0 1.35 (tjmather) 10/26/2001 - Fixed bug with XML::DOM::Comment::_to_sax (Mark Pundsack) - Added test for XML::DOM::Text::splitText (Michael Guennewig) 1.34 (tjmather) 10/07/2001 - Fixed bug with XML::DOM::Text::splitText (Michael Guennewig) - The '>' character is now encoded in attribute values (Stephen Crowley) - hasFeature now is case-insensitve for name of feature and the version defaults to 1.0, in accordance with the DOM 1.0 standard. (Wolfgang Mauerer) 1.33 (tjmather) 8/29/2001 - Added use bytes pragma to XML::DOM to fix unicode problems. 1.32 (tjmather) 8/25/2001 - Separated out XML::UM, XML::Filter::* and XML::Builder::* modules into separate distributions (Idea of Matt Sergeant, as discussed on perl-xml@listserver.activestate.com) - Removed dependency on Parse::Yapp - shouldn't have been there in the first place. 1.31 (tjmather) 6/26/2001 - Added dependency check for XML::RegExp in Makefile.PL 1.30 (tjmather) 6/20/2001 - XML::RegExp, XML::XQL, and XML::Checker separated out from libxml-enno, and libxml-enno renamed to XML-DOM. libxml-enno-1.05 (tjmather) 5/14/2001 - DOM: Fixed XML/DOM.pm to include forward declaration for XML::DOM::DocumentType (Oleh Khoma and Wolfgang Gassner) libxml-enno-1.04 (tjmather) 3/20/2001 - DOM: Fixed XML::DOM::DocumentType::replaceChild to call SUPER::replaceChild instead of SUPER::appendChild (John Salmon) - DOM: Fixed XML::DOM::Text::splitText to use substr instead of (non-existant) substring and insertBefore instead of (non-existant) insertAfter (Duncan Cameron) - DOM: Fixed XML::DOM::Text::print to encode '>' and '"' (John Cope) - DOM: Added code to convert Model argument of XML::Parser::Dom::Element from XML::Parser::ContentModel to string. XML::Parser >= 2.28 passes a XML::Parser::ContentModel object for the model arg of the Element handler while earlier versions passed a string. Fixed cannot find equals method in XML::Parser::ContentModel in dom_extent.t. - DOM: Updated XML::DOM::Entity and XML::Parser::Dom::Entity to reflect new Entity handler API in XML::Parser >= 2.28. There is a new isParam parameter and the name no longer starts with '%' if it is a parameter entity. - DOM: Fixed errors in test cases t/build_dom.t t/dom_attr.t by changing hair (none | blue | yellow) "yellow" to hair (none|blue|yellow) 'yellow' Also fixed t/dom_jp_attr by changing equivalent japanese text. - DOM: Fixed errors in test cases t/dom_print.t and t/dom_jp_print.t by changing to - DOM: Fixed error in test 3 of t/dom_jp_attr.t under Perl 5.6.0 by changing $FILE->print("$name $type") in XML::DOM::AttDef::print. libxml-enno-1.02 (enno) 3/2/2000 - This release fixes some installation related stuff. - Changed =head3 pod directives to =head2 in XML/Checker.pm This used to cause warnings when generating the man pages with pod2man. - Changed dependency of XML::Parser::PerlSAX to require version 0.07. libxml-perl 0.06 had a bad version number, causing a warning when doing 'make'. - Removed the libxml-enno.ppd file from the distribution. As Matt Sergeant pointed out, these PPD files are platform dependant and you can generate them yourselves with 'make ppd'. If you still need one, try Simon Oliver's website (see below.) libxml-enno-1.01 (enno) 2/17/2000 - This release contains XML::DOM 1.27, XML::XQL 0.63 and XML::Checker 0.09. - Added FAQ.xml (Needs more stuff.) - Added dependencies in Makefile.PL for LWP::UserAgent and XML::Parser::PerlSAX. See Makefile.PL for details. - Fixed XML::Filter::SAXT (a PerlSAX that works like Unix' tee command.) - Renamed XML::DOM::PerlSAX to XML::Handler::BuildDOM. A warning will be issued with -w if your code uses XML::DOM::PerlSAX. The reason for this change is that the new name is more consistent with how other PerlSAX related classes are named. Also added a test case for it in t/build_dom.t. - Added XML::Filter::DetectWS, a first stab at a PerlSAX filter that detects ignorable whitespace. Needs more testing! - Added XML::Filter::Reindent, a first stab at a PerlSAX filter that removes and inserts whitespace into the PerlSAX event stream to reindent the XML document for pretty printing functionality. Needs more testing! - Added XML::Handler::Composer. Yet another XML printer/writer that has several features missing in other implementations. See docs for details. Needs more testing! - Added bin/pretty.pl, an XML pretty printer that uses the previous 3 classes. - Added XML::UM for encoding support when printing XML. Needs more testing! - Added XML::Handler::PrintEvents for debugging PerlSAX filters/producers. - Added a PPM description called: libxml-enno.ppd I have no idea whether or how it works, so let me know! (Thanks to Simon Oliver , who has more package files at http://www.bi.umist.ac.uk/packages) - DOM: Reimplemented all Node types as a blessed array reference instead of a blessed hash reference. This should speed things up and consume less memory. Downside is that the code is harder to read and it's harder to extend the Node classes. - DOM: In XML::DOM::Element, attributes are stored in a NamedNodeMap with the hash key 'A' (i.e. _A). Previously, the NamedNodeMap object was created even if there were no attributes. For speed and memory reasons, we now create the NamedNodeMap objects only when needed. - DOM: The parsefile() method of XML::DOM::Parser now supports URLs. It uses LWP to download the remote file. See XML::DOM::Parser documentation for more info. This probably belongs in XML::Parser. - DOM: Added new test cases in t/dom_jp_*.t and a Japanese XML file in samples/minutes.xml. (Thanks to OKABE, Keiichi ) - DOM: Added support for parameter entity references (e.g. %pent;) in the DTD. If the reference points to a SYSTEM entity and XML::Parser read and expanded it (ParseParamEnt=1) and XML::DOM::Parser option ExpandParamEnt=0, then it will still add the contents of the entity to the DTD, but the nodes are flagged as 'Hidden'. In this case, it will also add an EntityReference node to the DTD. The Hidden nodes are skipped when printing, so this way you can suppress the expansion of external parameter entity references. Note that we still want to add these hidden nodes to the DTD, because they might contain e.g. ENTITY declarations that can be referenced further in the document. See new testcase t/dom_extent.t. (Thanks to Guillaume Rouchy ) libxml-enno-1.00 (enno) 10/26/1999 - This is the first version of libxml-enno. It contains XML::DOM 1.26, XML::XQL 0.62 and XML::Checker 0.08. See Changes.DOM, Changes.XQL and Changes.Checker for the change history prior to libxml-enno. - I redid the html documentation. Lots of cross links, more info. Check it out! - Added XML::DOM::PerlSAX. It's a PerlSAX handler that builds DOM trees. - Added XML::Filter::SAXT. It's a PerlSAX handler that forwards the callbacks to 2 or more PerlSAX handlers, kind of like the Unix 'tee' command. - Added XML::RegExp. It contains regular expressions for several XML tokens, as defined in the XML spec. - DOM: XML::DOM warnings now go thru XML::DOM::warning() (which uses warn by default) You can redefine XML::DOM::warning() to change this behavior. Currently, warning() is called only in one place: in XML::DOM::AttListDecl::addAttDef when multiple attribute definitions exist for the same attribute. - DOM: I added the xql() method to XML::DOM::Node as yet another shortcut to perform XQL queries on nodes. Make sure you 'use' XML::XQL and XML::XQL::DOM. 1.25 (enno) 8/24/1999 - Removed $`, $' and $& from code to speed up pattern matching in general - Fixed replaceChild() to process a DocumentFragment correctly (Thanks to Michael Stillwel ) - Fixed appendChild, replaceChild and insertBefore for Document nodes, so you can't add multiple Element nodes. - The XmlDecl field was called XMLDecl in certain place. (Thanks to Matt Sergeant ) - Fixed the non-recursive getElementsByTagName (Thanks to Geert Josten ) 1.24 (enno) 8/2/1999 - Processing Instructions inside an Element node were accidentally added to the Document node. - Added DOM.gif to the distribution and to the XML::DOM home page (http://www.erols.com/enno/dom) which shows a logical view of the DOM interfaces. (Thanks to Vance Christiaanse ) - Added recurse option (2nd parameter) to getElementsByTagName. When set to 0, it only returns direct child nodes. When not specified, it defaults to 1, so it should not break existing code. Note that using 0 is not portable to other DOM implementations. - Fixed the regular expressions for XML tokens to include Unicode values > 127 - Removed XML::DOM::UTF8 (it is no longer needed due to previous fix) - Fixed encodeText(). In certain cases special characters (like ", < and &) would not be converted to " etc. when writing attribute values. (Thanks to Alon Salant ) - When writing XML, single quotes were converted to &apos instead of ' (Thanks to Galactic Taco ) 1.23 (enno) 6/4/1999 - Added XML::DOM::setTagCompression to give you control over how empty element tags are printed. See XML::DOM documentation for details. - Fixed CAVEAT section in XML::DOM documentation to refer to the www-dom mailing list (as opposed to xml-dom.) 1.22 (enno) 5/28/1999 - The XML::DOM documentation was translated into Japanese by Takanori Kawai (aka Hippo2000) at http://member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm - Fixed documentation of XML::DOM::Node::removeChild() It used to list the exceptions HIERARCHY_REQUEST_ERR, WRONG_DOCUMENT_ERR. (Thanks again, Takanori) - XML::DOM::Entity::print was putting double quotes around the notation name after the NDATA keyword. - Added Unparsed handler that calls the Entity handler. - Changed implementation of XML::Parser::Dom to use local variables for slight performance improvement and easier exception handling. - Removed support for old XML::Parser versions (for detecting whether attributes were specified or defaulted.) People should move to latest XML::Parser (currently version 2.23) - If an ENTITY value was e.g. '"', it would be printed as """ (Thanks to Raimund Jacob ) 1.21 (enno) 4/27/1999 - Fixed Start handler to work with new method specified_attr() in XML::Parser 2.23 1.20 (enno) 4/16/1999 - Fixed getPreviousSibling(). If the node was the first child, it would return the last child of its parent instead of undef. (Thanks to Christoph StueckJuergen ) 1.19 (enno) 4/7/1999 - Fixed memory leak: Final handler did not call dispose on a temporarily created DocumentType object that was no longer needed. (Thanks to Loic Dachary ) - Fixed DocumentType::removeChildhoodMemories (which is called by dispose) to work correctly when the DocumentType node is already decoupled from the document. 1.18 (enno) 3/15/1999 - Fixed typo "DOM::XML::XmlUtf8Encode" in expandEntityRefs() to XML::DOM::XmlUtf8Encode. (Thanks to Manessinger Andreas ) - XML::Parser 2.20 added the is_defaulted method, which improves performance a bit when used. Benchmark (see below) went from 6.50s to 6.07s (7%) You don't have to upgrade to 2.20, this change is backwards compatible. - Copied node constants (e.g. ELEMENT_NODE) from XML::DOM::Node to XML::DOM, so you can use ELEMENT_NODE instead of XML::DOM::ELEMENT_NODE. The old style will still work. - Fixed XmlUtf8Decode to add 'x' when printing hex entities (not used by XML::DOM module, but other people might want to use it at some point) - Fixed typo: DocumentType::getSysid should have been getSysId. (Thanks to Bruce Kaskel ) - Added DocumentType::setName, setSysId, setPubId - Added Document::createDocumentType - DocumentType::print no longer prints the square brackets if it has no entities, notations or other children. (Thanks again, Bruce) - The MacOS related bugs in the testcases etc. should all be fixed. (Thanks to Arved Sandstrom and Chris Nandor ) - Added code to ignore Text Declarations from external parsed entities, i.e. They were causing exceptions like "XML::DOM::DOMException(Code=5, Name=INVALID_CHARACTER_ERR, Message=bad Entity Name [] in EntityReference)" (Thanks to Marcin Jagodzinski ) 1.17 (enno) 2/26/1999 (This release was never deployed to CPAN) - Added XML::DOM::UTF8 module which exploits Perl's new utf8 capabilities (version 5.005_55 recommended.) If you don't use/require this module, XML::DOM will work as it did before. If you do use/require it, it allows Unicode characters with character codes > 127 in element and attibute names etc. See XML::DOM::UTF8 man page for details. Note that this module hasn't been tested thoroughly yet. - Fixed Makefile.PL, it would accidentally install CheckAncestors.pm and CmpDOM.pm which were only meant for the test cases. - Added allowReservedNames, setAllowReservedNames to support checking for reserved XML Names (element/attribute/entity/etc. names starting with "xml") - Changed some print methods (in the DOCTYPE section) to use "\xA" as an end-of-line delimiter instead of "\n". Since XML::Parser (expat) converts all end-of-line sequences to "\xA", it makes sense that the print routines are consistent with that. - Fixed the testcases to convert "\n" to "\xA" before comparing test results with expected results, so that they also work on Mac OS. 1.16 (enno) 2/23/1999 - Added XML::DOM::Element::setTagName - Methods returning NodeList objects will now return regular perl lists when called in list context, e.g: @list = $elem->getChildNodes; # returns a list $nodelist = $elem->getChildNodes; # return a NodeList (object reference) Note that a NodeList is 'live' (except the one returned by getElementsByTagName) and that a list is not 'live'. - Fixed getElementsByTagName. - It would return the node that it was called on (if the tagName matched) - It would return the nodes in the wrong order (they should be in document order) 1.15 (enno) 2/12/1999 - 28% Performance improvements. Benchmark was the following program: use XML::DOM; $dom = XML::DOM::Parser->new; my $doc = $dom->parsefile ("samples/REC-xml-19980210.xml"); Running it 20 times on a Sun Ultra-1, using the ksh function 'time', the average time was 9.02s (real time.) XML::Parser 2.19, Perl 5.005_02. As a comparison, XML-Grove-0.05 takes 2.17s running: use XML::Parser; use XML::Parser::Grove; use XML::Grove; $parser = XML::Parser->new(Style => 'Grove'); $grove = $parser->parsefile ("samples/REC-xml-19980210.xml"); And XML::Parser 2.19 takes 0.71s running (i.e. building nothing): use XML::Parser; $parser = XML::Parser->new; $parser->parsefile ("samples/REC-xml-19980210.xml"); XML-Grove-0.40alpha takes 4.62s running the following script: use XML::Grove::Builder; use XML::Parser::SAXPerl; $grove_builder = XML::Grove::Builder->new; $parser = XML::Parser::SAXPerl->new ( Handler => $grove_builder ); $document = $parser->parse ( Source => { SystemId => "samples/REC-xml-19980210.xml" } ); Each following improvement knocked a few tenths of a second off: - Reimplemented the ReadOnly mechanism, because it was spending a lot of time in setReadOnly when parsing large documents (new time: 8.00s) - Hacked appendChild to squeeze out a little speed (new time: 7.70s) - Eliminated calls to addText in the Start handler which had to figure out every time wether it should add a piece of text to a previous text node. Now I keep track of whether the previous node was a text node in the XML::DOM::Parser code and take care of adding the text and creating a new Text node right there, without the overhead of several function calls (new time: 6.45s) 1.14 (enno) 15/1/1999 - Bug in Document::dispose - it tried to call dispose on XMLDecl even if it didn't exist - Bug with XML::Parser 2.19 (and up): XML::Parser 2.19 added support for CdataStart and CdataEnd handlers which will call the Default handler instead if those handlers aren't defined. This caused the exception "XML::DOM::DOMException(Code=5, Name=INVALID_CHARACTER_ERR, Message=bad Entity Name [] in EntityReference)" whenever it encountered a CDATASection. (Thanks to Roger Espinosa ) - Added a new XML::DOM::Parser option 'KeepCDATA' which will store CDATASections as CDATASection nodes instead of converting them to Text nodes (which is the default/old behavior) - Fixed bug in CDATASection print routine. It printed ") - removeChildNodes was using $_, which was somehow messing up the global $_. (Thanks again, Francois) 1.11 (enno) 12/16/1998 - Fixed checking of XML::Parser version number. Newer versions should be allowed as well. Current version works with XML::Parser 2.17. (Thanks to Martin Kolbuszewski ) - Fixed typo in SYNOPSIS: "print $node->getValue ..." should have been "print $href->getValue ..." (Thanks again Martin) - Fixed typo in documentation: 'getItem' method should have been 'item' (in 2 places.) (Thanks again Martin) 1.10 (enno) 12/8/1998 - Attributes with non-alphanumeric characters in the attribute name (like "-") were mistaken for default attribute values. (Bug in checkUnspecAttr regexp.) Default attribute values aren't printed out, so it appeared those attributes just vanished. (Thanks to Aravind Subramanian ) 1.09 (enno) 12/3/1998 - Changed NamedNodeMap {Values} to a NodeList instead of [] This way getValues can return a (live) NodeList. - Added NodeList and NamedNodeMap to documentation - Fixed documentation section near list of node type constants. I accidentally pasted some text in between and messed up that whole section. - getNodeTypeName() always returned empty strings and the documentation said @XML::DOM::NodeNames, which should have been @XML::DOM::Node::NodeNames (Thanks to Andrew Fitzhugh ) - Added dispose to NodeList - Added setOwnerDocument to all Nodes, NodeList and NamedNodeMap, to allow cut and paste between XML::DOM::Documents. It does nothing when called on a Document node. 1.08 (enno) 12/1/1998 - No changes - I messed up uploading to PAUS and had to up the version number. 1.07 (enno) 12/1/1998 - added Node::isElementNode for optimization - added NamedNodeMap::getChildIndex - fixed documentation regarding getNodeValue. It said it should return getTagName for Element nodes, but it should return undef. (Thanks to Aravind Subramanian ) - added CAVEATS in documentation (getElementsByTagName does not return "live" NodeLists) - added section about Notation node in documentation 1.06 (enno) 11/16/1998 - fixed example in the SYNOPSIS of the man page (Thanks to Aravind Subramanian ) - added test case t/example.t (it's also a simple example) 1.05 (enno) 11/11/1998 - added use strict, use vars etc. - fixed replaceData - changed $str to $data - merged getElementsByTagName and getElementsByTagName2 - added parsing of attributes (CheckUnspecAttr) to support Attr::isSpecified - added XML::DOM::Parser class to perform proper cleanup when an exception is thrown - more performance improvements, e.g. SafeMode, removed SUPER::new - added frequency comments for performance optimization: e.g. "REC 7473" means that that code is hit 7473 times when parsing REC-xml-19980210.xml - updated POD documentation - fixed problem in perl 5.004 (can't seems to use references to strings, e.g. *str = \ "constant";) 1.04 (enno) 10/21/1998 - Removed internal calls to getOwnerDocument, getParentNode - fixed isAncestor: $node->isAncestor($node) should return true - Fixed ReadOnly mechanism. Added isAlwaysReadOnly. - DocumentType::getDefaultAttrValue was using getElementDecl instead of getAttlistDecl - Attr::cloneNode cloneChildren was missing 2nd parameter=1 (deep) - NamedNodeMap::cloneNode forgot to copy {Values} list - Element::setAttributeNode was comparing {UsedIn} against $self instead of {A} - fixed AttDef::cloneNode, Value was copied wrong XML-DOM-1.44/MANIFEST0000644000076400007640000000265610271306205014103 0ustar tjmathertjmatherBUGS Changes CheckAncestors.pm Used by test cases in t/ CmpDOM.pm Used by test cases in t/ FAQ.xml MANIFEST This file. Makefile.PL README lib/XML/DOM.pm lib/XML/DOM/AttDef.pod lib/XML/DOM/AttlistDecl.pod lib/XML/DOM/Attr.pod lib/XML/DOM/CDATASection.pod lib/XML/DOM/CharacterData.pod lib/XML/DOM/Comment.pod lib/XML/DOM/DOMException.pm lib/XML/DOM/DOMImplementation.pod lib/XML/DOM/Document.pod lib/XML/DOM/DocumentFragment.pod lib/XML/DOM/DocumentType.pod lib/XML/DOM/Element.pod lib/XML/DOM/ElementDecl.pod lib/XML/DOM/Entity.pod lib/XML/DOM/EntityReference.pod lib/XML/DOM/NamedNodeMap.pm lib/XML/DOM/NamedNodeMap.pod lib/XML/DOM/Node.pod lib/XML/DOM/NodeList.pm lib/XML/DOM/NodeList.pod lib/XML/DOM/Notation.pod lib/XML/DOM/Parser.pod lib/XML/DOM/PerlSAX.pm lib/XML/DOM/ProcessingInstruction.pod lib/XML/DOM/Text.pod lib/XML/DOM/XMLDecl.pod lib/XML/Handler/BuildDOM.pm samples/REC-xml-19980210.xml Sample XML files samples/minutes.xml t/build_dom.t t/dom_astress.t dom_*.t are test cases for XML::DOM t/dom_attr.t t/dom_cdata.t t/dom_documenttype.t t/dom_encode.t t/dom_example.t t/dom_extent.dtd t/dom_extent.ent t/dom_extent.t t/dom_jp_astress.t t/dom_jp_attr.t t/dom_jp_cdata.t t/dom_jp_example.t t/dom_jp_minus.t t/dom_jp_modify.t t/dom_jp_print.t t/dom_minus.t t/dom_modify.t t/dom_noexpand.t t/dom_print.t t/dom_template.t t/dom_text.t XML-Parser-2.31.patch META.yml Module meta-data (added by MakeMaker) XML-DOM-1.44/META.yml0000644000076400007640000000070410271306205014213 0ustar tjmathertjmather# http://module-build.sourceforge.net/META-spec.html #XXXXXXX This is a prototype!!! It will change in the future!!! XXXXX# name: XML-DOM version: 1.44 version_from: lib/XML/DOM.pm installdirs: site requires: LWP::UserAgent: 0 XML::Parser: 2.30 XML::Parser::PerlSAX: 0.07 XML::RegExp: 0 distribution_type: module generated_by: ExtUtils::MakeMaker version 6.17 XML-DOM-1.44/CmpDOM.pm0000644000076400007640000001123407222666673014403 0ustar tjmathertjmather# # Used by test scripts to compare 2 DOM subtrees. # # Usage: # # my $cmp = new CmpDOM; # $node1->equals ($node2, $cmp) or # print "Difference found! Context:" . $cmp->context . "\n"; # use strict; package CmpDOM; use XML::DOM; use Carp; sub new { my %args = (SkipReadOnly => 0, Context => []); bless \%args, $_[0]; } sub pushContext { my ($self, $str) = @_; push @{$self->{Context}}, $str; #print ":: " . $self->context . "\n"; } sub popContext { pop @{$_[0]->{Context}}; } sub skipReadOnly { my $self = shift; my $prev = $self->{SkipReadOnly}; if (@_ > 0) { $self->{SkipReadOnly} = shift; } $prev; } sub sameType { my ($self, $x, $y) = @_; return 1 if (ref ($x) eq ref ($y)); $self->fail ("wrong type " . ref($x) . " != " . ref($y)); } sub sameReadOnly { my ($self, $x, $y) = @_; return 1 if $self->{SkipReadOnly}; my $result = 1; if (not defined $x) { $result = 0 if defined $y; } else { if (not defined $y) { $result = 0; } elsif ($x != $y) { $result = 0; } } return 1 if ($result == 1); $self->fail ("ReadOnly $x != $y"); } sub fail { my ($self, $str) = @_; $self->pushContext ($str); 0; } sub context { my $self = shift; join (", ", @{$self->{Context}}); } package XML::DOM::NamedNodeMap; sub equals { my ($self, $other, $cmp) = @_; return 0 unless $cmp->sameType ($self, $other); # sanity checks my $n1 = int (keys %$self); my $n2 = int (keys %$other); return $cmp->fail("same keys length") unless $n1 == $n2; return $cmp->fail("#1 value length") unless ($n1-1 == $self->getLength); return $cmp->fail("#2 value length") unless ($n2-1 == $other->getLength); my $i = 0; my $ov = $other->getValues; for my $n (@{$self->getValues}) { $cmp->pushContext ($n->getNodeName); return 0 unless $n->equals ($ov->[$i], $cmp); $i++; $cmp->popContext; } return 0 unless $cmp->sameReadOnly ($self->isReadOnly, $other->isReadOnly); 1; } package XML::DOM::NodeList; sub equals { my ($self, $other, $cmp) = @_; return 0 unless $cmp->sameType ($self, $other); return $cmp->fail("wrong length") unless $self->getLength == $other->getLength; my $i = 0; for my $n (@$self) { $cmp->pushContext ("[$i]"); return 0 unless $n->equals ($other->[$i], $cmp); $i++; $cmp->popContext; } 1; } package XML::DOM::Node; sub get_prop_byname { my ($self, $propname) = @_; my $pkg = ref ($self); no strict 'refs'; my $hfields = \ %{"$pkg\::HFIELDS"}; $self->[$hfields->{$propname}]; } sub equals { my ($self, $other, $cmp) = @_; return 0 unless $cmp->sameType ($self, $other); my $hasKids = $self->hasChildNodes; return $cmp->fail("hasChildNodes") unless $hasKids == $other->hasChildNodes; if ($hasKids) { $cmp->pushContext ("C"); return 0 unless $self->[_C]->equals ($other->[_C], $cmp); $cmp->popContext; } return 0 unless $cmp->sameReadOnly ($self->isReadOnly, $other->isReadOnly); for my $prop (@{$self->getCmpProps}) { $cmp->pushContext ($prop); my $p1 = $self->get_prop_byname ($prop); my $p2 = $other->get_prop_byname ($prop); if (ref ($p1)) { return 0 unless $p1->equals ($p2, $cmp); } elsif (! defined ($p1)) { return 0 if defined $p2; } else { return $cmp->fail("$p1 != $p2") unless $p1 eq $p2; } $cmp->popContext; } 1; } sub getCmpProps { return []; } package XML::DOM::Attr; sub getCmpProps { ['Name', 'Specified']; } package XML::DOM::ProcessingInstruction; sub getCmpProps { ['Target', 'Data']; } package XML::DOM::Notation; sub getCmpProps { return ['Name', 'Base', 'SysId', 'PubId']; } package XML::DOM::Entity; sub getCmpProps { return ['NotationName', 'Parameter', 'Value', 'SysId', 'PubId']; } package XML::DOM::EntityReference; sub getCmpProps { return ['EntityName', 'Parameter']; } package XML::DOM::AttDef; sub getCmpProps { return ['Name', 'Type', 'Required', 'Implied', 'Quote', 'Default', 'Fixed']; } package XML::DOM::AttlistDecl; sub getCmpProps { return ['ElementName', 'A']; } package XML::DOM::ElementDecl; sub getCmpProps { return ['Name', 'Model']; } package XML::DOM::Element; sub getCmpProps { return ['TagName', 'A']; } package XML::DOM::CharacterData; sub getCmpProps { return ['Data']; } package XML::DOM::XMLDecl; sub getCmpProps { return ['Version', 'Encoding', 'Standalone']; } package XML::DOM::DocumentType; sub getCmpProps { return ['Entities', 'Notations', 'Name', 'SysId', 'PubId', 'Internal']; } package XML::DOM::Document; sub getCmpProps { return ['XmlDecl', 'Doctype']; } 1; XML-DOM-1.44/samples/0000755000076400007640000000000010271306205014405 5ustar tjmathertjmatherXML-DOM-1.44/samples/minutes.xml0000644000076400007640000001022707014636022016621 0ustar tjmathertjmather ]> <議事録> <題å>å–締役会議事録 <本体> <剿–‡> <開催日 å¹´="1999" 月="2" æ—¥="28" 時="11" 分="00" > å¹³æˆ11年2月28日åˆå‰ï¼‘1時ï¼ï¼åˆ†ã‚ˆã‚Š <開催場所>当会社本店会議室ã«ãŠã„ã¦å–締役会を開催ã—ãŸã€‚ <ç·æ•° 人数="3">å–ç· å½¹ç·æ•°ï¼“å <出席者数 人数="3">出席å–締役数3å <開会宣言> 以上ã®ã¨ãŠã‚Š <æˆç«‹è¦ä»¶>å–締役全員ã®å‡ºå¸­ ãŒã‚ã£ãŸã®ã§ã€æœ¬å–締役会ã¯é©æ³•ã«æˆç«‹ã—ãŸã€‚よã£ã¦å–ç· å½¹ä½éƒ·å¹¸æ²»ã¯ 議長席ã«ç€ã開会を宣ã—ã€ç›´ã¡ã«è­°æ¡ˆã®å¯©è­°ã«å…¥ã£ãŸã€‚ <議案> <議題>第1å·è­°æ¡ˆ 代表å–ç· å½¹é¸ä»»ã®ä»¶ <ç†ç”±>è­°é•·ã¯ã€ä»£è¡¨å–ç· å½¹ä½éƒ·å¹¸æ²»ãŒå–ç· å½¹ã®ä»»æœŸæº€äº†ã«ã‚ˆã‚Šæœ¬æ—¥ä»˜ã‚’ã‚‚ã£ã¦ <退任事由>ãã®è³‡æ ¼ã‚’喪失ã—退任 ã—ãŸã®ã§ã€å¾Œä»»è€…ã‚’é¸ä»»ã™ã‚‹å¿…è¦ãŒã‚る旨を述ã¹ã€ <é¸ä»»æ–¹æ³•> ãã®é¸ä»»æ–¹æ³•ã‚’è­°å ´ã«è«®ã£ãŸã¨ã“ã‚出席å–締役中よりä½éƒ·å¹¸æ²»ã®å†é¸é‡ä»»ã‚’ 望む旨ã®ç™ºè¨€ãŒã‚ã£ãŸã€‚ <採決> è­°é•·ã¯ã€æ›´ã«ãã®å¯å¦ã‚’è­°å ´ã«è«®ã£ãŸã¨ã“ã‚全員ã“れã«è³›æˆã—ãŸã®ã§ã€ 本案ã¯å¯æ±ºç¢ºå®šã—ãŸã€‚ <就任承諾>ãªãŠã€è¢«é¸ä»»è€…ã¯å³æ™‚就任を承諾ã—ãŸã€‚ <決定事項> <役員>代表å–ç· å½¹ <æ°å>ä½éƒ·å¹¸æ²» <就任種別>é‡ä»» <閉会宣言> 以上ã«ã‚ˆã‚Šæœ¬æ—¥ã®è­°æ¡ˆã®å¯©è­°ã‚’å…¨ã¦çµ‚了ã—ã€è­°é•·ã¯é–‰ä¼šã‚’宣ã—ã€æ•£ä¼šã—ãŸã€‚ 時㫠<閉会時間 時="11" 分="30" >åˆå‰ï¼‘1時3ï¼åˆ† ã§ã‚ã£ãŸã€‚å‰è¨˜ã®è­°äº‹ã®çµŒéŽä¸¦ã³ã«æ±ºè­°ã®å†…容を明確ã«ã™ã‚‹ãŸã‚本議事録 を作æˆã—ã€è­°é•·ä¸¦ã³ã«å‡ºå¸­å–ç· å½¹ã“れã«è¨˜å押å°ã™ã‚‹ã€‚ <ç½²å> <ç½²åæ—¥ å¹´="1999" 月="2" æ—¥="28">å¹³æˆ11年2月28日 <商å·>大ä½éƒ·å•†äº‹æ ªå¼ä¼šç¤¾ <è­°é•·><役員>代表å–ç· å½¹<æ°å>ä½éƒ·å¹¸æ²» <出席役員><役員>å–ç· å½¹<æ°å>ä½éƒ·ç”±ç¾Ž <出席役員><役員>å–ç· å½¹<æ°å>ä½éƒ·å¹¸æ³• XML-DOM-1.44/samples/REC-xml-19980210.xml0000644000076400007640000046717507314125757017314 0ustar tjmathertjmather "> '"> amp, lt, gt, apos, quot"> ]>
Extensible Markup Language (XML) 1.0 REC-xml-&iso6.doc.date; W3C Recommendation &draft.day;&draft.month;&draft.year; http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date; http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.xml http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.html http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.pdf http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.ps http://www.w3.org/TR/REC-xml http://www.w3.org/TR/PR-xml-971208 Tim Bray Textuality and Netscape tbray@textuality.com Jean Paoli Microsoft jeanpa@microsoft.com C. M. Sperberg-McQueen University of Illinois at Chicago cmsmcq@uic.edu

The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

This document specifies a syntax created by subsetting an existing, widely used international text processing standard (Standard Generalized Markup Language, ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web. It is a product of the W3C XML Activity, details of which can be found at http://www.w3.org/XML. A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

This specification uses the term URI, which is defined by , a work in progress expected to update and .

The list of known errors in this specification is available at http://www.w3.org/XML/xml-19980210-errata.

Please report errors in this document to xml-editor@w3.org.

Chicago, Vancouver, Mountain View, et al.: World-Wide Web Consortium, XML Working Group, 1996, 1997.

Created in electronic form.

English Extended Backus-Naur Form (formal grammar) 1997-12-03 : CMSMcQ : yet further changes 1997-12-02 : TB : further changes (see TB to XML WG, 2 December 1997) 1997-12-02 : CMSMcQ : deal with as many corrections and comments from the proofreaders as possible: entify hard-coded document date in pubdate element, change expansion of entity WebSGML, update status description as per Dan Connolly (am not sure about refernece to Berners-Lee et al.), add 'The' to abstract as per WG decision, move Relationship to Existing Standards to back matter and combine with References, re-order back matter so normative appendices come first, re-tag back matter so informative appendices are tagged informdiv1, remove XXX XXX from list of 'normative' specs in prose, move some references from Other References to Normative References, add RFC 1738, 1808, and 2141 to Other References (they are not normative since we do not require the processor to enforce any rules based on them), add reference to 'Fielding draft' (Berners-Lee et al.), move notation section to end of body, drop URIchar non-terminal and use SkipLit instead, lose stray reference to defunct nonterminal 'markupdecls', move reference to Aho et al. into appendix (Tim's right), add prose note saying that hash marks and fragment identifiers are NOT part of the URI formally speaking, and are NOT legal in system identifiers (processor 'may' signal an error). Work through: Tim Bray reacting to James Clark, Tim Bray on his own, Eve Maler, NOT DONE YET: change binary / text to unparsed / parsed. handle James's suggestion about < in attriubte values uppercase hex characters, namechar list, 1997-12-01 : JB : add some column-width parameters 1997-12-01 : CMSMcQ : begin round of changes to incorporate recent WG decisions and other corrections: binding sources of character encoding info (27 Aug / 3 Sept), correct wording of Faust quotation (restore dropped line), drop SDD from EncodingDecl, change text at version number 1.0, drop misleading (wrong!) sentence about ignorables and extenders, modify definition of PCData to make bar on msc grammatical, change grammar's handling of internal subset (drop non-terminal markupdecls), change definition of includeSect to allow conditional sections, add integral-declaration constraint on internal subset, drop misleading / dangerous sentence about relationship of entities with system storage objects, change table body tag to htbody as per EM change to DTD, add rule about space normalization in public identifiers, add description of how to generate our name-space rules from Unicode character database (needs further work!). 1997-10-08 : TB : Removed %-constructs again, new rules for PE appearance. 1997-10-01 : TB : Case-sensitive markup; cleaned up element-type defs, lotsa little edits for style 1997-09-25 : TB : Change to elm's new DTD, with substantial detail cleanup as a side-effect 1997-07-24 : CMSMcQ : correct error (lost *) in definition of ignoreSectContents (thanks to Makoto Murata) Allow all empty elements to have end-tags, consistent with SGML TC (as per JJC). 1997-07-23 : CMSMcQ : pre-emptive strike on pending corrections: introduce the term 'empty-element tag', note that all empty elements may use it, and elements declared EMPTY must use it. Add WFC requiring encoding decl to come first in an entity. Redefine notations to point to PIs as well as binary entities. Change autodetection table by removing bytes 3 and 4 from examples with Byte Order Mark. Add content model as a term and clarify that it applies to both mixed and element content. 1997-06-30 : CMSMcQ : change date, some cosmetic changes, changes to productions for choice, seq, Mixed, NotationType, Enumeration. Follow James Clark's suggestion and prohibit conditional sections in internal subset. TO DO: simplify production for ignored sections as a result, since we don't need to worry about parsers which don't expand PErefs finding a conditional section. 1997-06-29 : TB : various edits 1997-06-29 : CMSMcQ : further changes: Suppress old FINAL EDIT comments and some dead material. Revise occurrences of % in grammar to exploit Henry Thompson's pun, especially markupdecl and attdef. Remove RMD requirement relating to element content (?). 1997-06-28 : CMSMcQ : Various changes for 1 July draft: Add text for draconian error handling (introduce the term Fatal Error). RE deleta est (changing wording from original announcement to restrict the requirement to validating parsers). Tag definition of validating processor and link to it. Add colon as name character. Change def of %operator. Change standard definitions of lt, gt, amp. Strip leading zeros from #x00nn forms. 1997-04-02 : CMSMcQ : final corrections of editorial errors found in last night's proofreading. Reverse course once more on well-formed: Webster's Second hyphenates it, and that's enough for me. 1997-04-01 : CMSMcQ : corrections from JJC, EM, HT, and self 1997-03-31 : Tim Bray : many changes 1997-03-29 : CMSMcQ : some Henry Thompson (on entity handling), some Charles Goldfarb, some ERB decisions (PE handling in miscellaneous declarations. Changed Ident element to accept def attribute. Allow normalization of Unicode characters. move def of systemliteral into section on literals. 1997-03-28 : CMSMcQ : make as many corrections as possible, from Terry Allen, Norbert Mikula, James Clark, Jon Bosak, Henry Thompson, Paul Grosso, and self. Among other things: give in on "well formed" (Terry is right), tentatively rename QuotedCData as AttValue and Literal as EntityValue to be more informative, since attribute values are the only place QuotedCData was used, and vice versa for entity text and Literal. (I'd call it Entity Text, but 8879 uses that name for both internal and external entities.) 1997-03-26 : CMSMcQ : resynch the two forks of this draft, reapply my changes dated 03-20 and 03-21. Normalize old 'may not' to 'must not' except in the one case where it meant 'may or may not'. 1997-03-21 : TB : massive changes on plane flight from Chicago to Vancouver 1997-03-21 : CMSMcQ : correct as many reported errors as possible. 1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec. 1997-03-20 : CMSMcQ : cosmetic changes preparatory to revision for WWW conference April 1997: restore some of the internal entity references (e.g. to docdate, etc.), change character xA0 to &nbsp; and define nbsp as &#160;, and refill a lot of paragraphs for legibility. 1996-11-12 : CMSMcQ : revise using Tim's edits: Add list type of NUMBERED and change most lists either to BULLETS or to NUMBERED. Suppress QuotedNames, Names (not used). Correct trivial-grammar doc type decl. Rename 'marked section' as 'CDATA section' passim. Also edits from James Clark: Define the set of characters from which [^abc] subtracts. Charref should use just [0-9] not Digit. Location info needs cleaner treatment: remove? (ERB question). One example of a PI has wrong pic. Clarify discussion of encoding names. Encoding failure should lead to unspecified results; don't prescribe error recovery. Don't require exposure of entity boundaries. Ignore white space in element content. Reserve entity names of the form u-NNNN. Clarify relative URLs. And some of my own: Correct productions for content model: model cannot consist of a name, so "elements ::= cp" is no good. 1996-11-11 : CMSMcQ : revise for style. Add new rhs to entity declaration, for parameter entities. 1996-11-10 : CMSMcQ : revise for style. Fix / complete section on names, characters. Add sections on parameter entities, conditional sections. Still to do: Add compatibility note on deterministic content models. Finish stylistic revision. 1996-10-31 : TB : Add Entity Handling section 1996-10-30 : TB : Clean up term & termdef. Slip in ERB decision re EMPTY. 1996-10-28 : TB : Change DTD. Implement some of Michael's suggestions. Change comments back to //. Introduce language for XML namespace reservation. Add section on white-space handling. Lots more cleanup. 1996-10-24 : CMSMcQ : quick tweaks, implement some ERB decisions. Characters are not integers. Comments are /* */ not //. Add bibliographic refs to 10646, HyTime, Unicode. Rename old Cdata as MsData since it's only seen in marked sections. Call them attribute-value pairs not name-value pairs, except once. Internal subset is optional, needs '?'. Implied attributes should be signaled to the app, not have values supplied by processor. 1996-10-16 : TB : track down & excise all DSD references; introduce some EBNF for entity declarations. 1996-10-?? : TB : consistency check, fix up scraps so they all parse, get formatter working, correct a few productions. 1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and organizational changes: Replace a few literals with xmlpio and pic entities, to make them consistent and ensure we can change pic reliably when the ERB votes. Drop paragraph on recognizers from notation section. Add match, exact match to terminology. Move old 2.2 XML Processors and Apps into intro. Mention comments, PIs, and marked sections in discussion of delimiter escaping. Streamline discussion of doctype decl syntax. Drop old section of 'PI syntax' for doctype decl, and add section on partial-DTD summary PIs to end of Logical Structures section. Revise DSD syntax section to use Tim's subset-in-a-PI mechanism. 1996-10-10 : TB : eliminate name recognizers (and more?) 1996-10-09 : CMSMcQ : revise for style, consistency through 2.3 (Characters) 1996-10-09 : CMSMcQ : re-unite everything for convenience, at least temporarily, and revise quickly 1996-10-08 : TB : first major homogenization pass 1996-10-08 : TB : turn "current" attribute on div type into CDATA 1996-10-02 : TB : remould into skeleton + entities 1996-09-30 : CMSMcQ : add a few more sections prior to exchange with Tim. 1996-09-20 : CMSMcQ : finish transcribing notes. 1996-09-19 : CMSMcQ : begin transcribing notes for draft. 1996-09-13 : CMSMcQ : made outline from notes of 09-06, do some housekeeping
Introduction

Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language . By construction, XML documents are conforming SGML documents.

XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

Origin and Goals

XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board) formed under the auspices of the World Wide Web Consortium (W3C) in 1996. It was chaired by Jon Bosak of Sun Microsystems with the active participation of an XML Special Interest Group (previously known as the SGML Working Group) also organized by the W3C. The membership of the XML Working Group is given in an appendix. Dan Connolly served as the WG's contact with the W3C.

The design goals for XML are:

XML shall be straightforwardly usable over the Internet.

XML shall support a wide variety of applications.

XML shall be compatible with SGML.

It shall be easy to write programs which process XML documents.

The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

XML documents should be human-legible and reasonably clear.

The XML design should be prepared quickly.

The design of XML shall be formal and concise.

XML documents shall be easy to create.

Terseness in XML markup is of minimal importance.

This specification, together with associated standards (Unicode and ISO/IEC 10646 for characters, Internet RFC 1766 for language identification tags, ISO 639 for language name codes, and ISO 3166 for country name codes), provides all the information necessary to understand XML Version &XML.version; and construct computer programs to process it.

This version of the XML specification &doc.distribution;.

Terminology

The terminology used to describe XML documents is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of an XML processor:

Conforming documents and XML processors are permitted to but need not behave as described.

Conforming documents and XML processors are required to behave as described; otherwise they are in error.

A violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it.

An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).

Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described.

A rule which applies to all valid XML documents. Violations of validity constraints are errors; they must, at user option, be reported by validating XML processors.

A rule which applies to all well-formed XML documents. Violations of well-formedness constraints are fatal errors.

(Of strings or names:) Two strings or names being compared must be identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and base+diacritic forms) match only if they have the same representation in both strings. At user option, processors may normalize such characters to some canonical form. No case folding is performed. (Of strings and rules in the grammar:) A string matches a grammatical production if it belongs to the language generated by that production. (Of content and content models:) An element matches its declaration when it conforms in the fashion described in the constraint .

A feature of XML included solely to ensure that XML remains compatible with SGML.

A non-binding recommendation included to increase the chances that XML documents can be processed by the existing installed base of SGML processors which predate the &WebSGML;.

Documents

A data object is an XML document if it is well-formed, as defined in this specification. A well-formed XML document may in addition be valid if it meets certain further constraints.

Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup. The logical and physical structures must nest properly, as described in .

Well-Formed XML Documents

A textual object is a well-formed XML document if:

Taken as a whole, it matches the production labeled document.

It meets all the well-formedness constraints given in this specification.

Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.

Document document prolog element Misc*

Matching the document production implies that:

It contains one or more elements.

There is exactly one element, called the root, or document element, no part of which appears in the content of any other element. For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other.

As a consequence of this, for each non-root element C in the document, there is one other element P in the document such that C is in the content of P, but is not in the content of any other element that is in the content of P. P is referred to as the parent of C, and C as a child of P.

Characters

A parsed entity contains text, a sequence of characters, which may represent markup or character data. A character is an atomic unit of text as specified by ISO/IEC 10646 . Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. The use of "compatibility characters", as defined in section 6.8 of , is discouraged. Character Range Char #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.

The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings of 10646; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in .

Common Syntactic Constructs

This section defines some symbols used widely in the grammar.

S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space S (#x20 | #x9 | #xD | #xA)+

Characters are classified for convenience as letters, digits, or other characters. Letters consist of an alphabetic or syllabic base character possibly followed by one or more combining characters, or of an ideographic character. Full definitions of the specific characters in each class are given in .

A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. Names beginning with the string "xml", or any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.

The colon character within XML names is reserved for experimentation with name spaces. Its meaning is expected to be standardized at some future point, at which point those documents using the colon for experimental purposes may need to be updated. (There is no guarantee that any name-space mechanism adopted for XML will in fact use the colon as a name-space delimiter.) In practice, this means that authors should not use the colon in XML names except as part of name-space experiments, but that XML processors should accept the colon as a name character.

An Nmtoken (name token) is any mixture of name characters. Names and Tokens NameChar Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender Name (Letter | '_' | ':') (NameChar)* Names Name (S Name)* Nmtoken (NameChar)+ Nmtokens Nmtoken (S Nmtoken)*

Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without scanning for markup. Literals EntityValue '"' ([^%&"] | PEReference | Reference)* '"' |  "'" ([^%&'] | PEReference | Reference)* "'" AttValue '"' ([^<&"] | Reference)* '"' |  "'" ([^<&'] | Reference)* "'" SystemLiteral ('"' [^"]* '"') | ("'" [^']* "'") PubidLiteral '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" PubidChar #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

Character Data and Markup

Text consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, and processing instructions.

All text that is not markup constitutes the character data of the document.

The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. They are also legal within the literal entity value of an internal entity declaration; see . If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string "&gt;", and must, for compatibility, be escaped using "&gt;" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>".

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "&apos;", and the double-quote character (") as "&quot;". Character Data CharData [^<&]* - ([^<&]* ']]>' [^<&]*)

Comments

Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) must not occur within comments. Comments Comment '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

An example of a comment: <!&como; declarations for <head> & <body> &comc;>

Processing Instructions

Processing instructions (PIs) allow documents to contain instructions for applications. Processing Instructions PI '<?' PITarget (S (Char* - (Char* &pic; Char*)))? &pic; PITarget Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) PIs are not part of the document's character data, but must be passed through to the application. The PI begins with a target (PITarget) used to identify the application to which the instruction is directed. The target names "XML", "xml", and so on are reserved for standardization in this or future versions of this specification. The XML Notation mechanism may be used for formal declaration of PI targets.

CDATA Sections

CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>": CDATA Sections CDSect CDStart CData CDEnd CDStart '<![CDATA[' CData (Char* - (Char* ']]>' Char*)) CDEnd ']]>' Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "&lt;" and "&amp;". CDATA sections cannot nest.

An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as character data, not markup: <![CDATA[<greeting>Hello, world!</greeting>]]>

Prolog and Document Type Declaration

XML documents may, and should, begin with an XML declaration which specifies the version of XML being used. For example, the following is a complete XML document, well-formed but not valid: Hello, world! ]]> and so is this: Hello, world! ]]>

The version number "1.0" should be used to indicate conformance to this version of this specification; it is an error for a document to use the value "1.0" if it does not conform to this version of this specification. It is the intent of the XML working group to give later versions of this specification numbers other than "1.0", but this intent does not indicate a commitment to produce any future versions of XML, nor if any are produced, to use any particular numbering scheme. Since future versions are not ruled out, this construct is provided as a means to allow the possibility of automatic version recognition, should it become necessary. Processors may signal an error if they receive documents labeled with versions they do not support.

The function of the markup in an XML document is to describe its storage and logical structure and to associate attribute-value pairs with its logical structures. XML provides a mechanism, the document type declaration, to define constraints on the logical structure and to support the use of predefined storage units. An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.

The document type declaration must appear before the first element in the document. Prolog prolog XMLDecl? Misc* (doctypedecl Misc*)? XMLDecl &xmlpio; VersionInfo EncodingDecl? SDDecl? S? &pic; VersionInfo S 'version' Eq (' VersionNum ' | " VersionNum ") Eq S? '=' S? VersionNum ([a-zA-Z0-9_.:] | '-')+ Misc Comment | PI | S

The XML document type declaration contains or points to markup declarations that provide a grammar for a class of documents. This grammar is known as a document type definition, or DTD. The document type declaration can point to an external subset (a special kind of external entity) containing markup declarations, or can contain the markup declarations directly in an internal subset, or can do both. The DTD for a document consists of both subsets taken together.

A markup declaration is an element type declaration, an attribute-list declaration, an entity declaration, or a notation declaration. These declarations may be contained in whole or in part within parameter entities, as described in the well-formedness and validity constraints below. For fuller information, see .

Document Type Definition doctypedecl '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | PEReference | S)* ']' S?)? '>' markupdecl elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment

The markup declarations may be made up in whole or in part of the replacement text of parameter entities. The productions later in this specification for individual nonterminals (elementdecl, AttlistDecl, and so on) describe the declarations after all the parameter entities have been included.

Root Element Type

The Name in the document type declaration must match the element type of the root element.

Proper Declaration/PE Nesting

Parameter-entity replacement text must be properly nested with markup declarations. That is to say, if either the first character or the last character of a markup declaration (markupdecl above) is contained in the replacement text for a parameter-entity reference, both must be contained in the same replacement text.

PEs in Internal Subset

In the internal DTD subset, parameter-entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.)

Like the internal subset, the external subset and any external parameter entities referred to in the DTD must consist of a series of complete markup declarations of the types allowed by the non-terminal symbol markupdecl, interspersed with white space or parameter-entity references. However, portions of the contents of the external subset or of external parameter entities may conditionally be ignored by using the conditional section construct; this is not allowed in the internal subset. External Subset extSubset TextDecl? extSubsetDecl extSubsetDecl ( markupdecl | conditionalSect | PEReference | S )*

The external subset and external parameter entities also differ from the internal subset in that in them, parameter-entity references are permitted within markup declarations, not only between markup declarations.

An example of an XML document with a document type declaration: Hello, world! ]]> The system identifier "hello.dtd" gives the URI of a DTD for the document.

The declarations can also be given locally, as in this example: ]> Hello, world! ]]> If both the external and internal subsets are used, the internal subset is considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

Standalone Document Declaration

Markup declarations can affect the content of the document, as passed from an XML processor to an application; examples are attribute defaults and entity declarations. The standalone document declaration, which may appear as a component of the XML declaration, signals whether or not there are such declarations which appear external to the document entity. Standalone Document Declaration SDDecl S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))

In a standalone document declaration, the value "yes" indicates that there are no markup declarations external to the document entity (either in the DTD external subset, or in an external parameter entity referenced from the internal subset) which affect the information passed from the XML processor to the application. The value "no" indicates that there are or may be such external markup declarations. Note that the standalone document declaration only denotes the presence of external declarations; the presence, in a document, of references to external entities, when those entities are internally declared, does not change its standalone status.

If there are no external markup declarations, the standalone document declaration has no meaning. If there are external markup declarations but there is no standalone document declaration, the value "no" is assumed.

Any XML document for which standalone="no" holds can be converted algorithmically to a standalone document, which may be desirable for some network delivery applications.

Standalone Document Declaration

The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:

attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or

entities (other than &magicents;), if references to those entities appear in the document, or

attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization, or

element types with element content, if white space occurs directly within any instance of those types.

An example XML declaration with a standalone document declaration:<?xml version="&XML.version;" standalone='yes'?>

White Space Handling

In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines, denoted by the nonterminal S in this specification) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content.

A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. In valid documents, this attribute, like any other, must be declared if it is used. When declared, it must be given as an enumerated type whose only possible values are "default" and "preserve". For example:]]>

The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overriden with another instance of the xml:space attribute.

The root element of any document is considered to have signaled no intentions as regards application space handling, unless it provides a value for this attribute or the attribute is declared with a default value.

End-of-Line Handling

XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters carriage-return (#xD) and line-feed (#xA).

To simplify the tasks of applications, wherever an external parsed entity or the literal entity value of an internal parsed entity contains either the literal two-character sequence "#xD#xA" or a standalone literal #xD, an XML processor must pass to the application the single character #xA. (This behavior can conveniently be produced by normalizing all line breaks to #xA on input, before parsing.)

Language Identification

In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by , "Tags for the Identification of Languages": Language Identification LanguageID Langcode ('-' Subcode)* Langcode ISO639Code | IanaCode | UserCode ISO639Code ([a-z] | [A-Z]) ([a-z] | [A-Z]) IanaCode ('i' | 'I') '-' ([a-z] | [A-Z])+ UserCode ('x' | 'X') '-' ([a-z] | [A-Z])+ Subcode ([a-z] | [A-Z])+ The Langcode may be any of the following:

a two-letter language code as defined by , "Codes for the representation of names of languages"

a language identifier registered with the Internet Assigned Numbers Authority ; these begin with the prefix "i-" (or "I-")

a language identifier assigned by the user, or agreed on between parties in private use; these must begin with the prefix "x-" or "X-" in order to ensure that they do not conflict with names later standardized or registered with IANA

There may be any number of Subcode segments; if the first subcode segment exists and the Subcode consists of two letters, then it must be a country code from , "Codes for the representation of names of countries." If the first subcode consists of more than two letters, it must be a subcode for the language in question registered with IANA, unless the Langcode begins with the prefix "x-" or "X-".

It is customary to give the language code in lower case, and the country code (if any) in upper case. Note that these values, unlike other names in XML documents, are case insensitive.

For example: The quick brown fox jumps over the lazy dog.

What colour is it?

What color is it?

Habe nun, ach! Philosophie, Juristerei, und Medizin und leider auch Theologie durchaus studiert mit heißem Bemüh'n. ]]>

The intent declared with xml:lang is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of xml:lang on another element within that content.

A simple declaration for xml:lang might take the form xml:lang NMTOKEN #IMPLIED but specific default values may also be given, if appropriate. In a collection of French poems for English students, with glosses and notes in English, the xml:lang attribute might be declared this way: ]]>

Logical Structures

Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and a value.

Element element EmptyElemTag | STag content ETag

This specification does not constrain the semantics, use, or (beyond syntax) names of the element types and attributes, except that names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization in this or future versions of this specification.

Element Type Match

The Name in an element's end-tag must match the element type in the start-tag.

Element Valid

An element is valid if there is a declaration matching elementdecl where the Name matches the element type, and one of the following holds:

The declaration matches EMPTY and the element has no content.

The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional white space (characters matching the nonterminal S) between each pair of child elements.

The declaration matches Mixed and the content consists of character data and child elements whose types match names in the content model.

The declaration matches ANY, and the types of any child elements have been declared.

Start-Tags, End-Tags, and Empty-Element Tags

The beginning of every non-empty XML element is marked by a start-tag. Start-tag STag '<' Name (S Attribute)* S? '>' Attribute Name Eq AttValue The Name in the start- and end-tags gives the element's type. The Name-AttValue pairs are referred to as the attribute specifications of the element, with the Name in each pair referred to as the attribute name and the content of the AttValue (the text between the ' or " delimiters) as the attribute value.

Unique Att Spec

No attribute name may appear more than once in the same start-tag or empty-element tag.

Attribute Value Type

The attribute must have been declared; the value must be of the type declared for it. (For attribute types, see .)

No External Entity References

Attribute values cannot contain direct or indirect entity references to external entities.

No < in Attribute Values

The replacement text of any entity referred to directly or indirectly in an attribute value (other than "&lt;") must not contain a <.

An example of a start-tag: <termdef id="dt-dog" term="dog">

The end of every element that begins with a start-tag must be marked by an end-tag containing a name that echoes the element's type as given in the start-tag: End-tag ETag '</' Name S? '>'

An example of an end-tag:</termdef>

The text between the start-tag and end-tag is called the element's content: Content of Elements content (element | CharData | Reference | CDSect | PI | Comment)*

If an element is empty, it must be represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. An empty-element tag takes a special form: Tags for Empty Elements EmptyElemTag '<' Name (S Attribute)* S? '/>'

Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag must be used, and can only be used, for elements which are declared EMPTY.

Examples of empty elements: <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/>

Element Type Declarations

The element structure of an XML document may, for validation purposes, be constrained using element type and attribute-list declarations. An element type declaration constrains the element's content.

Element type declarations often constrain which element types can appear as children of the element. At user option, an XML processor may issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.

An element type declaration takes the form: Element Type Declaration elementdecl '<!ELEMENT' S Name S contentspec S? '>' contentspec 'EMPTY' | 'ANY' | Mixed | children where the Name gives the element type being declared.

Unique Element Type Declaration

No element type may be declared more than once.

Examples of element type declarations: <!ELEMENT br EMPTY> <!ELEMENT p (#PCDATA|emph)* > <!ELEMENT %name.para; %content.para; > <!ELEMENT container ANY>

Element Content

An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S). In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear. The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles: Element-content Models children (choice | seq) ('?' | '*' | '+')? cp (Name | choice | seq) ('?' | '*' | '+')? choice '(' S? cp ( S? '|' S? cp )* S? ')' seq '(' S? cp ( S? ',' S? cp )* S? ')' where each Name is the type of an element which may appear as a child. Any content particle in a choice list may appear in the element content at the location where the choice list appears in the grammar; content particles occurring in a sequence list must each appear in the element content in the order given in the list. The optional character following a name or list governs whether the element or the content particles in the list may occur one or more (+), zero or more (*), or zero or one times (?). The absence of such an operator means that the element or content particle must appear exactly once. This syntax and meaning are identical to those used in the productions in this specification.

The content of an element matches a content model if and only if it is possible to trace out a path through the content model, obeying the sequence, choice, and repetition operators and matching each element in the content against an element type in the content model. For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model. For more information, see .

Proper Group/PE Nesting

Parameter-entity replacement text must be properly nested with parenthetized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text.

For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should not be empty, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,).

Examples of element-content models: <!ELEMENT spec (front, body, back?)> <!ELEMENT div1 (head, (p | list | note)*, div2*)> <!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>

Mixed Content

An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements. In this case, the types of the child elements may be constrained, but not their order or their number of occurrences: Mixed-content Declaration Mixed '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')' where the Names give the types of elements that may appear as children.

No Duplicate Types

The same name must not appear more than once in a single mixed-content declaration.

Examples of mixed content declarations: <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* > <!ELEMENT b (#PCDATA)>

Attribute-List Declarations

Attributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags; thus, the productions used to recognize them appear in . Attribute-list declarations may be used:

To define the set of attributes pertaining to a given element type.

To establish type constraints for these attributes.

To provide default values for attributes.

Attribute-list declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type: Attribute-list Declaration AttlistDecl '<!ATTLIST' S Name AttDef* S? '>' AttDef S Name S AttType S DefaultDecl The Name in the AttlistDecl rule is the type of an element. At user option, an XML processor may issue a warning if attributes are declared for an element type not itself declared, but this is not an error. The Name in the AttDef rule is the name of the attribute.

When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name, and at least one attribute definition in each attribute-list declaration. For interoperability, an XML processor may at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error.

Attribute Types

XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types have varying lexical and semantic constraints, as noted: Attribute Types AttType StringType | TokenizedType | EnumeratedType StringType 'CDATA' TokenizedType 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS'

ID

Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

One ID per Element Type

No element type may have more than one ID attribute specified.

ID Attribute Default

An ID attribute must have a declared default of #IMPLIED or #REQUIRED.

IDREF

Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.

Entity Name

Values of type ENTITY must match the Name production, values of type ENTITIES must match Names; each Name must match the name of an unparsed entity declared in the DTD.

Name Token

Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.

Enumerated attributes can take one of a list of values provided in the declaration. There are two kinds of enumerated types: Enumerated Attribute Types EnumeratedType NotationType | Enumeration NotationType 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' Enumeration '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.

Notation Attributes

Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.

Enumeration

Values of this type must match one of the Nmtoken tokens in the declaration.

For interoperability, the same Nmtoken should not occur more than once in the enumerated attribute types of a single element type.

Attribute Defaults

An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document. Attribute Defaults DefaultDecl '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)

In an attribute declaration, #REQUIRED means that the attribute must always be provided, #IMPLIED that no default value is provided. If the declaration is neither #REQUIRED nor #IMPLIED, then the AttValue value contains the declared default value; the #FIXED keyword states that the attribute must always have the default value. If a default value is declared, when an XML processor encounters an omitted attribute, it is to behave as though the attribute were present with the declared default value.

Required Attribute

If the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.

Attribute Default Legal

The declared default value must meet the lexical constraints of the declared attribute type.

Fixed Attribute Default

If an attribute has a default value declared with the #FIXED keyword, instances of that attribute must match the default value.

Examples of attribute-list declarations: <!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST form method CDATA #FIXED "POST">

Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize it as follows:

a character reference is processed by appending the referenced character to the attribute value

an entity reference is processed by recursively processing the replacement text of the entity

a whitespace character (#x20, #xD, #xA, #x9) is processed by appending #x20 to the normalized value, except that only a single #x20 is appended for a "#xD#xA" sequence that is part of an external parsed entity or the literal entity value of an internal parsed entity

other characters are processed by appending them to the normalized value

If the declared value is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

All attributes for which no declaration has been read should be treated by a non-validating parser as if declared CDATA.

Conditional Sections

Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them. Conditional Section conditionalSect includeSect | ignoreSect includeSect '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' ignoreSect '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>' ignoreSectContents Ignore ('<![' ignoreSectContents ']]>' Ignore)* Ignore Char* - (Char* ('<![' | ']]>') Char*)

Like the internal and external DTD subsets, a conditional section may contain one or more complete declarations, comments, processing instructions, or nested conditional sections, intermingled with white space.

If the keyword of the conditional section is INCLUDE, then the contents of the conditional section are part of the DTD. If the keyword of the conditional section is IGNORE, then the contents of the conditional section are not logically part of the DTD. Note that for reliable parsing, the contents of even ignored conditional sections must be read in order to detect nested conditional sections and ensure that the end of the outermost (ignored) conditional section is properly detected. If a conditional section with a keyword of INCLUDE occurs within a larger conditional section with a keyword of IGNORE, both the outer and the inner conditional sections are ignored.

If the keyword of the conditional section is a parameter-entity reference, the parameter entity must be replaced by its content before the processor decides whether to include or ignore the conditional section.

An example: <!ENTITY % draft 'INCLUDE' > <!ENTITY % final 'IGNORE' > <![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]> <![%final;[ <!ELEMENT book (title, body, supplements?)> ]]>

Physical Structures

An XML document may consist of one or many storage units. These are called entities; they all have content and are all (except for the document entity, see below, and the external DTD subset) identified by name. Each XML document has one entity called the document entity, which serves as the starting point for the XML processor and may contain the whole document.

Entities may be either parsed or unparsed. A parsed entity's contents are referred to as its replacement text; this text is considered an integral part of the document.

An unparsed entity is a resource whose contents may or may not be text, and if text, may not be XML. Each unparsed entity has an associated notation, identified by name. Beyond a requirement that an XML processor make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities.

Parsed entities are invoked by name using entity references; unparsed entities by name, given in the value of ENTITY or ENTITIES attributes.

General entities are entities for use within the document content. In this specification, general entities are sometimes referred to with the unqualified term entity when this leads to no ambiguity. Parameter entities are parsed entities for use within the DTD. These two types of entities use different forms of reference and are recognized in different contexts. Furthermore, they occupy different namespaces; a parameter entity and a general entity with the same name are two distinct entities.

Character and Entity References

A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices. Character Reference CharRef '&#' [0-9]+ ';' | '&hcro;' [0-9a-fA-F]+ ';' Legal Character

Characters referred to using character references must match the production for Char.

If the character reference begins with "&#x", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with "&#", the digits up to the terminating ; provide a decimal representation of the character's code point.

An entity reference refers to the content of a named entity. References to parsed general entities use ampersand (&) and semicolon (;) as delimiters. Parameter-entity references use percent-sign (%) and semicolon (;) as delimiters.

Entity Reference Reference EntityRef | CharRef EntityRef '&' Name ';' PEReference '%' Name ';' Entity Declared

In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", the Name given in the entity reference must match that in an entity declaration, except that well-formed documents need not declare any of the following entities: &magicents;. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration.

Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'.

Entity Declared

In a document with an external subset or external parameter entities with "standalone='no'", the Name given in the entity reference must match that in an entity declaration. For interoperability, valid documents should declare the entities &magicents;, in the form specified in . The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration.

Parsed Entity

An entity reference must not contain the name of an unparsed entity. Unparsed entities may be referred to only in attribute values declared to be of type ENTITY or ENTITIES.

No Recursion

A parsed entity must not contain a recursive reference to itself, either directly or indirectly.

In DTD

Parameter-entity references may only appear in the DTD.

Examples of character and entity references: Type <key>less-than</key> (&hcro;3C;) to save options. This document was prepared on &docdate; and is classified &security-level;.

Example of a parameter-entity reference: %ISOLat2;]]>

Entity Declarations

Entities are declared thus: Entity Declaration EntityDecl GEDecl | PEDecl GEDecl '<!ENTITY' S Name S EntityDef S? '>' PEDecl '<!ENTITY' S '%' S Name S PEDef S? '>' EntityDef EntityValue | (ExternalID NDataDecl?) PEDef EntityValue | ExternalID The Name identifies the entity in an entity reference or, in the case of an unparsed entity, in the value of an ENTITY or ENTITIES attribute. If the same entity is declared more than once, the first declaration encountered is binding; at user option, an XML processor may issue a warning if entities are declared multiple times.

Internal Entities

If the entity definition is an EntityValue, the defined entity is called an internal entity. There is no separate physical storage object, and the content of the entity is given in the declaration. Note that some processing of entity and character references in the literal entity value may be required to produce the correct replacement text: see .

An internal entity is a parsed entity.

Example of an internal entity declaration: <!ENTITY Pub-Status "This is a pre-release of the specification.">

External Entities

If the entity is not internal, it is an external entity, declared as follows: External Entity Declaration ExternalID 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral NDataDecl S 'NDATA' S Name If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity.

Notation Declared

The Name must match the declared name of a notation.

The SystemLiteral is called the entity's system identifier. It is a URI, which may be used to retrieve the entity. Note that the hash mark (#) and fragment identifier frequently used with URIs are not, formally, part of the URI itself; an XML processor may signal an error if a fragment identifier is given as part of a system identifier. Unless otherwise provided by information outside the scope of this specification (e.g. a special XML element type defined by a particular DTD, or a processing instruction defined by a particular application specification), relative URIs are relative to the location of the resource within which the entity declaration occurs. A URI might thus be relative to the document entity, to the entity containing the external DTD subset, or to some other external parameter entity.

An XML processor should handle a non-ASCII character in a URI by representing the character in UTF-8 as one or more bytes, and then escaping these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).

In addition to a system identifier, an external identifier may include a public identifier. An XML processor attempting to retrieve the entity's content may use the public identifier to try to generate an alternative URI. If the processor is unable to do so, it must use the URI specified in the system literal. Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.

Examples of external entity declarations: <!ENTITY open-hatch SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY open-hatch PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif >

Parsed Entities The Text Declaration

External parsed entities may each begin with a text declaration. Text Declaration TextDecl &xmlpio; VersionInfo? EncodingDecl S? &pic;

The text declaration must be provided literally, not by reference to a parsed entity. No text declaration may appear at any position other than the beginning of an external parsed entity.

Well-Formed Parsed Entities

The document entity is well-formed if it matches the production labeled document. An external general parsed entity is well-formed if it matches the production labeled extParsedEnt. An external parameter entity is well-formed if it matches the production labeled extPE. Well-Formed External Parsed Entity extParsedEnt TextDecl? content extPE TextDecl? extSubsetDecl An internal general parsed entity is well-formed if its replacement text matches the production labeled content. All internal parameter entities are well-formed by definition.

A consequence of well-formedness in entities is that the logical and physical structures in an XML document are properly nested; no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another.

Character Encoding in Entities

Each external parsed entity in an XML document may use a different encoding for its characters. All XML processors must be able to read entities in either UTF-8 or UTF-16.

Entities encoded in UTF-16 must begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors must be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents.

Although an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. Parsed entities which are stored in an encoding other than UTF-8 or UTF-16 must begin with a text declaration containing an encoding declaration: Encoding Declaration EncodingDecl S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) EncName [A-Za-z] ([A-Za-z0-9._] | '-')* Encoding name contains only Latin characters In the document entity, the encoding declaration is part of the XML declaration. The EncName is the name of the encoding used.

In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-9" should be used for the parts of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should be used for the various encoded forms of JIS X-0208-1997. XML processors may recognize other encodings; it is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority , other than those just listed, should be referred to using their registered names. Note that these registered names are defined to be case-insensitive, so processors wishing to match against them should do so in a case-insensitive way.

In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, for an encoding declaration to occur other than at the beginning of an external entity, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.

It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process.

Examples of encoding declarations: <?xml encoding='UTF-8'?> <?xml encoding='EUC-JP'?>

XML Processor Treatment of Entities and References

The table below summarizes the contexts in which character references, entity references, and invocations of unparsed entities might appear and the required behavior of an XML processor in each case. The labels in the leftmost column describe the recognition context:

as a reference anywhere after the start-tag and before the end-tag of an element; corresponds to the nonterminal content.

as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue.

as a Name, not a reference, appearing either as the value of an attribute which has been declared as type ENTITY, or as one of the space-separated tokens in the value of an attribute which has been declared as type ENTITIES.

as a reference within a parameter or internal entity's literal entity value in the entity's declaration; corresponds to the nonterminal EntityValue.

as a reference within either the internal or external subsets of the DTD, but outside of an EntityValue or AttValue.

Entity Type Character Parameter Internal General External Parsed General Unparsed Reference in Content Not recognized Included Included if validating Forbidden Included Reference in Attribute Value Not recognized Included in literal Forbidden Forbidden Included Occurs as Attribute Value Not recognized Forbidden Forbidden Notify Not recognized Reference in EntityValue Included in literal Bypassed Bypassed Forbidden Included Reference in DTD Included as PE Forbidden Forbidden Forbidden Forbidden Not Recognized

Outside the DTD, the % character has no special significance; thus, what would be parameter entity references in the DTD are not recognized as markup in content. Similarly, the names of unparsed entities are not recognized except when they appear in the value of an appropriately declared attribute.

Included

An entity is included when its replacement text is retrieved and processed, in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain both character data and (except for parameter entities) markup, which must be recognized in the usual way, except that the replacement text of entities used to escape markup delimiters (the entities &magicents;) is always treated as data. (The string "AT&amp;T;" expands to "AT&T;" and the remaining ampersand is not recognized as an entity-reference delimiter.) A character reference is included when the indicated character is processed in place of the reference itself.

Included If Validating

When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor must include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor may, but need not, include the entity's replacement text. If a non-validating parser does not include the replacement text, it must inform the application that it recognized, but did not read, the entity.

This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.

Forbidden

The following are forbidden, and constitute fatal errors:

the appearance of a reference to an unparsed entity.

the appearance of any character or general-entity reference in the DTD except within an EntityValue or AttValue.

a reference to an external entity in an attribute value.

Included in Literal

When an entity reference appears in an attribute value, or a parameter entity reference appears in a literal entity value, its replacement text is processed in place of the reference itself as though it were part of the document at the location the reference was recognized, except that a single or double quote character in the replacement text is always treated as a normal data character and will not terminate the literal. For example, this is well-formed: ]]> while this is not: <!ENTITY EndAttr "27'" > <element attribute='a-&EndAttr;>

Notify

When the name of an unparsed entity appears as a token in the value of an attribute of declared type ENTITY or ENTITIES, a validating processor must inform the application of the system and public (if any) identifiers for both the entity and its associated notation.

Bypassed

When a general entity reference appears in the EntityValue in an entity declaration, it is bypassed and left as is.

Included as PE

Just as with external parsed entities, parameter entities need only be included if validating. When a parameter-entity reference is recognized in the DTD and included, its replacement text is enlarged by the attachment of one leading and one following space (#x20) character; the intent is to constrain the replacement text of parameter entities to contain an integral number of grammatical tokens in the DTD.

Construction of Internal Entity Replacement Text

In discussing the treatment of internal entities, it is useful to distinguish two forms of the entity's value. The literal entity value is the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue. The replacement text is the content of the entity, after replacement of character references and parameter-entity references.

The literal entity value as given in an internal entity declaration (EntityValue) may contain character, parameter-entity, and general-entity references. Such references must be contained entirely within the literal entity value. The actual replacement text that is included as described above must contain the replacement text of any parameter entities referred to, and must contain the character referred to, in place of any character references in the literal entity value; however, general-entity references must be left as-is, unexpanded. For example, given the following declarations: ]]> then the replacement text for the entity "book" is: La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights; The general-entity reference "&rights;" would be expanded should the reference "&book;" appear in the document's content or an attribute value.

These simple rules may have complex interactions; for a detailed discussion of a difficult example, see .

Predefined Entities

Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (&magicents;) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "&#60;" and "&#38;" may be used to escape < and & when they occur in character data.

All XML processors must recognize these entities whether they are declared or not. For interoperability, valid XML documents should declare these entities, like any others, before using them. If the entities in question are declared, they must be declared as internal entities whose replacement text is the single character being escaped or a character reference to that character, as shown below. ]]> Note that the < and & characters in the declarations of "lt" and "amp" are doubly escaped to meet the requirement that entity replacement be well-formed.

Notation Declarations

Notations identify by name the format of unparsed entities, the format of elements which bear a notation attribute, or the application to which a processing instruction is addressed.

Notation declarations provide a name for the notation, for use in entity and attribute-list declarations and in attribute specifications, and an external identifier for the notation which may allow an XML processor or its client application to locate a helper application capable of processing data in the given notation. Notation Declarations NotationDecl '<!NOTATION' S Name S (ExternalID | PublicID) S? '>' PublicID 'PUBLIC' S PubidLiteral

XML processors must provide applications with the name and external identifier(s) of any notation declared and referred to in an attribute value, attribute definition, or entity declaration. They may additionally resolve the external identifier into the system identifier, file name, or other information needed to allow the application to call a processor for data in the notation described. (It is not an error, however, for XML documents to declare and refer to notations for which notation-specific applications are not available on the system where the XML processor or application is running.)

Document Entity

The document entity serves as the root of the entity tree and a starting-point for an XML processor. This specification does not specify how the document entity is to be located by an XML processor; unlike other entities, the document entity has no name and might well appear on a processor input stream without any identification at all.

Conformance Validating and Non-Validating Processors

Conforming XML processors fall into two classes: validating and non-validating.

Validating and non-validating processors alike must report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read.

Validating processors must report violations of the constraints expressed by the declarations in the DTD, and failures to fulfill the validity constraints given in this specification. To accomplish this, validating XML processors must read and process the entire DTD and all external parsed entities referenced in the document.

Non-validating processors are required to check only the document entity, including the entire internal DTD subset, for well-formedness. While they are not required to check the document for validity, they are required to process all the declarations they read in the internal DTD subset and in any parameter entity that they read, up to the first reference to a parameter entity that they do not read; that is to say, they must use the information in those declarations to normalize attribute values, include the replacement text of internal entities, and supply default attribute values. They must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations.

Using XML Processors

The behavior of a validating XML processor is highly predictable; it must read every piece of a document and report all well-formedness and validity violations. Less is required of a non-validating processor; it need not read any part of the document other than the document entity. This has two effects that may be important to users of XML processors:

Certain well-formedness errors, specifically those that require reading external entities, may not be detected by a non-validating processor. Examples include the constraints entitled Entity Declared, Parsed Entity, and No Recursion, as well as some of the cases described as forbidden in .

The information passed from the processor to the application may vary, depending on whether the processor reads parameter and external entities. For example, a non-validating processor may not normalize attribute values, include the replacement text of internal entities, or supply default attribute values, where doing so depends on having read declarations in external or parameter entities.

For maximum reliability in interoperating between different XML processors, applications which use non-validating processors should not rely on any behaviors not required of such processors. Applications which require facilities such as the use of default attributes or internal entities which are declared in external entities should use validating XML processors.

Notation

The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form symbol ::= expression

Symbols are written with an initial capital letter if they are defined by a regular expression, or with an initial lower case letter otherwise. Literal strings are quoted.

Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

where N is a hexadecimal integer, the expression matches the character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated. The number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant for XML.

matches any character with a value in the range(s) indicated (inclusive).

matches any character with a value outside the range indicated.

matches any character with a value not among the characters given.

matches a literal string matching that given inside the double quotes.

matches a literal string matching that given inside the single quotes.

These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions:

expression is treated as a unit and may be combined as described in this list.

matches A or nothing; optional A.

matches A followed by B.

matches A or B but not both.

matches any string that matches A but does not match B.

matches one or more occurrences of A.

matches zero or more occurrences of A.

Other notations used in the productions are:

comment.

well-formedness constraint; this identifies by name a constraint on well-formed documents associated with a production.

validity constraint; this identifies by name a constraint on valid documents associated with a production.

References Normative References (Internet Assigned Numbers Authority) Official Names for Character Sets, ed. Keld Simonsen et al. See ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets. IETF (Internet Engineering Task Force). RFC 1766: Tags for the Identification of Languages, ed. H. Alvestrand. 1995. (International Organization for Standardization). ISO 639:1988 (E). Code for the representation of names of languages. [Geneva]: International Organization for Standardization, 1988. (International Organization for Standardization). ISO 3166-1:1997 (E). Codes for the representation of names of countries and their subdivisions — Part 1: Country codes [Geneva]: International Organization for Standardization, 1997. ISO (International Organization for Standardization). ISO/IEC 10646-1993 (E). Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane. [Geneva]: International Organization for Standardization, 1993 (plus amendments AM 1 through AM 7). The Unicode Consortium. The Unicode Standard, Version 2.0. Reading, Mass.: Addison-Wesley Developers Press, 1996. Other References Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Reading: Addison-Wesley, 1986, rpt. corr. 1988. Berners-Lee, T., R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax and Semantics. 1997. (Work in progress; see updates to RFC1738.) Brüggemann-Klein, Anne. Regular Expressions into Finite Automata. Extended abstract in I. Simon, Hrsg., LATIN 1992, S. 97-98. Springer-Verlag, Berlin 1992. Full Version in Theoretical Computer Science 120: 197-213, 1993. Brüggemann-Klein, Anne, and Derick Wood. Deterministic Regular Languages. Universität Freiburg, Institut für Informatik, Bericht 38, Oktober 1991. James Clark. Comparison of SGML and XML. See http://www.w3.org/TR/NOTE-sgml-xml-971215. IETF (Internet Engineering Task Force). RFC 1738: Uniform Resource Locators (URL), ed. T. Berners-Lee, L. Masinter, M. McCahill. 1994. IETF (Internet Engineering Task Force). RFC 1808: Relative Uniform Resource Locators, ed. R. Fielding. 1995. IETF (Internet Engineering Task Force). RFC 2141: URN Syntax, ed. R. Moats. 1997. ISO (International Organization for Standardization). ISO 8879:1986(E). Information processing — Text and Office Systems — Standard Generalized Markup Language (SGML). First edition — 1986-10-15. [Geneva]: International Organization for Standardization, 1986. ISO (International Organization for Standardization). ISO/IEC 10744-1992 (E). Information technology — Hypermedia/Time-based Structuring Language (HyTime). [Geneva]: International Organization for Standardization, 1992. Extended Facilities Annexe. [Geneva]: International Organization for Standardization, 1996. Character Classes

Following the characteristics defined in the Unicode standard, characters are classed as base characters (among others, these contain the alphabetic characters of the Latin alphabet, without diacritics), ideographic characters, and combining characters (among others, this class contains most diacritics); these classes combine to form the class of letters. Digits and extenders are also distinguished. Characters Letter BaseChar | Ideographic BaseChar [#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] | [#x0141-#x0148] | [#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] | [#x01F4-#x01F5] | [#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] | #x0386 | [#x0388-#x038A] | #x038C | [#x038E-#x03A1] | [#x03A3-#x03CE] | [#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3] | [#x0401-#x040C] | [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] | [#x0490-#x04C4] | [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] | [#x04EE-#x04F5] | [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 | [#x0561-#x0586] | [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] | [#x0641-#x064A] | [#x0671-#x06B7] | [#x06BA-#x06BE] | [#x06C0-#x06CE] | [#x06D0-#x06D3] | #x06D5 | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D | [#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] | [#x0993-#x09A8] | [#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] | [#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] | [#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] | [#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] | [#x0A85-#x0A8B] | #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] | [#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 | [#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] | [#x0B2A-#x0B30] | [#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D | [#x0B5C-#x0B5D] | [#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] | [#x0B92-#x0B95] | [#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] | [#x0BA3-#x0BA4] | [#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] | [#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] | [#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] | [#x0C92-#x0CA8] | [#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE | [#x0CE0-#x0CE1] | [#x0D05-#x0D0C] | [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] | [#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 | [#x0E32-#x0E33] | [#x0E40-#x0E45] | [#x0E81-#x0E82] | #x0E84 | [#x0E87-#x0E88] | #x0E8A | #x0E8D | [#x0E94-#x0E97] | [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] | #x0EA5 | #x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 | [#x0EB2-#x0EB3] | #x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] | [#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] | [#x1105-#x1107] | #x1109 | [#x110B-#x110C] | [#x110E-#x1112] | #x113C | #x113E | #x1140 | #x114C | #x114E | #x1150 | [#x1154-#x1155] | #x1159 | [#x115F-#x1161] | #x1163 | #x1165 | #x1167 | #x1169 | [#x116D-#x116E] | [#x1172-#x1173] | #x1175 | #x119E | #x11A8 | #x11AB | [#x11AE-#x11AF] | [#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB | #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] | [#x1F18-#x1F1D] | [#x1F20-#x1F45] | [#x1F48-#x1F4D] | [#x1F50-#x1F57] | #x1F59 | #x1F5B | #x1F5D | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE | [#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] | [#x1FE0-#x1FEC] | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 | [#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094] | [#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3] Ideographic [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029] CombiningChar [#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] | [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF | [#x05C1-#x05C2] | #x05C4 | [#x064B-#x0652] | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] | [#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] | [#x0901-#x0903] | #x093C | [#x093E-#x094C] | #x094D | [#x0951-#x0954] | [#x0962-#x0963] | [#x0981-#x0983] | #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] | [#x09C7-#x09C8] | [#x09CB-#x09CD] | #x09D7 | [#x09E2-#x09E3] | #x0A02 | #x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] | [#x0A47-#x0A48] | [#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC | [#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] | #x0B3C | [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] | [#x0B56-#x0B57] | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] | [#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] | [#x0C3E-#x0C44] | [#x0C46-#x0C48] | [#x0C4A-#x0C4D] | [#x0C55-#x0C56] | [#x0C82-#x0C83] | [#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] | [#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] | [#x0D4A-#x0D4D] | #x0D57 | #x0E31 | [#x0E34-#x0E3A] | [#x0E47-#x0E4E] | #x0EB1 | [#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] | [#x0F18-#x0F19] | #x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F | [#x0F71-#x0F84] | [#x0F86-#x0F8B] | [#x0F90-#x0F95] | #x0F97 | [#x0F99-#x0FAD] | [#x0FB1-#x0FB7] | #x0FB9 | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] | #x3099 | #x309A Digit [#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29] Extender #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]

The character classes defined here can be derived from the Unicode character database as follows:

Name start characters must have one of the categories Ll, Lu, Lo, Lt, Nl.

Name characters other than Name-start characters must have one of the categories Mc, Me, Mn, Lm, or Nd.

Characters in the compatibility area (i.e. with character code greater than #xF900 and less than #xFFFE) are not allowed in XML names.

Characters which have a font or compatibility decomposition (i.e. those with a "compatibility formatting tag" in field 5 of the database -- marked by field 5 beginning with a "<") are not allowed.

The following characters are treated as name-start characters rather than name characters, because the property file classifies them as Alphabetic: [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.

Characters #x20DD-#x20E0 are excluded (in accordance with Unicode, section 5.14).

Character #x00B7 is classified as an extender, because the property list so identifies it.

Character #x0387 is added as a name character, because #x00B7 is its canonical equivalent.

Characters ':' and '_' are allowed as name-start characters.

Characters '-' and '.' are allowed as name characters.

XML and SGML

XML is designed to be a subset of SGML, in that every valid XML document should also be a conformant SGML document. For a detailed comparison of the additional restrictions that XML places on documents beyond those of SGML, see .

Expansion of Entity and Character References

This appendix contains some examples illustrating the sequence of entity- and character-reference recognition and expansion, as specified in .

If the DTD contains the declaration An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;).

" > ]]> then the XML processor will recognize the character references when it parses the entity declaration, and resolve them before storing the following string as the value of the entity "example": An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;).

]]>
A reference in the document to "&example;" will cause the text to be reparsed, at which time the start- and end-tags of the "p" element will be recognized and the three references will be recognized and expanded, resulting in a "p" element with the following content (all data, no delimiters or markup):

A more complex example will illustrate the rules and their effects fully. In the following example, the line numbers are solely for reference. 2 4 5 ' > 6 %xx; 7 ]> 8 This sample shows a &tricky; method. ]]> This produces the following:

in line 4, the reference to character 37 is expanded immediately, and the parameter entity "xx" is stored in the symbol table with the value "%zz;". Since the replacement text is not rescanned, the reference to parameter entity "zz" is not recognized. (And it would be an error if it were, since "zz" is not yet declared.)

in line 5, the character reference "&#60;" is expanded immediately and the parameter entity "zz" is stored with the replacement text "<!ENTITY tricky "error-prone" >", which is a well-formed entity declaration.

in line 6, the reference to "xx" is recognized, and the replacement text of "xx" (namely "%zz;") is parsed. The reference to "zz" is recognized in its turn, and its replacement text ("<!ENTITY tricky "error-prone" >") is parsed. The general entity "tricky" has now been declared, with the replacement text "error-prone".

in line 8, the reference to the general entity "tricky" is recognized, and it is expanded, so the full content of the "test" element is the self-describing (and ungrammatical) string This sample shows a error-prone method.

Deterministic Content Models

For compatibility, it is required that content models in element type declarations be deterministic.

SGML requires deterministic content models (it calls them "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors.

For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the parser cannot know which b in the model is being matched without looking ahead to see which element follows the b. In this case, the two references to b can be collapsed into a single reference, making the model read (b, (c | d)). An initial b now clearly matches only a single name in the content model. The parser doesn't need to look ahead to see what follows; either c or d would be accepted.

More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of Aho, Sethi, and Ullman . In many such algorithms, a follow set is constructed for each position in the regular expression (i.e., each leaf node in the syntax tree for the regular expression); if any position has a follow set in which more than one following position is labeled with the same element type name, then the content model is in error and may be reported as an error.

Algorithms exist which allow many but not all non-deterministic content models to be reduced automatically to equivalent deterministic models; see Brüggemann-Klein 1991 .

Autodetection of Character Encodings

The XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use—which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways: each implementation is assumed to support only a finite set of character encodings, and the XML encoding declaration is restricted in position and content in order to make it feasible to autodetect the character encoding in use in each entity in normal cases. Also, in many cases other sources of information are available in addition to the XML data stream itself. Two cases may be distinguished, depending on whether the XML entity is presented to the processor without, or with, any accompanying (external) information. We consider the first case first.

Because each XML entity not in UTF-8 or UTF-16 format must begin with an XML encoding declaration, in which the first characters must be '<?xml', any conforming processor can detect, after two to four octets of input, which of the following cases apply. In reading this list, it may help to know that in UCS-4, '<' is "#x0000003C" and '?' is "#x0000003F", and the Byte Order Mark required of UTF-16 data streams is "#xFEFF".

00 00 00 3C: UCS-4, big-endian machine (1234 order)

3C 00 00 00: UCS-4, little-endian machine (4321 order)

00 00 3C 00: UCS-4, unusual octet order (2143)

00 3C 00 00: UCS-4, unusual octet order (3412)

FE FF: UTF-16, big-endian

FF FE: UTF-16, little-endian

00 3C 00 3F: UTF-16, big-endian, no Byte Order Mark (and thus, strictly speaking, in error)

3C 00 3F 00: UTF-16, little-endian, no Byte Order Mark (and thus, strictly speaking, in error)

3C 3F 78 6D: UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding which ensures that the characters of ASCII have their normal positions, width, and values; the actual encoding declaration must be read to detect which of these applies, but since all of these encodings use the same bit patterns for the ASCII characters, the encoding declaration itself may be read reliably

4C 6F A7 94: EBCDIC (in some flavor; the full encoding declaration must be read to tell which code page is in use)

other: UTF-8 without an encoding declaration, or else the data stream is corrupt, fragmentary, or enclosed in a wrapper of some kind

This level of autodetection is enough to read the XML encoding declaration and parse the character-encoding identifier, which is still necessary to distinguish the individual members of each family of encodings (e.g. to tell UTF-8 from 8859, and the parts of 8859 from each other, or to distinguish the specific EBCDIC code page in use, and so on).

Because the contents of the encoding declaration are restricted to ASCII characters, a processor can reliably read the entire encoding declaration as soon as it has detected which family of encodings is in use. Since in practice, all widely used character encodings fall into one of the categories above, the XML encoding declaration allows reasonably reliable in-band labeling of character encodings, even when external sources of information at the operating-system or transport-protocol level are unreliable.

Once the processor has detected the character encoding in use, it can act appropriately, whether by invoking a separate input routine for each case, or by calling the proper conversion function on each character of input.

Like any self-labeling system, the XML encoding declaration will not work if any software changes the entity's character set or encoding without updating the encoding declaration. Implementors of character-encoding routines should be careful to ensure the accuracy of the internal and external information used to label the entity.

The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. Rules for the relative priority of the internal label and the MIME-type label in an external header, for example, should be part of the RFC document defining the text/xml and application/xml MIME types. In the interests of interoperability, however, the following rules are recommended.

If an XML entity is in a file, the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

If an XML entity is delivered with a MIME type of text/xml, then the charset parameter on the MIME type determines the character encoding method; all other heuristics and sources of information are solely for error recovery.

If an XML entity is delivered with a MIME type of application/xml, then the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

These rules apply only in the absence of protocol-level documentation; in particular, when the MIME types text/xml and application/xml are defined, the recommendations of the relevant RFC will supersede these rules.

W3C XML Working Group

This specification was prepared and approved for publication by the W3C XML Working Group (WG). WG approval of this specification does not necessarily imply that all WG members voted for its approval. The current and former members of the XML WG are:

Jon Bosak, SunChair James ClarkTechnical Lead Tim Bray, Textuality and NetscapeXML Co-editor Jean Paoli, MicrosoftXML Co-editor C. M. Sperberg-McQueen, U. of Ill.XML Co-editor Dan Connolly, W3CW3C Liaison Paula Angerstein, Texcel Steve DeRose, INSO Dave Hollander, HP Eliot Kimber, ISOGEN Eve Maler, ArborText Tom Magliery, NCSA Murray Maloney, Muzmo and Grif Makoto Murata, Fuji Xerox Information Systems Joel Nava, Adobe Conleth O'Connell, Vignette Peter Sharpe, SoftQuad John Tigue, DataChannel
XML-DOM-1.44/BUGS0000644000076400007640000000057207573673250013452 0ustar tjmathertjmatherThere have been reports of XML::DOM core dumping when the 'use diagnostics' pragma is set. Patches welcome! t/dom_jp_print fails when using a locally compiled Perl 5.8.0 t/dom_jp_print........FAILED test 2 Failed 1/3 tests, 66.67% okay The test passes with the Mandrake Perl 5.8.0 RPM - perhaps it is a bug in Perl that Mandrake fixed in their release of Perl 5.8.0? XML-DOM-1.44/CheckAncestors.pm0000644000076400007640000000543707050647536016227 0ustar tjmathertjmather# # Perl module for testing the XML::DOM module. # Used by the test cases in the 't' directory. # Recursively walks the node tree and checks parent/child and document links. # use strict; package CheckAncestors; use XML::DOM; use Carp; BEGIN { # import the constants for accessing member fields, e.g. _Doc import XML::DOM::Node qw{ :Fields }; import XML::DOM::DocumentType qw{ :Fields }; } sub new { my %args = (Mark => {}); bless \%args, $_[0]; } sub check { my ($self, $node) = @_; # check if node was already seen croak "found Node twice [$node]" if ($self->{Mark}->{$node}); $self->{Mark}->{$node} = $node; # check if document is correct my $doc = $self->{Doc}; if (defined $doc) { my $doc2 = $node->[_Doc]; croak "wrong Doc [$doc] [$doc2]" if $doc != $doc2; } else { $self->{Doc} = $doc; } # check if node's children know their parent # and, recursively, check each kid my $nodes = $node->getChildNodes; if ($nodes) { for my $kid (@$nodes) { my $parent = $kid->getParentNode; croak "wrong parent node=[$node] parent=[$parent]" if ($parent != $node); $self->check ($kid); } } # check NamedNodeMaps my $type = $node->getNodeType; if ($type == XML::DOM::Node::ELEMENT_NODE || $type == XML::DOM::Node::ATTLIST_DECL_NODE) { $self->checkAttr ($node, $node->[_A]); } elsif ($type == XML::DOM::Node::DOCUMENT_TYPE_NODE) { $self->checkAttr ($node, $node->[_Entities]); $self->checkAttr ($node, $node->[_Notations]); } } # (This should have been called checkNamedNodeMap) sub checkAttr { my ($self, $node, $attr) = @_; return unless defined $attr; # check if NamedNodeMap was already seen croak "found NamedNodeMap twice [$attr]" if ($self->{Mark}->{$attr}); $self->{Mark}->{$attr} = $attr; # check if document is correct my $doc = $self->{Doc}; if (defined $doc) { my $doc2 = $attr->getProperty ("Doc"); croak "wrong Doc [$doc] [$doc2]" if $doc != $doc2; } else { $self->{Doc} = $attr->getProperty ("Doc"); } # check if NamedNodeMap knows his daddy my $parent = $attr->getProperty ("Parent"); croak "wrong parent node=[$node] parent=[$parent]" unless $node == $parent; # check if NamedNodeMap's children know their parent # and, recursively, check the child nodes my $nodes = $attr->getValues; if ($nodes) { for my $kid (@$nodes) { my $parent = $kid->{InUse}; croak "wrong InUse attr=[$attr] parent=[$parent]" if ($parent != $attr); $self->check ($kid); } } } sub doit { my $node = shift; my $check = new CheckAncestors; eval { $check->check ($node); }; if ($@) { print "checkAncestors failed:\n$@\n"; return 0; } return 1; } 1; XML-DOM-1.44/XML-Parser-2.31.patch0000644000076400007640000000172007554315005016245 0ustar tjmathertjmather--- XML-Parser-2.31/Expat/Expat.pm 2002-04-02 12:35:54.000000000 -0500 +++ XML-Parser-2.32/Expat/Expat.pm 2002-10-19 12:51:00.000000000 -0400 @@ -568,7 +568,8 @@ } else { my $sep = $self->{Type} == CHOICE ? '|' : ','; - $ret = '(' . join($sep, @{$self->{Children}}) . ')'; + my @children_str = map { $_->asString } @{$self->{Children}}; + $ret = '(' . join($sep, @children_str) . ')'; } $ret .= $self->{Quant} if $self->{Quant}; Only in XML-Parser-2.32/Expat: Expat.pm~ diff -ur XML-Parser-2.31/Expat/Expat.xs XML-Parser-2.32/Expat/Expat.xs --- XML-Parser-2.31/Expat/Expat.xs 2002-04-02 12:35:54.000000000 -0500 +++ XML-Parser-2.32/Expat/Expat.xs 2002-10-19 12:20:12.000000000 -0400 @@ -259,7 +259,7 @@ switch(model->type) { case XML_CTYPE_NAME: - hv_store(hash, "Tag", 3, newSVpv((char *)model->name, 0), 0); + hv_store(hash, "Tag", 3, newUTF8SVpv((char *)model->name, 0), 0); break; case XML_CTYPE_MIXED: XML-DOM-1.44/Makefile.PL0000644000076400007640000000253507711324317014731 0ustar tjmathertjmatheruse ExtUtils::MakeMaker; sub MY::libscan { package MY; my ($self, $file) = @_; # Don't install these PM files (or Emacs or other backups: *~ *.bak) # Also don't install XML/Parser.pod and XML/Parser/Expat.pod because I copied # those from the XML::Parser distribution. return undef if $file =~ /(XML.Parser\.pod|Expat\.pod|CmpDOM|CheckAncestors|~$|\.bak$)/; return $self->SUPER::libscan ($file); } # See lib/ExtUtils/MakeMaker.pm for details of how to influence # the contents of the Makefile that is written. WriteMakefile( NAME => 'XML-DOM', VERSION_FROM => 'lib/XML/DOM.pm', # XML::Parser 2.28 and above work, but make test # doesn't pass because of different ways # errors are reported PREREQ_PM => { 'XML::Parser' => '2.30', # LWP::UserAgent is used when parsing XML from URLs # It's part of libwww-perl, and you don't strictly need it # (some test cases may fail) 'LWP::UserAgent' => '0', # XML::Parser::PerlSAX is part of libxml-perl. # It's used by some test cases in t/chk_batch.t and you # don't strictly need it. Version 0.05 causes errors in the # test cases in t/chk_batch.t. 'XML::Parser::PerlSAX' => '0.07', 'XML::RegExp' => 0, }, dist => {'COMPRESS' => 'gzip', 'SUFFIX' => '.gz'}, ); XML-DOM-1.44/README0000644000076400007640000000555107711323456013643 0ustar tjmathertjmather Perl module: XML-DOM Copyright (c) 1999,2000 Enno Derksen All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The XML::DOM code is fairly stable and has been used quite a bit. However, there is a new DOM module, XML::GDOME which is under active development and significantly faster than XML::DOM, since it is based on the libgdome C library. It provides Level 2 of the DOM Core API. For more details see http://tjmather.com/xml-gdome/ Patches welcome! Send them to tjmather@maxmind.com Paid support is available from directly from the maintainers of this package. Please see http://www.maxmind.com/app/opensourceservices for more details. ========= DEPENDENCIES ========================================================= You need the following modules (all available at CPAN): - Perl 5.6.0 or higher (can run under earlier versions, simply remove use bytes; from lib/XML/DOM.pm) - XML::RegExp - XML::Parser (At least version 2.28, 2.30 recommended) If you are using XML::Parser 2.27, then you should download libxml-enno-1.02 from your local CPAN mirror. If you are using Perl 5.8.0 or greater and XML::Parser 2.32 or lesser, you must apply the included XML-Parser-2.31.patch. - LWP::UserAgent (It's part of libwww-perl. If you don't have it, some test cases may fail and you can't read files from URLs.) - XML::Parser::PerlSAX (It's part of libxml-perl. You need at least version 0.06. If you don't have it some test cases may fail.) ========= INSTALLATION ========================================================= To configure this module, cd to the directory that contains this README file and type the following. perl Makefile.PL Alternatively, if you plan to install XML::Parser somewhere other than your system's perl library directory. You can type something like this: perl Makefile.PL PREFIX=/home/me/perl INSTALLDIRS=perl Then to build you run make. make You can then test the module by typing: make test If you have write access to the perl library directories, you may then install by typing: make install ============= XML::DOM ========================================================= This is a Perl extension to XML::Parser. It adds a new 'Style' to XML::Parser, called 'Dom', that allows XML::Parser to build an Object Oriented datastructure with a DOM Level 1 compliant interface. For a description of the DOM (Document Object Model), see http://www.w3.org/DOM/ XML::Parser is a Perl extension interface to James Clark's XML parser, expat. It requires at least version 5.004 of perl and can be found on CPAN. This is a beta version and although there will not be any major API changes, minor changes may occur as we get feedback from the people on the perl-xml mailing list. [You can subscribe to this list by sending a message to subscribe-perl-xml@listserv.activestate.com.]