dom4j-1.6.1/ 0000755 0001750 0001750 00000000000 10242120021 012044 5 ustar ebourg ebourg dom4j-1.6.1/build.xml 0000644 0001750 0001750 00000065575 10242120011 013706 0 ustar ebourg ebourg Fixed bug in Fixed bug in Fixed bug in Improved performance of Added flag to Upgraded dependencies to their latest version on ibiblio. Added method to Fixed a ClassCastException bug in Fixed a bug in Fixed bug which prevented an element's namespace prefix from being registered for use in xpath expressions (contributed by Todd Wolff). Fixed bug in Added a bunch of patches to make the dom4j DOM classes more DOM compliant (contributed by Curt Arnold). Fixed bug in Fixed bug in Fixed bug in Fixed bug in Added initial support for STaX streams (contributed by Christian Niles). Fixed encoding bug in Fixed bug in SAXReader that caused problems resolving relative URIs when parsing java.io.File Objects (reported by Kohsuke Kawaguchi). The iterators returned by the Element.elementIterator(...) methods now support remove(). DOMWriter writes now DOM Level 2 attributes and elements (reported by Geert Dendoncker and Joury Gokel). Use latest implementation of the Aelfred parser. Fixed some problems with internal/external DTD declarations (reported by Bryan Thompson). Upgraded to Jaxen 1.1 beta 2. Ignore attribute order when comparing Elements in Fixed bug in XMLWriter where namespace declarations were duplicated. Fixed bug in parsing a Processing Instruction (reported by Vladimir Kralik). Added support for Stylesheet modes (reported by Mark Diggory). Don't escape " and ' characters in attribute values if it's not necessary (contributed by Christian Niles). Fixed some Fixed some datatype issues (reported by Thomas Draier). Fixed an bug where the EntityResolver was not set on the XMLReader. Fixed multithreaded access on Fixed problem parsing XML Files (reported by Geoffrey Vlassaks). Added xml:space attribute support based on XML Specification 1.0. Maven build of dom4j is now nearly complete. Maven is now used for the website generation. Fixed some bugs in BackedList (contributed by Alessandro Vernet). Added patch supplied by Dan Jacobs that fixes some entity encoding problems in XMLWriter - cheers Dan!
Patched the DOMElement replaceChild method to return the correct Node and to throw a DOMException when trying to replace a non-existing child.
Added patch to BackedList that could cause IndexOutOfBoundsExceptions to be thrown
that was kindly supplied by Andy Yang - thanks Andy!
Update of Cookbook containing a chapter about rule API.
Patched SAXWriter to not pass in null System or Public IDs which can cause problems in Saxon.
Patched dom4j to work against Jaxen 1.0 RC1 or later build.
Applied patch to bug found by Tom Oehser that XPath expressions using elements or attributes whose name starts
with '_' were not being handled correctly. It turns out this was a SAXPath issue.
Applied patch to bug found by Soumanjoy Das that creating a new DOMDocument then calling createElement() would generate
a ClassCastException.
Applied patch supplied by James Dodd that fixes a MIME encoding issue in the embedded Aelfred parser
Applied patch to fix bug found by David Frankson. Adding attributes with null values causes problems in XSLT engines
so now adding a null valued attribute is equivalent to removing the attribute. So null attribute values are silently ignored.
e.g.
BeanElement
which prevented proper execution of the bean samples (contributed by Wonne Keysers).STAXEventWriter
now uses XMLEventConsumer
instead of XMLEventWriter
(contributed by Christian Niles).SAXReader
that caused problems parsing files in OSX (reported by Paul Libbrecht).XMLWriter
that caused whitespace to be added between successive calls of the characters(...)
method (reported by Paul Libbrecht).NamespaceCache
in multithreaded environments (contributed by Brett Finnell).OutputFormat
that supresses newline after XML declaration.DocumentHelper
that allows user to specify encoding when parsing an xml String
(contributed by Todd Wolff).BeanElement
.SAXContentHandler
which caused a NullPointerException
in some
situations.XMLWriter
that caused duplication of the default namespace declaration (reported by Todd Wolff).DispatchHandler
which made the handler not reusable (reported by Ricardo Leon).SAXContentHandler
that caused incorrect CDATA section parsing (contributed by Todd Wolff).SAXContentHandler
that caused incorrect entity handling.XMLWriter
causing padding to be disabled, even if enabled in the specified outputformat (reported by Bo Gundersen).Document.asXML()
and DocumentHelper.parseText()
.NodeComparator
.DOMNodeHelper
issues (reported by Henner Kollmann).DefaultElement
.
Element element = ...;
element.addAttribute( "foo", "123" );
...
Attribute attribute = element.attribute( "foo" );
assertTrue( attribute != null );
...
element.addAttribute( "foo", null );
attribute = element.attribute( "foo" );
assertTrue( attribute == null );
Applied patch to bug found by Mike Skells that was causing XPath.matches() to return true for absolute XPaths which returned different nodes to the node provided to the XPath.
Applied patch provided by Stefan that was causing IndexOutOfBoundsException when using the evaluate() method in DefaultXPath on an empty result set. Also added a test case to org.dom4j.TestXPathBug called testStefan().
Applied patch suggested by Frank Walinsky, that XPath objects are now Serializable.
Applied patch provided by Bill Burton that fixes union pattern matching.
Added a new Swing TableModel for displaying XML data in a Swing user interface. It uses an XPath based model to define the rows and column values. A table definition can be specified using a simple XML format and then loaded in a small amount of code. e.g. here's an example of a table that will list the servlets used in a web.xml document
<table select="/web-app/servlet"> <column select="servlet-name">Name</column> <column select="servlet-class">Class</column> <column select="../servlet-mapping[servlet-name=$Name]/url-pattern">Mapping</column> </table>
Notice the use of the $Name XPath variable to access other cells on the row. Here's the pseudo code to display a table for an XML document.
Document tableDefinition = ...; Document source = ...; TableModel tableModel = new XMLTableModel( tableDefinition, source ); JTable table = new JTable( tableModel );
There is a sample program in samples/swing/JTableTool which will display any table definition for a given source XML document. There is an example table definition for the periodic table in xml/swing/tableForAtoms.xml.
Added a new helper method to make it easier to create namespace contexts for doing namespace aware XPath expressions. The new setNamespaceURIs(Map) method on XPath makes it easier to pass in the prefixes and URIs you wish to use in an XPath expression. Here's an example of it in action
Map uris = new HashMap(); uris.put( "SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/" ); uris.put( "m", "urn:xmethodsBabelFish" ); XPath xpath = document.createXPath( "/SOAP-ENV:Envelope/SOAP-ENV:Body/m:BabelFish" ); xpath.setNamespaceURIs( uris ); Node element = xpath.selectSingleNode( document );
In addition DocumentFactory has a setXPathNamespaceURIs(Map) method so that common namespace URIs can be associated with a DocumentFactory so namespace prefixes can be used across many XPath expressions in an easy way. e.g.
// register prefixes with my factory Map uris = new HashMap(); uris.put( "SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/" ); uris.put( "m", "urn:xmethodsBabelFish" ); DocumentFactory factory = new DocumentFactory(); factory.setXPathNamespaceURIs( uris ); // now parse a document using my factory SAXReader reader = new SAXReader(); reader.setDocumentFactory( factory ); Document doc = reader.read( "soap.xml" ); // now lets use the prefixes Node element = doc.selectSingleNode( "/SOAP-ENV:Envelope/SOAP-ENV:Body/m:BabelFish" );
There is a new mergeAdjacentText option available on SAXReader to concatenate adjacent text nodes into a single Text node.
In addition there is a new stripWhitespaceText option to strip text which occurs between start/end tags which only consists of whitespace.
For example, parsing the following XML with the stripWhitespaceText option enabled and the mergeAdjacentText option enabled will result in a single child node of the parent element, rather than 3 (2 text nodes containing whitespace and one element).
<parent> <child>foo</child> </parent>
Note that this option will not break most mixed content markup such as the following, since its only whitespace between tag start/ends that gets removed; non-whitespace strings are not trimmed.
<p>hello <b>James</b> how are you?</p>
Both these options together can improve the parsing performance by around 10-12% depending on the structure of the document. Though the whitespace layout of the XML document will be lost, so only use these modes in data-centric applications like XML messaging and SOAP.
So a typical SOAP or XML messaging developer, who may care more about performance than preserving exact whitespace layout, may use the following code to make the SAX parsing more optimal.
SAXReader reader = new SAXReader(); reader.setMergeAdjacentText( true ); reader.setStripWhitespaceText( true ); Document doc = reader.read( "soap.xml" );
Applied patch to HTMLWriter to fix bug found by Dominik Deimling that was not correctly outputting CDATA sections correctly.
Patched the setName() method on Element so that elements can be renamed. Also added a new setQName() to the Element interface so that elements can be renamed in a namespace aware manner. Thanks to Robert Lebowitz for this.
Applied fix to bug found by Manfred Lotz that XMLWriter in whitespace trimming mode would sometimes not correctly insert a space when text is seperated by newlines. The Test case testWhitespaceBug() in org.dom4j.TestXMLWriter reproduces the bug that has now been fixed.
Applied patches supplied by Stefan Graeber that enhance the datatype support to support included schemata and derived element types.
Applied patches suggested by Omer van der Horst Jansen to enable dom4j to fully work properly on JDK1.1 platforms. There were some uses of java.util.Stack which have been changed to ArrayList.
Applied patches supplied by Maarten Coene that fixes some issues with using the correct DocumentFactory when using the DOM implementation.
Updated the MSV support to comply with the latest MSV version, 1.12 (Nov 01 2001). In addition the MSVDemo.java in dom4j/src/samples/validate has been replaced by JARVDemo.java which now uses the JARV API to validate a dom4j document using the MSV implementation. This demo can validate any XML document against any DTD, XML Schema, Relax NG, Relax or Trex schema - thanks to the excellent JARV API and MSV library.
Applied patches supplied by Steen Lehmann that fixes handling of external DTD entities in SAXContentHandler and fix the XML output of the ExternalEntityDecl
Applied patch to bug found by Steen Lehmann that XPath expressions on the root element were not correctly handling namespaces correctly. The test case is demonstrated in dom4j/src/test/org/dom4j/xpath/TestSelectSingleNode.java
Added patch found by Howard Moore when using XTags that XPath string values which contained strings with entities, such as the use of & in a text, would result in redundant spaces occuring, breaking URLs.
Added a new package, org.dom4j.dtd which contains some DTD declaration classes which are added to the DocumentType interfaces List of declarations. This is useful for finding out details of the attribute or element delcarations inside either the internal or external DTD subset of a document.
To expand internal or external DTD subsets when parsing with SAXReader use the 2 properties on SAXReader (and SAXContentHandler).
SAXReader reader = new SAXReader(); reader.setIncludeInternalDTDDeclarations( true ); reader.setIncludeExternalDTDDeclarations( true ); Document doc = reader.read( "foo.xml" ); DocumentType docType = doc.getDocType(); List internalDecls = docType.getInternalDeclarations(); List externalDecls = docType.getExternalDeclarations();
This new feature means that XML documents which use internal DTD subsets, external DTDs or a mixture of internal and external DTD subsets can now be properly round tripped.
Note that there appears to be a bug in Crimson 1.1.3 which does not properly differentiate between internal or external DTD subsets. Refer to the startDTD() method of LexicalHandler for details of how startEntity/endEntity is meant to demark external DTD subsets.
Its our intention to expand internal DTD subsets by default (so that documents can be properly round tripped by default) but require external DTD subsets to be explicitly enabled via the property on the SAXReader (or SAXContentHandler). This bug in Crimson causes all DTD declarations to appear as internal DTD subsets, which both is a performance overhead and breaks round tripping of documents which just use external DTD declarations. So until this matter is resolved both internal and external declarations are not expanded by default.
Note that the code works perfectly against Xerces.
Applied patch submitted by Yuxin Ruan which fixes some issues with XML Schema Data Type support
Followed Dennis Sosnoski's suggestion, adding a null text String to an Element now throws an IllegalArgumentException. To ensure that the IllegalArgumentException is not thrown its advisable to check for null first. For example...
Element element = ...; String text = ...; // might throw IllegalArgumentException // if text == null element.addText( text ); // safer to do this if ( text != null ) { element.addText( text ); }
Fixed problem found by Kesav Kumar Kolla whereby a deserialized Document could have problems if new elements were attempted to be added. The problem was an issue with DocumentFactory not correctly deserializing itself properly.
Fixed problem found by David Hooker with Ant build file for the binary and source distribution that was not including the manifest file in the distribution.
Applied patch submitted by Lari Hotari that was causing the XMLWriter to fail when used as a SAX XMLFilter or ContentHandler to turn SAX events into XML text. Thanks Lari!
Fixed bug found by Kohsuke Kawaguchi that there was a problem in XMLWriter during its serialization of a document which redeclared the default namespace prefix. It turned out to be a bug in org.dom4j.tree.NamespaceStack where redeclarations of namespace prefixes were not being handled properly during serialization. The test cases in org.dom4j.TestXMLWriter and org.dom4j.TestNamespaces have been improved to test these features more rigorously.
Fixed bug found by Toby that was causing a security exception in applets when using a DocumentFactory.
Implemented the suggestion by Kesav Kumar, that the detach() method now returns the node (this) so that moving nodes from one part of a document to any another can now be one line of code. Here's an example of it in use.
Document doc1 = ...; Document doc2; = ...; Element destination = doc2.getRootElement(); Element source = doc1.selectSingleNode( "//foo[@style='bar']" ); // lets move the source to the destination destination.add( source.detach() );
Added better checking in selectSingleNode() implementation so that XPath expressions which do not return a Node throw a meaningful exception (not ClassCastException) informing the user of why the XPath expression did not succeed.
Added patch found by Kesav Kumar that a document containing null Strings would cause a NullPointerException to be thrown if it was passed into SAXWriter (used by the JAXP - XSLT code). Now the SAXWriter will quietly ignore null Strings, as will XMLWriter.
Added helper method setXMLFilter() to SAXReader making it easier to install SAX filters to filter or preprocess SAX events before a dom4j Document is created. Also added a new sample program called sax.FilterDemo that demonstrates how to use a SAX filter with dom4j.
Added full support for Jaxen function, namespace and variable context interfaces. This allows the XPath engine to be fully customized. e.g.
XPath xpath = document.createXPath( "//foo[@code='123']" ); // customize function, namespace and variable contexts xpath.setFunctionContext( myFunctionContext ); xpath.setNamespaceContext( myNamespaceContext ); xpath.setVariableContext( myVariableContext ); List nodes = xpath.selectNodes( document );
Added new helper class org.dom4j.util.XMLErrorHandler
which
turns SAX ErrorHandler callbacks into XML that can then be output in a JAXM or SOAP message
or styled via XSLT or whatever.
Added new helper method DocumentHelper.makeElement(doc, "a/b/c")
which will
navigate from a document or element to the given simple path, creating new elements along the way if need be.
This allows elements to be found or created using a simple path expression mechansim.
Added helper method getQName(String qualifiedName) to Element so that easier element name matching can be done. Here are some examples of it in use.
// find all elements with a local name of "foo" // in any namespace List list = element.elements( "foo" ); // find all elements with a local name "foo" // and the default namespace URI List list = element.elements( element.getQName( "foo" ) ); // find all elements which match the local name "foo" // and the namespace URI mapped to the "x" prefix List list = element.elements( element.getQName( "x:foo" ) );
Added helper method on org.dom4j.DocumentFactory
called getQNames
that returns a List of all the QNames that were used to parse the documents.
Added an EntityResolver property to SAXReader to make it easier to configure a specific EntityResolver.
Added patch so that patterns such as @id='123'
and name()='foo'
are now
working properly again. Also patterns such as not(@id='123')
work now too.
Patched the dynamic loading of classes to fix some ClassLoader issues found with some application servers.
Ported the data type support to work with the latest MSV library from Sun
Fixed bug spotted by Stefan Graeber that was causing a DocumentException to be thrown with Xerces when turning validation mode on.
Patched bug in QName which was using the qualified name rather than the local name along with the namespace URI to determine equality.
Added patch kindly supplied by Michal Palicka that SAXReader was passing in the wrong name for the SAX string-interning feature. Thanks Michal!
Fixed the behaviour of DocumentFactory.createXPathFilter() to use XPath filtering rather than XSLT style patterns. One of the major differences is that an XSLT pattern (used in the <xsl:template match="pattern"/> element in XSLT) works slightly differently. An element <foo> would match an XSLT pattern "foo" whereas an element <bar> could match an XPath filter "foo" if it contained a child <foo> element.
Patched the behaviour of Node.matches(String xpathExpression) so that it uses XPath filters now rather than XSLT patterns.
Patched bug in XRule implementation in org.dom4j.rule that was causing ordering problems when using stylesheets - the Rule precendence order was not being correctly used.
Backed out a previous patch added to 0.9 such that attributes with no namespace prefix are in no namespace. An attribute does not inherit the default namespace - the only way to put an attribute into a namespace is via a namespace prefix.
Patched XMLWriter to that a flush() is not required when using an OutputStream and the various sub-document write() methods are called such as write(Element), write(Attribute), write(Node), write(Namespace) etc.
Fixed bug in SAXReader that setEntityResolver() was not always behaving properly. Also the default entity resolver used to locate XML Schemas seems to work properly now.
Moved the XML Schema Data Type supporting classes in org.dom4j.schema.Schema* to org.dom4j.datatype.Datatype*. This should avoid confusion and better describe the intent of the classes, to implement Data typing, rather than schema validation. We hope to use the MSV library for all of our schema validation requirements.
The XPath engine in dom4j has been migrated to using Jaxen. This single XPath engine can be plugged into any model such that Jaxen will support DOM, dom4j, EXML and JDOM. Hopefully we'll get Jaxen working on Java Beans too.
In general this will mean a much better, more compliant and more bug-free XPath engine for dom4j as it will be used extensively across XML object models.
Already numerous irregularities have been fixed in the XPath support in dom4j. We have donated the dom4j XPath test harness to Jaxen so that we now have a large rigorous test harness to ensure correct XPath behaviour - this test harness is run against all 4 current XML object models to ensure consistent behaviour and valid XPath compliance.
We are also in the process of migrating over our XPath extension functions as well as adding additional XPath functions such as those defined in XSLT and XPointer.
New class org.dom4j.io.XMLResult which is-a JAXP Result which uses the same org.dom4j.io.OutputFormat object to provide its formatting options to allow XML output from JAXP (such as via XSLT) to be pretty printed.
XMLWriter now implements the SAX XMLFilter interface so that it can be added to a SAX parsing filter chain to output the XML being parsed in a simple way. Many thanks to Joseph Bowbeer for his help in this area.
Added setProperty() and setFeature() methods to SAXReader to allow the easy configuration of custom parser properties via SAXReader, such as being able to specify the location of schema or DTD resources.
Added new method OutputFormat.createCompactFormat() for those wishing to output their XML in a compact format, such as in messaging systems.
Fixed bug in getNamespaceForPrefix() where if the prefix is null or "" and there is a default namespace defined, this method was returning a namespace instance with the incorrect URI.
Patched DOM writer so that it uses JAXP if it is available on the CLASSPATH using namespace aware mode by default.
Fixed a number of issues relating to namespaces and the redefinition of namespace prefixes. We now have a quite aggressive JUnit test harness to ensure that we handle namespace URIs correctly when prefixes are mapped and unmapped.
Applied patch from Andrew Wason for HTMLWriter to support the full HTML 4.01 DTD elements which do not require proper XML element closes. The new elements are PARAM, AREA, LINK, COL, BASE and META.
Fixed bug found by Dennis Sosnoski that SAX warnings were causing exceptions to be thrown by the SAXReader. Now warnings are silently ignored. If you want to detect warnings then an ErrorHandler should be registered with the SAXReader.
Patched bug that was also found by Jonathan Doughty for the non-standard behaviour of the FilterIterator. Also added Jonathan's JUnit test case to the distribution so that this problem should not come back.
Fixed bug that when round tripping into JAXP and back again, sometimes additional namespace attributes were appearing. Now the TestRoundTrip JUnit test case includes JAXP round tripping.
Fixed bug that attributes without a namespace prefix which are inside an element with a default namespace declaration of the form xmlns="theURI", the attribute now correctly inherits the namespace URI.
Applied patch found by Stefan Graeber that the UserDataFactory was not correctly creating UserDataAttribute instances.
Fixed bug that SAXWriter and DocumentSource were not correctly producing lexical events such as entities, comments and DOCTYPE declarations. Many thanks to Joseph Bowbeer for his help in this area.
hasContent()
has been added to the Node interface so that it is easy to decide if a node is a leaf node or not. This method was suggested by Dane Foster. This method returns true if the node is a Branch (i.e. an Element or Document) which contains at least one node.
getPath(Element context)
getUniquePath(Element context)
These new methods allow paths and unique paths to be created relatively. Previously both getPath() and getUniquePath() would create absolute XPath expressions. These new methods allow relative path expressions to be created by providing an ancestor Element from which to make the path expression. This method was suggested by Chris Nokleberg.
Fixed bug found by Chris Nokleberg when using the UserDataElement that the clone() and createCopy() methods were not correctly copying the user data object. A JUnit test case has been added that tests this fix (org.dom4j.TestUserData). If any deep copying of user data objects is required then UserDataElement now has a method getCopyOfUserData() which can be overloaded to perform a deep copy of any user data objects if required.
Minor patch for dom4j implementors wishing to create their own QName implementations. Previously the DocumentFactory class was hardwired to use QNameCache internally which was hard wired to only create QName instances. Now some factory methods have been added such that you can derive your own DocumentFactory which uses your own QNameCache which creates your own QName classes.
If JAXP can be found in the CLASSPATH then it is now used first by the SAXReader to find the correct SAX parser class. We have found that sometimes (e.g. Tomcat 4.0 beta 6) the value of the org.xml.sax.driver system property is set to a class which is not in the CLASSPATH but a valid JAXP parser is present. So now we try JAXP first, then the SAX system property then if all else fails we use the bundled Aelfred SAX parser.
Fixed XPath bug found by James Elson that the path /foo[@a and @b] or /foo[@a='1' and @b='2'] was no longer working correctly. This is now fixed and many tests of this nature have been added to the JUnit test harness.
Fixed some namespace related bugs found by Steen Lehmann. It appears that for a document of:-
<a xmlns="dummyNamespace"> <b> <c>Hello</c> </b> </a>
Then the path /a/b/c will not find anything - this is correct according to the XPath spec. Instead the path /*[name()='a']/*[name()='b']/*[name()='c'] is required. These changes have been applied to getPath() and getUniquePath() such that these methods now work, irrespectively of the namespaces used in a document. Finally many new test cases have been added to validate a variety of XPath expressions with various uses of namespaces.
SAXWriter now fully supports the SAX feature "http://xml.org/sax/features/namespace-prefixes". Failure to support this feature properly was causing problems when outputting a dom4j Document using JAXP - the namespace declarations often did not appear correctly.
Patched bug in XMLWriter which caused multiple duplicate namespace declarations to sometimes appear.
The SAXPath project is a Simple API for XPath parsing. Its analogous to SAX in that the API abstracts away the details of parsing XPath expressions and provides a simple event based callback interface.
Originally dom4j was using a parser generated via the Antlr tool which resulted in a considerably larger code base. Now dom4j uses SAXPath for its XPath parsing which results in faster XPath parsing and a much smaller code base.
The dom4j.jar is now about 100 Kb smaller! Also several XPath related bugs are now fixed. For example the numeric paths like '2 + count(//foo)' are now working.
Fixed bug found by Tobias Rademacher that XML Schema Data Type support wasn't working correctly when the XSD document used a namespace prefix. The bug was hidden by a further bug in the JUnit test case that was not correctly testing this case. Both these bugs are now fixed.
Fixed bug found by Piero de Salvia that some invalid XPath expressions were not correctly throwing exceptions. Now any attempt to use any invalid XPath expressions should result in an InvalidXPathException being thrown.
Applied patch submitted by Theodor Schwarzinger that fixes the preceding-sibling and preceding axes.
Fixed bug found my James Elson that the normalize() method was being quite agressive and removing all text nodes! New JUnit test case added to ensure this doesn't break again.
Improved the setContent() semantics on Branch (and so Element and Document) such that the parent and document relationships are correctly removed for old content and added for new content. As a helper method, the setContent() method will clone any content nodes which are already part of an existing document. So for example the following code will clone the content of a document.
Document doc1 = ...; Document doc2 = DocumentHelper.createDocument(); doc2.setContent( doc1.content() );Though this behaviour is much more useful when used with elements...
Element sourceElement; Element destElement; // copy the content of sourceElemenet destElement.setContent( sourceElement.content() );
Support has been added for Java Serialization so dom4j documents can be serialized over RMI or EJB calls. Note that currently Serialization is much slower (by a factor of 2-5 times) than using the textual format of XML so we recommend sending XML text over RMI rather than serialization if possible. Over time we will tune the serialization implementation to be at least as fast as using the text format (even if that means under the covers we just use the text format).
Fixed bug in XPath engine found by Christophe Ponsard for paths of the form /* which were not finding anything. Now we have an extensible XPath test harness (in src/test/org/dom4j/TestXPathExamples.java) which contains some test cases for these kinds of paths. We can extend these cases to test other XPath expressions easily.
Fixed bug in elementByID() method found by Thomas Nichols that was resulting in the element not being found correctly.
Fixed bug in IndexedElement reported by Kerstin Gr�nefeld that was causing a null pointer exception when using XPath on an IndexedElement.
Applied the patch supplied by Mike Skells that fix problems with the getUniquePath() method not returning properly indexed elements
Applied a fix to the problem found by Dane Foster when using dom4j with JTidy. JTidy returns null for getLocalName() so DOMReader has been patched to handle nulls returned from either getLocalName() or getName().
Fixed bug reported anonymously to the Sourceforge Site here that explicitly creating a Document from an existing Element could cause problems when using XMLWriter.
Assorted performance tunings of SAX parsing, avoiding unnecessary repeated code paths.
Tidied factory and construction of Element code such that there are no longer dependencies on the SAX Attributes class. This was originally added as a performance enhancement, but after further refactoring this is now no longer needed. This makes the process of creating new Element derivations or DocumentFactory implementations easier.
For those wishing to do value based comparisons of Nodes, Element, Attributes, Documents or Document fragments there is a new NodeComparator class which implements the Comparator interface from the Java Collections Framework.
A new helper method has been added for parsing text. For example:-
Document document = DocumentHelper.parseText( "<team> <author>James</author> </team>" );
The Branch interface (and so Document and Element interfaces) has a new normalize() method that has the same semantics as the same method in the DOM API to remove empty Text nodes and merge adjacent Text nodes.
A document can now be constructed more easily now that the addXXX() methods return a reference to the Document or Element which created them. An example is shown below
import org.dom4j.Document; import org.dom4j.DocumentHelper; import org.dom4j.Element; public class Foo { public Document createDocument() { Document document = DocumentHelper.createDocument(); Element root = document.addElement( "root" ); Element author1 = root.addElement( "author" ) .addAttribute( "name", "James" ) .addAttribute( "location", "UK" ) .addText( "James Strachan" ); Element author2 = root.addElement( "author" ) .addAttribute( "name", "Bob" ) .addAttribute( "location", "Canada" ) .addText( "Bob McWhirter" ); return document; } }
Note that the addElement() method returns the new child element not the parent element.
To promote consistency, the Element.setAttributeValue() method is now deprecated and should be replaced with Element.addAttribute().
Applied Theo's patch for cloning of Documents correctly together with JUnit test cases to ensure this keeps working.
Applied Rob Wilson's patch that NullPointerExceptions were being thrown if a Document is output with the XMLWriter and an attribute value is null.
Fixed problem found by Nicolas Fonrose that XPath expressions using namespace prefixes were not working correcty.
Fixed problem found by Thomas Nichols whereby default namespaces with no prefix were not being processed correctly. As a result of finding this bug we now have a rigorous JUnit round trip test harness in place which highlighted a number of issues with namespaces when round tripping from dom4j to SAX to DOM to Text and back again. These issues have now been fixed and should not show up again hopefully.
Fixed some detach() bugs that were found with Attributes.
Default encoding is now "UTF-8" rather than "UTF8". Thanks to Thomas Nichols for spotting that one. Also the default line seperator when using XMLWriter is now "\n" rather than "\r\n"
If an XMLWriter is used with an OutputStream then an explicit call to flush() is no longer required after calling write(Document)
Some housekeeping was performed in the naming of some implementation classes. The old XPathXXX.java classes in the org.dom4j.tree package where XXX = Attribute, CDATA, Comment, Entity, ProcessingInstruction and Text have been renamed to DefaultXXX and the corresponding DefaultXXX has been renamed to FlyweightXXX. This makes it clearer the purpose of these implementation classes. The default implementations of the leaf nodes are mutable but cannot be shared across elements. The FlyweightXXX implementations are immutable and can be shared across nodes and documents.
A new enhanced event notification mechanism has been implemented by David White. Now you can register multple ElementHandler instances with a SAXReader object before you parse a document such that the different handlers are notified when different paths are reached.
The ElementHandler
interface now has both onStart()
and onEnd()
allowing more fine grained control over when you are called
and the ability to perform actions before or after the content
for an Element is populated.
The methods also take a reference to a
ElementPath
to allow more optimised and powerful access to the path to the specified document.
This release contains an alpha release of XML Schema Data Type support. The main class in question is the XML Schema Data Type aware DatatypeDocumentFactory which will create an XML Schema Data Type aware XML object model.
The getData()
and setData(Object)
methods
on
Attribute and
Element
allow access to the concrete data types such as Dates and Numbers.
Applied Theo's patch for the XPath substring
function
that was causing the incorrect string indexes to be returned.
The substring
now returns the correct answer.
Applied Theo's patch for incorrectly escaping of element text.
Fixed bug in the XPath engine for absolute path expressions which now work correctly when applied to leaf nodes.
Fixed bug
in the name()
and local-name()
functions such that the following expressions now work fine
local-name(..), name(parent::*)
.
A variety of minor performance tuning optimisations have been made.
The org.dom4j.io.OutputFormat
class now has a new helper
method to make it easier to create pretty print formatting objects.
The new method is OutputFormat.createPrettyPrint()
.
So to pretty print some XML (trimming all whitespace and indenting nicely)
the following code should do the job...
OutputFormat format = OutputFormat.createPrettyPrint(); XMLWriter writer = new XMLWriter( out, format ); writer.write( document ); writer.close();
SAXReader.read(String url)
can now accept either
a URL or a file name which makes things a little easier.
The logic uses the existence of a ':' in the url String to determine if
it should be treated as a URL or a File name.
For more explicit control over whether documents are Files or URLs
call SAXReader.read(File file)
or SAXReader.read(URL url)
A new extension function, matrix-concat() was submitted by James Pereira. By default, doing concat() functions in XPath the 'string-value' is taken for each argument. So for a document:-
<root project="dom4j"> <contributor>James Pereira</contributor> <contributor>Bob McWhirter</contributor> </root;>
Then the XPath
concat( 'thanks ', /root/contributor )
would return
"thanks James Pereira"
as the /root/contributor expression matches a node set of 2 elements, but the "string-value" takes the first elements text. Whereas matrix-contact will do a cartesian product of all the arguments and then do the concatenation of each combination of nodes. So
matrix-concat( 'thanks ', /root/contributor )
will produce
"thanks James Pereira" "thanks Bob McWhirter"
The cartesian product is done such that multiple paths can be used.
matrix-concat( 'thanks ', /root/contributor, ' for working on ', '/@project' )
will produce
"thanks James Pereira for working on dom4j" "thanks Bob McWhirter for working on dom4j"
Fixed bug where XMLWriter.write(Object)
was not correctly
writing a Document instance.
Finally, a couple of small issues with the build process have been fixed. The dom4j.jar no longer contains any SAX or DOM classes (they are all in dom4j-full.jar) And the Antlr grammar files for the XPath parser are now corrrectly included in the binary distribution.
There following new features were added:-
Document document = new SAXReader().read( new File( "customers.xml" ) ); List customers = document.selectNodes( "//CUSTOMER", "@name", true );
This release also includes full XPath source code.
Initial release which comes complete with DOM, JAXP and SAX support and integrated XPath
Iterator<Node> iter = element.nodeIterator(); while ( iter.hasNext() ) { Node node = iter.next(); } Iterator<Element> iter2 = element.elementIterator( "foo" ); while ( iter2.hasNext() ) { Element foo = iter2.next(); }
The following functions are not yet fully supported in the inbuilt XPath engine
The optional W3C DOM implementation of the dom4j API is not yet at full DOM compliance
The following people have contributed to the dom4j project. Many thanks to you all!
dom4j is an Open Source XML framework for Java. dom4j allows you to read, write, navigate, create and modify XML documents. dom4j integrates with DOM and SAX and is seamlessly integrated with full XPath support.
We use an Apache-style open source license which is one of the least restrictive licenses around, you can use dom4j to create new products without them having to be open source.
You can find a copy of the license here.
The dom4j.jar only contains the dom4j classes. If you want to use a SAX parser, you'll have to include the SAX classes and the SAX parser of your choice to your CLASSPATH. If you want to use XPath expressions, you also have to include the jaxen.jar to your CLASSPATH.
dom4j can use your existing XML parser and/or DOM implementation (such as Crimson or Xerces if you want it to. dom4j can also use JAXP to configure which SAX Parser to use - just add the jaxp.jar to your CLASSPATH and whichever SAX parser you wish away you go.
DOM is a quite large language independent API. dom4j is a simpler, lightweight API making extensive use of standard Java APIs such as the Java 2 collections API.
Remark that dom4j fully supports the DOM standard allowing both APIs to be used easily together.
dom4j is a different project and different API to JDOM though they both have similar goals. They both attempt to make it easier to use XML on the Java platform. They differ in their design, API and implementation.
dom4j is based on Java interfaces so that plug and play document object model implementations are allowed and encouraged such as small, read only, quick to create implementations or bigger, highly indexed fast to naviagte implementations or implementations which read themselves lazily from a database or Java Beans etc.
dom4j uses polymorphism extensively such that all document object types implement the Node interface. Also both the Element and Document interfaces can be used polymorphically as they both extend the Branch interface.
dom4j is fully integrated with XPath support throughout the API so doing XPath expressions is as easy as
SAXReader reader = new SAXReader(); Document document = reader.read( url ); List links = document.selectNodes( "//a[@href]" ); String title = document.valueOf( "/head/title" );
dom4j will soon provide a configuration option to support the W3C DOM API natively to avoid unnecessary tree duplication when using dom4j with XSLT engines etc.
You can create dom4j documents from XML text, SAX events or existing DOM trees or you can write dom4j documents as SAX events, DOM trees or XML text.
dom4j integrates with XSLT using the JAXP standard (TrAX) APIs. A dom4j Document can be used as the source of XML to be styled or the source of the stylesheet. A dom4j Document can also be used as the result of a transformation.
First you'll need to use JAXP to load a Transformer.
import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.stream.StreamSource; import org.dom4j.Document; import org.dom4j.DocumentResult; import org.dom4j.DocumentSource; ... TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer( new StreamSource( "foo.xsl" ) );
Now that you have a transformer its easy to style a Document into a new Document.
DocumentSource source = new DocumentSource( document ); DocumentResult result = new DocumentResult(); transformer.transform( source, result ); Document transformedDoc = result.getDocument();
If you want to transform a Document into XML text you can use JAXP as follows:-
DocumentSource source = new DocumentSource( document ); DocumentResult result = new StreamResult( new FileReader( "output.xml" ) ); transformer.transform( source, result );
For more information on JAXP and (TrAX) try Sun's JAXP site.
You can control the format of the XML text output by
XMLWriter
by using the
OutputFormat
object.
You can explicitly set the various formatting options via the
properties methods of the OutputFormat object. There is also a helper
method OutputFormat.createPrettyPrint()
which creates
the default pretty-print format.
So to pretty print some XML (trimming all whitespace and indenting nicely) the following code should do the job...
OutputFormat format = OutputFormat.createPrettyPrint(); XMLWriter writer = new XMLWriter( out, format ); writer.write( document ); writer.close();
Sometimes you have a String (or StringBuffer) which contains the XML
to be parsed. This can be parsed using SAXReader
and the StringReader
from the JDK. For example:-
import org.dom4j.Document; import org.dom4j.DocumentException; import org.dom4j.DocumentHelper; public class Foo { public Document getDocument() throws DocumentException { return DocumentHelper.parseText( "<root> <child id='1'>James</child> </root>" ); } }
dom4j by default uses identity based equality for performance. It avoids having to walk entire documents or document fragments when putting nodes in collections.
To compare 2 nodes (attributes, elements, documents etc) for equality the NodeComparator can be used.
Node node1 = ...; Node node2 = ...; NodeComparator comparator = new NodeComparator(); if ( comparator.compare( node1, node2 ) == 0 ) { // nodes are equal! }
If you are having problems comparing documents that you think are equal but the NodeComparator decides that they are different, you might find the following useful.
In dom4j/test/src/org/dom4j/AbstractTestCase.java is-a JUnit TestCase and is an abstract base class for dom4j test cases. It contains a whole bunch of useful assertion helper methods for testing documents, nodes and fragments being equal. The nice thing is that you get useful messages telling you exactly why they are different, so its pretty easy to track down. For example.
public MyTest extends AbstractTestCase { ... public void testSomething() { Document doc1 = ...; Document doc2 = ...; assertDocumentsEqual( doc1, doc2 ); ... assertNodesEqual( doc1.getRootElement(), doc2.getRootElement() ); } }
dom4j provides an event based model for processing XML documents. Using this event based model allows developers to prune the XML tree when parts of the document have been successfully processed avoiding having to keep the entire document in memory.
For example, imagine you need to process a very large XML file that is generated externally by some database process and looks something like the following (where N is a very large number).
...
...
......
]]>
We can process each <ROW> at a time, there is no need to keep all of them in memory at once. dom4j provides a Event Based Mode for this purpose. We can register an event handler for one or more path expressions. These handlers will then be called on the start and end of each path registered against a particular handler. When the start tag of a path is found, the onStart method of the handler registered to the path is called. When the end tag of a path if found, the onEnd method of the handler registered to that path is called.
The onStart and onEnd methods are passed an instance of an ElementPath, which can be used to retrieve the current Element for the given path. If the handler wishes to "prune" the tree being built in order to save memory use, it can simply call the detach() method of the current Element being processed in the handlers onEnd() method.
So to process each <ROW> individually we can do the following.
// enable pruning mode to call me back as each ROW is complete SAXReader reader = new SAXReader(); reader.addHandler( "/ROWSET/ROW", new ElementHandler() { public void onStart(ElementPath path) { // do nothing here... } public void onEnd(ElementPath path) { // process a ROW element Element row = path.getCurrent(); Element rowSet = row.getParent(); Document document = row.getDocument(); ... // prune the tree row.detach(); } } ); Document document = reader.read(url); // The document will now be complete but all the ROW elements // will have been pruned. // We may want to do some final processing now ...
Yes. dom4j supports the visitor pattern via the Visitor interface.
Here is an example.
protected void foo(Document doc) { // lets use the Visitor Pattern to // navigate the document for entities Visitor visitor = new VisitorSupport() { public void visit(Entity entity) { System.out.println( "Entity name: " + entity.getName() + " text: " + entity.getText() ); } }; doc.accept( visitor ); }
Yes. The selectNodes() is a really useful feature to allow nodes to be selected from any object in the dom4j object model via an XPath expression. The List that is returned can be sorted by specifying another XPath expression to use as the sorting comparator.
For example the following code parses an XML play and finds all the SPEAKER elements sorted in name order.
SAXReader reader = new SAXReader(); Document document = reader.read( new File( "xml/much_ado.xml" ) ); List speakers = document.selectNodes( "//SPEAKER", "." );
In the above example the name of the SPEAKER is defined by the XPath expression "." as the name is stored in the text of the SPEAKER element. If the name was defined by an attribute called "name" then the XPath expression "@name" should be used for sorting.
You may wish to remove duplicates while sorting such that (for example) the distinct list of SPEAKER elements is returned, sorted by name. To do this add an extra parameter to the selectNodes() method call.
List distinctSpeakers = document.selectNodes( "//SPEAKER", ".", true );
In dom4j being able to navigate up a tree towards the parent and to be able to change a tree are optional features. These features are optional so that an implementation can create memory efficient read only document models which conserve memory by sharing imutable objects (such as interning Atttributes).
There are some helper methods to determine if optional features are implemented. Here is some example code demonstrating their use.
protected void foo(Node node) { // can we do upward navigation? if ( ! node.supportsParent() ) { throw new UnsupportedOperationException( "Cannot navigate upwards to parent" ); } Element parent = node.getParent(); System.out.println( "Node: " + node + " has parent: " + parent ); if ( parent != null ) { // can I modify the parent? if ( parent.isReadOnly() ) { throw new UnsupportedOperationException( "Cannot modify parent as it is read only" ); } parent.setAttributeValue( "bar", "modified" ); } }
If dom4j detects JAXP on the classpath it tries to use it to load a SAX parser.
If it can't load the SAX parser via JAXP it then tries to use the
org.xml.sax.driver
system property to denote the SAX parser to use.
If none of the above work dom4j outputs a warning and continues, using its own
internal Aelfred2 parser instead.
The following warning is a result of JAXP being in the classpath but either an old JAXP1.0 version was found (rather than JAXP 1.1) or there is no JAXP configured parser (such as crimson.jar or xerces.jar) on the classpath.
Warning: Error occurred using JAXP to load a SAXParser. Will use Aelfred instead
So the warning generally indicates an incomplete JAXP classpath and is
nothing to worry excessively about.
If you'd like to see the full verbose reason why the load of a JAXP
parser failed then you can try setting the system property
org.dom4j.verbose=true
. e.g.
java -Dorg.dom4j.verbose=true MyApp
And you should see a verbose list of why the load of a SAX parser via JAXP failed.
To avoid this warning happening either remove the jaxp.jar from your classpath or add a JAXP 1.1. jaxp.jar together with a JAXP 1.1 parser such as crimson.jar or xerces.jar to your classpath.
dom4j works with any SAX parser via JAXP. So putting a recent distribution of crimson.jar or xerces.jar on the CLASSPATH will allow Crimson or Xerces's parser to be used.
If no SAX parser is on the classpath via JAXP or the SAX org.xml.sax.driver system property then the embedded Aelfred distribution will be used instead. Note that the embedded Aelfred distribution is a non validating parser, though it is quite fast
If a recent version of crimson.jar or xerces.jar is on the CLASSPATH then dom4j will use that as the SAX parser via JAXP. If none of these are on the CLASSPATH then a bundled version of Aelfred is used, which does not validate.
So to perform DTD validation when parsing put crimson.jar or xerces.jar on the CLASSPATH. If you wish to validate against an XML Schema then try xerces.jar. Then use the following code.
// turn validation on SAXReader reader = new SAXReader(true); Document document = reader.read( "foo.xml" );
Note: if you want to validate against an XML Schema with xerces, you need to enable the XML Schema validation with the "setFeature" method. For more information about xerces features visit the xerces website. Below is a code sample to enable XML Schema validation.
// turn validation on SAXReader reader = new SAXReader(true); // request XML Schema validation reader.setFeature("http://apache.org/xml/features/validation/schema", true); Document document = reader.read( "foo.xml" );
An alternative approach is to use Sun's MSV library for validation, which allows you to use DTD, XML Schema, Relax NG, Relax or TREX as the schema languages. There's an example in the daily build at dom4j/src/samples/validate/JARVDemo.java
If you are validating an existing dom4j document then we recommend you try MSV as it avoids turning the document into text and then parsing it again - MSV can work purely off of SAX events generated from the dom4j document.
Using this approach your code will actually be based on the JARV API which allows alternative validation mechanisms to be plugged into your code.
VisualAge for Java checks all dependencies in a JAR and displays warnings if there are any unresolved links. To avoid any warnings the following steps should be followed (thanks to Jan Haluza for this).
dom4j.jar xalan.jar PullParser.jar relaxng.jar msv.jar isorelax.jar xsdlib.jar crimson.jar
A common way around this is to implement a SAX EntityResolver to load the DTD from somewhere else. e.g. you could include the DTD in your JAR with your java code and load it from there.
EntityResolver resolver = new EntityResolver() { public InputSource resolveEntity(String publicId, String systemId) { if ( publicId.equals( "-//Acme//DTD Foo 1.2//EN" ) ) { InputStream in = getClass().getResourceAsStream( "com/acme/foo.dtd" ); return new InputSource( in ); } return null; } }; SAXReader reader = new SAXReader(); reader.setEntityResolver( resolver ); Document doc = reader.parse( "foo.xml" );
The Quick Start Guide will hopefully show you how to do the basic operations in dom4j.
One of the first things you'll probably want to do is to parse an XML document of some kind. This is easy to do in dom4j. The following code demonstrates how to this.
import java.net.URL; import org.dom4j.Document; import org.dom4j.DocumentException; import org.dom4j.io.SAXReader; public class Foo { public Document parse(URL url) throws DocumentException { SAXReader reader = new SAXReader(); Document document = reader.read(url); return document; } }
A document can be navigated using a variety of methods that return standard Java Iterators. For example
public void bar(Document document) throws DocumentException { Element root = document.getRootElement(); // iterate through child elements of root for ( Iterator i = root.elementIterator(); i.hasNext(); ) { Element element = (Element) i.next(); // do something } // iterate through child elements of root with element name "foo" for ( Iterator i = root.elementIterator( "foo" ); i.hasNext(); ) { Element foo = (Element) i.next(); // do something } // iterate through attributes of root for ( Iterator i = root.attributeIterator(); i.hasNext(); ) { Attribute attribute = (Attribute) i.next(); // do something } }
In dom4j XPath expressions can be evaluated on the Document or on any Node in the tree (such as Attribute, Element or ProcessingInstruction). This allows complex navigation throughout the document with a single line of code. For example.
public void bar(Document document) { List list = document.selectNodes( "//foo/bar" ); Node node = document.selectSingleNode( "//foo/bar/author" ); String name = node.valueOf( "@name" ); }
For example if you wish to find all the hypertext links in an XHTML document the following code would do the trick.
public void findLinks(Document document) throws DocumentException { List list = document.selectNodes( "//a/@href" ); for (Iterator iter = list.iterator(); iter.hasNext(); ) { Attribute attribute = (Attribute) iter.next(); String url = attribute.getValue(); } }
If you need any help learning the XPath language we highly recommend the Zvon tutorial which allows you to learn by example.
If you ever have to walk a large XML document tree then for performance we recommend you use the fast looping method which avoids the cost of creating an Iterator object for each loop. For example
public void treeWalk(Document document) { treeWalk( document.getRootElement() ); } public void treeWalk(Element element) { for ( int i = 0, size = element.nodeCount(); i < size; i++ ) { Node node = element.node(i); if ( node instanceof Element ) { treeWalk( (Element) node ); } else { // do something.... } } }
Often in dom4j you will need to create a new document from scratch. Here's an example of doing that.
import org.dom4j.Document; import org.dom4j.DocumentHelper; import org.dom4j.Element; public class Foo { public Document createDocument() { Document document = DocumentHelper.createDocument(); Element root = document.addElement( "root" ); Element author1 = root.addElement( "author" ) .addAttribute( "name", "James" ) .addAttribute( "location", "UK" ) .addText( "James Strachan" ); Element author2 = root.addElement( "author" ) .addAttribute( "name", "Bob" ) .addAttribute( "location", "US" ) .addText( "Bob McWhirter" ); return document; } }
A quick and easy way to write a Document (or any Node) to a Writer is via the write() method.
FileWriter out = new FileWriter( "foo.xml" ); document.write( out );
If you want to be able to change the format of the output, such as pretty printing or a compact format, or you want to be able to work with Writer objects or OutputStream objects as the destination, then you can use the XMLWriter class.
import org.dom4j.Document; import org.dom4j.io.OutputFormat; import org.dom4j.io.XMLWriter; public class Foo { public void write(Document document) throws IOException { // lets write to a file XMLWriter writer = new XMLWriter( new FileWriter( "output.xml" ) ); writer.write( document ); writer.close(); // Pretty print the document to System.out OutputFormat format = OutputFormat.createPrettyPrint(); writer = new XMLWriter( System.out, format ); writer.write( document ); // Compact format to System.out format = OutputFormat.createCompactFormat(); writer = new XMLWriter( System.out, format ); writer.write( document ); } }
If you have a reference to a Document or any other Node such as an Attribute or Element, you can turn it into the default XML text via the asXML() method.
Document document = ...; String text = document.asXML();
If you have some XML as a String you can parse it back into a Document again using the helper method DocumentHelper.parseText()
String text = "<person> <name>James</name> </person>"; Document document = DocumentHelper.parseText(text);
Applying XSLT on a Document is quite straightforward using the JAXP API from Sun. This allows you to work against any XSLT engine such as Xalan or SAXON. Here is an example of using JAXP to create a transformer and then applying it to a Document.
import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import org.dom4j.Document; import org.dom4j.io.DocumentResult; import org.dom4j.io.DocumentSource; public class Foo { public Document styleDocument( Document document, String stylesheet ) throws Exception { // load the transformer using JAXP TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer( new StreamSource( stylesheet ) ); // now lets style the given document DocumentSource source = new DocumentSource( document ); DocumentResult result = new DocumentResult(); transformer.transform( source, result ); // return the transformed document Document transformedDoc = result.getDocument(); return transformedDoc; } }
This page attempts to survey the landscape of available XML object models and compare and contrast their features. The information in this table is correct to the best of our knowledge and we will try and keep this information as up to date as possible. If you think there's anything wrong, please let us know here.
Feature | WC3 DOM | DOM4J 1.5 | JDOM 1.0 | XOM 1.O |
---|---|---|---|---|
Open Source | Yes | Yes | Yes | Yes |
Based on Java Interfaces | Yes | Yes | No | No |
Supports Java 2 Collections | No | Yes | Yes | No |
Can use any SAX parser and XMLFilter | Yes (usually) | Yes | Yes | Yes |
Convert to and from DOM trees | Yes | Yes | Yes | Yes |
Implements DOM interfaces | Yes | Yes (optional) | No | No |
Integrated XPath API support | No | Yes | No | No |
Bundled XPath implementation | No | Yes | Optional | No |
Support for JAXP/TrAX for XSLT integration | Yes | Yes | Yes | Yes |
Capable of processing a continuous XML streams | Don't know | Yes | No | Yes |
Capable of processing massive documents | Don't know | Yes | No | Yes |
XML Schema Data Type support | No | Yes | No | Don't know |
XInclude support | Don't know | No | No | Yes |
Canonical XML support | Don't know | No | No | Yes |
Dennis Sosnoski has published an interesting article on IBM's developerWorks which compares the performance of a variety of XML document models for the Java platform including dom4j. You can find the very interesting results here.
Also you might find these new Performance Benchmarks interesting comparing dom4j and Jaxen against Xerces and Xalan.
The current release can be downloaded at SourceForge
You can download an interim snapshot build from the Maven repository
You can browse the current CVS repository here
To learn more about CVS go here.
This project's SourceForge CVS repository can be checked out through anonymous (pserver) CVS with the following instruction set. When prompted for a password for anonymous, simply press the Enter key.
cvs -d:pserver:anonymous@cvs.dom4j.org:/cvsroot/dom4j login cvs -d:pserver:anonymous@cvs.dom4j.org:/cvsroot/dom4j co dom4j
Updates from within the module's directory do not need the -d parameter.
Only project developers can access the CVS tree via this method. SSH1 must be installed on your client machine. Substitute developername with the proper values. Enter your site password when prompted.
export CVS_RSH=ssh cvs -d:ext:developername@cvs.dom4j.org:/cvsroot/dom4j co dom4j
Martin Bhm, Jean-Jacques Dubray
Eigner Precision Lifecycle Management
We have created a simple test bed to evaluate the performance of DOM4J versus Xerces/Xalan. These results are intended to give a rough idea rather than exhaustive test suite. In particular we focus our study on XML document which look like database result set. It is pretty clear that performance results may vary greatly based on the topology of your XML.
The test was designed with two topologies in mind:
a) to have elements only and each element name is unique in the whole document.
<?xml version="1.0" encoding="UTF-8"?>
<ItemResultSet>
<Item>
<Attr0x0>123456789</Attr0x0>
<Attr1x0>123456789</Attr1x0>
<Attr2x0>123456789</Attr2x0>
<Attr3x0>123456789</Attr3x0>
<Attr4x0>123456789</Attr4x0>
<Attr5x0>123456789</Attr5x0>
<Attr6x0>123456789</Attr6x0>
<Attr7x0>123456789</Attr7x0>
<Attr8x0>123456789</Attr8x0>
<Attr9x0>123456789</Attr9x0>
<Attr10x0>123456789</Attr10x0>
<Attr11x0>123456789</Attr11x0>
<Attr12x0>123456789</Attr12x0>
<Attr13x0>123456789</Attr13x0>
...
</Item>
<Item>
<Attr0x1>123456789</Attr0x1>
<Attr1x1>123456789</Attr1x1>
<Attr2x1>123456789</Attr2x1>
...
</ItemResultSet>
b) To use attributes only
<?xml version="1.0" encoding="UTF-8"?>
<
ItemResultSet><Item guid="0" Attr0="123456789" Attr1="123456789" .../> <Item guid="1" Attr0="123456789" Attr1="123456789" .../>
</ItemResultSet>
We have tested for 1000,100,10,1 items the time it takes to:
a)
/*/*/Attr1x1
/*/*/Attr1x500
/*/*/Attr1x999
/*/*/Item
b)
/*/*[@id="1"]
/*/*[@id="500"]
/*/*[@id="999""]
All tests are running on my lapdog (PIII, 500MHz, 512Mb) We allocate a heap size of 256 Mb when we start the test.
All times in ms | |||||||
Create Document | Write Document to disk | Reparse the document from disk | |||||
Items | dom4j | xalan | dom4j | xalan | dom4j | xalan | |
1000 | 641.0 | 571.0 | 531 | 852 | 2020 | 2664 | |
100 | 9.0 | 20.0 | 60 | 61 | 62.99 | 68.6 | |
10 | 0.7 | 1.0 | 10 | 10 | 11.92 | 14.62 | |
1 | 0.1 | 0.0 | 10 | 10 | 8.01 | 8.31 | |
The most surprising result comes from executing XPath statements. Xalan does warn us in the JavaDoc that things could be a little slow.
selectSingleNode()
|
|
||||||||||||||||||||||||||||||||||||
selectNodes()
|
|
||||||||||||||||||||||||||||||||||||
selectNodes()
|
|||||||||||||||||||
All times in ms | |||||||
Create Document | Write Document to disk | Reparse the document from disk | |||||
Items | dom4j - elements | dom4j - attrs | dom4j - elements | dom4j - attrs | dom4j - elements | dom4j - attrs | |
1000 | 641.0 | 100 | 531 | 140 | 2020 | 207 | |
100 | 9.0 | 8.0 | 60 | 20 | 62.99 | 24 | |
10 | 0.7 | 0.9 | 10 | 10 | 11.92 | 8.31 | |
1 | 0.1 | 0.1 | 10 | 10 | 8.01 | 6.81 | |
The most surprising result comes from executing XPath statements. Xalan does warn us in the JavaDoc that things could be a little slow.
selectSingleNode()
|
|
||||||||||||||||||||||||||||||||||||
selectNodes()
|
|
||||||||||||||||||||||||||||||||||||
selectNodes()
|
|||||||||||||||||||
|
|||||||||||||||||||||||||
These number suggest one should use the XPathAPI class of Xalan with great caution, if at all
The syntax of Xpath statements must be chosen carefully. Contrary to some belief, and of the topology of our XML format, using /*/* or // was most efficient compared to the absolute path /ItemResultSet/Item
It appears more efficient to use selectNodes with Dom4j even if one needs a single node.
With DOM4J, it is about twice as fast when running XPath against a document which contains elements vs attributes.
In our case, we found that Dom4j is faster than Xalant for XSLT transformations. We do not claim this is a general result, but rather a datapoint
Here's the source code and data for these tests. Try them for yourself
PerfDOM4J.java |
PerfDOM4JAttr.java |
PerfW3C.java |
item.xslt |
w3c_100.xml |
dom4j-1.6.1/xdocs/benchmarks/xpath/PerfDOM4JAttr.java 0000644 0001750 0001750 00000024452 10242120011 021563 0 ustar ebourg ebourg import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.util.List; import javax.xml.transform.Source; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.stream.StreamSource; import org.dom4j.Document; import org.dom4j.DocumentHelper; import org.dom4j.Element; import org.dom4j.Node; import org.dom4j.XPath; import org.dom4j.io.DOMWriter; import org.dom4j.io.DocumentResult; import org.dom4j.io.DocumentSource; import org.dom4j.io.OutputFormat; import org.dom4j.io.SAXReader; import org.dom4j.io.XMLWriter; public class PerfDOM4JAttr { public static void main(String args[]) { Document doc; try { int numrec = 1; numrec = 10000; System.out.println("\n1000 Elements -------------------"); doc = PerfDOM4JAttr.createDocument(numrec, 20, 1); PerfDOM4JAttr.createW3CDOM(doc); PerfDOM4JAttr.write(doc, "DOM4JAttr_" + numrec + ".xml"); PerfDOM4JAttr.parse(numrec, 1); // PerfDOM4JAttr.transform(doc,"item.xslt",1); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"1\"]", 3); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"500\"]", 3); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"999\"]", 3); PerfDOM4JAttr .xpathNodes(doc, "/ItemResultSet/Item[@guid=\"1\"]", 3); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"500\"]", 3); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"999\"]", 3); PerfDOM4JAttr.xpathNodes(doc, "/*/Item", 100); numrec = 1000; System.out.println("\n1000 Elements -------------------"); doc = PerfDOM4JAttr.createDocument(numrec, 20, 1); PerfDOM4JAttr.createW3CDOM(doc); PerfDOM4JAttr.write(doc, "DOM4JAttr_" + numrec + ".xml"); PerfDOM4JAttr.parse(numrec, 3); PerfDOM4JAttr.transform(doc, "item.xslt", 3); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"1\"]", 3); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"500\"]", 3); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"999\"]", 3); PerfDOM4JAttr .xpathNodes(doc, "/ItemResultSet/Item[@guid=\"1\"]", 3); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"500\"]", 3); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"999\"]", 3); PerfDOM4JAttr.xpathNodes(doc, "/*/Item", 100); numrec = 100; System.out.println("\n100 Elements --------------------"); doc = PerfDOM4JAttr.createDocument(numrec, 20, 10); PerfDOM4JAttr.createW3CDOM(doc); PerfDOM4JAttr.write(doc, "DOM4JAttr_" + numrec + ".xml"); PerfDOM4JAttr.parse(numrec, 10); PerfDOM4JAttr.transform(doc, "item.xslt", 10); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"1\"]", 10); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"50\"]", 10); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"99\"]", 10); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"1\"]", 10); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"50\"]", 10); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"99\"]", 10); PerfDOM4JAttr.xpathNodes(doc, "/*/Item", 100); numrec = 10; System.out.println("\n10 Elements ---------------------"); doc = PerfDOM4JAttr.createDocument(numrec, 20, 100); PerfDOM4JAttr.createW3CDOM(doc); PerfDOM4JAttr.write(doc, "DOM4JAttr_" + numrec + ".xml"); PerfDOM4JAttr.parse(numrec, 100); PerfDOM4JAttr.transform(doc, "item.xslt", 10); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"1\"]", 100); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"5\"]", 100); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"9\"]", 100); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"1\"]", 100); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"5\"]", 100); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"9\"]", 100); PerfDOM4JAttr.xpathNodes(doc, "/*/Item", 100); numrec = 1; System.out.println("\n1 Element -----------------------"); doc = PerfDOM4JAttr.createDocument(numrec, 20, 100); PerfDOM4JAttr.createW3CDOM(doc); PerfDOM4JAttr.write(doc, "DOM4JAttr_" + numrec + ".xml"); PerfDOM4JAttr.parse(numrec, 100); PerfDOM4JAttr.transform(doc, "item.xslt", 10); PerfDOM4JAttr.xpath(doc, "/ItemResultSet/Item[@guid=\"1\"]", 100); PerfDOM4JAttr.xpathNodes(doc, "/ItemResultSet/Item[@guid=\"1\"]", 100); PerfDOM4JAttr.xpathNodes(doc, "/*/Item", 100); } catch (IOException ie) { ie.printStackTrace(); } } public static Document createDocument(int iNumRecs, int iNumFlds, int pp) { double start = System.currentTimeMillis(); Document document = null; for (int kk = 0; kk < pp; kk++) { document = DocumentHelper.createDocument(); Element root = document.addElement("ItemResultSet"); for (int ii = 0; ii < iNumRecs; ii++) { Element Record = root.addElement("Item").addAttribute("guid", "" + ii); for (int jj = 0; jj < iNumFlds; jj++) { Record.addAttribute("Attr" + jj, "123456789"); } } } double end = System.currentTimeMillis(); System.err.println("Creation time : " + (end - start) / pp); return document; } public static Document parse(int iNumRecs, int kk) { File file = new File("DOM4JAttr_" + iNumRecs + ".xml"); double start = System.currentTimeMillis(); Document document = null; for (int pp = 0; pp < kk; pp++) { try { SAXReader SAXrd = new SAXReader(); SAXrd.read(file); } catch (Exception e) { e.printStackTrace(); } } double end = System.currentTimeMillis(); System.err.println("Parsing time for :" + 1.000 * (end - start) / kk); return document; } public static void createW3CDOM(Document doc) { long start = System.currentTimeMillis(); try { DOMWriter dw = new DOMWriter(); dw.write(doc); } catch (Exception de) { } long end = System.currentTimeMillis(); System.err.println("W3C Creation time for :" + (end - start)); } public static void write(Document document, String name) throws IOException { long start = System.currentTimeMillis(); // lets write to a file try { OutputFormat format = OutputFormat.createPrettyPrint(); XMLWriter writer = new XMLWriter(new FileWriter(name), format); writer.write(document); writer.close(); } catch (IOException e) { e.printStackTrace(); } long end = System.currentTimeMillis(); System.err.println("DOM4JAttr File write time :" + (end - start) + " " + name); } public static void transform(Document xmlDoc, String xslFile, int kk) { System.err.println("DOM4JAttr start transform "); int ii = 1; try { TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer(new StreamSource( xslFile)); long start = System.currentTimeMillis(); for (ii = 0; ii < kk; ii++) { Source source = new DocumentSource(xmlDoc); DocumentResult result = new DocumentResult(); transformer.transform(source, result); // output the transformed document } long end = System.currentTimeMillis(); System.err.println("DOM4JAttr transform time :" + (end - start) / ii); } catch (Exception e) { e.printStackTrace(); } } public static void xpath(Document document, String xpathExp, int kk) { long start = System.currentTimeMillis(); XPath xpath = document.createXPath(xpathExp); for (int ii = 0; ii < kk; ii++) { Node node = xpath.selectSingleNode(document); if ((node != null) & (ii == 0)) { String val = node.getStringValue(); // System.out.println("xpath OK:"+val); } } long end = System.currentTimeMillis(); System.err.println("DOM4JAttr xpath time :" + (end - start) / kk); } public static void xpathNodes(Document document, String xpathExp, int kk) { long start = System.currentTimeMillis(); XPath xpath = document.createXPath(xpathExp); for (int ii = 0; ii < kk; ii++) { try { List nodeList = xpath.selectNodes(document); if ((nodeList != null) && (nodeList.size() > 0)) { Node node = (Node) nodeList.get(0); if ((node != null) & (ii == 0)) { String val = node.getStringValue(); // System.out.println("xpathNodes OK:"+val); } } } catch (Exception e) { e.printStackTrace(); } } long end = System.currentTimeMillis(); System.err.println("DOM4JAttr xpath Nodes time :" + 1.000 * (end - start) / kk); } } dom4j-1.6.1/xdocs/benchmarks/xpath/PerfW3C.java 0000644 0001750 0001750 00000020743 10242120012 020507 0 ustar ebourg ebourg import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMResult; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamSource; import org.apache.xerces.dom.DocumentImpl; import org.apache.xml.serialize.OutputFormat; import org.apache.xml.serialize.XMLSerializer; import org.apache.xpath.XPathAPI; import org.dom4j.io.SAXReader; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList; public class PerfW3C { public static void main(String args[]) { Document doc; System.err.println("W3C createDocument:"); int numrec; long start = 0; long end = 0; numrec = 1000; System.out.println("\n1000 Elements ---------------------------------"); doc = PerfW3C.createDocument(numrec, 20, 1); PerfW3C.write(doc, "w3c_" + numrec + ".xml"); PerfW3C.parse(numrec, 1); PerfW3C.transform(doc, "item.xslt", 1); PerfW3C.xpath(doc, "/*/*/Attr1x1", 1); PerfW3C.xpath(doc, "/*/*/Attr1x500", 1); PerfW3C.xpath(doc, "/*/*/Attr1x999", 1); PerfW3C.xpathNodes(doc, "/*/*/Attr1x1", 1); PerfW3C.xpathNodes(doc, "/*/*/Attr1x500", 1); PerfW3C.xpathNodes(doc, "/*/*/Attr1x999", 1); PerfW3C.xpathNodes(doc, "/*/Item", 1); numrec = 100; System.out.println("\n100 Elements ----------------------------------"); doc = PerfW3C.createDocument(numrec, 20, 1); PerfW3C.write(doc, "w3c_" + numrec + ".xml"); PerfW3C.transform(doc, "item.xslt", 10); PerfW3C.parse(numrec, 10); PerfW3C.xpath(doc, "/*/*/Attr0x1", 10); PerfW3C.xpath(doc, "/*/*/Attr0x50", 10); PerfW3C.xpath(doc, "/*/*/Attr0x99", 10); PerfW3C.xpathNodes(doc, "/*/*/Attr0x0", 10); PerfW3C.xpathNodes(doc, "/*/*/Attr1x50", 10); PerfW3C.xpathNodes(doc, "/*/*/Attr1x99", 10); PerfW3C.xpathNodes(doc, "/*/Item", 10); numrec = 10; System.out.println("\n10 Elements -----------------------------------"); doc = PerfW3C.createDocument(numrec, 20, 10); PerfW3C.write(doc, "w3c_" + numrec + ".xml"); PerfW3C.parse(numrec, 50); PerfW3C.transform(doc, "item.xslt", 10); PerfW3C.xpath(doc, "/*/*/Attr5", 100); PerfW3C.xpathNodes(doc, "/*/*/Attr1x5", 100); PerfW3C.xpathNodes(doc, "/*/Item", 100); numrec = 1; System.out.println("\n1 Elements ------------------------------------"); doc = PerfW3C.createDocument(numrec, 20, 10); PerfW3C.write(doc, "w3c_" + numrec + ".xml"); PerfW3C.parse(numrec, 100); PerfW3C.transform(doc, "item.xslt", 10); PerfW3C.xpath(doc, "/*/*/Attr1x0", 100); PerfW3C.xpathNodes(doc, "/*/*/Attr1x0", 100); PerfW3C.xpathNodes(doc, "/*/Item", 100); } public static Document createDocument(int iNumRecs, int iNumFlds, int pp) { double start = System.currentTimeMillis(); Document document = null; for (int kk = 0; kk < pp; kk++) { document = new DocumentImpl(); Element root = document.createElement("ItemResultSet"); // Create // Root // Element document.appendChild(root); for (int ii = 0; ii < iNumRecs; ii++) { Element Record = document.createElement("Item"); root.appendChild(Record); for (int jj = 0; jj < iNumFlds; jj++) { /* * AttrImpl a = * (AttrImpl)document.createAttribute("Attr"+jj); * a.setNodeValue("123456789"); Record.setAttributeNode(a); */ Element field = document.createElement("Attr" + jj + "x" + ii); field.appendChild(document.createTextNode("123456789")); Record.appendChild(field); } } } double end = System.currentTimeMillis(); System.err.println("Creation time :" + (end - start) / pp); return document; } public static void write(Document document, String name) { long start = System.currentTimeMillis(); // lets write to a file OutputFormat format = new OutputFormat(document); // Serialize DOM format.setIndent(2); format.setLineSeparator(System.getProperty("line.separator")); format.setLineWidth(80); try { FileWriter writer = new FileWriter(name); BufferedWriter buf = new BufferedWriter(writer); XMLSerializer FileSerial = new XMLSerializer(writer, format); FileSerial.asDOMSerializer(); // As a DOM Serializer FileSerial.serialize(document); } catch (IOException ioe) { ioe.printStackTrace(); } long end = System.currentTimeMillis(); System.err.println("W3C File write time :" + (end - start) + " " + name); } public static Document parse(int iNumRecs, int kk) { File file = new File("dom4j_" + iNumRecs + ".xml"); double start = System.currentTimeMillis(); Document document = null; for (int pp = 0; pp < kk; pp++) { try { SAXReader SAXrd = new SAXReader(); SAXrd.read(file); } catch (Exception e) { e.printStackTrace(); } } double end = System.currentTimeMillis(); // System.err.println("DOM4J createDocument:" + "Num Rec. = " + iNumRecs // + " Num. Fld.=" + iNumFlds); System.err.println("Parsing time for :" + iNumRecs + " " + (end - start) / kk); return document; } public static void transform(Document xmlDoc, String xslFile, int kk) { int ii = 1; try { TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer(new StreamSource( xslFile)); long start = System.currentTimeMillis(); for (ii = 0; ii < kk; ii++) { DOMSource source = new DOMSource(xmlDoc); DOMResult result = new DOMResult(); transformer.transform(source, result); } long end = System.currentTimeMillis(); System.err.println("W3C transform time :" + (end - start) / ii); } catch (Exception e) { e.printStackTrace(); } } public static void xpath(Document document, String xpathExp, int pp) { long start = System.currentTimeMillis(); for (int ii = 0; ii < pp; ii++) { try { Node node = XPathAPI.selectSingleNode(document, xpathExp); if ((node != null) & (ii == 0)) { String val = node.getNodeName(); // System.out.println(val); } } catch (Exception e) { e.printStackTrace(); } } long end = System.currentTimeMillis(); System.err.println("W3C xpath time :" + 1.000 * (end - start) / pp); } public static void xpathNodes(Document document, String xpathExp, int pp) { long start = System.currentTimeMillis(); for (int ii = 0; ii < pp; ii++) { try { NodeList nodeList = XPathAPI.selectNodeList(document, xpathExp); if ((nodeList != null) && (nodeList.getLength() > 0)) { Node node = nodeList.item(0); if ((node != null) & (ii == 0)) { String val = node.getNodeName(); // System.out.println(val); } } } catch (Exception e) { e.printStackTrace(); } } long end = System.currentTimeMillis(); System.err.println("W3C xpathNodes time :" + 1.000 * (end - start) / pp); } } dom4j-1.6.1/xdocs/goals.xml 0000644 0001750 0001750 00000005573 10242120011 015024 0 ustar ebourg ebourg
This document outlines our design goals for dom4j and our philosophy.
We think that an XML framework for Java should be simple, easy to use and intuitive for a Java programmer. We want to take the best features from DOM and SAX and put them together into a new unified API which is optimised the for the Java platform.
We want to fully support DOM and SAX together with existing Java platform standards such as the Java 2 Collections and J2EE.
We want complete XPath support integrated into the API and for it to be very easy to use. XPath is the ideal technology for navigating around XML documents simply and easily without writing lines and lines of code.
We want to be able to support very flexible, performant and memory efficient implementations of XML documents. So we want the API to be based on Java interfaces just like the Java 2 Collections framework.
Just as no single List implementation will suffice (the JDK comes with at least 3) we believe we need a framework allowing plug and play XML document implementations. For some users, using a LinkedList performs better than an ArrayList because their usage characteristics differ. Others like to use a Vector as it is synchronized. We believe an XML model should have the same flexibility.
One of the primary goals of dom4j is to be a flexible XML framework for Java which supports most users needs whether that be fast and efficient parsing with small memory overhead, processing very large documents or using the latest XML features such as XPath, XSLT and XML Query.
We found that we often needed to move from DOM to SAX to handle very large documents or to move from SAX to DOM to handle complex documents. Our aim is for dom4j to be the only framework you really need on the Java platform and for it to be a good citizen supporting and integrating with existing standards fully.
dom4j is an easy to use, open source library for working with XML, XPath and XSLT on the Java platform using the Java Collections Framework and with full support for DOM, SAX and JAXP.
You can download the current release or a snapshot build via the download page.
For a quick overview of dom4j and how to use it try the reading the quick start guide or browsing the FAQ or the online JavaDoc.
To see how dom4j compares to other XML object models you could try reading our comparison
Contributors are welcome to join this project. Once you've browsed the FAQ you could try sending an email to one of the mailing lists below or check out the project page.
We really like XPath and if you are working with XML we highly recommend you try to experiment with it. If you need help learning XPath try the excellent Zvon tutorial .
Alternatively you could try Elliotte Rusty Harold's great XML in a nutshell .
Finally you could try reading the XPath spec.
For developers there is an email list at dom4j-dev where you can make requests for new features or changes to the API, discuss alternate implementation strategies, submit patches or just talk about any relevant topic of the day.
For dom4j users wanting help using dom4j there is an email list at dom4j-user where you can share ideas and experiences, ask for help, give us feedback or discuss your requirements.
You can browse the archives for here dom4j-dev or here dom4j-dev and here dom4j-user or here dom4j-user
Redistribution and use of this software and associated documentation ("Software"), with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY METASTUFF, LTD. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL METASTUFF, LTD. OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Copyright 2001-2005 (C) MetaStuff, Ltd. All Rights Reserved.
XMLWriter
that was causing too many
new lines to be written to the resulting XML.
XPathException
or InvalidXPathException
is thrown.
SAXReader
allowing to specify the encoding
used when reading XML sources.
DocumentHelper.parseText(String)
method to make sure
that the XML encoding is always set (if known) on the returned Document
,
even if the used SAXParser doesn't provide a way to retrieve that encoding.
setXMLEncoding(String)
method to the Document
interface.
OutputFormat
field from AbstractBranch
.
This can cause problems if multiple threads are using the asXML()
method simultaniously.
OutputFormat
.
DefaultElement.setContent(List)
method that
caused incorrectly resetting the parent of the nodes in the list.
persistence
package and sub-package.
SAXEventRecorder
to accomodate sax events generated when
writing a DOMDocument
.
AbstractDocument.asXML()
when an encoding was
specified on the Document
.
DefaultNamespace.isReadOnly()
method now returns false
.
This fixes issues with cloning this Node
.
DocumentFactory
to create the instance untill the
first time it is needed.
Stylesheet
when an xpath expressions was used
to select the nodes.
SingletonStrategy
class for managing singletons. This
allows to use different strategies for singletons, like: one instance per
VM, one instance per thread, ... This change removed the usage of
ThreadLocal
s.
SAXEventRecorder
that can replay SAX events at a later
time. This provides an alternative serialization approach.
DOMDocument
.
Document.asXML()
which ignored the encoding
of the document.
NamespaceCache
to use WeakReference
s
to allow Namespace
objects to be garbage collected.
JAXBReader
to allow ElementHandlers to be notified
when the specified path is encountered, without having to unmarshall
XML content.
XMLWriter
where a NullPointerException
was thrown if trying to write a CData section containing null
content.
XMLWriter.characters(...)
where the escapeText
property of the writer was ignored.
Stylesheet.removeRule(Rule)
method which didn't remove the
Rule
but added it again.
BackedList
causing new elements to always
be added at the first position if the size of the list is 1.
getXMLEncoding()
method to org.dom4j.Document
which returns the encoding of the document.
DocumentHelper.parseText(String xml, String encoding)
method that was introduced in dom4j-1.5-beta2.
DOMWriter
.
ElementStack
and DispatchHandler
to check
if a handler is registered for a given path.
ElementStack
is now a public class.
SAXContentHandler.endElement(...)
can now throw SAXException
.
Attribute.getPath(Element context)
and
Attribute.getUniquePath(Element context)
.
Element.declaredNamespaces()
now only returns the namespaces that are
declared on that element. Element.additionalNamespaces()
now only returns namespaces that are declared on that element and is not the same as the
namespace of that element.
AbstractElement
causing Node.getPath(Element context)
to
return an absolute path, even if a the current element was the same as the context element. The
relative path "." is now returned.
Element
to retrieve all Namespaces
for a given URI.
DOMReader
causing namespace declarations to get lost in some situations.
booleanValueOf(Object node)
method to XPath
.
BeanElement
which prevented proper execution of the bean samples.
STAXEventWriter
now uses XMLEventConsumer
instead of XMLEventWriter
.
SAXReader
that caused problems parsing files in OSX.
XMLWriter
that caused whitespace to be added between successive
calls of the characters(...)
method. This is used particularly frequent in Apache Jelly.
NamespaceCache
in multithreaded environments.
OutputFormat
that supresses newline after XML declaration.
DocumentHelper
that allows user to specify encoding when parsing an xml String
.
BeanElement
.
SAXContentHandler
which caused a NullPointerException
in some situations.
XMLWriter
that caused duplication of the default namespace declaration.
DispatchHandler
which made the handler not reusable.
SAXContentHandler
that caused incorrect CDATA section parsing.
SAXContentHandler
that caused incorrect entity handling.
XMLWriter
causing padding to be disabled, even if enabled in the specified outputformat.
Document.asXML()
and DocumentHelper.parseText()
.
SAXReader
that caused problems resolving relative URIs when parsing java.io.File
Objects.
Element.elementIterator(...)
methods now support remove()
.
DOMWriter
writes now DOM Level 2 attributes and elements.
NodeComparator
.
XMLWriter
where namespace declarations were duplicated.
ProcessingInstruction
.
Stylesheet
modes.
DOMNodeHelper
issues.
DefaultElement
.