XML: Principles and Performance Comparison of Four Parsers (dom,sax,jdom,dom4j)
_dom is one of the underlying interfaces for parsing xml (the other is sax). jdom and dom4j are more advanced packages based on the underlying api. DOM is universal, cross-language and cross-platform, while jdom and dom4j are oriented Java Language.
1.DOM
DOM is the official W3C standard for representing XML documents in a platform-and language-independent manner. DOM is a set of nodes or information fragments organized in a hierarchical structure. This hierarchy allows developers to find specific information in the tree. Analyzing this structure usually requires loading the entire document and constructing a hierarchy before doing other work. Because it is based on information hierarchy, DOM is considered to be tree-based or object-based. DOM and general tree-based processing have the following advantages:
- First, because the tree is persistent in memory, it can be modified so that applications can make changes to data and structure. It can also navigate up and down the tree at any time, rather than doing one-off processing like SAX. DOM is much simpler to use.
- On the other hand, for very large documents, parsing and loading the entire document can be slow and resource-intensive, so it is better to use other means to process such data. For example, event-based models, such as SAX.
2.SAX
_The way it processes documents is very similar to streaming media. Analysis can begin immediately, rather than waiting for all data to be loaded and processed. Moreover, since the application only checks the data when it reads it, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application does not even need to parse the entire document; it can stop parsing when a condition is met. Generally speaking, SAX is much faster than its replacement DOM.
3. DOM or SAX?
For developers who need to write their own code to process XML documents, choosing DOM or SAX parsing model is a very important design decision.
_DOM accesses XML documents by building tree structure, while SAX uses event model.
DOM parser transforms an XML document into a tree containing its contents and traverses the tree. The advantage of parsing the model with DOM is that it is easy to program. Developers only need to call the tree building instructions, and then use navigation APIs to access the required tree nodes to complete the task. It is easy to add and modify elements in the tree. However, due to the need to process the entire XML document when using DOM parser, the performance and memory requirements are relatively high, especially when encountering large XML files. Because of its traversal capability, DOM parsers are often used in services where XML documents need to change frequently.
SAX parser adopts event-based model. It can trigger a series of events when parsing XML documents. When a given tag is found, it can activate a callback method to tell the method that the specified tag has been found. SAX usually requires less memory because it lets developers decide what tags to process themselves. Especially when developers only need to process part of the document data, SAX's ability to expand is better reflected. However, encoding with SAX parsers is difficult, and it is difficult to access multiple different data in the same document at the same time.
4.JDOM
JDOM is designed to be a Java-specific document model that simplifies interaction with XML and is faster than using DOM. As the first Java-specific model, JDOM has been widely promoted and promoted. Consideration is being given to using "Java Specification Request JSR-102" as a "Java Standard Extension". JDOM development has been started since the beginning of 2000.
There are two main differences between JDOM and DOM.
- First, JDOM uses only concrete classes instead of interfaces. This simplifies the API in some ways, but also limits its flexibility.
- Second, the API makes extensive use of Collections classes, simplifying the use of Java developers who are already familiar with these classes.
_JDOM documents declare that their purpose is to "use 20% (or less) of effort to solve 80% (or more) Java/XML problems" (assumed to be 20% according to the learning curve).
JDOM is certainly useful for most Java/XML applications, and most developers find API s much easier to understand than DOM. JDOM also includes a fairly extensive review of program behavior to prevent users from doing anything meaningless in XML. However, it still requires users to fully understand XML in order to do something beyond the basic scope (or even understand errors in some cases). This may be more meaningful than learning about DOM or JDOM interfaces.
JDOM itself does not contain parsers. It usually uses the SAX2 parser to parse and validate the input XML document (although it can also use the previously constructed DOM representation as input). It contains converters to output JDOM representations into SAX2 event streams, DOM models, or XML text documents. JDOM is an open source code released under the Apache license variant.
5.DOM4J
Although DOM4J represents a completely independent development result, it was originally one of JDOM's. Intelligence Branch. It incorporates many functions beyond the basic XML document representation, including integrated
XPath support, XML Schema support, and event-based processing for large or streamed documents. It also provides the option to build document representations with parallel access through the DOM4J API and standard DOM interfaces. It has been under development since the second half of 2000.
To support all these functions, DOM4J uses interfaces and abstract basic class methods. DOM4J makes extensive use of Collections classes in APIs, but in many cases it also provides alternatives to allow better performance or more direct coding. The direct benefit is that although DOM4J pays for a more complex API, it provides much greater flexibility than JDOM.
_When adding flexibility, XPath integration, and large document processing goals, DOM4J's goals are the same as JDOM's: for Java developers'ease of use and intuitive operability. It is also committed to becoming a more complete solution than JDOM to achieve the goal of essentially dealing with all Java/XML issues. When accomplishing this goal, it places less emphasis on preventing incorrect application behavior than JDOM.
_DOM4J is a very, very good one. Java XML API has the characteristics of excellent performance, powerful function and extremely easy to use. At the same time, it is also an open source software. Now you can see more and more
Java software is using DOM4J to read and write XML, especially Sun's JAXM is using DOM4J.
6. Basic usage of four xml operation modes
xml file:
<?xml version="1.0" encoding="utf-8" ?>
< Result>
<VALUE>
<NO DATE="2005">A1</NO>
<ADDR>GZ</ADDR>
</VALUE>
<VALUE>
<NO DATE="2004">A2</NO>
<ADDR>XG</ADDR>
< /VALUE>
< /Result>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
1)DOM
import java.io.*;
import java.util.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
public class MyXMLReader{
public static void main(String arge[]){
long lasting =System.currentTimeMillis();
try{
File f=new File("data_10k.xml");
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance(); DocumentBuilder builder=factory.newDocumentBuilder();
Document doc = builder.parse(f);
NodeList nl = doc.getElementsByTagName("VALUE");
for (int i=0;i<nl.getLength();i++){
System.out.print("License plate:" +
doc.getElementsByTagName("NO").item(i).getFirstChild().getNodeValue()); System.out.println("Address of car owner:" +
doc.getElementsByTagName("ADDR").item(i).getFirstChild().getNodeValue()); }
}catch(Exception e){
e.printStackTrace();
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
2)SAX
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;
public class MyXMLReader extends DefaultHandler {
java.util.Stack tags = new java.util.Stack();
public MyXMLReader() {
super();
}
public static void main(String args[]) {
long lasting = System.currentTimeMillis();
try {
SAXParserFactory sf = SAXParserFactory.newInstance();
SAXParser sp = sf.newSAXParser();
MyXMLReader reader = new MyXMLReader();
sp.parse(new InputSource("data_10k.xml"), reader);
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Running time:" + (System.currentTimeMillis() - lasting) + "Millisecond");}
public void characters(char ch[], int start, int length) throws SAXException { String tag = (String) tags.peek();
if (tag.equals("NO")) {
System.out.print("License plate:" + new String(ch, start, length));
}
if (tag.equals("ADDR")) {
System.out.println("address:" + new String(ch, start, length));
}
}
public void startElement(String uri,String localName,String
qName,Attributes attrs) {
tags.push(qName);}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
3) JDOM
import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.*;
publicclassMyXMLReader{;publicstaticvoidmain(Str;longlasting=System.curre;try{;SAXBuilderbuilder=newSAX;Documentdoc=builder.buil;ListallChildren=foo.getC;for(inti=0;i<allChildren;S
________________________________________
public class MyXMLReader {
public static void main(String arge[]) {
long lasting = System.currentTimeMillis();
try {
SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(new File("data_10k.xml")); Element foo = doc.getRootElement();
List allChildren = foo.getChildren();
for(int i=0;i<allChildren.size();i++) {
System.out.print("License plate:" +
((Element)allChildren.get(i)).getChild("NO").getText()); System.out.println("Address of car owner:" +
((Element)allChildren.get(i)).getChild("ADDR").getText()); }
} catch (Exception e) {
e.printStackTrace();
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
4)DOM4J
import java.io.*;
import java.util.*;
import org.dom4j.*;
import org.dom4j.io.*;
public class MyXMLReader {
public static void main(String arge[]) {
long lasting = System.currentTimeMillis();
try {
File f = new File("data_10k.xml");
SAXReader reader = new SAXReader();
Document doc = reader.read(f);
Element root = doc.getRootElement();
Element foo;
for (Iterator i = root.elementIterator("VALUE"); i.hasNext();)
{ foo = (Element) i.next();
System.out.print("License plate:" + foo.elementText("NO"));
System.out.println("Address of car owner:" + foo.elementText("ADDR")); }
} catch (Exception e) {
e.printStackTrace(); }
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
7. summary
Performance of_JDOM and DOM test When testing 10M documents, memory overflow occurred. It is also worth considering DOM and JDOM in the case of small documents. although
JDOM developers have stated that they expect to focus on performance issues before the official release, but from a performance point of view, it really has nothing to recommend. In addition, DOM is still a very good choice. DOM implementation is widely used in many programming languages. It is also the basis for many other XML-related standards, as it is formally recommended by the W3C (as opposed to a non-standard-based Java model), so it may also be needed in some types of projects (e.g. JavaScript Use in
DOM).
_SAX performs well, depending on its specific parsing method. A SAX detects the upcoming XML stream, but it is not loaded into memory (of course, when the XML stream is read in, some documents are temporarily hidden in memory).
DOM4J is undoubtedly the best. At present, DOM4J is widely used in many open source projects, such as the well-known ones. hibernate Read XML with DOM4J
Configuration file. If portability is not considered, DOM4J should be adopted.