1 Concept
There are many xml parser libraries in Java, and SAX and DOM parsers are invariable.
1.1 DOM Analysis
The full name of DOM is Document Object Model, or Document Object Model. In an application program, an XML parser based on DOM converts an XML document into a set of object models (usually called DOM trees). It is through the operation of this object model that the application program realizes the operation of XML document data. With the DOM interface, an application can access any part of the data in an XML document at any time. Therefore, this mechanism using the DOM interface is also called random access mechanism.
DOM interface provides a way to access the information of XML documents through hierarchical object models, which form a node tree according to the document structure of XML. Regardless of the type of information described in an XML document, even tabulating data, item lists, or a document, the model generated by DOM is in the form of a node tree. In other words, DOM forces the use of tree models to access information in XML documents. Since XML is essentially a hierarchical structure, this description method is quite effective.
The random access provided by DOM tree brings great flexibility to the development of application program, which can control the content of the whole XML document arbitrarily. However, because the DOM parser converts the entire XML document into a DOM tree and places it in memory, the memory requirement is higher when the document is large or the structure is complex. Moreover, traversing trees with complex structures is also a time-consuming operation. Therefore, DOM analyzer requires high performance of the machine, and its implementation efficiency is not very ideal. However, because the idea of tree structure adopted by DOM parser is consistent with the structure of XML document, and because of the convenience of random access, DOM parser is still of great use value.
1.2 SAX
SAX adopts the event model. SAX is the abbreviation of Simple API for XML. The advantages of this kind of processing are very similar to those of streaming media. Analysis can begin immediately, rather than waiting for all data to be processed. Moreover, since the application only checks the data when it reads it, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application does not even need to parse the entire document; it can stop parsing when a condition is met. Generally speaking, SAX is much faster than its replacement DOM.
1.3 Pull Analysis
Pull parsing is similar to Sax parsing, both of which are lightweight parsing. Pull has been embedded in Android's kernel, so we don't need to add third-party jar packages to support Pull.
2 difference
2.1 DOM Analysis
The parser reads the entire document, builds a tree structure in memory, and then the code can use the DOM interface to manipulate the tree structure.
(1) Advantages: the tree structure is formed, which helps to better understand and master, and the code is easy to write; the whole document tree is in memory, easy to modify.
(2) Disadvantage: Because the file is read once, it consumes a lot of memory; if the XML file is large, it is easy to affect parsing performance and may cause memory overflow.
(3) Use occasions: Once the document is parsed, the data needs to be accessed many times; hardware resources are sufficient (memory, CPU)
2.2 SAX
Event-driven. When the parser finds elements start, end, text, document start or end, send events; write code to respond to these events and save data. Note: The SAX parser does not create any objects.
(1) Advantage: It does not need to import the whole document beforehand, and it consumes less memory. ;
(2) Disadvantage: not durable; after the event, if the data is not saved, then the data will be lost; statelessness; from the event can only get text, but do not know which element the text belongs to;
(3) Use occasions: only a small amount of content of the XML document, rarely access back; one-time reading; less machine memory.
3 Use examples
3.1 preparation
(1) Xml file content
<?xml version="1.0" encoding="UTF-8"?>
<books>
<book id="12">
<name>thinking in java</name>
<price>85.5</price>
</book>
<book id="15">
<name>Spring in Action</name>
<price>39.0</price>
</book>
</books>
(2) Book.java is mainly used to assemble data.
public class Book {
private String id;
private String num;
private String name;
private String price;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getNum() {
return num;
}
public void setNum(String num) {
this.num = num;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getPrice() {
return price;
}
public void setPrice(String price) {
this.price = price;
}
@Override
public String toString() {
return "Book{" +
"id='" + id + '\'' +
", num='" + num + '\'' +
", name='" + name + '\'' +
", price='" + price + '\'' +
'}';
}
}
3.2 Dom Analysis
If we call its getChildNodes() method when we get the node book, which is the place shown in Figure 1, guess how many of its children are there? It does not include its grandchild node, except thinking in java, because it is a grandchild node. It has five sub-nodes, as shown in Figures 2, 3, 4, 5 and 6. So when parsing, be careful not to overlook the blanks.
// Parsing book.xmlfile DomParseService.java
public class DomParseService {
public List<Book> getBooks(InputStream inputStream) throws Exception{
List<Book> list = new ArrayList<Book>();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(inputStream);
Element element = document.getDocumentElement();
NodeList bookNodes = element.getElementsByTagName("book");
for(int i=0;i<bookNodes.getLength();i++){
Element bookElement = (Element) bookNodes.item(i);
Book book = new Book();
book.setId(Integer.parseInt(bookElement.getAttribute("id")));
NodeList childNodes = bookElement.getChildNodes();
// System.out.println("*****"+childNodes.getLength());
for(int j=0;j<childNodes.getLength();j++){
if(childNodes.item(j).getNodeType()==Node.ELEMENT_NODE){
if("name".equals(childNodes.item(j).getNodeName())){
book.setName(childNodes.item(j).getFirstChild().getNodeValue());
}else if("price".equals(childNodes.item(j).getNodeName())){
book.setPrice(Float.parseFloat(childNodes.item(j).getFirstChild().getNodeValue()));
}
}
} //end for j
list.add(book);
} //end for i
return list;
}
}
// The test uses unit tests such as ParseTest.java
public class ParseTest extends TestCase{
public void testDom() throws Exception{
InputStream input = this.getClass().getClassLoader().getResourceAsStream("book.xml");
DomParseService dom = new DomParseService();
List<Book> books = dom.getBooks(input);
for(Book book : books){
System.out.println(book.toString());
}
}
}
3.3 Sax parsing
To parse the xml file step by step, before parsing the xml file, we need to know the types of nodes in the xml file, one is ElementNode, the other is TextNode. Among them, nodes like books and books belong to ElementNode, while thinking in java and 85.5 belong to TextNode. Note: When parsing with Sax, the most important thing to pay attention to is not to ignore the gaps between <nodes>.
public class SaxParseService extends DefaultHandler{
private List<Book> books = null;
private Book book = null;
private String preTag = null;//The function is to record the name of the last node at the time of parsing.
public List<Book> getBooks(InputStream xmlStream) throws Exception{
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
SaxParseService handler = new SaxParseService();
parser.parse(xmlStream, handler);
return handler.getBooks();
}
public List<Book> getBooks(){
return books;
}
@Override
public void startDocument() throws SAXException {
books = new ArrayList<Book>();
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if("book".equals(qName)){
book = new Book();
book.setId(Integer.parseInt(attributes.getValue(0)));
}
preTag = qName;//Assign the node name being resolved to preTag
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if("book".equals(qName)){
books.add(book);
book = null;
}
preTag = null;
/**When the parsing ends, it is empty. This is important, for example, when the position of Figure 3 ends, this method is called.
,If you don't set preTag to null here, according to the startElement(...) method, the value of preTag is still book, when the document is read to the graph sequentially.
When the location of </book> is marked in, the characters(char[] ch, int start, int length) method is executed, and the characters(....) side
The method preTag!=null will execute if judgment code, which will assign the null value to book, which is not what we want.*/
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
if(preTag!=null){
String content = new String(ch,start,length);
if("name".equals(preTag)){
book.setName(content);
}else if("price".equals(preTag)){
book.setPrice(Float.parseFloat(content));
}
}
}
}
// The test code is as follows:ParseTest
public class ParseTest extends TestCase{
public void testSAX() throws Throwable{
SaxParseService sax = new SaxParseService();
InputStream input = this.getClass().getClassLoader().getResourceAsStream("book.xml");
List<Book> books = sax.getBooks(input);
for(Book book : books){
System.out.println(book.toString());
}
}
}
3.4 Pull Analysis
(1) book.xml, saved under asserts
<?xml version="1.0" encoding="UTF-8"?> <!-- START_DOCUMENT,Document Start Label -->
<books> <!-- START_TAG,Start the label, the label name passes getName()read -->
<book id="book1" num="1"> <!-- Attribute name-id,adopt getAttributeName(int index)read -->
<name>thinking in java</name>
<price>85.5</price>
</book>
<book id="book2" num="2"> <!-- Attribute value-"15",adopt getAttributeValue(int index)or getAttributeValue(String namespace,String name)read -->
<name>Spring in Action</name>
<price>39.0</price>
</book>
<node>Text field</node> <!-- TEXT,Text label, through getText()read -->
</books> <!-- END_TAG,End tag -->
(2)MainActivity.java
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
List<Book> books = praseBook();
for(Book book : books){
Log.e("TAG", book.toString());
}
}
private List<Book> praseBook() {
List<Book> books = null;
Book book = null;
XmlPullParser parser = Xml.newPullParser();
InputStream input = null;
try {
input = getAssets().open("book.xml");
parser.setInput(input, "UTF-8");
} catch (IOException e) {
e.printStackTrace();
} catch (XmlPullParserException e) {
e.printStackTrace();
}
try {
//Generate the first event
int event = parser.getEventType();
while (event != XmlPullParser.END_DOCUMENT) {
switch (event) {
//Determine whether the current event is a document start event?
case XmlPullParser.START_DOCUMENT:
books = new ArrayList<>();
break;
//Determine whether the current event is a tag element start event?
case XmlPullParser.START_TAG:
//Determine whether the start tag element is a book element?
if ("book".equals(parser.getName())) {
book = new Book();
//Get the attribute value of the book tag
book.setId(parser.getAttributeValue(0));
book.setNum(parser.getAttributeValue(null,"num"));
// book.setNum(parser.getAttributeValue(1));
}
if (book != null) {
//Determine whether the start tag element is a name element?
if ("name".equals(parser.getName())) {
book.setName(parser.nextText());
//Determine whether the start tag element is a price element?
} else if ("price".equals(parser.getName())) {
book.setPrice(parser.nextText());
}
}
break;
case XmlPullParser.TEXT:
// if (!TextUtils.isEmpty(parser.getText())) {
// Log.e("TAG", "parser.getText():" + parser.getText());
// }
break;
//Determine whether the current event is a tag element end event?
case XmlPullParser.END_TAG:
//Determine whether the end tag element is a book element?
if ("book".equals(parser.getName()) && books != null) {
books.add(book);
book = null;
}
break;
default:
break;
}
//Enter the next element and trigger the corresponding event
event = parser.next();
}
} catch (XmlPullParserException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return books;
}
}
4 Reference Documents
Java's main parser for parsing xml: the choice of SAX and DOM (with a new method - Pull parsing)
XML parsing: four parsing methods of XML in Java