Java Advanced Sax, Dom and Pull Parsing xml

Keywords: xml Java Attribute Android

1 Concept

There are many xml parser libraries in Java, and SAX and DOM parsers are invariable.

1.1 DOM Analysis

The full name of DOM is Document Object Model, or Document Object Model. In an application program, an XML parser based on DOM converts an XML document into a set of object models (usually called DOM trees). It is through the operation of this object model that the application program realizes the operation of XML document data. With the DOM interface, an application can access any part of the data in an XML document at any time. Therefore, this mechanism using the DOM interface is also called random access mechanism.
DOM interface provides a way to access the information of XML documents through hierarchical object models, which form a node tree according to the document structure of XML. Regardless of the type of information described in an XML document, even tabulating data, item lists, or a document, the model generated by DOM is in the form of a node tree. In other words, DOM forces the use of tree models to access information in XML documents. Since XML is essentially a hierarchical structure, this description method is quite effective.
The random access provided by DOM tree brings great flexibility to the development of application program, which can control the content of the whole XML document arbitrarily. However, because the DOM parser converts the entire XML document into a DOM tree and places it in memory, the memory requirement is higher when the document is large or the structure is complex. Moreover, traversing trees with complex structures is also a time-consuming operation. Therefore, DOM analyzer requires high performance of the machine, and its implementation efficiency is not very ideal. However, because the idea of tree structure adopted by DOM parser is consistent with the structure of XML document, and because of the convenience of random access, DOM parser is still of great use value.

1.2 SAX

SAX adopts the event model. SAX is the abbreviation of Simple API for XML. The advantages of this kind of processing are very similar to those of streaming media. Analysis can begin immediately, rather than waiting for all data to be processed. Moreover, since the application only checks the data when it reads it, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application does not even need to parse the entire document; it can stop parsing when a condition is met. Generally speaking, SAX is much faster than its replacement DOM.

1.3 Pull Analysis

Pull parsing is similar to Sax parsing, both of which are lightweight parsing. Pull has been embedded in Android's kernel, so we don't need to add third-party jar packages to support Pull.

2 difference

2.1 DOM Analysis

The parser reads the entire document, builds a tree structure in memory, and then the code can use the DOM interface to manipulate the tree structure.
(1) Advantages: the tree structure is formed, which helps to better understand and master, and the code is easy to write; the whole document tree is in memory, easy to modify.
(2) Disadvantage: Because the file is read once, it consumes a lot of memory; if the XML file is large, it is easy to affect parsing performance and may cause memory overflow.
(3) Use occasions: Once the document is parsed, the data needs to be accessed many times; hardware resources are sufficient (memory, CPU)

2.2 SAX

Event-driven. When the parser finds elements start, end, text, document start or end, send events; write code to respond to these events and save data. Note: The SAX parser does not create any objects.
(1) Advantage: It does not need to import the whole document beforehand, and it consumes less memory. ;
(2) Disadvantage: not durable; after the event, if the data is not saved, then the data will be lost; statelessness; from the event can only get text, but do not know which element the text belongs to;
(3) Use occasions: only a small amount of content of the XML document, rarely access back; one-time reading; less machine memory.

3 Use examples

3.1 preparation

(1) Xml file content

<?xml version="1.0" encoding="UTF-8"?>   
<books>   
    <book id="12">   
        <name>thinking in java</name>   
        <price>85.5</price>   
    </book>   
    <book id="15">   
        <name>Spring in Action</name>   
        <price>39.0</price>   
    </book>   
</books>

(2) Book.java is mainly used to assemble data.

   public class Book {
    private String id;
    private String num;
    private String name;
    private String price;

    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    public String getNum() {
        return num;
    }
    public void setNum(String num) {
        this.num = num;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getPrice() {
        return price;
    }
    public void setPrice(String price) {
        this.price = price;
    }
    @Override
    public String toString() {
        return "Book{" +
                "id='" + id + '\'' +
                ", num='" + num + '\'' +
                ", name='" + name + '\'' +
                ", price='" + price + '\'' +
                '}';
    }
}

3.2 Dom Analysis


If we call its getChildNodes() method when we get the node book, which is the place shown in Figure 1, guess how many of its children are there? It does not include its grandchild node, except thinking in java, because it is a grandchild node. It has five sub-nodes, as shown in Figures 2, 3, 4, 5 and 6. So when parsing, be careful not to overlook the blanks.

// Parsing book.xmlfile DomParseService.java
public class DomParseService { 
        public List<Book> getBooks(InputStream inputStream) throws Exception{ 
            List<Book> list = new ArrayList<Book>(); 
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 
            DocumentBuilder builder = factory.newDocumentBuilder(); 
            Document document = builder.parse(inputStream); 
            Element element = document.getDocumentElement(); 

            NodeList bookNodes = element.getElementsByTagName("book"); 
            for(int i=0;i<bookNodes.getLength();i++){ 
                Element bookElement = (Element) bookNodes.item(i); 
                Book book = new Book(); 
                book.setId(Integer.parseInt(bookElement.getAttribute("id"))); 
                NodeList childNodes = bookElement.getChildNodes(); 
    //          System.out.println("*****"+childNodes.getLength()); 
                for(int j=0;j<childNodes.getLength();j++){ 
                    if(childNodes.item(j).getNodeType()==Node.ELEMENT_NODE){ 
                        if("name".equals(childNodes.item(j).getNodeName())){ 
                            book.setName(childNodes.item(j).getFirstChild().getNodeValue()); 
                        }else if("price".equals(childNodes.item(j).getNodeName())){ 
                            book.setPrice(Float.parseFloat(childNodes.item(j).getFirstChild().getNodeValue())); 
                        } 
                    } 
                } //end for j 
                list.add(book); 
            } //end for i 
            return list; 
        } 
    } 
// The test uses unit tests such as ParseTest.java
public class ParseTest extends TestCase{ 
    public void testDom() throws Exception{ 
        InputStream input = this.getClass().getClassLoader().getResourceAsStream("book.xml"); 
        DomParseService dom = new DomParseService(); 
        List<Book> books = dom.getBooks(input); 
        for(Book book : books){ 
            System.out.println(book.toString()); 
        } 
    } 
}

3.3 Sax parsing

To parse the xml file step by step, before parsing the xml file, we need to know the types of nodes in the xml file, one is ElementNode, the other is TextNode. Among them, nodes like books and books belong to ElementNode, while thinking in java and 85.5 belong to TextNode. Note: When parsing with Sax, the most important thing to pay attention to is not to ignore the gaps between <nodes>.

    public class SaxParseService extends DefaultHandler{ 
        private List<Book> books = null; 
        private Book book = null; 
        private String preTag = null;//The function is to record the name of the last node at the time of parsing. 

        public List<Book> getBooks(InputStream xmlStream) throws Exception{ 
            SAXParserFactory factory = SAXParserFactory.newInstance(); 
            SAXParser parser = factory.newSAXParser(); 
            SaxParseService handler = new SaxParseService(); 
            parser.parse(xmlStream, handler); 
            return handler.getBooks(); 
        } 

        public List<Book> getBooks(){ 
            return books; 
        } 

        @Override 
        public void startDocument() throws SAXException { 
            books = new ArrayList<Book>(); 
        } 

        @Override 
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { 
            if("book".equals(qName)){ 
                book = new Book(); 
                book.setId(Integer.parseInt(attributes.getValue(0))); 
            } 
            preTag = qName;//Assign the node name being resolved to preTag 
        } 

        @Override 
        public void endElement(String uri, String localName, String qName) 
                throws SAXException { 
            if("book".equals(qName)){ 
                books.add(book); 
                book = null; 
            } 
            preTag = null;
            /**When the parsing ends, it is empty. This is important, for example, when the position of Figure 3 ends, this method is called. 
            ,If you don't set preTag to null here, according to the startElement(...) method, the value of preTag is still book, when the document is read to the graph sequentially. 
            When the location of </book> is marked in, the characters(char[] ch, int start, int length) method is executed, and the characters(....) side 
            The method preTag!=null will execute if judgment code, which will assign the null value to book, which is not what we want.*/ 
        } 

        @Override 
        public void characters(char[] ch, int start, int length) throws SAXException { 
            if(preTag!=null){ 
                String content = new String(ch,start,length); 
                if("name".equals(preTag)){ 
                    book.setName(content); 
                }else if("price".equals(preTag)){ 
                    book.setPrice(Float.parseFloat(content)); 
                } 
            } 
        } 
    } 
// The test code is as follows:ParseTest
public class ParseTest extends TestCase{   
    public void testSAX() throws Throwable{   
        SaxParseService sax = new SaxParseService();   
        InputStream input = this.getClass().getClassLoader().getResourceAsStream("book.xml");   
        List<Book> books = sax.getBooks(input);   
        for(Book book : books){   
            System.out.println(book.toString());   
        }   
    }   
}   

3.4 Pull Analysis

(1) book.xml, saved under asserts

<?xml version="1.0" encoding="UTF-8"?> <!-- START_DOCUMENT,Document Start Label -->
<books> <!-- START_TAG,Start the label, the label name passes getName()read -->
    <book id="book1" num="1">  <!-- Attribute name-id,adopt getAttributeName(int index)read -->
        <name>thinking in java</name>
        <price>85.5</price>
    </book>
    <book id="book2" num="2">  <!-- Attribute value-"15",adopt getAttributeValue(int index)or getAttributeValue(String namespace,String name)read -->
        <name>Spring in Action</name>
        <price>39.0</price>
    </book>
    <node>Text field</node> <!-- TEXT,Text label, through getText()read -->
</books> <!-- END_TAG,End tag -->

(2)MainActivity.java

public class MainActivity extends AppCompatActivity {
    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        List<Book> books = praseBook();
        for(Book book : books){
            Log.e("TAG", book.toString());
        }
    }

    private List<Book> praseBook() {
        List<Book> books = null;
        Book book = null;
        XmlPullParser parser = Xml.newPullParser();
        InputStream input = null;
        try {
            input = getAssets().open("book.xml");
            parser.setInput(input, "UTF-8");
        } catch (IOException e) {
            e.printStackTrace();
        } catch (XmlPullParserException e) {
            e.printStackTrace();
        }

        try {
            //Generate the first event
            int event = parser.getEventType();
            while (event != XmlPullParser.END_DOCUMENT) {
                switch (event) {
                    //Determine whether the current event is a document start event?
                    case XmlPullParser.START_DOCUMENT:
                        books = new ArrayList<>();
                        break;
                    //Determine whether the current event is a tag element start event?
                    case XmlPullParser.START_TAG:
                        //Determine whether the start tag element is a book element?
                        if ("book".equals(parser.getName())) {
                            book = new Book();
                            //Get the attribute value of the book tag
                            book.setId(parser.getAttributeValue(0));
                            book.setNum(parser.getAttributeValue(null,"num"));
//                            book.setNum(parser.getAttributeValue(1));
                        }
                        if (book != null) {
                            //Determine whether the start tag element is a name element?
                            if ("name".equals(parser.getName())) {
                                book.setName(parser.nextText());

                                //Determine whether the start tag element is a price element?
                            } else if ("price".equals(parser.getName())) {
                                book.setPrice(parser.nextText());
                            }
                        }
                        break;
                    case XmlPullParser.TEXT:
//                        if (!TextUtils.isEmpty(parser.getText())) {
//                            Log.e("TAG", "parser.getText():" + parser.getText());
//                        }
                        break;
                    //Determine whether the current event is a tag element end event?
                    case XmlPullParser.END_TAG:
                        //Determine whether the end tag element is a book element?
                        if ("book".equals(parser.getName()) && books != null) {
                            books.add(book);
                            book = null;
                        }
                        break;
                    default:
                        break;
                }

                //Enter the next element and trigger the corresponding event
                event = parser.next();
            }
        } catch (XmlPullParserException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return books;
    }
}

4 Reference Documents

Java's main parser for parsing xml: the choice of SAX and DOM (with a new method - Pull parsing)

XML parsing: four parsing methods of XML in Java

[Android Learning Notes] XmlResourceParser parses xml files

An Efficient Xml Parsing Method for android-pull

Posted by romic on Mon, 07 Jan 2019 13:00:10 -0800