Why and how to use XML

Keywords: xml Attribute Java encoding

XML: Extensible Markup Language

With the popularization of some technologies such as json, it seems that the path of XML is getting narrower and narrower. Although some functions of XML are replaced by other technologies, it is necessary to learn xml. If you use XML to store a lot of data, there are still some advantages. Even if you ignore these, there are many existing frameworks and technologies. Configuration files exist in xml, at least you need to have a good understanding of its structure and some basic usage.

(1) Basic overview

(1) Concept

XML: Extensible Markup Language: Extensible Markup Language

Markup: Using tags to operate, html is a common markup language

Extensible: You can customize tags, and even write tags Eg:<person></person<Zhangsan><Zhangsan> in Chinese.

(2) Use

xml is mainly used to store data, as a configuration file, or as a small database to transmit data in the network

A: Configuration files: for example, configuring mysql database

Previously, we often wrote a jdbc.properties file to make the configuration file. The advantage is that if you want to modify the database information, you don't need to modify the source code, just modify the configuration file, and xml can also be used as a configuration file.

url=jdbc:mysql://localhost:3306/db1
user=root
password=root99
driver=com.mysql.jdbc.Driver
<!--For example, analog configuration mysql The database is only hypothetical and corresponds to the previous knowledge. The actual configuration file will be very common later.-->
<?xml version="1.0" encoding="UTF-8"?>
<config>
	<dbinfo>
		<dbDriver>com.mysql.jdbc.Driver</dbDriver>
		<dbUrl>jdbc:mysql://localhost:3306/db1</dbUrl>
		<username>root</username>
		<password>root99</password>
</config>

B: Act as a small database

We can store some data in xml to act as a small database

<?xml version="1.0" encoding="UTF-8"?>
<student> 
	<stu> 
    	<id>001</id>  
    	<name>zhangsan</name>  
    	<age>20</age> 
    </stu>  
	<stu> 
    	<id>002</id>  
    	<name>lisi</name>  
    	<age>30</age> 
	</stu>  
</student>

C: Transfer data

In network programming, we have more or less contacted, such as how to realize a simple chat room, the basic principle is that a server, multiple clients, when Client 1 sends data, the server receives data, and the data is audited (whether there are illegal sensitive words) and grid. Type of processing, and then send the data to each client

At the beginning, we usually choose to use strings to transfer content directly, but it is not conducive to the later maintenance of the program, and using xml can be more friendly to the later maintenance of the program.

<?xml version="1.0" encoding="UTF-8"?>
<message id="1">
	<sender>Account No. 1</sender>
	<getter>Account 2</getter>
	<content>Messages sent</content>
    <ip>ip address</ip>
</message>

(2) xml grammar

The suffix of an xml document is. xml

(1) Document declaration

After creating an xm file, the first step is to have a document declaration (after the document declaration is written, the content of the xml file is written).

<?xml version="1.0" encoding="UTF-8"?>
  • Version: XML version, must be written

  • Enoding: Common encodings for xml encoding: GBK, UTF-8, ISO8859-1 (excluding Chinese)

    • Save-time encoding and set-up-time encoding need to be consistent, otherwise there will be confusion.
  • standalone: Do you need to rely on other files yes/no

(2) Definition of labels

Matters needing attention:

  1. Start and finish: < person > < / peoson >
  2. Reasonable nesting: < AA > < BB > < / BB > < / AA >
  3. Spaces and newlines are parsed as content, so maybe we need to pay attention to some indentation issues.

Name rule:

  1. xml code case-sensitive
  2. Names cannot begin with numbers or punctuation marks
  3. You can't start with xml, XML, Xml, etc.
  4. Cannot contain spaces and colons

(3) Definition of attributes

  1. There can be multiple attributes on a label <person id1="aaa" id2="bbb"></person>
  2. Attribute names and values are connected by = and attribute values are enclosed in quotation marks (both single and double quotation marks can be used)

(4) Notes

<?xml version="1.0" encoding="UTF-8"?>
<!-- xml Notes -->

Annotations cannot be nested, and cannot be placed on the first line, which must be declared as a document.

(5) Special characters

If you want to enter special characters in xml, you need to escape the characters, because < etc. will be treated as labels.

character Escape character describe
& & and
< < Less than sign
> > Greater than sign
" " Double quotation marks
' ' Single quotation mark

If multiple characters need to be escaped, they can be stored in CDATA.

<! [CDATA [Content]>

(7) PI Directive (Processing Directive)

Styles can be set in xml

<?xml-stylesheet type="text/css" href="css Path"?>

(3) xml constraints

Why do constraints need to be used? For example, we now define a student.xml file in which we want to save information about students, such as id, name, age, but if we randomly write a label such as <Hello> it is grammatically standard, but it obviously has nothing to do with what we want to store, so We need to use xml constraints to constrain the only elements in xml

Classification:

  • DTD: A relatively simple constraint technique
  • Schema: A relatively sophisticated constraint technology that can be understood

DTD constraints

(1) Introduction of DTD (three)

A: Using the internal dtd file, the constraint rules are defined in the xml document

<!DOCTYPE Root element name [
	<!ELEMENT person (name,age)>
	<!ELEMENT name (#PCDATA)>
	<!ELEMENT age (#PCDATA)>
]>

B: Introducing external dtd files

<! DOCTYPE root element name SYSTEM "dtd path">

C: Use external DTD files (dtd files on the network)

<! DOCTYPE root element PUBLIC "DTD name", "URL of DTD document">

For example, use the external dtd file used by the configuration file using the struts2 framework

<!DOCTYPE struts PUBLIC   "-//Apache Software Foundation//DTD
Struts Configuration 2.0//EN"    
"http://struts.apache.org/dtds/struts-2.0.dtd">

(2) Define elements using dtd

<! ELEMENT element name constraint >

A: Simple elements (no child elements)

ELEMENT name (#PCDATA)>
	(#PCDATA: Constraint name is a string type
	EMPTY : Elements are empty (no content)
		- <sex></sex>
	ANY:Arbitrarily

B: Complex elements

<! - Grammar - >
<!ELEMENT person (id,name,age,)>
	Subelements can only appear once
 <! ELEMENT element name (child element)>

<! - Number of occurrences of sub-elements - >
* More than once or more
 ? Zero or once
 * Zero or more

<! - Subelements are separated directly by commas - >
	Represents the order in which elements appear 

<! - Direct use of child elements | - >
	Represents that only one of the elements can appear

(2) Define attributes using dtd

<! - Grammar - >
<! ATTLIST element name
	Attribute Name Attribute Type Attribute Constraints
>

<! - Attribute Type - > CDATA: String
<!ATTLIST birthday
	ID1 CDATA #REQUIRED
>

<!--Enumeration-->
Represents that only one of the values can appear within a certain range, but only one of them can appear at a time, the traffic light effect
<!ATTLIST age
	ID2 (AA|BB|CC)  #REQUIRED
>

<! - ID: Values can only begin with letters or underscores - >
<!ATTLIST name 
	ID3 ID   #REQUIRED
>

<! - Constraints on attributes
 # REQUIRED: Attributes must exist
 # IMPLIED: Attributes are optional
 # FIXED: Represents a fixed value # FIXED "AAA"
	The value of the property must be the fixed value set
		<!ATTLIST sex
			ID4 CDATA #FIXED "ABC"
		>

Direct value
	Do not write attributes, use direct values
	Write the property and use the settings to set that value
		<!ATTLIST school
			ID5 CDATA "WWW"
		>

Schema constraints

Schema conforms to XML grammar. There can be more than one schema in an xml. Multiple schemas use namespaces to distinguish (similar to java package name) dtd from PCDATA type, but more data types can be supported in the schema.

Suffix name: xsd

Introduce:
Fill in the root elements of an xml document

Introduce the XSI prefix. xmlns: xsi= "http://www.w3.org/2001/XML Schema-instance"
	Represents that an xml file is a constraint file
	
Introduce XSD file namespace. xsi: schemaLocation = "http://www.bwh.cn/xml student.xsd"
	Using a schema constraint file, the constraint file is introduced directly from this address
	  Usually a url address is used to prevent renaming
	  
Declare a prefix for each xsd constraint as the identifier xmlns="http://www.bwh.cn/xml" 

(1) See how many elements are in xml

<element>

(2) Look at simple and complex elements

<element name="person">
<complexType>
<sequence>
	<element name="name" type="string"></element>
					<element name="age" type="int"></element>
</sequence>
</complexType>
</element>

(3) Introducing a constraint file into a constrained file

<person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.bwh.cn/20151111"
xsi:schemaLocation="http://www.bwh.cn/20151111 1.xsd">

			xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
				-- Express xml Is a constrained document
			xmlns="http://www.bwh.cn/20151111"
				-- It's in the constraint document. targetNamespace
			xsi:schemaLocation="http://www.bwh.cn/20151111 1.xsd">
				-- targetNamespace Address Path of Space Constrained Documents

Attributes that can be constrained

A: <sequence>: Represents the order in which elements appear
 B: <all>: Elements can only appear once
 C: <choice>: Only one of the elements can appear
 D: maxOccurs="unbounded": Represents the number of occurrences of elements
 E: <any></any>: Represents any element

Written in complex elements
 Write before </complexType>.

--
<attribute name="id1" type="int" use="required"></attribute>
	- name: attribute name
	- type: attribute type int stirng
	- use: Do attributes have to be required

(4) Analysis of xml

Simple Understanding parsing: There's an xml where you read and read the data you need

(1) parsing mode dom and sax

DOM: According to the hierarchical structure of xml, a tree structure is allocated in memory, which encapsulates the label, attribute and text of XML into objects and loads them into memory at one time.

  • Advantages: Easy to operate and easy to add, delete and modify.

  • Disadvantage: Occupy memory, there is a risk of memory overflow

SAX: Event-driven, read-while-parse, parse to an object, return the object name

  • Advantages: No memory
  • Disadvantage: Only read, not add, delete and change operations.

(2) Parser

To parse xml, we need to understand parsers, different companies and organizations, provide parsers for dom and sax, and provide parsers through api (focusing today on the two more commonly used)

  1. jaxp: sun's parsers for dom and sax are slightly inefficient
  2. dim4j: A very good parser, which is commonly used in practical development
  3. jdom: jdom organization provides parsers for dom and sax
  4. Jsoup: jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a very labor-saving API for extracting and manipulating data through DOM, CSS and jQuery-like operations.
  5. Pull: Android operating system built-in parser, sax mode

(3) Using dom4 to manipulate xml

Note: In all the java code below, because my code is written in Module, the path is named. If you create a project directly, just write src/s1.xml.

<?xml version="1.0" encoding="UTF-8"?>
<student>
    <stu id1="love">
        <name>zhangsan</name>
        <age>20</age>
    </stu>
    <stu>
        <name>lisi</name>
        <age>30</age>
    </stu>
</student>

Use dom4j to implement query xml operation

(1) Query the values in all name elements

package cn.ideal.xml.dom4j;

/*
   1,Create parsers
   2,Get the document
   3,Get the root node getRootElement() and return Element
   4,Get all the p1 Tags
      * elements("p1") Returns the list collection
      * Traverse the list to get each p1
   5,Get the name
      * Execute the element("name") method under p1 to return Element
   6,Get the value in name
      * getText Method to get the value
*/

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

import java.util.List;

public class TestDom4j1 {
    //Query the values of all name elements in xml
    public static void main(String[] args) throws DocumentException {
        //Create parsers
        SAXReader saxReader = new SAXReader();
        //Get the document
        Document document = saxReader.read("code-04_xml/src/s1.xml");
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get stu
        List<Element> list = rootElement.elements("stu");

        //Traversal list
        for (Element element : list) {
            //Element is every element
            //Get the value in name
            Element name1 = element.element("name");
            //Get the value in name
            String s = name1.getText();
            System.out.println(s);
        }
    }
}
//Operation results
zhangsan
lisi

(2) Query the value of the first name element

package cn.ideal.xml.dom4j;

/*
    1,Create parsers
    2,Get the document
    3,Get the root node
    4,Get the first stu element
        element("stu")Method returns Element
    5,Get the name element below p1
        element("name")Method returns Element
    6,Get the value in the name element
        getText Method
 */

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

public class TestDom4j2 {
    public static void main(String[] args) throws DocumentException {
        //Create parsers
        SAXReader saxReader = new SAXReader();
        //Get the document object
        Document document = saxReader.read("code-04_xml/src/s1.xml");
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get the first stu element
        Element stu = rootElement.element("stu");
        //Get the name element under stu
        Element name1 = stu.element("name");
        //Get the value of name
        String s1 = name1.getText();
        System.out.println(s1);
    }
}

//Operation results
zhangsan

(3) Get the value of the second name element

package cn.ideal.xml.dom4j;

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

import java.util.List;

/*
    1,Create parsers
    2,Get the document
    3,Get the root node
    4,Get all stu s
        Returns the list collection
    5,Traverse to get the second stu
        The list subscript is used to get the get method. The set subscript starts at 0. To get the second value, the list subscript is written as 1.
    6,Get the name below the second p1
        element("name")Method returns Element
    7,Get the value of name
        getText Method
 */
public class TestDom4j3 {
    public static void main(String[] args) throws DocumentException {
        //Create parsers
        SAXReader saxReader = new SAXReader();
        //Get the document
        Document document = saxReader.read("code-04_xml/src/s1.xml");
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get all stu s
        List<Element> list = rootElement.elements("stu");
        //Get the second stu
        Element stu2 = list.get(1);
        //Get the name under stu
        Element name2 = stu2.element("name");
        //Get the value in name
        String s2 = name2.getText();
        System.out.println(s2);
    }
}

Use dom4j to implement add operations

** (1) Add an element at the end of the first p1 tag **<sex>male</sex>

package cn.ideal.xml.dom4j;

import org.dom4j.Document;

import org.dom4j.Element;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;

import java.io.FileOutputStream;


/*
    1,Create parsers
    2,Get the document
    3,Get the root node

    4,Get the first p1
        Use element method
    5,Add elements below p1
        Return an Element directly on p1 using the addElement("tag name") method

    6,Add text below the element after the addition is complete
       Use the setText("text content") method directly on sex
    7,Write back xml
        Format OutputFormat and use the createPrettyPrint method to represent a beautiful format
        Using the class XMLWriter directly to the new class, pass two parameters
        The first parameter is the xml file path new FileOutputStream("path")
        The second parameter is the value of the formatted class
*/
public class TestDom4j4 {
    public static void main(String[] args) throws Exception {
        //Create parsers
        SAXReader saxReader = new SAXReader();
        //Get the document
        Document document = saxReader.read("code-04_xml/src/s1.xml");
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get the first stu element
        Element stu = rootElement.element("stu");
        //Add elements directly under stu
        Element sex1 = stu.addElement("sex");
        //Add text under sex
        sex1.setText("male");

        //Write back xml
        OutputFormat prettyPrint = OutputFormat.createPrettyPrint();//With indentation effect
        XMLWriter xmlWriter = new XMLWriter(new FileOutputStream("code-04_xml/src/s1.xml"), prettyPrint);
        xmlWriter.write(document);
    }
}

Write a tool class to simplify the operation

The operation of encapsulating method can omit the method of creating parser to get document and writing back xml, encapsulating the file path passed as a constant.

Benefits: It improves development speed and submits code maintainability

For example, if you want to change the file path (name), you only need to change the value of the constant at this time, and no other code needs to change.

package cn.ideal.xml.utils;

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;

import java.io.FileOutputStream;
import java.io.IOException;

public class Dom4jUtils {
    public static final String PATH = "code-04_xml/src/s1.xml";

    //Return to document
    public static Document getDocument(String path) {
        //Create parsers
        SAXReader saxReader = new SAXReader();
        //Get the document
        try {
            Document document = saxReader.read(path);
            return document;
        } catch (DocumentException e) {
            e.printStackTrace();
        }
        return null;
    }

    //Writing back xml
    public static void xmlWriters(String path, Document document) {
        try {
            OutputFormat prettyPrint = OutputFormat.createPrettyPrint();//With indentation effect
            XMLWriter xmlWriter = new XMLWriter(new FileOutputStream(path), prettyPrint);
            xmlWriter.write(document);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

The code we added above can be simplified to

package cn.ideal.xml.dom4j;

import cn.ideal.xml.utils.Dom4jUtils;
import org.dom4j.Document;
import org.dom4j.Element;


public class TestDom4j5 {
    public static void main(String[] args) throws Exception {

        Document document = Dom4jUtils.getDocument(Dom4jUtils.PATH);
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get the first stu element
        Element stu = rootElement.element("stu");
        //Add elements directly under stu
        Element sex1 = stu.addElement("sex");
        //Add text under sex
        sex1.setText("male");

        //Write back xml
        Dom4jUtils.xmlWriters(Dom4jUtils.PATH, document);
    }
}

** (2) Adding element ** to a specific location using dom4j

Add < ID > 001 </id > before the name tag under the first stu

package cn.ideal.xml.dom4j;

import cn.ideal.xml.utils.Dom4jUtils;
import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;

import java.util.List;

/*
    1,Create parsers
    2,Get the document
    3,Get the root node
    4,Get the first p1

    5,Get all the elements below p1
           ** elements()Method returns list collection

        ** Use the method in the list to add elements at specific locations
       ** First create the element to create text under the element
              - Create tags using the DocumentHelper class method createElement
              - Add text to the label using the setText("text content") method

         ** list add(int index, E element) in a collection
           - The first parameter is the position subscript, starting at 0.
           - The second parameter is the element to add.
      6,Write back xml
*/
public class TestDom4j6 {
    //Add < ID > 001 </id > before the name tag under the first stu
    public static void main(String[] args) {
        Document document = Dom4jUtils.getDocument(Dom4jUtils.PATH);
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get the first stu element
        Element stu = rootElement.element("stu");
        //Get all the elements under stu
        List<Element> list = stu.elements();
        //Create elements
        Element id = DocumentHelper.createElement("id");
        //Create text under id
        id.setText("001");
        //Adding at a specific location
        list.add(0, id);

        Dom4jUtils.xmlWriters(Dom4jUtils.PATH, document);
    }
}

** (3) Modify the operation of the node by using dom4j **

Modify the value of age element below the first p1 to 18

package cn.ideal.xml.dom4j;
/*
    1,Get the document
   	2,Get the root node, and then get the first p1 element
   	3,Get the age below the first p1
      element("")Method
   	4,The modified value is 30
       Using the setText("text content") method
   	5,Write back xml
*/

import cn.ideal.xml.utils.Dom4jUtils;
import org.dom4j.Document;
import org.dom4j.Element;

public class TestDom4j7 {
    public static void main(String[] args) {
        //Get the document
        Document document = Dom4jUtils.getDocument(Dom4jUtils.PATH);
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get the first stu element
        Element stu = rootElement.element("stu");
        //Get the age below the first stu
        Element age = stu.element("age");
        age.setText("18");
        //Write back xml
        Dom4jUtils.xmlWriters(Dom4jUtils.PATH, document);
    }
}

Use dom4j to delete nodes

package cn.ideal.xml.dom4j;

import cn.ideal.xml.utils.Dom4jUtils;
import org.dom4j.Document;
import org.dom4j.Element;

public class TestDom4j8 {
    public static void main(String[] args) {
        //Get the document
        Document document = Dom4jUtils.getDocument(Dom4jUtils.PATH);
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get the first stu element
        Element stu = rootElement.element("stu");
        //Get the age below the first stu
        Element id = stu.element("id");

        stu.remove(id);
        //Write back xml
        Dom4jUtils.xmlWriters(Dom4jUtils.PATH, document);
    }
}

Use dom4j to get attributes

package cn.ideal.xml.dom4j;

import cn.ideal.xml.utils.Dom4jUtils;
import org.dom4j.Document;
import org.dom4j.Element;

public class TestDom4j9 {
    public static void main(String[] args) {
        //Get the document
        Document document = Dom4jUtils.getDocument(Dom4jUtils.PATH);
        //Get the root node
        Element rootElement = document.getRootElement();
        //Get the first stu element
        Element stu = rootElement.element("stu");
        //Get the attribute values in stu
        String value = stu.attributeValue("id1");
        System.out.println(value);
    }
}

(4) Supporting xpath operations with dom4j

XPath is the XML Path Language. It is a language used to locate a part of an XML document.

By default, dom4j does not support xpath

If you want to use it, you need to introduce jar packages that support xpath, using jaxen-1.1-beta-6.jar

The first form
	/ AAA/CCC/BBB: A layer by layer, a BBB under CCC under AAA
 The second form
	// BBB: Represents the same name as BBB. It means that as long as the name is BBB, you get it.
The third form
	/* All elements
 The fourth form
	BBB[1]: Represents the first BBB element
	BBB[last()]: Represents the last BBB element
 The fifth form
	// BBB[@id]: Represents that as long as the BBB element has an ID attribute on it, it gets
 Sixth form
	// BBB[@id='b1'] indicates that the element name is BBB, and that there is an ID attribute on the BBB, and the attribute value of ID is b1.

There are two methods in dom4j to support xpath

//Getting multiple nodes
selectNodes("xpath Expression")

//Get a node
selectSingleNode("xpath Expression")

(1) Using xpath to query the values of all name elements in xml

package cn.ideal.xml.dom4j.xpath;

import cn.ideal.xml.utils.Dom4jUtils;
import org.dom4j.Document;
import org.dom4j.Node;

import java.util.List;

public class TestDom4jXpath1 {
    //Query the values of all name elements in xml
    public static void main(String[] args) {
        //Get the document
        Document document = Dom4jUtils.getDocument(Dom4jUtils.PATH);
        //Get all name elements
        List<Node> list = document.selectNodes("//name");
        //Traversing list sets
        for (Node node : list) {
            //node is each name element
            //Get the value in the name element
            String s = node.getText();
            System.out.println(s);
        }
    }
}

(2) Implement with xpath: Get the value of the name under the first stu

package cn.ideal.xml.dom4j.xpath;

import cn.ideal.xml.utils.Dom4jUtils;
import org.dom4j.Document;
import org.dom4j.Node;

public class TestDom4jXpath2 {
    public static void main(String[] args) {
        //Get the document
        Document document = Dom4jUtils.getDocument(Dom4jUtils.PATH);
        Node name1 = document.selectSingleNode("//stu[@id1='love']/name");
        //Get the value in name
        String s1 = name1.getText();
        System.out.println(s1);
    }
}

(4) Using Jsoup to manipulate xml

package cn.ideal.xml.jsoup;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.File;
import java.io.IOException;

public class JsoupDemo1 {
    public static void main(String[] args) throws IOException {
        //Getting Documnet Objects
        //Get the path of student.xml through the class loader
        String path = JsoupDemo1.class.getClassLoader().getResource("student.xml").getPath();
        //Parse xml document, load document into memory, get dom tree - > Document
        Document document = Jsoup.parse(new File(path), "utf-8");
        //Getting Element of Element Object
        Elements elements = document.getElementsByTag("name");

        //Get the first name
        Element element = elements.get(0);
        //get data
        String name = element.text();
        System.out.println(name);
    }
}

Interpretation of the above commonly used objects

1. Jsoup: Tool class: can parse html or xml documents and return to Document

parse:

//Parsing xml or html files
parse (File in, String charsetName)

//Parsing xml or html strings
parse (String html)

//Get the specified html or xml through the network path
parse (URL url, int timeoutMillis)

2. Document: Document object: represents the dom tree in memory

A: Get Element objects

//Get the unique element object based on the id attribute value
getElementById (String id)

//Get the collection of element objects based on the label name
getElementsByTag (String tagName)

//Get the collection of element objects based on attribute names
getElementsByAttribute (String key)

//Get the set of element objects according to the corresponding attribute name and attribute value
getElementsByAttributeValue (String key, String value)

3. Elements: A collection of Element objects. It can be approximated as ArrayList < Element >

A: Get Element objects, as in 2

B: Get attribute values

String attr(String key): Get attribute values based on attribute names

C: Getting text content

//Getting text content
String text()

//Get all the contents of the tag body
String html()

Two Faster Query Ways

Selector: selector

Elements select(String cssQuery)
//Specific grammar, see document format
<?xml version="1.0" encoding="UTF-8"?>
<student>
    <stu number="stu_001">
        <name id="ideal">zhangsan</name>
        <age>18</age>
    </stu>
    <stu number="stu_002">
        <name>lisi</name>
        <age>30</age>
    </stu>
</student>
package cn.ideal.xml.jsoup;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import java.io.File;
import java.io.IOException;

public class SelectorDemo {
    public static void main(String[] args) throws IOException {
        //Getting Documnet Objects
        //Get the path of student.xml through the class loader
        String path = JsoupDemo1.class.getClassLoader().getResource("student.xml").getPath();
        //Parse xml document, load document into memory, get dom tree - > Document
        Document document = Jsoup.parse(new File(path), "utf-8");

        //Query name tag
        Elements elements1 = document.select("name");
        System.out.println(elements1);

        System.out.println("--------------");

        //The query id value is stu_001
        Elements elements2 = document.select("#ideal");
        System.out.println(elements2);

        System.out.println("--------------");

        Elements elements3 = document.select("stu[number='stu_001']");
        System.out.println(elements3);

    }
}

//Operation results
<name id="ideal">
 zhangsan
</name>
<name>
 lisi
</name>
--------------
<name id="ideal">
 zhangsan
</name>
--------------
<stu number="stu_001"> 
 <name id="ideal">
  zhangsan
 </name> 
 <age>
  18
 </age> 
</stu>

XPath

package cn.ideal.xml.jsoup;

import cn.wanghaomiao.xpath.model.JXDocument;
import cn.wanghaomiao.xpath.model.JXNode;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.io.File;
import java.util.List;

public class XpathDemo {
    public static void main(String[] args) throws Exception {
//Getting Documnet Objects
        //Get the path of student.xml through the class loader
        String path = JsoupDemo1.class.getClassLoader().getResource("student.xml").getPath();
        //Parse xml document, load document into memory, get dom tree - > Document
        Document document = Jsoup.parse(new File(path), "utf-8");

        //Create JXDocument objects
        JXDocument jxDocument = new JXDocument(document);

        //Query with xpath grammar
        List<JXNode> jxNodes = jxDocument.selN("//stu");
        for (JXNode jxNode : jxNodes) {
            System.out.println(jxNode);
        }
    }
}

//Operation results
<stu number="stu_001"> 
 <name id="ideal">
  zhangsan
 </name> 
 <age>
  18
 </age> 
</stu>
<stu number="stu_002"> 
 <name>
  lisi
 </name> 
 <age>
  30
 </age> 
</stu>
//Query the name tag under the stu tag
List<JXNode> jxNodes = jxDocument.selN("//student/name");

//name tag with id attribute and attribute value ideal under query stu tag
List<JXNode> jxNodes = jxDocument.selN("//stu/name[@id='ideal']");

Ending:

If there are any deficiencies or errors in the content, please leave a message for me, crab and crab! C

If you can help me, then pay attention to me! (All articles in this series will be updated at the first time on the Public Number.)

We don't know each other here, but we are all working hard for our dreams.

A public name that insists on pushing original Java technology: more than 20 years of ideal

Posted by Bullit on Fri, 16 Aug 2019 02:09:31 -0700