Requses request mode and basic usage of xpath parser

Keywords: Session xml JSON encoding


The urllib module in Python's standard library already contains most of the functions we normally use, but its API is not very good to use. Requests inherits all the features of urllib, and API is more convenient to use, which can simplify our code.
First install
pip3 install requests
It has two requests: get and post.
get sample code

url = ""
#Request header
headers = {
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
response = requests.get(url=url,headers=headers)

post sample code

  form_data = {

url = ''
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
response =,data=form_data,headers=headers)

Response result processing

#Get response results
response.text  #Page source code
response.status_code #Status code
response.headers #Response Head
response.request.headers)#Get the request header
response.content) #Get the binary data of the page
response.encoding = 'utf-8'   #You can set the encoding type
response.encoding  #Get the current encoding
#If you get a json string, you can call the json method to convert the json string to a python data type

File upload

url = ''
#Read local files
files = {'file':open('page.html','r',encoding='gbk')}
response =,files=files,headers=headers)
if response.status_code == 200:
    print('Successful file upload')

Setting up agents

# url = ''
url = ''
headers = {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
#proxy can correspond to a dictionary, list, or tuple.
proxy = {
	#Setting up Network Agent
    'https': '',
    'http': '',
response = requests.get(url=url,headers=headers,proxies=proxy)

if response.status_code == 200:
    print('Successful request')

Session session retention

#session: In requests, we often need to keep in touch with the requests.
#At this point, we need to use session.
import requests

#Instantiate session() object
session = requests.session()

headers = {
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
response = session.get('',headers=headers)

print(session.cookies) #session.cookies saves the cookies information returned by the server

#When session.cookies have user information, we use session.get() or
#When a request is initiated, cookies and other information are automatically brought along.
response = session.get('',headers=headers)

# Normally, we will use request.session to simulate login and complete login first.
# Then use session to initiate the request

Handling HTTPS requests (ignoring) SSL certificate validation

#If a certificate authentication ssl ca certificate error occurs
#Very: Defaults to `True', default to true, indicating certificate authentication
#If there is a certificate authentication ssl ca certificate error, modify verify to False, indicating that certificate authentication is ignored
response = requests.get(url=url,headers=headers,verify=False)

Request related parameters

# Request parameter analysis
 param method: Set request mode get, post, delete
 param url: target URL
 param params: After get requests url address? Parameters of back stitching
 Param data: (optional) Dictionary, parameter of post request.
param headers: (optional) Dictionary sets the request header.
param cookies: (optional) Dict or CookieJar object sets the user's cookies information.
param files: (optional) Dictionary file upload (post).
Param auth: Auth authentication.
param timeout: Sets the timeout time for requests
 param allow_redirects: Sets whether redirection is allowed by default
 param proxies: (optional) Dictionary setup agent.
:param verify: (optional) Either a boolean,  Defaults to ``True``.
# Ignore certificate authentication and set to False

xpath parsing

Know what xpath is?

XPath is the XML Path Language.
It is a language used to find information in XML documents. It can be used to search for elements and attributes in XML documents. It also applies to HTML.

As mentioned above, what's the ghost of xml?

So look here:
* XML is a markup language, very similar to HTML
 * XML is designed to transfer data, not display it.
* XML tags need to be defined by ourselves.

Okay, let's look at some common expression paths for returning to xpath after understanding xml

Expression describe
nodename Select all child nodes of this node
/ Selection from the root node
// Select the nodes in the document from the current node that matches the selection, regardless of their location
. Select the current node
... Select the parent of the current node
@ Select attributes

Don't be anxious to see how to use it.

from lxml import etree
html = etree.HTML(html) #Construct an Xpath parsing object and automatically correct HTML text
text() Get label text
@Attribute Name Gets Tag Attribute Value
//Attribute multi-value matching function contains(@class,'li')
ranks = etree_xpath.xpath('//div[@class="scores_List"]/dl') # Grouped by div
        for dl in ranks:#Traversal grouping
            school_info = {}
            school_info['url'] = dl.xpath('./dt/a[1]/@href')
            school_info['icon'] =dl.xpath('./dt/a[1]/img/@src')
            school_info['name'] = dl.xpath('./dt/strong/a/text()')

Posted by gplaurin on Mon, 09 Sep 2019 02:44:06 -0700