1. requests module
1.1 introduction to requests
Requests is a powerful, simple and easy-to-use HTTP request library. Compared with the previously used urllib module, the api of requests module is more convenient. (the essence is to encapsulate urlib3)
You can use the pip install requests command to install, but it is easy to have network problems, so I found a domestic image source to speed up.
Then we find the image source of Douban
pip install Package name -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
Just change the package name, you can download the module quickly.
1.2 requests
There are many request methods, but we only talk about the two most commonly used: GET request and POST request.
1.2.1 GET request
The GET method is used to send a request to the target web address. The method returns a Response object, which is explained in the next section.
Parameters of GET method:
URL: required, specify the requested URL
params: dictionary type, which specifies request parameters. It is often used when sending GET requests
Example:
import requests url = 'http://www.httpbin.org/get' params = { 'key1':'value1', 'key2':'value2' } response = requests.get(url=url,params=params) print(response.text)
Result:
Headers: dictionary type, specifying request headers
Example:
import requests url = 'http://www.httpbin.org/headers' headers = { 'USER-AGENT':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' } response = requests.get(url=url,headers=headers) print(response.text)
Result:
proxies: dictionary type, specify the proxy to use
Example:
import requests url = 'http://www.httpbin.org/ip' proxies = { 'http':'113.116.127.164:8123', 'http':'113.116.127.164:80' } response = requests.get(url=url,proxies=proxies) print(response.text)
Result:
Cookie s: dictionary type, specifying cookies
Example:
import requests url = 'http://www.httpbin.org/cookies' cookies = { 'name1':'value1', 'name2':'value2' } response = requests.get(url=url,cookies=cookies) print(response.text)
Result:
auth: tuple type, specifying the account and password when logging in
Example:
import requests url = 'http://www.httpbin.org/basic-auth/user/password' auth = ('user','password') response = requests.get(url=url,auth=auth) print(response.text)
Result:
verify: Boolean type, which specifies whether certificate verification is required when requesting a website. The default value is True, which means certificate verification is required. If certificate verification is not desired, it needs to be set to False
import requests response = requests.get(url='https://www.httpbin.org/',verify=False)
Result:
But in this case, the Warning prompt will appear generally, because Python wants us to be able to use certificate validation.
If you do not want to see Warning information, you can use the following command to eliminate it:
import urllib3 urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
Timeout: Specifies the timeout time. If no response is received after the specified time, an exception will be thrown
1.2.2 POST request
The difference between a POST request and a GET request is that POST data does not appear in the address bar, and there is no upper limit on the size of the data.
So GET parameters and POST parameters can be used almost. Except params parameters, POST can use data parameters.
data: dictionary type, specifying form information, commonly used when sending POST requests
Example:
import requests url = 'http://www.httpbin.org/post' data = { 'key1':'value1', 'key2':'value2' } response = requests.post(url=url,data=data) print(response.text)
Result:
1.3 requests response
1.3.1 response attribute
After a GET or POST request is used, a response object will be received. The commonly used properties and methods are listed as follows:
response.url: return the URL of the requested website
Response.status "Code: return the status code of the response
response.encoding: return the encoding method of the response
response.cookies: return the Cookie information of the response
response.headers: return response headers
response.content: returns the response body of bytes type
response.text: returns the response body of str type, equivalent to response.content.decode('utf-8 ')
response.json(): returns the response body of dict type, equivalent to json.loads(response.text)
import requests response = requests.get('http://www.httpbin.org/get') print(type(response)) # <class 'requests.models.Response'> print(response.url) # Return to the URL # http://www.httpbin.org/get print(response.status_code) # Return the status code of the response # 200 print(response.encoding) # Return the encoding of the response # None print(response.cookies) # Return the Cookie information # <RequestsCookieJar[]> print(response.headers) # Return response header # {'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json', 'Date': 'Mon, 16 Dec 2019 03:16:22 GMT', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Server': 'nginx', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '189', 'Connection': 'keep-alive'} print(type(response.content))# Return bytes Response body of type # <class 'bytes'> print(type(response.text)) # Return str Response body of type # <class 'str'> print(type(response.json())) # Return dict Response body of type # <class 'dict'>
1.3.2 coding problems
#Coding problem import requests response=requests.get('http://www.autohome.com/news/') # response.encoding='gbk' #The content of the page returned by home of cars website is gb2312 encoded, while the default encoding of requests is ISO-8859-1. If it is not set to gbk, the Chinese code will be garbled print(response.text)