requests Library
Although the urlib module in Python's standard library already contains most of the functions we usually use, its API is not very good, and Requests is advertised as "HTTP for Humans", indicating that it is more concise and convenient to use.
Installation and documentation address:
pip is very convenient to install:
pip install requests
Chinese document: http://docs.python-requests.org/zh_CN/latest/index.html
github address: https://github.com/requests/requests
Send GET request:
-
The simplest way to send a get request is to call it through requests.get:
response = requests.get("http://www.baidu.com/")
-
Add headers and query parameters:
If you want to add headers, you can pass in the headers parameter to increase the headers information in the request header. If you want to pass parameters in a url, you can use the params parameter. The relevant example code is as follows:import requests kw = {'wd':'China'} headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"} # params receives the query parameters of a dictionary or string. The dictionary type is automatically converted to url encoding, and urlencode() is not required response = requests.get("http://www.baidu.com/s", params = kw, headers = headers) # View the response content. response.text returns data in Unicode format print(response.text) # View the response content, and the byte stream data returned by response.content print(response.content) # View full url address print(response.url) # View response header character encoding print(response.encoding) # View response code print(response.status_code)
Send POST request:
-
The most basic post request can use the post method:
response = requests.post("http://www.baidu.com/",data=data)
-
Incoming data:
At this time, you don't need to use urlencode to encode any more. Just pass in a dictionary. For example, the code of the data requesting the pull hook network:import requests url = "https://www.lagou.com/jobs/positionAjax.json?city=%E6%B7%B1%E5%9C%B3&needAddtionalResult=false&isSchoolJob=0" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36', 'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=' } data = { 'first': 'true', 'pn': 1, 'kd': 'python' } resp = requests.post(url,headers=headers,data=data) # If it is json data, you can call the json method directly print(resp.json())
Use agent:
Using requests to add a proxy is also very simple, as long as you pass the proxies parameter in the requested method (such as get or post). The sample code is as follows:
import requests
url = "http://httpbin.org/get"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36',
}
proxy = {
'http': '171.14.209.180:27829'
}
resp = requests.get(url,headers=headers,proxies=proxy)
with open('xx.html','w',encoding='utf-8') as fp:
fp.write(resp.text)
cookie:
If a cookie is included in a response, you can use the cookie property to get the returned cookie value:
import requests
url = "http://www.renren.com/PLogin.do"
data = {"email":"970138074@qq.com",'password':"pythonspider"}
resp = requests.get('http://www.baidu.com/')
print(resp.cookies)
print(resp.cookies.get_dict())
session:
Before using the urlib library, you can use opener to send multiple requests, and multiple requests can share cookies. If you want to share cookies with requests, you can use the session object provided by the requests library. Note that the session here is not the session in web development, this place is just a session object. Or to login Renren as an example, using requests to achieve. The sample code is as follows:
import requests
url = "http://www.renren.com/PLogin.do"
data = {"email":"970138074@qq.com",'password':"pythonspider"}
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"
}
# Sign in
session = requests.session()
session.post(url,data=data,headers=headers)
# Visit Dapeng personal Center
resp = session.get('http://www.renren.com/880151247/profile')
print(resp.text)
To process untrusted SSL certificates:
For those websites that have been trusted with SSL integers, such as https://www.baidu.com/, the normal response can be returned directly by using requests. The sample code is as follows:
resp = requests.get('http://www.12306.cn/mormhweb/',verify=False)
print(resp.content.decode('utf-8'))