Basic use of crawler Requests

Keywords: Programming encoding Session Windows JSON

Basic use of Requests

install

pip install requests

I. Requests module request

Get web page (without parameters)

r = requests.get('http://www.chinahufei.com')
r = requests.post('http://www.chinahufei.com')
r = requests.delete('http://www.chinahufei.com')
r = requests.head('http://www.chinahufei.com')
r = requests.options('http://www.chinahufei.com')

Get web page (with parameters)

# get mode
r = requests.get("http://api.chinahufei.com", params = { 'page': 1 })
# post mode
r = requests.post('http://api.chinahufei.com', data = {'kwd':'hufei'})
# General mode
r = requests.request("get", "http://api.chinahufei.com/")
# Other
payload = {'page': '1', 'kwd': ['hufei', 'china']}
r = requests.get('http://api.chinahufei.com', params=payload)

Get web page (with header and UserAgent)

# get mode
kw = {'kwd':'The Great Wall'}
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
# params receives the query parameters of a dictionary or string. The dictionary type is automatically converted to url encoding, and urlencode() is not required
response = requests.get("http://api.chinahufei.com", params = kwd, headers = headers)

# post mode
formdata = {
    "type":"AUTO",
    "i":"i love python",
    "doctype":"json",
    "xmlVersion":"1.8",
    "keyfrom":"fanyi.web",
    "ue":"UTF-8",
    "action":"FY_BY_ENTER",
    "typoResult":"true"
}
url = "http://api.chinahufei.com"
headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
r = requests.post(url, data = formdata, headers = headers)

Get web page (use proxy)

import requests
# Select different agents according to the protocol type
proxies = {
  "http": "http://12.34.56.79:9527",
  "https": "http://12.34.56.79:9527"
}

response = requests.get("http://api.chinahufei.com", proxies = proxies)
print response.text

# Private agent verification
import requests
	# If the proxy needs to use HTTP Basic Auth, you can use the following format:
proxy = { "http": "mr_mao_hacker:sffqry9r@61.158.163.130:16816" }
response = requests.get("http://api.chinahufei.com", proxies = proxy)
print response.text
# Web client authentication
import requests
auth=('test', '123456')
response = requests.get('http://192.168.199.107', auth = auth)
print response.text

Get web page (for redirection)

# Not allow
r = requests.head('http://github.com', allow_redirects=False)

HTTPS requests SSL certificate validation

# If we want to skip the certificate validation of 12306, set verify to False to request normally.
r = requests.get("https://www.12306.cn/mormhweb/", verify = False)

II. Request module response

Response content - text (data in Unicode format)
Response content (byte stream data)
Response content - JSON (data of JSON type)
url address - url (full address)
Response code - status? Code
Response headers()
Response header character encoding encoding
Cookies-cookies

import requests
response = requests.get("http://www.baidu.com/")
# Return CookieJar object
cookiejar = response.cookies
# Turning CookieJar into a dictionary
cookiedict = requests.utils.dict_from_cookiejar(cookiejar)
print cookiejar
print cookiedict

Sission-session

# Renren simulated Login
import requests
# 1. Create session object to save Cookie value
ssion = requests.session()
# 2. Processing headers
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
# 3. User name and password to log in
data = {"email":"mr_mao_hacker@163.com", "password":"alarmchime"}  
# 4. Send the request with user name and password, and obtain the Cookie value after login, and save it in the mission
ssion.post("http://www.renren.com/PLogin.do", data = data)
# 5. The session contains the Cookie value of the user after login. You can directly access the pages that can be accessed only after login
response = ssion.get("http://www.renren.com/410043129/profile")
# 6. Print response content
print response.text

Response history

Three solutions to the encoding and decoding problem of Request module

response.content.decode()
response.content.decode('gbk')
response.text

Posted by wefollow on Wed, 04 Dec 2019 15:37:24 -0800

Programmer Group

Basic use of crawler Requests

Basic use of Requests

install

I. Requests module request

II. Request module response

Three solutions to the encoding and decoding problem of Request module

Hot Keywords