Basic use of crawler Requests

Keywords: Programming encoding Session Windows JSON

Basic use of Requests

install

  • pip install requests

I. Requests module request

  • Get web page (without parameters)
r = requests.get('http://www.chinahufei.com')
r = requests.post('http://www.chinahufei.com')
r = requests.delete('http://www.chinahufei.com')
r = requests.head('http://www.chinahufei.com')
r = requests.options('http://www.chinahufei.com')
  • Get web page (with parameters)
# get mode
r = requests.get("http://api.chinahufei.com", params = { 'page': 1 })
# post mode
r = requests.post('http://api.chinahufei.com', data = {'kwd':'hufei'})
# General mode
r = requests.request("get", "http://api.chinahufei.com/")
# Other
payload = {'page': '1', 'kwd': ['hufei', 'china']}
r = requests.get('http://api.chinahufei.com', params=payload)
  • Get web page (with header and UserAgent)
# get mode
kw = {'kwd':'The Great Wall'}
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
# params receives the query parameters of a dictionary or string. The dictionary type is automatically converted to url encoding, and urlencode() is not required
response = requests.get("http://api.chinahufei.com", params = kwd, headers = headers)

# post mode
formdata = {
    "type":"AUTO",
    "i":"i love python",
    "doctype":"json",
    "xmlVersion":"1.8",
    "keyfrom":"fanyi.web",
    "ue":"UTF-8",
    "action":"FY_BY_ENTER",
    "typoResult":"true"
}
url = "http://api.chinahufei.com"
headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
r = requests.post(url, data = formdata, headers = headers)
  • Get web page (use proxy)
import requests
# Select different agents according to the protocol type
proxies = {
  "http": "http://12.34.56.79:9527",
  "https": "http://12.34.56.79:9527"
}

response = requests.get("http://api.chinahufei.com", proxies = proxies)
print response.text

# Private agent verification
import requests
	# If the proxy needs to use HTTP Basic Auth, you can use the following format:
proxy = { "http": "mr_mao_hacker:sffqry9r@61.158.163.130:16816" }
response = requests.get("http://api.chinahufei.com", proxies = proxy)
print response.text
# Web client authentication
import requests
auth=('test', '123456')
response = requests.get('http://192.168.199.107', auth = auth)
print response.text
  • Get web page (for redirection)
# Not allow
r = requests.head('http://github.com', allow_redirects=False)
  • HTTPS requests SSL certificate validation
# If we want to skip the certificate validation of 12306, set verify to False to request normally.
r = requests.get("https://www.12306.cn/mormhweb/", verify = False)

II. Request module response

  • Response content - text (data in Unicode format)
  • Response content (byte stream data)
  • Response content - JSON (data of JSON type)
  • url address - url (full address)
  • Response code - status? Code
  • Response headers()
  • Response header character encoding encoding
  • Cookies-cookies
import requests
response = requests.get("http://www.baidu.com/")
# Return CookieJar object
cookiejar = response.cookies
# Turning CookieJar into a dictionary
cookiedict = requests.utils.dict_from_cookiejar(cookiejar)
print cookiejar
print cookiedict
  • Sission-session
# Renren simulated Login
import requests
# 1. Create session object to save Cookie value
ssion = requests.session()
# 2. Processing headers
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
# 3. User name and password to log in
data = {"email":"mr_mao_hacker@163.com", "password":"alarmchime"}  
# 4. Send the request with user name and password, and obtain the Cookie value after login, and save it in the mission
ssion.post("http://www.renren.com/PLogin.do", data = data)
# 5. The session contains the Cookie value of the user after login. You can directly access the pages that can be accessed only after login
response = ssion.get("http://www.renren.com/410043129/profile")
# 6. Print response content
print response.text
  • Response history

Three solutions to the encoding and decoding problem of Request module

  • response.content.decode()
  • response.content.decode('gbk')
  • response.text

Posted by wefollow on Wed, 04 Dec 2019 15:37:24 -0800