Python Network Request library Requests, mom will never worry about my network Requests again

Keywords: Python encoding JSON network

I. General Situation

Network request may be an important part of every language. In Python, although there is a native network request library such as urllib, some of its APIs are not very friendly to developers. So, the revolution of requests has begun. It not only humanizes the API, but also supports all features of urllib.

Requests supports HTTP connection retention and connection pool, cookie session retention, file upload, automatic response content encoding, international URL and POST data encoding. There's not much to say about the benefits. Let's have a try.

It seems that I forgot to say that the database of Requests is no longer powerful, and the bottom layer is also urllib.

Two. Preparation

  • Install requests Library
pip3 install requests

Three, use

1. Get request:
response = requests.get("http://www.baidu.com/")

See, right, you're right. It's a short line of code to initiate a GET request.
The returned response has many attributes. Let's take a look at:

response.text returns the decoded string,
Responses.content is returned in bytes (binary).
Response.status? Coderesponse status code
 response.request.headers request headers
 response.headers
 response.encoding = 'utf-8' can set encoding type
 response.encoding gets the current encoding
 The built-in JSON decoder of response.json() returns in the form of JSON, provided that the returned content is in the form of JSON, otherwise the parsing error will throw an exception
  • Add parameters and request headers
import requests

#parameter
kw = {'wd':'Beauty'}

#Request header
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"
}

# params receives a dictionary or string query parameter,
# Dictionary type is automatically converted to url encoding, urlencode() is not required
response = requests.get(
    "http://www.baidu.com/s?",
    params = kw, 
    headers = headers
)

# View the response content. response.text returns data in Unicode format
print (response.text)

# View the response content, and the byte stream data returned by response.content
print (respones.content)

# View full url address
print (response.url)

# View response header character encoding
print (response.encoding)

# View response code
print (response.status_code)

The returned content is as follows:

... too much to show

... too much to show

'http://www.baidu.com/s?wd=%E9%95%BF%E5%9F%8E'

'utf-8'

200

Be careful

  • When response.text is used, Requests will automatically decode the response content based on the text encoding of the HTTP response, and most Unicode character sets can be decoded seamlessly.
  • When response.content is used, the original binary byte stream of server response data is returned, which can be used to save binary files such as pictures.

For example:

import  requests
response = requests.get("http://www.sina.com")
print(response.request.headers)
print(response.content.decode('utf-8'))

Result:

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <title>Sina Homepage</title>
    <meta name="keywords" content="Sina,Sina network,SINA,sina,sina.com.cn,Sina Homepage,Gateway,information" />
    ...

If response.text is used, garbled code may appear, such as:

import requests
response = requests.get("http://www.sina.com")
print(response.text)

Result:

<!DOCTYPE html>
<!-- [ published at 2017-06-09 15:18:10 ] -->
<html>
<head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <title>新浪首页</title>
    <meta name="keywords" content="新浪,新浪ç½',SINA,sina,sina.com.cn,新浪首页,门户,资讯" />
    <meta name="description" content="新浪ç½'为å...¨çƒç"¨æˆ·24小时提供å...¨é¢åŠæ—¶çš„中文资讯,å†...容覆盖国å†...外突å'新闻事件、ä½"坛赛事、娱乐时尚、产业资讯、实ç"¨ä¿¡æ¯ç­‰ï¼Œè®¾æœ‰æ–°é—»ã€ä½"育、娱乐、财经、ç§'技、房产、汽车等30多个å†...容é¢'é",同时开设博客、视é¢'、论坛等自ç"±äº'动交流空间。" />
    <link rel="mask-icon" sizes="any" href="//www.sina.com.cn/favicon.svg" color="red">
  • Reason

When a response is received, requests guesses the encoding of the response, which is used to decode the response when you call the response.text method. Requests first checks whether the specified encoding method exists in the HTTP header. If not, it will use chardet.detect to try to guess the encoding method (with errors), so response. Content. Decode() is more recommended

2. Post request:

The functions of website login and registration are generally done through post request. Learning to initiate post request with Requests is also the first step to climb the website that needs to be logged in.

import requests

#Test request address
req_url = "https://httpbin.org/post"

#Form Data
formdata = {
    'username': 'test',
    'password': '123456',
}

#Add request header
req_header = {
    'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
}
#Initiate request
response = requests.post(
    req_url, 
    data = formdata, 
    headers = req_header
)

print (response.text)
# If it is a json file, it can be displayed directly
#print (response.json())

Result:

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "password": "123456",
    "username": "test"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "29",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
  },
  "json": null,
  "origin": "223.72.76.90, 223.72.76.90",
  "url": "https://httpbin.org/post"
}
  • Upload files
url = 'https://httpbin.org/post'
files = {'file': open('image.png', 'rb')}
response = requests.post(url, files=files)
print(response.text)

Result:

{
  "args": {},
  "data": "",
  "files": {
    "file": "....Too much content to show"
  },
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "27597",
    "Content-Type": "multipart/form-data; boundary=16afeeccbaec949570677ca271dc2d0c",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.21.0"
  },
  "json": null,
  "origin": "223.72.76.90, 223.72.76.90",
  "url": "https://httpbin.org/post"
}

Four, summary

The above is the basic Get request and Post request of Requests. The syntax is relatively simple, which is also the first step to learn how to crawl. Come on! Teenager year.

Posted by SEVIZ on Tue, 25 Feb 2020 02:05:02 -0800