Python 3 requests module usage example

Keywords: Python JSON encoding network

Requests automatically send HTTP/1.1 requests through urlib3. It can easily implement cookies, login verification, proxy settings and other operations.

Python's built-in urllib module is used to access network resources. However, it is troublesome to use and lacks many practical advanced functions.
A better solution is to use requests. It is a Python third-party library, which is very convenient to handle URL resources

requests implementation content:

Keep alive and connect pool Support international domain name and website Session and Cookie persistence Browser SSL authentication
Automatic content decoding Basic / digest authentication Automatic decompression Unicode response body
HTTP(s) proxy support Multipart file upload Streaming Downloads connection timed out
Chunked request . netrc support    
       

Requests are not installed by default in python. If pip install requests is not installed, it is. request dependency package relationship

requests==2.19.1
  - certifi [required: >=2017.4.17, installed: 2018.4.16]  #CA authentication module
  - chardet [required: <3.1.0,>=3.0.2, installed: 3.0.4]  #Universal character encoding detector module
  - idna [required: <2.8,>=2.5, installed: 2.7]  #International domain name resolution module
  - urllib3 [required: <1.24,>=1.21.1, installed: 1.23] #Thread safe HTTP Library

Here is an example of using modules with requests

requests.request(method,url,**kwargs): construct and send a request and return a response object
 Parameters:

Method: method of request object (POST)
URL: the URL of the request object
 params: optional dictionary or byte request to be sent in the query string
 data: optional, the dictionary or metalist is encoded in the form, and the byte or similar object is sent in the body [(key,value)]
json: optional, a json serializable python object that sends a request in the body
 headers: optional for writing http header information
 Cookies: optional, send cookies with dict or cookieJar object
 File: optional. The dictionary used for multipart code upload can be a multipart ancestor. It is a string defining the content type of a given file and a class dictionary object containing the additional header file added by the query file
 auth: optional, authentication ancestor, custom http authentication
 Timeout: optional. The timeout (float/tuple) for sending waiting request data. If it is set to Yuanzu, it means the timeout for level practice connect and read reading. If it is set to None, it means the permanent wait
 Allow "redirects: Boolean, optional, enable or disable GET,OPTIONS,POST,PUT,PATCH,DELETE,HEAD redirection, default is true
 proxies: optional, dictionary mapping protocol to proxy URL
 verify: optional. It can be a Boolean value. You can specify the TLS certificate path of the validation server. The default value is true
 stream: optional, if False, the response content will be downloaded immediately
 CERT: optional. If it is string, it is the ssl client certificate file path. If it is a meta ancestor, then ('cert','key ') specifies the certificate and key
# The Response object contains the Response information of the server to the HTTP request 
import requests
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36'}
r=requests.get('http://docs.python-requests.org/en/master/',headers=headers)
print('chardet Code provided:',r.apparent_encoding)
print('Response byte content:',r.content)
print('response cookies:',r.cookies.items())
print('Time between request and response:',r.elapsed)
print('Response coding:',r.encoding)
print('Response header information:',r.headers)
print('In header information server:',r.headers['Server'])
print('Request history:',r.history)
print('Iterative response data:',r.iter_lines())
#print('response JSON encoding data: ', r.json())
print('Return the resolved response header link:',r.links)
print('Return status code:',r.status_code)
print('response str content:',r.text)
print('response URL:',r.url)
print('Return sent header parameters:',r.request.headers)

1.get to get web content

from requests import get
response=get('http://Httpbin. Org / get ', params = {name': 'py. Qi', 'age': 22}) ා add parameter query
print(response.text) #The returned result contains args parameter, headers header information, URL and IP information
print(response.url) #Return the combined URL (http: / / httpbin. Org / get? Name = py. Qi & age = 22)
print(response.json()) #If the returned web page is in JSON format, you can use the json() method to parse the returned dictionary data
# requests.head(url,**kwargs): send head request, URL: Website URL address, return a response object
from requests import head
header=head('https://github.com/get')
print('text:',header.text) #Content information will not be returned
print('headers:',header.headers) #Return header information
print(header.cookies.items()) #Return cookie tuple list
import requests
import re
url='http://www.runoob.com/python3/python3-reg-expressions.html'
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36'
}
response=requests.get(url,headers=headers)
response.encoding='UTF-8'
#print(response.encoding)
#print(response.text)
pattern = re.compile('id="content">.*?<h1>(.*?)</h1>.*?<p>(.*?)</p><p>(.*?)</p>.*?<p>(.*?)</p>.*?<p>(.*?)</p>.*?<p>(.*?)</p>',re.S)
text = re.search(pattern,response.text)
 
for i in text.groups():
    print(i)

 

Python 3 regular expression
 Regular expression is a special character sequence, which can help you easily check whether a string matches a certain pattern.
Python has added re module since version 1.5, which provides Perl style regular expression mode.
The re module makes Python have all the regular expression functions. 
The compile function generates a regular expression object based on a pattern string and optional flag parameters. The object has a series of methods for regular expression matching and replacement. 
The re module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter.
# Capture binary file: Image, BytesIO creates memory object to store data, Image opens Image to obtain Image object, or it can use up and down query mode
# Write image directly to file, suitable for audio, video and other files
import requests
from io import BytesIO
from PIL import Image
 
url='http://docs.python-requests.org/en/master/_static/requests-sidebar.png'
r=requests.get(url)
i=Image.open(BytesIO(r.content)) #Get an image object
print(i.format,i.size,i.mode) #View the source, pixel and pixel type (RGB) of the image
#print(i.show())  #display picture
i.save('requests_log.png')  #Save image data to file

2. Get data by post

import requests
data={'k1':'v1','k2':'v2'}
r = requests.post('http://httpbin.org/post',data=data) 񖓿 send data as form data
body=r.json()  #Get return data in dictionary format 
print(body['form'])  #Stealing form code data
# Multiple file uploads
import requests
 
url='http://httpbin.org/post'
 
multple_files=[
    ('images1',('11.jpg',open('11.jpg','rb'),'image/jpg')),
    ('images2',('22.jpg',open('22.jpg','rb'),'image/jpg')),
]  #The meaning of the field is file name, file object and file type
r=requests.post(url,files=multple_files)
print(r.text)
# Upload file: the files parameter specifies the upload file. The uploaded file is in the main data
 
import requests
url='http://httpbin.org/post'
files={'file':open('network.csv','rb')}
files1={'file':('filename.xls',open('fileanme.xls','rb'),'application/vnd.ms-excel',{'expires':'0'})} #Set file name
r=requests.post(url,files=files) #Specify file send request
print(r.json()['files'])

3. Add data header

Sometimes the website may limit the user agent(UA). The UA you use directly with the default parameters contains requests, so if you modify the header, it is

import requests
headers = {'user-agent': 'Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10'}
url="https://httpbin.org/get"
r = requests.get(url, headers=headers,timeout=5)
    #Or use timeout to set the delay

4. Basic verification

Sometimes, if the website uses basic verification, you only need to add the auth parameter

r = requests.get(url, headers=headers,timeout=5,auth=HTTPBasicAuth('username', 'password'))
    #Due to the popularity of httpbasicauth, python allows verification methods as follows
r = requests.get(url, headers=headers,timeout=5,auth=('username', 'password'))

5.get download file

r = requests.get('https://www.bobobk.com/wp-content/uploads/2018/12/wizard.webp')
f = open('download.webp', 'wb')
for chunk in r.iter_content(chunk_size=512 * 1024): 
	if chunk: 
		f.write(chunk)
f.close()

The method used here can download large files

6.post download file

Of course, you can also directly post files. Add the parameter files and use

url = 'https://httpbin.org/post'
files = {'file': ('myfile.xls', open('myfile.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
r = requests.post(url, files=files)

7. Using cookie s

Just specify cookie parameters directly

url = 'https://httpbin.org/cookies'
r = requests.get(url, cookies={"username":"bobobk"})
    #If a web page returns a cookie, it can also be easily obtained
r.cookies

 

Published 25 original articles, won praise 10, visited 60000+
Private letter follow

Posted by Cerebral Cow on Wed, 29 Jan 2020 01:30:35 -0800