Requests automatically send HTTP/1.1 requests through urlib3. It can easily implement cookies, login verification, proxy settings and other operations. Python's built-in urllib module is used to access network resources. However, it is troublesome to use and lacks many practical advanced functions. A better solution is to use requests. It is a Python third-party library, which is very convenient to handle URL resources
requests implementation content:
Keep alive and connect pool | Support international domain name and website | Session and Cookie persistence | Browser SSL authentication |
Automatic content decoding | Basic / digest authentication | Automatic decompression | Unicode response body |
HTTP(s) proxy support | Multipart file upload | Streaming Downloads | connection timed out |
Chunked request | . netrc support | ||
Requests are not installed by default in python. If pip install requests is not installed, it is. request dependency package relationship
requests==2.19.1 - certifi [required: >=2017.4.17, installed: 2018.4.16] #CA authentication module - chardet [required: <3.1.0,>=3.0.2, installed: 3.0.4] #Universal character encoding detector module - idna [required: <2.8,>=2.5, installed: 2.7] #International domain name resolution module - urllib3 [required: <1.24,>=1.21.1, installed: 1.23] #Thread safe HTTP Library
Here is an example of using modules with requests
requests.request(method,url,**kwargs): construct and send a request and return a response object Parameters: Method: method of request object (POST) URL: the URL of the request object params: optional dictionary or byte request to be sent in the query string data: optional, the dictionary or metalist is encoded in the form, and the byte or similar object is sent in the body [(key,value)] json: optional, a json serializable python object that sends a request in the body headers: optional for writing http header information Cookies: optional, send cookies with dict or cookieJar object File: optional. The dictionary used for multipart code upload can be a multipart ancestor. It is a string defining the content type of a given file and a class dictionary object containing the additional header file added by the query file auth: optional, authentication ancestor, custom http authentication Timeout: optional. The timeout (float/tuple) for sending waiting request data. If it is set to Yuanzu, it means the timeout for level practice connect and read reading. If it is set to None, it means the permanent wait Allow "redirects: Boolean, optional, enable or disable GET,OPTIONS,POST,PUT,PATCH,DELETE,HEAD redirection, default is true proxies: optional, dictionary mapping protocol to proxy URL verify: optional. It can be a Boolean value. You can specify the TLS certificate path of the validation server. The default value is true stream: optional, if False, the response content will be downloaded immediately CERT: optional. If it is string, it is the ssl client certificate file path. If it is a meta ancestor, then ('cert','key ') specifies the certificate and key
# The Response object contains the Response information of the server to the HTTP request import requests headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36'} r=requests.get('http://docs.python-requests.org/en/master/',headers=headers) print('chardet Code provided:',r.apparent_encoding) print('Response byte content:',r.content) print('response cookies:',r.cookies.items()) print('Time between request and response:',r.elapsed) print('Response coding:',r.encoding) print('Response header information:',r.headers) print('In header information server:',r.headers['Server']) print('Request history:',r.history) print('Iterative response data:',r.iter_lines()) #print('response JSON encoding data: ', r.json()) print('Return the resolved response header link:',r.links) print('Return status code:',r.status_code) print('response str content:',r.text) print('response URL:',r.url) print('Return sent header parameters:',r.request.headers)
1.get to get web content
from requests import get response=get('http://Httpbin. Org / get ', params = {name': 'py. Qi', 'age': 22}) ා add parameter query print(response.text) #The returned result contains args parameter, headers header information, URL and IP information print(response.url) #Return the combined URL (http: / / httpbin. Org / get? Name = py. Qi & age = 22) print(response.json()) #If the returned web page is in JSON format, you can use the json() method to parse the returned dictionary data
# requests.head(url,**kwargs): send head request, URL: Website URL address, return a response object from requests import head header=head('https://github.com/get') print('text:',header.text) #Content information will not be returned print('headers:',header.headers) #Return header information print(header.cookies.items()) #Return cookie tuple list
import requests import re url='http://www.runoob.com/python3/python3-reg-expressions.html' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36' } response=requests.get(url,headers=headers) response.encoding='UTF-8' #print(response.encoding) #print(response.text) pattern = re.compile('id="content">.*?<h1>(.*?)</h1>.*?<p>(.*?)</p><p>(.*?)</p>.*?<p>(.*?)</p>.*?<p>(.*?)</p>.*?<p>(.*?)</p>',re.S) text = re.search(pattern,response.text) for i in text.groups(): print(i)
Python 3 regular expression Regular expression is a special character sequence, which can help you easily check whether a string matches a certain pattern. Python has added re module since version 1.5, which provides Perl style regular expression mode. The re module makes Python have all the regular expression functions. The compile function generates a regular expression object based on a pattern string and optional flag parameters. The object has a series of methods for regular expression matching and replacement. The re module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter.
# Capture binary file: Image, BytesIO creates memory object to store data, Image opens Image to obtain Image object, or it can use up and down query mode # Write image directly to file, suitable for audio, video and other files import requests from io import BytesIO from PIL import Image url='http://docs.python-requests.org/en/master/_static/requests-sidebar.png' r=requests.get(url) i=Image.open(BytesIO(r.content)) #Get an image object print(i.format,i.size,i.mode) #View the source, pixel and pixel type (RGB) of the image #print(i.show()) #display picture i.save('requests_log.png') #Save image data to file
2. Get data by post
import requests data={'k1':'v1','k2':'v2'} r = requests.post('http://httpbin.org/post',data=data) send data as form data body=r.json() #Get return data in dictionary format print(body['form']) #Stealing form code data
# Multiple file uploads import requests url='http://httpbin.org/post' multple_files=[ ('images1',('11.jpg',open('11.jpg','rb'),'image/jpg')), ('images2',('22.jpg',open('22.jpg','rb'),'image/jpg')), ] #The meaning of the field is file name, file object and file type r=requests.post(url,files=multple_files) print(r.text)
# Upload file: the files parameter specifies the upload file. The uploaded file is in the main data import requests url='http://httpbin.org/post' files={'file':open('network.csv','rb')} files1={'file':('filename.xls',open('fileanme.xls','rb'),'application/vnd.ms-excel',{'expires':'0'})} #Set file name r=requests.post(url,files=files) #Specify file send request print(r.json()['files'])
3. Add data header
Sometimes the website may limit the user agent(UA). The UA you use directly with the default parameters contains requests, so if you modify the header, it is
import requests headers = {'user-agent': 'Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10'} url="https://httpbin.org/get" r = requests.get(url, headers=headers,timeout=5) #Or use timeout to set the delay
4. Basic verification
Sometimes, if the website uses basic verification, you only need to add the auth parameter
r = requests.get(url, headers=headers,timeout=5,auth=HTTPBasicAuth('username', 'password')) #Due to the popularity of httpbasicauth, python allows verification methods as follows r = requests.get(url, headers=headers,timeout=5,auth=('username', 'password'))
5.get download file
r = requests.get('https://www.bobobk.com/wp-content/uploads/2018/12/wizard.webp') f = open('download.webp', 'wb') for chunk in r.iter_content(chunk_size=512 * 1024): if chunk: f.write(chunk) f.close()
The method used here can download large files
6.post download file
Of course, you can also directly post files. Add the parameter files and use
url = 'https://httpbin.org/post' files = {'file': ('myfile.xls', open('myfile.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})} r = requests.post(url, files=files)
7. Using cookie s
Just specify cookie parameters directly
url = 'https://httpbin.org/cookies' r = requests.get(url, cookies={"username":"bobobk"}) #If a web page returns a cookie, it can also be easily obtained r.cookies