I. General Situation
Network request may be an important part of every language. In Python, although there is a native network request library such as urllib, some of its APIs are not very friendly to developers. So, the revolution of requests has begun. It not only humanizes the API, but also supports all features of urllib.
Requests supports HTTP connection retention and connection pool, cookie session retention, file upload, automatic response content encoding, international URL and POST data encoding. There's not much to say about the benefits. Let's have a try.
It seems that I forgot to say that the database of Requests is no longer powerful, and the bottom layer is also urllib.
Two. Preparation
- Install requests Library
pip3 install requests
Three, use
1. Get request:
response = requests.get("http://www.baidu.com/")
See, right, you're right. It's a short line of code to initiate a GET request.
The returned response has many attributes. Let's take a look at:
response.text returns the decoded string, Responses.content is returned in bytes (binary). Response.status? Coderesponse status code response.request.headers request headers response.headers response.encoding = 'utf-8' can set encoding type response.encoding gets the current encoding The built-in JSON decoder of response.json() returns in the form of JSON, provided that the returned content is in the form of JSON, otherwise the parsing error will throw an exception
- Add parameters and request headers
import requests #parameter kw = {'wd':'Beauty'} #Request header headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" } # params receives a dictionary or string query parameter, # Dictionary type is automatically converted to url encoding, urlencode() is not required response = requests.get( "http://www.baidu.com/s?", params = kw, headers = headers ) # View the response content. response.text returns data in Unicode format print (response.text) # View the response content, and the byte stream data returned by response.content print (respones.content) # View full url address print (response.url) # View response header character encoding print (response.encoding) # View response code print (response.status_code)
The returned content is as follows:
... too much to show ... too much to show 'http://www.baidu.com/s?wd=%E9%95%BF%E5%9F%8E' 'utf-8' 200
Be careful
- When response.text is used, Requests will automatically decode the response content based on the text encoding of the HTTP response, and most Unicode character sets can be decoded seamlessly.
- When response.content is used, the original binary byte stream of server response data is returned, which can be used to save binary files such as pictures.
For example:
import requests response = requests.get("http://www.sina.com") print(response.request.headers) print(response.content.decode('utf-8'))
Result:
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <title>Sina Homepage</title> <meta name="keywords" content="Sina,Sina network,SINA,sina,sina.com.cn,Sina Homepage,Gateway,information" /> ...
If response.text is used, garbled code may appear, such as:
import requests response = requests.get("http://www.sina.com") print(response.text)
Result:
<!DOCTYPE html> <!-- [ published at 2017-06-09 15:18:10 ] --> <html> <head> <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <title>新浪首页</title> <meta name="keywords" content="新浪,新浪ç½',SINA,sina,sina.com.cn,新浪首页,门户,资讯" /> <meta name="description" content="新浪ç½'为å...¨çƒç"¨æˆ·24å°æ—¶æä¾›å...¨é¢åŠæ—¶çš„ä¸æ–‡èµ„讯,å†...容覆盖国å†...外çªå'新闻事件ã€ä½"å›èµ›äº‹ã€å¨±ä¹æ—¶å°šã€äº§ä¸šèµ„讯ã€å®žç"¨ä¿¡æ¯ç‰ï¼Œè®¾æœ‰æ–°é—»ã€ä½"育ã€å¨±ä¹ã€è´¢ç»ã€ç§'技ã€æˆ¿äº§ã€æ±½è½¦ç‰30多个å†...容é¢'é",åŒæ—¶å¼€è®¾åšå®¢ã€è§†é¢'ã€è®ºå›ç‰è‡ªç"±äº'动交æµç©ºé—´ã€‚" /> <link rel="mask-icon" sizes="any" href="//www.sina.com.cn/favicon.svg" color="red">
- Reason
When a response is received, requests guesses the encoding of the response, which is used to decode the response when you call the response.text method. Requests first checks whether the specified encoding method exists in the HTTP header. If not, it will use chardet.detect to try to guess the encoding method (with errors), so response. Content. Decode() is more recommended
2. Post request:
The functions of website login and registration are generally done through post request. Learning to initiate post request with Requests is also the first step to climb the website that needs to be logged in.
import requests #Test request address req_url = "https://httpbin.org/post" #Form Data formdata = { 'username': 'test', 'password': '123456', } #Add request header req_header = { 'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', } #Initiate request response = requests.post( req_url, data = formdata, headers = req_header ) print (response.text) # If it is a json file, it can be displayed directly #print (response.json())
Result:
{ "args": {}, "data": "", "files": {}, "form": { "password": "123456", "username": "test" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "29", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36" }, "json": null, "origin": "223.72.76.90, 223.72.76.90", "url": "https://httpbin.org/post" }
- Upload files
url = 'https://httpbin.org/post' files = {'file': open('image.png', 'rb')} response = requests.post(url, files=files) print(response.text)
Result:
{ "args": {}, "data": "", "files": { "file": "....Too much content to show" }, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "27597", "Content-Type": "multipart/form-data; boundary=16afeeccbaec949570677ca271dc2d0c", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "json": null, "origin": "223.72.76.90, 223.72.76.90", "url": "https://httpbin.org/post" }
Four, summary
The above is the basic Get request and Post request of Requests. The syntax is relatively simple, which is also the first step to learn how to crawl. Come on! Teenager year.