Introduction to Requests
- Requests are HTTP libraries written in Python language and based on urllib, using Apache 2 Licensed Open Source Protocol.
- It's more convenient than urllib, and it saves us a lot of work and is completely satisfying
HTTP test requirements. - As a matter of fact, Requests is a simple HTTP library implemented by Python.
##### Installation of Request
pip install requests
# Requests Actual Warfare
### 1. Initiate HTTP requests
import requests response = requests.get('https://www.baidu.com/') print(type(response)) print(response.status_code) print(type(response.text)) print(response.text) print(response.cookies)
2. Various requests
import requests requests.post('http://httpbin.org/post') requests.put('http://httpbin.org/put') requests.delete('http://httpbin.org/delete') requests.head('http://httpbin.org/get') requests.options('http://httpbin.org/get')
2. Basic get requests
import requests response = requests.get('http://httpbin.org/get') print(response.text)
3. GET requests with parameters
import requests response = requests.get("http://httpbin.org/get?name=germey&age=22") print(response.text)
The above method of splicing parameters may be inconvenient. The following is another method with parameter get request initiation, which is equivalent to the above method.
import requests data = { 'name': 'germey', 'age': 22 } response = requests.get("http://httpbin.org/get", params=data) print(response.text)
4. parsing json
import requests import json response = requests.get("http://httpbin.org/get") print(type(response.text))#str format print(response.json())#json format output print(json.loads(response.text))#The result is the same as the previous one. print(type(response.json()))#json is dict format, that is, dictionary format
5. Getting binary data
import requests response = requests.get("https://github.com/favicon.ico") print(type(response.text), type(response.content))#str and bytes formats print(response.text)#Random code print(response.content)#Binary byte stream
Save the picture:
import requests response = requests.get("https://github.com/favicon.ico") with open('favicon.ico', 'wb') as f: f.write(response.content) f.close()
6. Add headers
When writing crawlers, without headers, the server may refuse access, such as:
import requests response = requests.get("https://www.zhihu.com/explore") print(response.text)#The status code in response is 400, the error request
Add headers
import requests headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36' } response = requests.get("https://www.zhihu.com/explore", headers=headers) print(response.text)#Return to normal results
7. Basic post requests
post requests require users to carry some data, typically user login.
import requests data = {'name': 'germey', 'age': '22'} response = requests.post("http://httpbin.org/post", data=data) print(response.text)
[2]response
1. Main attributes of response
import requests response = requests.get('http://www.jianshu.com') print(type(response.status_code), response.status_code) print(type(response.headers), response.headers) print(type(response.cookies), response.cookies) print(type(response.url), response.url) print(type(response.history), response.history)
2. State Code Judgment
import requests response = requests.get('http://www.jianshu.com/hello.html') exit() if not response.status_code == requests.codes.not_found else print('404 Not Found')
import requests response = requests.get('http://www.jianshu.com') exit() if not response.status_code == 200 else print('Request Successfully')
3. Common status codes: This is also a question frequently asked in interviews.
100: ('continue',),
101: ('switching_protocols',),
102: ('processing',),
103: ('checkpoint',),
122: ('uri_too_long', 'request_uri_too_long'),
200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\o/', '✓'),
201: ('created',),
202: ('accepted',),
203: ('non_authoritative_info', 'non_authoritative_information'),
204: ('no_content',),
205: ('reset_content', 'reset'),
206: ('partial_content', 'partial'),
207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'),
208: ('already_reported',),
226: ('im_used',),
Redirection.
300: ('multiple_choices',),
301: ('moved_permanently', 'moved', '\o-'),
302: ('found',),
303: ('see_other', 'other'),
304: ('not_modified',),
305: ('use_proxy',),
306: ('switch_proxy',),
307: ('temporary_redirect', 'temporary_moved', 'temporary'),
308: ('permanent_redirect',
'resume_incomplete', 'resume',), # These 2 to be removed in 3.0
Client Error.
400: ('bad_request', 'bad'),
401: ('unauthorized',),
402: ('payment_required', 'payment'),
403: ('forbidden',),
404: ('not_found', '-o-'),
405: ('method_not_allowed', 'not_allowed'),
406: ('not_acceptable',),
407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'),
408: ('request_timeout', 'timeout'),
409: ('conflict',),
410: ('gone',),
411: ('length_required',),
412: ('precondition_failed', 'precondition'),
413: ('request_entity_too_large',),
414: ('request_uri_too_large',),
415: ('unsupported_media_type', 'unsupported_media', 'media_type'),
416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'),
417: ('expectation_failed',),
418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'),
421: ('misdirected_request',),
422: ('unprocessable_entity', 'unprocessable'),
423: ('locked',),
424: ('failed_dependency', 'dependency'),
425: ('unordered_collection', 'unordered'),
426: ('upgrade_required', 'upgrade'),
428: ('precondition_required', 'precondition'),
429: ('too_many_requests', 'too_many'),
431: ('header_fields_too_large', 'fields_too_large'),
444: ('no_response', 'none'),
449: ('retry_with', 'retry'),
450: ('blocked_by_windows_parental_controls', 'parental_controls'),
451: ('unavailable_for_legal_reasons', 'legal_reasons'),
499: ('client_closed_request',),
Server Error.
500: ('internal_server_error', 'server_error', '/o\', '✗'),
501: ('not_implemented',),
502: ('bad_gateway',),
503: ('service_unavailable', 'unavailable'),
504: ('gateway_timeout',),
505: ('http_version_not_supported', 'http_version'),
506: ('variant_also_negotiates',),
507: ('insufficient_storage',),
509: ('bandwidth_limit_exceeded', 'bandwidth'),
510: ('not_extended',),
511: ('network_authentication_required', 'network_auth', 'network_authentication'),
[3] Advanced Operations
(1) File upload
import requests files = {'file': open('favicon.ico', 'rb')} response = requests.post("http://httpbin.org/post", files=files) print(response.text)
(2) Getting cookie s
import requests response = requests.get("https://www.baidu.com") print(response.cookies) for key, value in response.cookies.items(): print(key + '=' + value)
(3)session can maintain session information
import requests s = requests.Session() s.get('http://httpbin.org/cookies/set/number/123456789')#Website for testing #Maintaining session information with session, as opposed to operating in a browser response = s.get('http://httpbin.org/cookies') print(response.text)
Output:
{
"cookies": {
"number": "123456789"
}
}
(4) Certificate Verification
Nowadays, many websites need certificate validation. Without certificates, errors such as "you are not visiting a private link" will occur. For the Https protocol, the certificate is checked first and then thrown: SSLError if the certificate is not valid. There are two measures to address this point:
The following method: When accessing, set up the certificate verification, then return the status code 200, but there will still be a warning.
import requests response = requests.get('https://www.12306.cn') print(response.status_code)
Eliminate warning information:
import requests from requests.packages import urllib3 urllib3.disable_warnings() response = requests.get('https://www.12306.cn', verify=False) print(response.status_code)
[2] Manual Designation Certificate
import requests response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key')) print(response.status_code)
(5) Agent settings
import requests proxies = { "http": "http://127.0.0.1:9743", "https": "https://127.0.0.1:9743", } response = requests.get("https://www.taobao.com", proxies=proxies) print(response.status_code)
For agents requiring usernames and passwords
import requests proxies = { "http": "http://user:password@127.0.0.1:9743/",#Specify user name and password } response = requests.get("https://www.taobao.com", proxies=proxies) print(response.status_code)
socks agent: install the module first
pip3 install 'requests[socks]'
Proxy settings
proxies = { 'http': 'socks5://127.0.0.1:9742', 'https': 'socks5://127.0.0.1:9742' } response = requests.get("https://www.taobao.com", proxies=proxies) print(response.status_code)
(6) Timeout settings
import requests from requests.exceptions import ReadTimeout try: response = requests.get("http://httpbin.org/get", timeout = 0.5) print(response.status_code) except ReadTimeout:
### (7) Websites that require login authentication to access
import requests from requests.auth import HTTPBasicAuth r = requests.get('http://120.27.34.24:9001', auth=HTTPBasicAuth('user', '123')) print(r.status_code)
The following is equivalent to the above:
import requests r = requests.get('http://120.27.34.24:9001', auth=('user', '123')) print(r.status_code)
(8) exception handling
import requests from requests.exceptions import ReadTimeout, ConnectionError, RequestException try: response = requests.get("http://httpbin.org/get", timeout = 0.5) print(response.status_code) except ReadTimeout: print('Timeout') except ConnectionError: print('Connection error') except RequestException: print('Error')
Here are some common methods of request
Scanning the two-dimensional code below, timely access to more Internet job search, java, python, crawler, big data and other technologies, and mass information sharing: public number backstage reply "csdn" can be free to receive [csdn] and [Baidu Library] download services; public number backstage reply "information": you can get 5T quality learning materials, Java interview points and Java face summaries, and several Ten java, big data projects, information is complete, you want to find almost all