Summary of usage of python requests

Keywords: Session github Python SSL

Summary of usage of python requests

Requests is a very practical Python HTTP client library, which is often used when writing crawlers and testing server response data. It can be said that Requests fully meet the needs of today's network

This article is all from official documents.
http://docs.python-requests.org/en/master/
pip install requests are generally used for installation. Refer to official documentation for other installation methods

HTTP - requests

    import requests

GET request

    r  = requests.get('http://httpbin.org/get')

Biography

    >>> payload = {'key1': 'value1', 'key2': 'value2', 'key3': None}
    >>> r = requests.get('http://httpbin.org/get', params=payload)

http://httpbin.org/get?key2=value2&key1=value1

Note that any dictionary key whose value is None will not be added to the URL's query string.

Parameters can also pass lists

    >>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
    >>> r = requests.get('http://httpbin.org/get', params=payload)
    >>> print(r.url)

http://httpbin.org/get?key1=value1&key2=value2&key2=value3

r.text returns the result of encoding parsing in headers and can change the decoding mode by r.encoding = gbk
r.content returns binary results
r.json() returns to JSON format and may throw an exception
r.status_code
R. ray returns the original socket respons es, requiring a parameter stream=True

    >>> r = requests.get('https://api.github.com/events', stream=True)
    >>> r.raw                                                                                                   <requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
    >>> r.raw.read(10)
    '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

Save the results to a file, using r.iter_content()

    with open(filename, 'wb') as fd:
        for chunk in r.iter_content(chunk_size):
            fd.write(chunk)

Transfer headers

    >>> headers = {'user-agent': 'my-app/0.0.1'}
    >>> r = requests.get(url, headers=headers)

Passing cookies

    >>> url = 'http://httpbin.org/cookies'
    >>> r = requests.get(url, cookies=dict(cookies_are='working'))
    >>> r.text
    '{"cookies": {"cookies_are": "working"}}'

POST request

Delivery form

    r = requests.post('http://httpbin.org/post', data = {'key':'value'})

Usually, you want to send some data encoded as a form - much like an HTML form. To achieve this, simply pass a dictionary to the data parameter. Your data dictionary will be automatically coded as a form when a request is made:

    >>> payload = {'key1': 'value1', 'key2': 'value2'}
    >>> r = requests.post("http://httpbin.org/post", data=payload)
    >>> print(r.text)
    {
      ...
      "form": {
      "key2": "value2",
      "key1": "value1"
     },
     ...
    }

Many times the data you want to send is not encoded as a form. If you pass a string instead of a dict, the data will be published directly.

    >>> url = 'https://api.github.com/some/endpoint'
    >>> payload = {'some': 'data'}
    >>> r = requests.post(url, data=json.dumps(payload))

perhaps

    >>> r = requests.post(url, json=payload)

Delivery of files

    url = 'http://httpbin.org/post'
    >>> files = {'file': open('report.xls', 'rb')}
    >>> r = requests.post(url, files=files)

Configure files, filename, content_type and headers

    files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

    files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}

response

    r.status_code
    r.heards
    r.cookies

Jump

By default Requests will perform location redirection for all verbs except HEAD.

    >>> r = requests.get('http://httpbin.org/cookies/set?k2=v2&k1=v1')
    >>> r.url
    'http://httpbin.org/cookies'

    >>> r.status_code
    200

    >>> r.history
    [<Response [302]>]

If you're using HEAD, you can enable redirection as well:

    r=requests.head('http://httpbin.org/cookies/set?k2=v2&k1=v1',allow_redirects=True)

You can tell Requests to stop waiting for a response after a given number of seconds with the timeoutparameter:

requests.get('http://github.com', timeout=0.001)

Advanced features

Come from
http://docs.python-requests.org/en/master/user/advanced/#advanced

session, automatically save cookies, you can set request parameters, next request automatically bring request parameters

s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')

print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'

Session can be used to provide default data. Data at the function parameter level will be merged with data at the session level. If the key is repeated, data at the function parameter level will cover data at the session level. If you want to cancel a parameter of the session, you can pass a dict with the same key and value of None.

s = requests.Session()
s.auth = ('user', 'pass') #Authentication of privileges
s.headers.update({'x-test': 'true'})

# both 'x-test' and 'x-test2' are sent

Data in function parameters will only be used once and will not be saved to session

For example: cookies are only valid this time

r = s.get('http://httpbin.org/cookies', cookies={'from-my': 'browser'})

session can also be automatically closed

with requests.Session() as s:
    s.get('http://httpbin.org/cookies/set/
    sessioncookie/123456789')                                                                        

The response result contains not only all the information of the response, but also the request information.

r = requests.get('http://en.wikipedia.org/wiki/Monty_Python')
r.headers
r.request.headers

SSL Certificate Verification

Requests can validate an SSL certificate for HTTPS requests, just like a web browser. To check the SSL certificate of a host, you can use the verify parameter:

>>> requests.get('https://kennethreitz.com', verify=True)
requests.exceptions.SSLError: hostname 'kennethreitz.com' doesn't match either of '*.herokuapp.com', 'herokuapp.com'

I didn't set up SSL on this domain name, so I failed. But Github has set up SSL:

>>> requests.get('https://github.com', verify=True)
<Response [200]>

For private certificates, you can also pass a path to the CA_BUNDLE file to verify. You can also set the REQUEST_CA_BUNDLE environment variable.

>>> requests.get('https://github.com', verify='/path/to/certfile')

If you set verify to False, Requests can also ignore the validation of the SSL certificate.

>>> requests.get('https://kennethreitz.com', verify=False)
<Response [200]>

By default, verify is set to True. Option verify applies only to host certificates.
You can also specify a local certificate as a client certificate, either a single file (containing keys and certificates) or a tuple containing two file paths:

>>> requests.get('https://kennethreitz.com', cert=('/path/server.crt', '/path/key'))
<Response [200]>

Response volume workflow

By default, when you make a network request, the response experience is downloaded immediately. You can override this behavior by using the stream parameter, postponing downloading the response body until accessing the Response.content property:

tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
r = requests.get(tarball_url, stream=True)

At this point, only the response header is downloaded, and the connection remains open, thus allowing us to access the content according to the conditions:

if int(r.headers['content-length']) < TOO_LONG:
      content = r.content
      ...

If stream is set to True, the request connection will not be closed unless all data is read or Response.close is called.

You can use contextlib.closing to automatically close the connection:

import requests
from contextlib
import closing
tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
file = r'D:\Documents\WorkSpace\Python\Test\Python34Test\test.tar.gz'

with closing(requests.get(tarball_url, stream=True)) as r:
    with open(file, 'wb') as f:
        for data in r.iter_content(1024):
            f.write(data)

Keep-Alive

Come from
http://docs.python-requests.org/en/master/user/advanced/

Any request you make in the same session will automatically reuse the appropriate connection!
Note: Connections are released as connection pools only after all response volume data has been read; so make sure that stream is set to False or read the content property of Response objects.

Streaming Upload

Requests support streaming uploads, which allow you to send large data streams or files without having to read them into memory first. To use streaming upload, you only need to provide a class file object for your request body:
Read the file in bytes so that Requests generate the correct Content-Length

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

Block transmission coding

Requests also support block transmission coding for outgoing and incoming requests. To send a block-coded request, just provide a generator for your request body
Note that the generator output should be bytes

def gen():
    yield b'hi'
    yield b'there'

requests.post('http://some.url/chunked', data=gen())

For chunked encoded responses, it's best to iterate over the data using Response.iter_content(). In an ideal situation you'll have set stream=True on the request, in which case you can iterate chunk-by-chunk by calling iter_content with a chunk size parameter of None. If you want to set a maximum size of the chunk, you can set a chunk size parameter to any integer.

POST Multiple Multipart-Encoded Files

Come from
http://docs.python-requests.org/en/master/user/advanced/

<input type="file" name="images" multiple="true" required="true"/>

To do that, just set files to a list of tuples of (form_field_name, file_info):

>>> url = 'http://httpbin.org/post'
>>> multiple_files = [
    ('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),
    ('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
>>> r = requests.post(url, files=multiple_files)
>>> r.text
{
  ...
  'files': {'images':' ....'}
  'Content-Type': 'multipart/form-data;     
  boundary=3131623adb2043caaeb5538cc7aa0b          3a',
  ...
}

Custom Authentication

Requests allows you to use specify your own authentication mechanism.
Any callable which is passed as the auth argument to a request method will have the opportunity to modify the request before it is dispatched.
Authentication implementations are subclasses of requests.auth.AuthBase, and are easy to define. Requests provides two common authentication scheme implementations in requests.auth:HTTPBasicAuth and HTTPDigestAuth.
Let's pretend that we have a web service that will only respond if the X-Pizza header is set to a password value. Unlikely, but just go with it.

from requests.auth import AuthBase
class PizzaAuth(AuthBase):
    """Attaches HTTP Pizza Authentication to the given Request object."""
    def __init__(self, username):
        # setup any auth-related data here
        self.username = username

def __call__(self, r):
    # modify and return the request
    r.headers['X-Pizza'] = self.username
    return r

Then, we can make a request using our Pizza Auth:
>>> requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth'))
<Response [200]>

Come from
http://docs.python-requests.org/en/master/user/advanced/

Streaming requests

r = requests.get('http://httpbin.org/stream/20', stream=True)
for line in r.iter_lines():

agent

If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:

import requests
proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}

requests.get('http://example.org', proxies=proxies)

To use HTTP Basic Auth with your proxy, use the http://user:password@host/ syntax:

proxies = {'http': 'http://user:pass@10.10.1.10:3128/'}

overtime

If you specify a single value for the timeout, like this:

r = requests.get('https://github.com', timeout=5)

The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:

r = requests.get('https://github.com', timeout=(3.05, 27))

If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.

r = requests.get('https://github.com', timeout=None)

Come from
http://docs.python-requests.org/en/master/user/advanced/

Original:
http://www.cnblogs.com/lilinwei340/p/6417689.html

Posted by xyn on Fri, 12 Jul 2019 19:07:35 -0700