Scrapy Framework--Get/Delivery/Local Save of Cookies

Keywords: Python

Environment: Python 3.6 + Scrapy1.4
What I want to do: 1. Complete a simulated landing
_2. After successful login, extract the cookie s and save them locallyCookie.txtFile
_3. Use it again from localCookie.txtRead cookie s and omit simulated login.


I read some blogs about how Scrapy manipulates cookies, and they are all very different (they don't work well...). I'll summarize what I've tried to extract and transfer cookies, so that hands-on testing is available

Delivery and extraction of cookies

from scrapy.http.cookies import CookieJar    # The module inherits from the built-inHttp.cookiejar, similar to

# Instantiate a cookiejar object
cookie_jar = CookieJar()

# First is the extraction of cookie s
class MySpider(scrapy.Spider):
    ....
    ....
    # Simulate login, then call a function to check if the login was successful
    def login(self, response):
        ....
        return [scrapy.FormRequest(
            url=login_url,
            formdata = {'username':xxx, 'password':xxx},
            callback = self.check_login
        )]

def check_login(self, response):
    if Login Successful:
        # Here our login status is written to'Set-Cookies'in the response header.
        # Use extract_The cookies method extracts cookies from the response
        cookiejar.extract_cookies(response, response.request)
        # The cookiejar is a class dictionary type and is written to a file
        with open('cookies.txt', 'w') as f:
            for cookie in cookie_jar:
                f.write(str(cookie) + '\n')

# In some cases, there may be requests before the login is initiated, and cookies are generated one after another, which can be passed by writing the cookie jar to the meta of the request on the first request
scrapy.Request(url, callback=self.xxx, meta={'cookiejar': cookiejar})
# Each time you need to pass this cookiejar object, you canResponse.metaGet it in
scrapy.Request(url, callback=self.xxx, meta={'cookiejar': response.meta['cookiejar']})

I've read a lot of posts about adding cookies to meta so that you can get cookies in the middle of requests. I've tried this. Cookiejars are still empty after they flow through multiple requests-responses, so they can only be used when neededCookiejar.extract_The cookies method is extracted manually.
If any God knows how cookiejar in meta can get his own cookies, please comment on them. Thank you.

Read Cookie s from Local Files

     with open('cookies.txt', 'r') as f:
         cookiejar = f.read()
         p = re.compile(r'<Cookie (.*?) for .*?>')
         cookies = re.findall(p, cookiejar)
         cookies = (cookie.split('=', 1) for cookie in cookies)
         cookies = dict(cookies)

You can then initiate the request for the first time (start_Add cookies manually to requestScrapy.RequestIn the cookies parameter of, the cookie will flow itself in subsequent requests.

scrapy.Request(url, callback=self.xxx, cookies=cookies)

Posted by fugix on Fri, 10 Jul 2020 09:14:35 -0700