Environment: Python 3.6 + Scrapy1.4
What I want to do: 1. Complete a simulated landing
_2. After successful login, extract the cookie s and save them locallyCookie.txtFile
_3. Use it again from localCookie.txtRead cookie s and omit simulated login.
I read some blogs about how Scrapy manipulates cookies, and they are all very different (they don't work well...). I'll summarize what I've tried to extract and transfer cookies, so that hands-on testing is available
Delivery and extraction of cookies
from scrapy.http.cookies import CookieJar # The module inherits from the built-inHttp.cookiejar, similar to # Instantiate a cookiejar object cookie_jar = CookieJar() # First is the extraction of cookie s class MySpider(scrapy.Spider): .... .... # Simulate login, then call a function to check if the login was successful def login(self, response): .... return [scrapy.FormRequest( url=login_url, formdata = {'username':xxx, 'password':xxx}, callback = self.check_login )] def check_login(self, response): if Login Successful: # Here our login status is written to'Set-Cookies'in the response header. # Use extract_The cookies method extracts cookies from the response cookiejar.extract_cookies(response, response.request) # The cookiejar is a class dictionary type and is written to a file with open('cookies.txt', 'w') as f: for cookie in cookie_jar: f.write(str(cookie) + '\n') # In some cases, there may be requests before the login is initiated, and cookies are generated one after another, which can be passed by writing the cookie jar to the meta of the request on the first request scrapy.Request(url, callback=self.xxx, meta={'cookiejar': cookiejar}) # Each time you need to pass this cookiejar object, you canResponse.metaGet it in scrapy.Request(url, callback=self.xxx, meta={'cookiejar': response.meta['cookiejar']})
I've read a lot of posts about adding cookies to meta so that you can get cookies in the middle of requests. I've tried this. Cookiejars are still empty after they flow through multiple requests-responses, so they can only be used when neededCookiejar.extract_The cookies method is extracted manually.
If any God knows how cookiejar in meta can get his own cookies, please comment on them. Thank you.
Read Cookie s from Local Files
with open('cookies.txt', 'r') as f: cookiejar = f.read() p = re.compile(r'<Cookie (.*?) for .*?>') cookies = re.findall(p, cookiejar) cookies = (cookie.split('=', 1) for cookie in cookies) cookies = dict(cookies)
You can then initiate the request for the first time (start_Add cookies manually to requestScrapy.RequestIn the cookies parameter of, the cookie will flow itself in subsequent requests.
scrapy.Request(url, callback=self.xxx, cookies=cookies)