Hey, guys, good evening!
After climbing Jingdong in the afternoon, we'll try Taobao in the evening. The road of chopping hands will never stop!
In fact, I don't want to climb. I can't be known by my daughter-in-law. She said she wanted to buy things on Taobao. She didn't bother to see it. Let me analyze it with code.
No, he doesn't smell good when he has time to fight two unlimited firepower!
Anyway, it's all finished. Sort it out and send it to everyone for reference.
Environment introduction:
- python 3.6
- pycharm
- selenium
- csv
- time
- random
python interpreter installation package and installation tutorial
pycharm code editor installation package, installation tutorial, activation code
Chrome browser Webdriver plug-in installation tutorial
XPath helper plug-in installation tutorial
If not, these can be obtained at the end of the text.
Third party module
selenium python module operates browser driver pip install selenium
Let's simply say chrome River (Google driver)
The browser driver is used to automatically turn pages. Download the version closest to the version of Google browser you installed and put it in your Python installation directory or in the same folder as your code.
This is my browser version
This is the driver plug-in corresponding to my version
Download it and unzip it. It looks like this. I put it together with the code here.I won't say much about the others
Import module
First import the module to be used. The comments are detailed enough.
from selenium import webdriver # Browser functions for importing selenium modules import random # Random data module setting random waiting import time # Time module. This is the waiting time < random > bui lt-in module. It comes with the interpreter when it is installed from constants import TAO_USERNAME, TAO_PASSWORD # Import user information import csv # Data saving # Built in module
Resolve login
We should search products according to keywords, solve login, avoid Taobao detecting selenium, and try to simulate user operation to solve login. Taobao login is encrypted with JS, and JS will detect selenium automatic login. It is best to learn JS decryption.
def search_product(keyword): driver.find_element_by_xpath('//*[@id="q"]').send_keys(keyword) time.sleep(random.randint(1, 3)) driver.find_element_by_xpath('//*[@id="J_TSearchForm"]/div[1]/button').click() time.sleep(random.randint(1, 3)) driver.find_element_by_xpath('//*[@id="fm-login-id"]').send_keys(TAO_USERNAME) time.sleep(random.randint(1, 3)) driver.find_element_by_xpath('//*[@id="fm-login-password"]').send_keys(TAO_PASSWORD) time.sleep(random.randint(1, 3)) driver.find_element_by_xpath('//*[@id="login-form"]/div[4]/button').click() time.sleep(random.randint(1, 3))
Parse data
Here, we need to analyze multiple commodity data, all div tags, and then extract the commodity price, number of payers, store name, store address and detail page address again
def parse_data(): # Analysis of multiple commodity data divs = driver.find_elements_by_xpath('//div[@class="grid g-clearfix"]/div/div ') # all div Tags for div in divs: # Secondary extraction try: info = div.find_element_by_xpath('.//div[@class="row row-2 title"]/a').text price = div.find_element_by_xpath('.//strong').text +' Yuan '# commodity price # handwritten deal = div.find_element_by_xpath('.//Div [@ class = "deal CNT"]). Text # number of payers # handwritten name = div.find_element_by_xpath('.//Div [@ class = "shop"] / A / span [2]). Text # store name # handwritten location = div.find_element_by_xpath('.//Div [@ class = "location"]). Text # store address # handwritten detail_url = div.find_element_by_xpath('.//div[@class="pic"]/a').get_attribute('href ') # details page address # handwritten print(info, price, deal, name, location, detail_url, sep='|') with open('TaoBao.csv', mode='a', encoding='utf-8', newline='') as f: csv_write = csv.writer(f) csv_write.writerow([info, price, deal, name, location, detail_url]) except: continue
Realize the search of goods according to keywords
word = input('Please enter the keyword you want to search for the product:')
Browser operation
Create a browser
driver = webdriver.Chrome()
Modify browser properties
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {"source": """Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"""})
Perform browser actions
driver.get('https://www.taobao.com/')
get is a method of driver, which internally passes an address. driver.get() is not a function, but a method of driver object. The call of method is different from that of user-defined function.
Intelligent waiting: the loading process of page rendering takes time
driver.implicitly_wait(10)
Maximize browser
driver.maximize_window()
Search and parse products
search_product(word) for page in range(0, 100): # 0123456... print(f'\n========================Grabbing page{page + 1}Page data=========================') driver.get(f'https://s.taobao.com/search?q={word}&s={page * 44}') parse_data() time.sleep(random.randint(2, 4))
When we execute the automatic operation of the browser, how the user normally operates the page, then our code logic is roughly the same as that of the page operated by the user.
Verification code is mainly used to verify man-machine behavior, generally sliding, clicking and ordinary verification code.
Complete free source code collection:
Brother Meng, my biggest motivation comes from your support!! After reading, remember to praise the collection of three consecutive ha!
About answers——
Brothers, if you encounter problems in Python learning, you will answer them when you have time! You can add it yourself~
.