Double eleven welfare! Climb to the e-commerce platform for product details! We only buy high-quality products!

Keywords: Python crawler Python crawler

Hey, guys, good evening!
After climbing Jingdong in the afternoon, we'll try Taobao in the evening. The road of chopping hands will never stop!

In fact, I don't want to climb. I can't be known by my daughter-in-law. She said she wanted to buy things on Taobao. She didn't bother to see it. Let me analyze it with code.
No, he doesn't smell good when he has time to fight two unlimited firepower!


Anyway, it's all finished. Sort it out and send it to everyone for reference.

Environment introduction:

  • python 3.6
  • pycharm
  • selenium
  • csv
  • time
  • random

python interpreter installation package and installation tutorial
pycharm code editor installation package, installation tutorial, activation code
Chrome browser Webdriver plug-in installation tutorial
XPath helper plug-in installation tutorial

If not, these can be obtained at the end of the text.

Third party module

selenium python module operates browser driver pip install selenium

Let's simply say chrome River (Google driver)

The browser driver is used to automatically turn pages. Download the version closest to the version of Google browser you installed and put it in your Python installation directory or in the same folder as your code.

This is my browser version
This is the driver plug-in corresponding to my version
Download it and unzip it. It looks like this. I put it together with the code here.I won't say much about the others

Import module
First import the module to be used. The comments are detailed enough.

from selenium import webdriver  # Browser functions for importing selenium modules
import random  # Random data module setting random waiting
import time  # Time module. This is the waiting time < random > bui lt-in module. It comes with the interpreter when it is installed
from constants import TAO_USERNAME, TAO_PASSWORD  # Import user information
import csv  # Data saving  # Built in module

Resolve login

We should search products according to keywords, solve login, avoid Taobao detecting selenium, and try to simulate user operation to solve login. Taobao login is encrypted with JS, and JS will detect selenium automatic login. It is best to learn JS decryption.

def search_product(keyword):
    
    driver.find_element_by_xpath('//*[@id="q"]').send_keys(keyword)

    time.sleep(random.randint(1, 3))
    driver.find_element_by_xpath('//*[@id="J_TSearchForm"]/div[1]/button').click()
    time.sleep(random.randint(1, 3))
    
    driver.find_element_by_xpath('//*[@id="fm-login-id"]').send_keys(TAO_USERNAME)
    time.sleep(random.randint(1, 3))
    driver.find_element_by_xpath('//*[@id="fm-login-password"]').send_keys(TAO_PASSWORD)
    time.sleep(random.randint(1, 3))

    driver.find_element_by_xpath('//*[@id="login-form"]/div[4]/button').click()
    time.sleep(random.randint(1, 3))

Parse data
Here, we need to analyze multiple commodity data, all div tags, and then extract the commodity price, number of payers, store name, store address and detail page address again

def parse_data():
   
    # Analysis of multiple commodity data
    divs = driver.find_elements_by_xpath('//div[@class="grid g-clearfix"]/div/div ') # all div Tags

    for div in divs:  # Secondary extraction
        try:
            info = div.find_element_by_xpath('.//div[@class="row row-2 title"]/a').text
            price = div.find_element_by_xpath('.//strong').text +' Yuan '# commodity price # handwritten
            deal = div.find_element_by_xpath('.//Div [@ class = "deal CNT"]). Text # number of payers # handwritten
            name = div.find_element_by_xpath('.//Div [@ class = "shop"] / A / span [2]). Text # store name # handwritten
            location = div.find_element_by_xpath('.//Div [@ class = "location"]). Text # store address # handwritten
            detail_url = div.find_element_by_xpath('.//div[@class="pic"]/a').get_attribute('href ') # details page address # handwritten

            print(info, price, deal, name, location, detail_url, sep='|')

            with open('TaoBao.csv', mode='a', encoding='utf-8', newline='')  as f:
                csv_write = csv.writer(f)
                csv_write.writerow([info, price, deal, name, location, detail_url])
        except:
            continue

Realize the search of goods according to keywords

word = input('Please enter the keyword you want to search for the product:')

Browser operation

Create a browser

driver = webdriver.Chrome()

Modify browser properties

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument",
                       {"source": """Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"""})

Perform browser actions

driver.get('https://www.taobao.com/') 

get is a method of driver, which internally passes an address. driver.get() is not a function, but a method of driver object. The call of method is different from that of user-defined function.

Intelligent waiting: the loading process of page rendering takes time

driver.implicitly_wait(10)

Maximize browser

driver.maximize_window() 

Search and parse products

search_product(word)


for page in range(0, 100):  # 0123456...
    print(f'\n========================Grabbing page{page + 1}Page data=========================')
    driver.get(f'https://s.taobao.com/search?q={word}&s={page * 44}')
    
    parse_data()
    time.sleep(random.randint(2, 4))

When we execute the automatic operation of the browser, how the user normally operates the page, then our code logic is roughly the same as that of the page operated by the user.
Verification code is mainly used to verify man-machine behavior, generally sliding, clicking and ordinary verification code.

Complete free source code collection:

For the complete source code + corresponding video tutorial base, see: # click the blue text to add to get free benefits!

Brother Meng, my biggest motivation comes from your support!! After reading, remember to praise the collection of three consecutive ha!

About answers——

Brothers, if you encounter problems in Python learning, you will answer them when you have time! You can add it yourself~


.

Posted by laduch on Fri, 22 Oct 2021 07:21:12 -0700