Basic usage of day3 selenium

Keywords: Python Selenium chrome

selenium basic usage

Running environment: from selenium.webdriver import Chrome

1. Create browser object

b = Chrome('files/chromedriver')

2. Open the page


3. Get web data


4. Close the web page


selenium common configurations

Running environment: from selenium.webdriver import Chrome, ChromeOptions
import time

1. Set the setting object of Google browser

options = ChromeOptions()

1) Cancel test environment

options.add_experimental_option('excludeSwitches', ['enable-automation'])

2) Cancel picture loading - acceleration

options.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})

2. Create a browser and open a web page

b = Chrome('files/chromedriver', options=options)

Get and manipulate web page tags

Running environment: from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys

# goods = input('Please enter the product type you want to obtain: ')
b = Chrome('files/chromedriver')

1. Get labels

Browser object.find_element_by... - return label
Browser object.find_elements_by... - returns a list in which the elements are labels

search = b.find_element_by_id('key')
# b.find_element_by_css_selector('#key')

2. Operation label

1) Input box operation (input label): input content

# Press enter
# search.send_keys(Keys.ENTER)

2) Click the tab (click the button or hyperlink)
Get the label to click

search_btn = b.find_element_by_xpath('//div[@role="serachbox"]/button')


Exercise: 51job crawls 5 pages of 'data analysis' position data, analyzes and obtains: position name, salary, company name and company type

from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
import time
from lxml import etree

b = Chrome('files/chromedriver')

def get_html_by_chrome():
    url = ''
    search_input = b.find_element_by_id('kwdselectid')
    search_input.send_keys('Data analysis')

    # Click 5 times next page
    for _ in range(5):
        # print(b.page_source)
        next = b.find_element_by_class_name('next')

def analysis_data(html: str):
    html_node = etree.HTML(html)
    all_job_div = html_node.xpath('//div[@class="j_joblist"]/div[@class="e"]')
    for job_div in all_job_div:
        # Job name
        job_name = job_div.xpath('./a/p[@class="t"]/span[1]/text()')[0]
        # salary
            salary = job_div.xpath('./a/p[@class="info"]/span[1]/text()')[0]
        except IndexError:
            salary = 'Face to face'
        # corporate name
        company_name = job_div.xpath('./div[@class="er"]/a/text()')[0]

        # Company type
            company_type = job_div.xpath('./div[@class="er"]/p[@class="int at"]/text()')[0]
        except IndexError:
            company_type = 'nothing'
        print(job_name, salary, company_name, company_type)

if __name__ == '__main__':

Page scrolling

Running environment: from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

1. Open JD search 'computer' and press enter

b = Chrome('files/chromedriver')
search_input = b.find_element_by_id('key')
# print(b.page_source)

2. Scroll slowly to the specified position

height = 0
while True:
    height += 500
    if height > 9000:
    # Execute js scrolling Code: window.scrollTo(x, y)
    b.execute_script(f'window.scrollTo(0, {height})')

# soup = BeautifulSoup(b.page_source, 'lxml')
# all_goods_li ='#J_goodsList li')
# print(len(all_goods_li))

wait for

Running environment: from selenium.webdriver import Chrome
from import WebDriverWait
from import expected_conditions as EC
from import By
from selenium.webdriver.common.keys import Keys

b = Chrome('files/chromedriver')

1. Implicit waiting

When obtaining a web page tag, if the tag cannot be found in the web page under normal circumstances, the program will directly report an error;
Implicit waiting is to set a waiting time when the tag cannot be obtained. As long as the tag is obtained within the waiting time, no error will be reported

b.implicitly_wait(10)     # Set the waiting time to 10 seconds, which is globally valid

2. Explicit wait

1) First create a wait object: webdriverwait (browser object, timeout)

wait = WebDriverWait(b, 5)
wait2 = WebDriverWait(b, 10)

2) Add condition
Wait for object. Until (condition) - wait until the condition is established, and the wait ends
Wait for object.until_ Not (condition) - wait until the condition does not hold, and wait for the end

Common conditions:
EC.presence_of_element_located((By.X, value)) - judge whether an element is added to the dom tree (judge whether a tag is loaded into the web page, not necessarily visible). When the condition is true, return the corresponding tag
EC.visibility_of_element_located((By.X, value)) - judge whether a label is visible (not hidden, and the width and height of the element are not equal to 0). When the condition is true, return the corresponding label
EC.text_to_be_present_in_element((By.X, value), data) - judge whether the tag content in a tag contains the expected string, and return Boolean True when the condition is True
EC.text_to_be_present_in_element_value((By.X, value), data) - judge whether the value attribute in a tag contains the expected string, and return Boolean True when the condition is True
EC.element_to_be_clickable((By.X, value)) - judge whether a tag can be clicked, and return the corresponding tag when the condition is true

# EC.presence_ of_ element_ Located ((how to determine the label and value))
wait.until(EC.presence_of_element_located((By.ID, 'key')))
search_input = b.find_element_by_id('key')

# The content of the input tag (input box) is the value of the value attribute
wait2.until(EC.text_to_be_present_in_element_value((By.ID, 'key'), 'computer'))

Posted by tbaink2000 on Thu, 04 Nov 2021 13:02:49 -0700