Explain
This article refers to the answer based on Chrome, resolution 1920*1080, in other environments may behave differently.
Code address of this article
- Reference Book Download:
2008 Best Artificial Intelligence Data Acquisition (Reptilian) Toolbook Download
Learning Selenium Testing Tools with Python-2014.pdf
Selenium Automated Testing Based on Python Language-2018.pdf
selenium on-line operation: query more records over browsers
Enter "..." in the "Name" box and "300" in the "Page size"
Picture.png
Reference answer
#!/usr/bin/python3 # -*- coding: utf-8 -*- # Discuss nail free group 21745728 qq q group 144081101 567351477 # CreateDate: 2018-10-20 from selenium import webdriver driver = webdriver.Firefox() driver.implicitly_wait(30) driver.get('http://example.webscraping.com/places/default/search') driver.find_element_by_id('search_term').send_keys('.') js = "document.getElementById('page_size').options[1].text = '300';" driver.execute_script(js) driver.find_element_by_id('search').click() links = driver.find_elements_by_css_selector('#results a') countries = [link.text for link in links] print(len(countries)) print(countries) driver.close()
Reference books for this example: Write Web Crawler with Python. pdf
selenium on-line operation: all content in the drop-down refresh box (javascript implementation)
- Open: http://www.webscrapingfordatascience.com/complexjavascript/
- Grab everything in the box. The box rolls down by holding down the middle mouse button and refreshes the content until it is fully loaded.
Picture.png
Reference answer
#!/usr/bin/python3 # -*- coding: utf-8 -*- # Discuss nail free group 21745728 qq q group 144081101 567351477 # CreateDate: 2018-10-18 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import TimeoutException class at_least_n_elements_found(object): def __init__(self, locator, n): self.locator = locator self.n = n def __call__(self, driver): elements = driver.find_elements(*self.locator) if len(elements) >= self.n: return elements else: return False url = 'http://www.webscrapingfordatascience.com/complexjavascript/' driver = webdriver.Chrome() driver.get(url) # Use an implicit wait for cases where we don't use an explicit one driver.implicitly_wait(10) div_element = driver.find_element_by_class_name('infinite-scroll') quotes_locator = (By.CSS_SELECTOR, ".quote:not(.decode)") nr_quotes = 0 while True: # Scroll down to the bottom driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', div_element) # Try to fetch at least nr_quotes+1 quotes try: all_quotes = WebDriverWait(driver, 3).until( at_least_n_elements_found(quotes_locator, nr_quotes + 1)) except TimeoutException as ex: # No new quotes found within 3 seconds, assume this is all there is print("... done!") break # Otherwise, update the quote counter nr_quotes = len(all_quotes) print("... now seeing", nr_quotes, "quotes") # all_quotes will contain all the quote elements print(len(all_quotes), 'quotes found\n') for quote in all_quotes: print(quote.text) input('Press ENTER to close the automated browser') driver.quit()
- Questions and Answers for Interviews
1. What's the use of execute_script()?