selenium uses try except to deal with the exception of grabbing (case 3 of news grabbing)

Keywords: Selenium

Today's practice is to grab the news content locally, instead of printing the full text, you only need to print the first two or three paragraphs, so you can directly locate the p-tag of the first three paragraphs

content1=driver.find_element_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[1]/div[4]/div/p[1]").text
content2=driver.find_element_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[1]/div[4]/div/p[2]").text
content3=driver.find_element_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[1]/div[4]/div/p[3]").text

However, an article was reported wrong when it was actually fetched, because it was short and had no third paragraph - selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element:

So I thought about whether I could use try except to deal with the exception and judge whether there is a third paragraph. Check that there is an exception NoSuchElementException. To use the NoSuchElementException exception, import from selenium.common.exceptions import NoSuchElementException at the beginning.

If there is no third paragraph, that is, p[3] does not exist, only the first and second paragraphs will be printed. If there is no exception, the first three paragraphs will be printed:

try:
    content3=driver.find_element_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[1]/div[4]/div/p[3]").text
except NoSuchElementException:   #If there is no third paragraph and a NoSuchElementException exception occurs, only the first and second paragraphs are printed.
    print(content1,content2)
else:
    print(content1,content2,content3)

The complete code is as follows:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import time

driver=webdriver.Chrome()

driver.get("http://news.cnpc.com.cn/hynews/")
time.sleep(1)

links=driver.find_elements_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[2]/div[2]/div/ul/*/a")
length=len(links)
 
for i in range(0,21):  #Grabbing too many old ones doesn't make sense. Grabbing 20 is enough
    links=driver.find_elements_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[2]/div[2]/div/ul/*/a")
    link=links[i]
    link.click()   
    time.sleep(1)
    handles=driver.window_handles 
    index_handle=driver.current_window_handle 
    for handle in handles: 
        if handle != index_handle:  
            driver.switch_to.window(handle)  
        else:
            continue
    title=driver.find_element_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[1]/div[2]/h2/a").text
    print(i+1,title)
    content1=driver.find_element_by_xpath("//*[@ id='newsmain-ej']/div/div[1]/div[1]/div[4]/div/p[1]").text
    content2=driver.find_element_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[1]/div[4]/div/p[2]").text
    try:      
        content3=driver.find_element_by_xpath("//*[@id='newsmain-ej']/div/div[1]/div[1]/div[4]/div/p[3]").text
    except NoSuchElementException:   
        print(content1,content2)
    else:
        print(content1,content2,content3)
    print("\n")
    driver.close()
    time.sleep(1)
    driver.switch_to_window(index_handle)

print("--CNPC grabs 20 news————")
print("\n")

 

Posted by ashida123 on Thu, 26 Dec 2019 08:30:35 -0800