python selenium+beautifulSoup crawl lottery network data

Keywords: Python Selenium Google encoding

I've always been particularly interested in python, and now I'm learning python. Here's how to use the library selenium and beautifulSoup from Python to crawl 360 lottery net winners:
First open the page you want to crawl to see its content structure:
Web address is http://chart.cp.360.cn/kaijiang/ssq?sb_spm=b5d6e27c6c47fd3a77aacda65df2ad7a

The final data output results are formatted as follows:

The test tools are as follows:
1. python library selenium and beautifulSoup
2. Google Google Google Browser
3. The pandas library is also used because the final data is displayed in a data frame format.

The code is as follows:

from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
pd.set_option('display.max_rows', None)  # This description was added to prevent incomplete data display

def getdata():
    elem = driver.find_element_by_xpath('/html/body/div[4]/div[3]/table[2]/tbody')#Find the div where the data is located
    html = elem.get_attribute('innerHTML')#Returns the code corresponding to the part of the data
    soup = BeautifulSoup(html, "html.parser")
    s = soup.find_all("tr") # Returned is the tag list

    for u in range(len(s)):
        i = s[u].contents
        d = {}
        d['Period Number'] = i[0].text
        d['Date'] = i[1].text[0:10]
        s1 = i[2].text.split('\xa0')
        d['Red Globe 1']= s1[0]
        d['Red Globe 2'] = s1[1]
        d['Red Globe 3'] = s1[2]
        d['Red Globe 4'] = s1[3]
        d['Red Globe 5'] = s1[4]
        d['Red Globe 6'] = s1[5]
        d['blue ball']=i[3].text
        all_data.append(d)


 # You need to change this to your own, but not if the environment variable can be called directly
driver = webdriver.Chrome()
driver.get('http://chart.cp.360.cn/kaijiang/ssq?sb_spm=b5d6e27c6c47fd3a77aacda65df2ad7a')
all_data = []
qishu2 = driver.find_element_by_class_name('zdy-btn')
qishu2.click()
qishu3 = driver.find_element_by_xpath('/html/body/div[4]/div[2]/ul/li[5]/div/div/div[1]/a[2]')
qishu3.click()
start = driver.find_element_by_class_name("issueFrom")
start.send_keys("2003001")
end = driver.find_element_by_class_name("issueTo")
end.send_keys("2017134")
qishu = driver.find_element_by_class_name("btn-star")
qishu.click()
getdata()

a = pd.DataFrame(all_data)
f = open("data.txt", "w+", encoding="utf-8")
print(a, file=f)

Since I am also in the process of learning, I hope that the passing gods have questions or guidance, please communicate with me more

Posted by mash on Sat, 18 Jul 2020 09:23:12 -0700