I've always been particularly interested in python, and now I'm learning python. Here's how to use the library selenium and beautifulSoup from Python to crawl 360 lottery net winners:
First open the page you want to crawl to see its content structure:
Web address is http://chart.cp.360.cn/kaijiang/ssq?sb_spm=b5d6e27c6c47fd3a77aacda65df2ad7a
The final data output results are formatted as follows:
The test tools are as follows:
1. python library selenium and beautifulSoup
2. Google Google Google Browser
3. The pandas library is also used because the final data is displayed in a data frame format.
The code is as follows:
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
pd.set_option('display.max_rows', None) # This description was added to prevent incomplete data display
def getdata():
elem = driver.find_element_by_xpath('/html/body/div[4]/div[3]/table[2]/tbody')#Find the div where the data is located
html = elem.get_attribute('innerHTML')#Returns the code corresponding to the part of the data
soup = BeautifulSoup(html, "html.parser")
s = soup.find_all("tr") # Returned is the tag list
for u in range(len(s)):
i = s[u].contents
d = {}
d['Period Number'] = i[0].text
d['Date'] = i[1].text[0:10]
s1 = i[2].text.split('\xa0')
d['Red Globe 1']= s1[0]
d['Red Globe 2'] = s1[1]
d['Red Globe 3'] = s1[2]
d['Red Globe 4'] = s1[3]
d['Red Globe 5'] = s1[4]
d['Red Globe 6'] = s1[5]
d['blue ball']=i[3].text
all_data.append(d)
# You need to change this to your own, but not if the environment variable can be called directly
driver = webdriver.Chrome()
driver.get('http://chart.cp.360.cn/kaijiang/ssq?sb_spm=b5d6e27c6c47fd3a77aacda65df2ad7a')
all_data = []
qishu2 = driver.find_element_by_class_name('zdy-btn')
qishu2.click()
qishu3 = driver.find_element_by_xpath('/html/body/div[4]/div[2]/ul/li[5]/div/div/div[1]/a[2]')
qishu3.click()
start = driver.find_element_by_class_name("issueFrom")
start.send_keys("2003001")
end = driver.find_element_by_class_name("issueTo")
end.send_keys("2017134")
qishu = driver.find_element_by_class_name("btn-star")
qishu.click()
getdata()
a = pd.DataFrame(all_data)
f = open("data.txt", "w+", encoding="utf-8")
print(a, file=f)
Since I am also in the process of learning, I hope that the passing gods have questions or guidance, please communicate with me more