Crawl the top 100 movie information of an eye and store it in the database
It's hard for me to make it today. Cry.
- number
- picture
- Full name
- To star
- time
- score
Let's put the code below:
from urllib import request from bs4 import BeautifulSoup import pymysql conn = pymysql.connect(host = 'localhost', user = 'root', password = '523310', db = 'mysql') cur = conn.cursor() num = 0 headers = { 'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36', 'host' : '' } for i in range(0,10): url_top = '{}'.format(str(i*10)) html_top = requests.get(link, headers = headers, timeout = 10) soup = BeautifulSoup(html_top.text, 'html5lib') name_li = soup.find_all('p', class_ = 'name')#i.a.text.strip() actor = soup.find_all('p', class_ = 'star')#i.text.strip().split('starring:') [1] img_li = soup.find_all('img', class_ = 'board-img') time_li = soup.find_all('p', class_ = 'releasetime')#i.text.strip().split('release time:') [1] score = soup.find_all('p', class_ = 'score') score1_li = soup.find_all('p', class_ = 'integer') score2_li = soup.find_all('p', class_ = 'fraction') long = len(name_li) for j in range(0,long): num = num+1 number = str(num) #print(number) img = img_li[j]['data-src'] #print(img) name = name_li[j].a.text.strip() #print(name) act = actor[j].text.strip().split('To star:')[1] #print(act) date = time_li[j].text.strip().split("Show time:")[1] #print(date) scor =score[j].text #print(scor) try: cur.execute('insert into top100(no, pic, name, actor, time, score) values(%s, %s, %s, %s, %s, %s)',(number, img, name, act, date, scor)) conn.commit() print('wanc') except: conn.rollback() print('shibai') cur.close() conn.commit() conn.close()
There is a lot of information missing in this process, and I also asked the big man.
The questions are as follows:
1. The code I wrote before is as follows:
actor = actor[j].text.strip().split('To star:')[1]
After that, I always reported that there was no text method in str type, but I didn't encounter this problem in other code blocks. I just checked the usage of various types and texts. At one time, it was very collapsed. At last, I saw a problem under my classmate's reminder that my variables in my code were the same as the list name. Then I changed it. That's all right. I was stuck for a week, ah ah ah ah ah ah ah ah ah ah ah ah ah ah ah ah ah ah ah ah ah.
2. There is something wrong with this question.
Before I wrote to the database, there was no try module in my code, so the code had not been able to run. After that, I did not know what a mistake was, but I missed the data in the database.
Notice the varchar and utf8 in this. Okay. Sit back.