Programmers use Python to pick out UP owners who are "amazed by nature" in Station B!

Keywords: Python JSON Programming

Preface

 

!

Recently, the New Year's Eve in Station B has swept over all video websites with its unique ideas, which has had a great positive impact on the company. Stock prices have also risen sharply. We must all regret not buying stocks in Station B earlier:

Today, however, we are not talking about the New Year's Party in Station B, but about the core resource of Station B: "Astonished Mothers-in-law", the inspiration of this article comes from a question on the hot list:

 

Data Acquisition

 

There are 859 answers to the above questions, which is why the data in this article comes from. As many of the answers will be reflected in the answers with links with Grandmother's ID, as shown in the following figure:

We can crawl the IDs of Grandma's master space that appear in the questions, but considering that not all answers will have such IDs, we extracted some bold fonts to get some Grandma's names as supplements to the data:

The above answer is a typical case, referring to a very popular pupil who received Cook's birthday wishes before. Here are some codes to extract the data:

#Start crawling data
driver = webdriver.Chrome()
driver.maximize_window()
url = 'https://www.zhihu.com/question/291506148'
js='window.open("'+url+'")'
driver.execute_script(js)
for i in range(1000):
     time.sleep(1)
     js="var q=document.documentElement.scrollTop=10000000"  
     driver.execute_script(js)
     print(i)

#Organize data
all_html = [k.get_property('innerHTML') for k in driver.find_elements_by_class_name('AnswerItem')]
all_text = ''.join(all_html)
pat = '/space.bilibili.com/\d+'
spaces = list(set([k for k in re.findall(pat,all_text)]))

 

Now we have the ID s of these amazing grandmothers-in-law. The next step is to crawl their personal space in station B for more detailed information:

Above is the personal space of well-known scientists in Station B, from which we can get the number of fans by hand, the main types of videos (thought should be technology, unexpectedly life, B-Station can be manipulated) and the average playback, number of shots, number of comments of all videos, as the basis for subsequent ranking, some codes are as follows:

upstat = pd.DataFrame(columns=['name','fans','face','main_type','total_video',
                               'total_play', 'total_comment'])
for i in range(len(spaces)):
    try:
        time.sleep(1)
        space_id = str(spaces[i].replace('/space.bilibili.com/',''))
        url= 'https://api.bilibili.com/x/web-interface/card?mid={}&jsonp=jsonp&article=true'.format(space_id)
        html = requests.get(url=url, cookies=cookie, headers=header).content
        data = json.loads(html.decode('utf-8'))['data']
        this_name = data['card']['name']
        this_fans = data['card']['fans']
        this_face = data['card']['face']
        this_video = int(data['archive_count'])
        total_page = int((this_video-1)/30)+1
        video_list=[]
        for j in range(total_page):
            url = 'https://api.bilibili.com/x/space/arc/search?mid={}&ps=30&tid=0&pn={}&keyword=&order=click&jsonp=jsonp'.format(space_id,str(j+1))
            html = requests.get(url=url, cookies=cookie, headers=header).content
            data = json.loads(html.decode('utf-8'))
            if j == 0 :
                 type_list = data['data']['list']['tlist']
            this_list = data['data']['list']['vlist']
            video_list = video_list + [ this_list [k] for k in range(len(this_list))]
        type_list = list(type_list.values())
        type_list = {type_list[k]['name']:int(type_list[k]['count']) for k in range(len(type_list))}
        this_type = max(type_list,key=type_list.get)
        this_play = sum([video_list[k]['play'] for k in range(len(video_list)) if video_list[k]['play'] != '--'])
        this_comment = sum([video_list[k]['comment'] for k in range(len(video_list)) if video_list[k]['comment'] != '--'])
        upstat = upstat.append({'name':this_name,
                               'fans':this_fans,
                               'face':this_face,
                               'main_type':this_type,
                               'total_video':this_video,
                               'total_play':this_play,
                               'total_comment':this_comment},
                              ignore_index=True)
        print('success:'+str(i))
    except:
        print('fail:'+str(j))
        continue

Finally, we get the information of more than 200 "amazing heaven and man" grandmothers-in-law in Station B. The overview data are as follows:

 

Overview

 

Having obtained this data, we'll first look at the distribution of the major types of videos published by these amazing "heavenly and human" mothers-in-law:

Because the classification of life in Station B is all-inclusive, both manual Geng and Plum Wu are classified into the category of life and fantastic, so this type of video is more divided, and the proportion of science and technology and digital categories is also very large, which confirms the conclusion that Station B is an excellent learning website and interesting can refer to another article: Do you believe you can learn programming by browsing Station B?

In addition, videos can be collectively referred to as entertainment category, including games, movies and TV. After that, video types will be divided according to technology, life, entertainment, looking for the most "amazing" Grandmother-in-law in each category.

Before you start the formal ranking, stitch the portraits of these grandmothers-in-law in Python to get the following pictures to see how many grandmothers you are very familiar with:

This part of the code is as follows:

i = 0 
for i in range(upstat.shape[0]):
    loc = 'D:/Reptiles/Surprised by nature and man/'+upstat['name'][i]+'.jpg'
 # request.urlretrieve(upstat['face'][i],loc)
    img = mpimg.imread(loc)[:,:,0:3]
    img = cv2.resize(img, (500,500),interpolation=cv2.INTER_CUBIC)
    if i % 20 == 0:
        row_img=img
    elif i == 19:
        row_img=np.hstack((row_img,img))
        all_img = row_img
    elif i % 20 == 19:
        row_img=np.hstack((row_img,img))
        all_img = np.vstack((all_img,row_img))
    else:
        row_img=np.hstack((row_img,img))
    i = i+1    
plt.axis('off')
plt.margins(0,0)
plt.imshow(all_img)
plt.savefig('Head portrait.png',dpi=1000)

 

 

Comprehensive Ranking

 

The next thing to do is to be bold and dare to rank these women-in-law owners, taking into account their number of fans, average number of video screenshots, playback and reviews, to get a comprehensive index, hereby declare: This ranking is for entertainment only, if you want to go deeper, AWSL

First take a look at the grandmothers-in-law who entered TOP10:

Xiao Editor has just been listed on Amway's Wizard Finance and Economics List recently. I suggest you take a look at it. It is really a very grounded statement of complex financial knowledge. Two well-known Grandmother-in-law heads, Warnoff Brothers and Jing Hanqin, are also listed. Here's another look at the TOP11-20 list:

Xu Dasao, Li Zixu and Manual Geng appear in the list at the same time. There will be a chance in the future. I hope someone can plan a cooperation between them. The process is well thought out. Manual Geng provides post-modern tools for Li Zixu. Li Zixu makes the world's hottest pepper with the artifact of manual Geng. Then it is eaten by Xu Da Sao in one bite, and Manual Geng finally collapses into Xu with his own brain melon.Big Sao alleviates the discomfort caused by Chili Peppers

 

Category ranking

 

After the comprehensive ranking, the following will all Grandmother-in-law according to technology, life, entertainment for a comprehensive ranking, living each category of TOP10:

With the classification ranking, you can ask for it on demand according to your own preferences. I believe that after reading it, the holes will become larger grammar. After a while, you can try to publish videos on Station B and become a well-known (strange) grandmother-in-law with two-digit fans on Station B.

Finally, the end of this article is a video that plays the most in Station B by hand. This video shows the theme of "amazing heaven and man" very well. I also hope you can try it yourself. If you use it, you can write down your experience with your extremities perfectly. Welcome to share with us

Posted by genericnumber1 on Thu, 09 Jan 2020 22:12:42 -0800