Changjin lake, me and my parents, Python National Day film review analysis

Keywords: Python Visualization Python crawler

Hello readers, I'm Xiao Zhang~

The National Day holiday ended yesterday. We also returned to our jobs and continued our efforts to move bricks (fishing) life

Since 19 years ago, a film with the theme of me and my * will be released every 11 days to celebrate the national day. According to the box office trend of the previous two years, the popularity of this film is much greater than that of other films released at the same time, and the box office ranks first

This year, no exception, a film "me and my parents" was released, which tells the story between parents and children in four segments, and the content is also affirmed by the public;

But what is surprising is that its box office is far lower than that of another national day file, Changjin lake. The popularity and praise were higher than the former. This paper makes a film review analysis on the specific details

This paper selects three films that will be released on this year's national day, namely "me and my parents", "Changjin Lake" and "five children splashing on the water"

Many readers may have heard the film "five children splashing on the water" for the first time, and the heat is far less than the first two, but it was released during this year's national day, and according to the cat's eye ranking, the heat is not low, ranking third

Technology stack

Before we start, let's talk about the technology stack used in this paper, which is mainly divided into the following two aspects:

Language: Python, javascript;

Library: ecarts, styleCloud;

Comparative analysis of film reviews

First of all, from the perspective of film reviews, I have obtained some film reviews of three films on Douban with the help of Python. I will not introduce more about the crawling of Douban film reviews here. For those unfamiliar, refer to the old article:, the core code is posted below:

headers = {
    "Cookie":"bid=tulFhUK9Lzo; douban-fav-remind=1; ll=\"118160\"; _vwo_uuid_v2=D55143433EAF6AF4EB29A904F8BE781A1|4d5d27125abfe3f6d29caa68ba504fed; ap_v=0,6.0;; _pk_ses.100001.4cf6=*; __utma=30149280.52492667.1628212627.1629608096.1632849782.3; __utmc=30149280; __utmz=30149280.1632849782.3.3.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); __utma=223695111.788106722.1629608096.1629608096.1632849782.2; __utmb=223695111.0.10.1632849782; __utmc=223695111; __utmz=223695111.1632849782.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); __utmb=30149280.3.10.1632849782; _pk_id.100001.4cf6=254979423a09aae4.1629608097.2.1632851386.1629608485.",
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36",
    "Accept-Language": "zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7,zh-TW;q=0.6",
#Part1 data crawling and changing id
movieId = "35030151"
for offset in range(0,220,20):
    url = "{}/comments?start={}&limit=20&status=P&sort=new_score".format(movieId,offset)
    res = requests.get(url,headers= headers)
    # print(res.text)
    soup = BeautifulSoup(res.text,'lxml')
    for comment_item in"#comments > .comment-item"):

            data_item = []
            avatar =".avatar a img")[0].get("src")
            name =".comment h3 .comment-info a")[0]
            rate =".comment h3 .comment-info span:nth-child(3)")[0]
            date =".comment h3 .comment-info span:nth-child(4)")[0]
            comment =".comment .comment-content span")[0]
            # comment_item.get("div img").ge
            data_json ={
                'name': str(name.string).strip("\t"),
                'rate': str(rate.get("class")[0]).strip("allstar").strip('\t').strip("\n"),
                'date' : str(date.string).replace('\n','').replace('\t','').strip(' '),
                'comment': str(comment.string).strip("\t").strip("\n")
            if not (collection.find_one({'avatar':avatar})):
               print("data _json is {}".format(data_json))
        except Exception as e:

First of all, let's see if there are quantitative differences in the comments on the three films in each time period, so we have figure 1 below

Figure 1

According to the visualization results in Figure 1, the comment trend of the three films is consistent. The number of comments slowly increases from the 24th to the 30th, and then slowly decreases;

This trend is also more reasonable. Comments on the 30th and before the 30th can be regarded as feedback after users watch the point show, and it is also a way for producers to increase the popularity of the film in order to maximize their interests

However, a big problem here is the comparison of the number of reviews. According to the broken line chart, the number of reviews of "five water splashing teenagers" is much greater than that of "Changjin Lake" and "me and my parents". Regardless of the film reviews, from the perspective of communication, the former is much more popular than the latter, while the comparison results of the box office are just the opposite. As for why this trend appears, Let's have a closer look. It can only be said that the producer of Changjin lake is really confident and has low comment popularity, but the box office is very outstanding

I have also made a simple comparison with the star distribution related to film reviews here. The visual effects of "Youth", "Changjin Lake" and "parents" (stealing a lazy here, all replaced with abbreviations) are shown in Figure 2, figure 3 and Figure 4

Figure 2

Figure 3

Figure 4

From the results, the popularity of Changjin lake is the highest. From the collected samples, the proportion of five-star praise is as high as 45.93, nearly half of the proportion, followed by parents, 29.05, and finally youth, 22.6;

But at present, the scoring on Douban is like this, "Changjin Lake" 7.6, "parents": 7.0, "Youth" 7.3. Although Douban film review is relatively authoritative, I still doubt this scoring. As for the reason, it is the box office behind it;

As for the quality of a film, it is meaningless to only look at the score. After all, the score can be changed according to the power of capital. The core of a film is the box office. It depends on whether users are willing to pay the price of a film ticket for the film

The following box office data comes from cat's eye. According to the box office data of three films, I have drawn two charts, one is the box office number of the day, and the other is the cumulative box office since the release date

See Figure 5 for the statistical results of box office number of the day

Figure 5

After reading this table, I was shocked. If I didn't add the comparison of box office statistics, I thought there would be a gap between juvenile and the first two films, but it wouldn't be very different. After reading the data of this table, I found that I was still too young. The ticket room of juvenile is basically not comparable with the other two films

Please don't ignore the blue line at the bottom in Figure 5, which is the daily box office statistics of youth

On October 6, the box office of "five boys throwing at the water" was about 5 million, while that of "Changjin Lake" was only 475 million, a difference of nearly 90 times

In addition to the one-day box office, a broken line chart is also drawn for the cumulative box office trend of the three films. The results are shown in Figure 6

As of October 6, the box office of Changjin lake had totaled 2.8 billion, that of parents had totaled 870 million, and that of teenagers was only about 30 million;

If Changjin lake goes up in a straight line of 45 degrees, the box office trend angle of me and my parents is only 30 degrees, or even less than 30 degrees; The box office trend of "five young men throwing at the water" is horizontal, which is really miserable;

Film review analysis here can not help but think of a sentence: drought death of drought, waterlogging death of waterlogging; Some film manufacturers may specially put their films in festivals with large flow such as national day and Spring Festival, and the investment is likely to bring several times or even dozens of times of income;

However, from the box office comparison above, this strategy is not very wise. Unless the film is affirmed and liked by the public, it may not only have no income, but also can not recover the cost invested in the early stage

As a film with high expectations for the national day, I and my parents thought the box office would be slightly lower than that of Changjin lake, but from the above trend, there is still a big gap between parents and the latter;

It can also be seen from the side that nowadays, the public will never be stingy with good movies, and we will never treat ourselves badly with the entertainment of watching movies, but we can't tolerate watching a low-quality film

Word cloud drawing

Finally, draw the film review of each film as a word cloud to see what the audience said about each film

In the film review of Changjin lake, the most mentioned keywords are history, volunteer army, war film, American army, shock and so on; These key words are indeed consistent with the theme of the film, highlighting the hard won good life now

In "me and my parents", the names of Shen Teng and Wu Jing occupy a large part of the whole picture, which is loved by the audience; For the evaluation of the plot of the play, romance, comedy and moving appear most. Although there are great differences in emotion, they set off the different styles of the four directors;

The original, remake, adaptation and domestic production are the theme and tone of the play, but in addition, I have not seen any emotional evaluation related to the plot of the play, and there is nothing to discuss,


About the way to get code and data in this article: pay attention to WeChat official account: Xiao Zhang Python, background reply key words: 211009 can be obtained.

This year's national day film review analysis is over here. Generally speaking, it can be summarized in one sentence: in the past, film producers could create a lot of revenue for the box office by virtue of hype and early traffic IP publicity, which rarely happens now. Now the audience is no longer so easy to fool

If this article is helpful to you, you might as well praise and encourage me. Finally, thank you for reading. See you next time~

Posted by BeanoEFC on Sun, 10 Oct 2021 03:37:39 -0700