Python data analysis tells you how hot it is!

Keywords: Python Session

Recently, circles of friends and micro-blogs brushed the screen of the passive painting "The Coming of the Devil Child of Nazha".
The memory of Nazha is still in the cartoon that he watched when he was a child: it's him, it's him, our little friend Nazha.

After 14 days of screening, with a total box office of 3.19 billion, it ranked eighth in the history of Chinese film box office, and eventually reached the top five in the list without accident.

In order to give you a more intuitive feeling, I used Python to crawl and analyze the movie-related data.

Data source address: http://piaofang.baidu.com/
The code has been posted.

@classmethod
def spider(cls):

cls.session.get("https://piaofang.baidu.com/?sfrom=wise_film_box")
lz_list = []
szw_list = []

for r in [datetime.now() - timedelta(days=i) for i in range(0, 14)]:
    params = {
        "pagelets[]": "index-overall",
        "reqID": "28",
        "sfrom": "wise_film_box",
        "date": r.strftime("%Y-%m-%d"),
        "attr": "3,4,5,6",
        "t": int(time.time() * 1000),
    }
    response = cls.session.get("https://piaofang.baidu.com/", params=params).text

    result = eval(re.findall("BigPipe.onPageletArrive\((.*?)\)", response)[0])

    selector = Selector(text=result.get("html"))

    li_list = selector.css(".detail-list .list dd")
    for d in range(len(li_list)):
        dic = {}
        name = li_list[d].css("h3 b ::text").extract_first()
        if 'Na Zha' in name or "Raging fire" in name:
            total_box = li_list[d].css("h3 span ::attr(data-box-office)").extract_first()  # Gross box office
            box = li_list[d].css("div span[data-index='3'] ::text").extract_first()  # Real-time box office
            ratio = li_list[d].css("div span[data-index='4'] ::text").extract_first()  # Box office share
            movie_ratio = li_list[d].css("div span[data-index='5'] ::text").extract_first()  # Film arrangement proportion

            dic["date"] = r.strftime("%Y-%m-%d")
            dic["total_box"] = float(
                total_box.replace("Billion", "")) * 10000 if "Billion" in total_box else total_box.replace("ten thousand", "")
            dic["box"] = float(box.replace("Billion", "")) * 10000 if "Billion" in box else box.replace("ten thousand", "")
            dic["ratio"] = ratio
            dic["movie_ratio"] = movie_ratio

            lz_list.append(dic) if 'Na Zha' in name else szw_list.append(dic)

return lz_list, szw_list

This is a class class method, because class variables are used, there is a decorator on it. You can also write in the usual way.
The above code has crawled down the relevant data from the release of "The Devil Child of Nezha" and "Hero of Fire".
Data visualization
Data visualization based on pyecharts module

Gross box office chart

Look at the box office trend, plus two days last weekend, 4 billion is not a dream.
Part of the code is as follows:
@staticmethod
def line_base(l1, l2) -> Line:

lh_list = [y["total_box"] for y in l2]
lh_list.extend([0 for _ in range(3)])  # The first three days were 0.

c = (
    Line(init_opts=opts.InitOpts(bg_color="", page_title="Gross box office"))
        .add_xaxis([y["date"] for y in reversed(l1)])
        .add_yaxis("The Devil Child of Nezha Comes into the World", [y["total_box"] for y in reversed(l1)], is_smooth=True, markpoint_opts=opts.
                   MarkPointOpts(data=[opts.MarkPointItem(type_="max")]))

        .add_yaxis("Fire hero", reversed(lh_list), is_smooth=True, markpoint_opts=opts.
                   MarkPointOpts(data=[opts.MarkPointItem(type_="max")]))

        .set_global_opts(title_opts=opts.TitleOpts(title="Gross box office", subtitle_textstyle_opts={"color": "red"},
                                                   subtitle="Company: Ten thousand yuan"), toolbox_opts=opts.ToolboxOpts())
)
return c.render("line.html")

Look at the next row.

Well, it tastes like a doughnut, as one basketball superstar said.
What about the box office share?

Only 38% of the films were filmed, but the box office accounted for half of the total.
Nazha is so strong!

More technical information: gzitcast

Posted by intech on Mon, 26 Aug 2019 03:12:03 -0700

Programmer Group

Python data analysis tells you how hot it is!

Hot Keywords