Python 3 reptile battle data of pneumonia

Keywords: encoding Python Selenium Windows

Reptile thinking

The first step is to get the data we need

Tencent, Netease, clove doctor and other platforms have real-time data of the epidemic. After crawling the data of each platform, it is found that the data returned by Dr. lilac is static. If you want to crawl in Tencent and Netease, you need to use some functions of selenium library to crawl, which is more troublesome. So I chose the simple clove. Using requests + beautifulsop to crawl data

def get_one_page(url):
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'}
        response = requests.get(url,headers=headers)
        response.encoding = 'utf-8' #Specify utf-8 encoding format here or garbled
        if response.status_code == 200:
            soup = BeautifulSoup(response.text, "html.parser")
            return soup
        return None
    except RequestException:
        return None

Step 2 data processing

After crawling to the html code of the web page containing the data we want, we can analyze it and further extract the data we want (city, number of confirmed cases, number of deaths, number of cures, etc.)

def main():
    url = 'https://ncov.dxy.cn/ncovh5/view/pneumonia'
    soup = get_one_page(url)
    #Use regular matching and search for content with id getAreaStat in script tag
    information = re.search(r'\[(.*)\]', str(soup.find('script', attrs={'id': 'getAreaStat'}))) 
    information = json.loads(information.group(0))#Turn to list
    data = [] #For data visualization and map drawing
    lists = [] #Export csv statistics all data
    #The specific data format obtained is roughly as follows
    #{'cityName': 'Wuhan', 'currentconfirmedcount': 30042, 'confirmedcount': 32994, 'anticipated count': 0, 'currentcount': 1916, 'deadcount': 1036, 'locationid': 420100}
    for area in information:
        data.append([area['provinceShortName'], area['confirmedCount']])
        for city in area['cities']:
            lists.append([city['cityName'],city['currentConfirmedCount'],city['confirmedCount'],city['suspectedCount'],city['curedCount'],city['deadCount']])
    save_data_as_csv('covid_Excel', lists)
    print(data)
    echarts(data)

Step 3 data persistence

After we climb down the data, we need to store the data. Here we choose to store the data in csv format. The author wants to save it and use it to push it to WeChat everyday official account, so that I do not need to open browser search.

def save_data_as_csv(filename, data):
        filename = filename.replace(":", " ")  # Adjustment time
        with open(filename + ".csv", "w", newline="",encoding='utf-8') as f:
            writer = csv.writer(f)
            writer.writerow(["region", "Current number of confirmed cases","Number of historical confirmed cases","Suspected case","Cure number","death toll"])
            for i in data:
                writer.writerow(i)
            f.close()

The effect is as shown in the figure

Step 4 data visualization

Of course, it is not intuitive to view these useful data through excel, we can visualize them. Here we choose to use the very easy-to-use eckarts. In python, it's pyecharts. In the process of using this wheel, you need to pay attention to that if it's python 3 or above, it only supports V1.0 or above. Many usage may be different from the old version. Please refer to here for details https://pyecharts.org/#/zh-cn/geography_charts
The implementation function code is as follows:

def echarts(data):
    map = Map().add("Confirmed number", data,"china").set_series_opts(label_opts=opts.LabelOpts(is_show=True)).set_global_opts(
        visualmap_opts=opts.VisualMapOpts(),
        title_opts=opts.TitleOpts(title="COVID-2019 Epidemic situation"),)
    # Type = "effectscatter", is "random = true, effect" scale = 5 makes the point divergent
    map.render(path="pneumonia.html")