A novel coronavirus pneumonia is divided into 1.16 sub sections from the date of the national day.

After entering the website, f12, when we click the map and switch provinces and cities to query specific values, there is a "data Zr DOM id =" Zr Gu 0 "canvas in HTML that changes dynamically, so it is proved that these data are operated by js files.
There are no new packets in the Network when switching provinces and cities for query, so it is proved that all the data has been downloaded locally, and js just reads the data to fill and draw, hey hey!
In Sources, we can see that there is a JS folder and config.js file under 2019-nCoV folder of 2019ncov.chinacdc.cn. When we click config.js, the fragrance of the data will come.
From the config.js file, we can see that it lists many files in JSON format, such as yq_.json, yq_.json, etc. we use the website https://ncportal.esrichina.com.cn/JKZX/yq_20200117.json You can get all the data on January 17, 2020
The original data is in json format, and the location, date and indicators are clearly written. We can directly convert them

Download and organize data into csv

The data can be divided into provinces, cities, dates, additions, accumulations, diagnoses and suspicions. Therefore, after I export the csv format, the abscissa is the province, and the ordinate is the date. The format of each cell is: new diagnoses - New suspicions - new deaths - cumulative diagnoses - cumulative suspicions - cumulative deaths

Download the json format raw data code automatically:

import requests
import datetime
import time


# Get N days from startdate
def get_n_day_after(start_date, n):
    date_datetime = datetime.datetime.strptime(start_date, "%Y-%m-%d")
    date = str(
        datetime.datetime(date_datetime.year, date_datetime.month, date_datetime.day) + datetime.timedelta(n)).split()
    return date[0]


for i in range(60):
    date = get_n_day_after('2020-01-16', i)
    date = "".join(date.split("-"))
    url = "https://ncportal.esrichina.com.cn/JKZX/yq_" + date + ".json"
    print(url)
    # Download the json file and save it locally
    time.sleep(1)
    file_name = url.split('/')[-1]
    r = requests.get(url)
    with open("data/" + file_name, "wb") as code:
        pass
        code.write(r.content)

Parsing json code and exporting data into a csv data

import json
import os


def readfile(path):
    files = os.listdir(path)
    file_list = []
    for file in files:  # traverse folder 
        if not os.path.isdir(file):
            file_list.append(path + '/' + file)
    return file_list


# Extract data
data_dict = {}
file_list = readfile("data")
for file_name in file_list:
    f = open(file_name, encoding='utf-8')
    file_json = json.load(f)
    # Date of extraction
    date = (file_name.split('/')[1])[3:11]
    # date.insert(4, '-')
    # date.insert(7, '-')
    date = "%s-%s-%s" % (date[:4], date[4:6], date[6:8])

    # Extract provinces and cities
    features = file_json['features']
    for feature in features:
        properties = feature['properties']
        province = properties['Province']
        new_confirm = properties['Newly diagnosed']
        new_death = properties['New death']
        new_suspect = properties['New suspected']
        total_confirm = properties['Cumulative diagnosis']
        total_death = properties['Cumulative death']
        total_suspect = properties['Cumulative suspicion']
        write_str = "-".join(
            [str(new_confirm), str(new_suspect), str(new_death), str(total_confirm), str(total_suspect),
             str(total_death)])
        # {date : { province : write_str } }
        if data_dict.get(date):
            data_dict[date].update({province: write_str})
        else:
            data_dict[date] = {province: write_str}
sorted(data_dict)
# write file
province_init = False
file_name = 'COVID-19.CSV'
with open(file_name, 'w') as file_writer:
    for date, value_dict in data_dict.items():
        province_list = []
        value_list = []
        sorted(value_dict)
        for province, values in value_dict.items():
            province_list.append(province)
            value_list.append(values)
            # Province of writing
        if province_init:
            file_writer.write(date + "," + ",".join(value_list) + "\n")
        else:
            # The first time to write data, you need to join the province
            province_init = True
            file_writer.write("," + ",".join(province_list) + "\n")
        file_writer.write(date + "," + ",".join(value_list) + "\n")

Dumeng's Dai Ma

109 original articles published, 135 praised and 210 thousand visited+

Private letter follow

Posted by jonsimmonds on Mon, 16 Mar 2020 03:57:31 -0700

Programmer Group

A novel coronavirus pneumonia is divided into 1.16 sub sections from the date of the national day.

Article directory

Novel coronavirus pneumonia: total data of domestic sub provincial date (1.16 cases):

Download the original json format data

Data download in csv format

2. Data source and capture

Data delivery process

Download and organize data into csv

Hot Keywords