A novel coronavirus pneumonia is divided into 1.16 sub sections from the date of the national day.

Keywords: JSON network encoding

Article directory

Novel coronavirus pneumonia: total data of domestic sub provincial date (1.16 cases):

Download the original json format data

https://ncportal.esrichina.com.cn/JKZX/yq_20200117.json

https://ncportal.esrichina.com.cn/JKZX/yq_20200118.json

https://ncportal.esrichina.com.cn/JKZX/yq_20200119.json

Wait... (put the yq Change to the date you want)

Data download in csv format

Link: https://pan.baidu.com/s/10hpprJ8sMoucJQfvik5SzQ
Extraction code: 6lhy

This is from json format data collation. The collation code is as follows

2. Data source and capture

From the most authoritative website: http://2019ncov.chinacdc.cn/2019-nCoV/

If you know about reptiles, you can look down. If you are not familiar with reptiles, you can go to the top to get data

Data delivery process

  1. After entering the website, f12, when we click the map and switch provinces and cities to query specific values, there is a "data Zr DOM id =" Zr Gu 0 "canvas in HTML that changes dynamically, so it is proved that these data are operated by js files.
  2. There are no new packets in the Network when switching provinces and cities for query, so it is proved that all the data has been downloaded locally, and js just reads the data to fill and draw, hey hey!
  3. In Sources, we can see that there is a JS folder and config.js file under 2019-nCoV folder of 2019ncov.chinacdc.cn. When we click config.js, the fragrance of the data will come.
  4. From the config.js file, we can see that it lists many files in JSON format, such as yq_.json, yq_.json, etc. we use the website https://ncportal.esrichina.com.cn/JKZX/yq_20200117.json You can get all the data on January 17, 2020
  5. The original data is in json format, and the location, date and indicators are clearly written. We can directly convert them

Download and organize data into csv

The data can be divided into provinces, cities, dates, additions, accumulations, diagnoses and suspicions. Therefore, after I export the csv format, the abscissa is the province, and the ordinate is the date. The format of each cell is: new diagnoses - New suspicions - new deaths - cumulative diagnoses - cumulative suspicions - cumulative deaths

Download the json format raw data code automatically:

import requests
import datetime
import time


# Get N days from startdate
def get_n_day_after(start_date, n):
    date_datetime = datetime.datetime.strptime(start_date, "%Y-%m-%d")
    date = str(
        datetime.datetime(date_datetime.year, date_datetime.month, date_datetime.day) + datetime.timedelta(n)).split()
    return date[0]


for i in range(60):
    date = get_n_day_after('2020-01-16', i)
    date = "".join(date.split("-"))
    url = "https://ncportal.esrichina.com.cn/JKZX/yq_" + date + ".json"
    print(url)
    # Download the json file and save it locally
    time.sleep(1)
    file_name = url.split('/')[-1]
    r = requests.get(url)
    with open("data/" + file_name, "wb") as code:
        pass
        code.write(r.content)

Parsing json code and exporting data into a csv data

import json
import os


def readfile(path):
    files = os.listdir(path)
    file_list = []
    for file in files:  # traverse folder 
        if not os.path.isdir(file):
            file_list.append(path + '/' + file)
    return file_list


# Extract data
data_dict = {}
file_list = readfile("data")
for file_name in file_list:
    f = open(file_name, encoding='utf-8')
    file_json = json.load(f)
    # Date of extraction
    date = (file_name.split('/')[1])[3:11]
    # date.insert(4, '-')
    # date.insert(7, '-')
    date = "%s-%s-%s" % (date[:4], date[4:6], date[6:8])

    # Extract provinces and cities
    features = file_json['features']
    for feature in features:
        properties = feature['properties']
        province = properties['Province']
        new_confirm = properties['Newly diagnosed']
        new_death = properties['New death']
        new_suspect = properties['New suspected']
        total_confirm = properties['Cumulative diagnosis']
        total_death = properties['Cumulative death']
        total_suspect = properties['Cumulative suspicion']
        write_str = "-".join(
            [str(new_confirm), str(new_suspect), str(new_death), str(total_confirm), str(total_suspect),
             str(total_death)])
        # {date : { province : write_str } }
        if data_dict.get(date):
            data_dict[date].update({province: write_str})
        else:
            data_dict[date] = {province: write_str}
sorted(data_dict)
# write file
province_init = False
file_name = 'COVID-19.CSV'
with open(file_name, 'w') as file_writer:
    for date, value_dict in data_dict.items():
        province_list = []
        value_list = []
        sorted(value_dict)
        for province, values in value_dict.items():
            province_list.append(province)
            value_list.append(values)
            # Province of writing
        if province_init:
            file_writer.write(date + "," + ",".join(value_list) + "\n")
        else:
            # The first time to write data, you need to join the province
            province_init = True
            file_writer.write("," + ",".join(province_list) + "\n")
        file_writer.write(date + "," + ",".join(value_list) + "\n")

109 original articles published, 135 praised and 210 thousand visited+
Private letter follow

Posted by jonsimmonds on Mon, 16 Mar 2020 03:57:31 -0700