Article directory
Novel coronavirus pneumonia: total data of domestic sub provincial date (1.16 cases):
Download the original json format data
https://ncportal.esrichina.com.cn/JKZX/yq_20200117.json
https://ncportal.esrichina.com.cn/JKZX/yq_20200118.json
https://ncportal.esrichina.com.cn/JKZX/yq_20200119.json
Wait... (put the yq Change to the date you want)
Data download in csv format
Link: https://pan.baidu.com/s/10hpprJ8sMoucJQfvik5SzQ
Extraction code: 6lhy
This is from json format data collation. The collation code is as follows
2. Data source and capture
From the most authoritative website: http://2019ncov.chinacdc.cn/2019-nCoV/
If you know about reptiles, you can look down. If you are not familiar with reptiles, you can go to the top to get data
Data delivery process
- After entering the website, f12, when we click the map and switch provinces and cities to query specific values, there is a "data Zr DOM id =" Zr Gu 0 "canvas in HTML that changes dynamically, so it is proved that these data are operated by js files.
- There are no new packets in the Network when switching provinces and cities for query, so it is proved that all the data has been downloaded locally, and js just reads the data to fill and draw, hey hey!
- In Sources, we can see that there is a JS folder and config.js file under 2019-nCoV folder of 2019ncov.chinacdc.cn. When we click config.js, the fragrance of the data will come.
- From the config.js file, we can see that it lists many files in JSON format, such as yq_.json, yq_.json, etc. we use the website https://ncportal.esrichina.com.cn/JKZX/yq_20200117.json You can get all the data on January 17, 2020
- The original data is in json format, and the location, date and indicators are clearly written. We can directly convert them
Download and organize data into csv
The data can be divided into provinces, cities, dates, additions, accumulations, diagnoses and suspicions. Therefore, after I export the csv format, the abscissa is the province, and the ordinate is the date. The format of each cell is: new diagnoses - New suspicions - new deaths - cumulative diagnoses - cumulative suspicions - cumulative deaths
Download the json format raw data code automatically:
import requests import datetime import time # Get N days from startdate def get_n_day_after(start_date, n): date_datetime = datetime.datetime.strptime(start_date, "%Y-%m-%d") date = str( datetime.datetime(date_datetime.year, date_datetime.month, date_datetime.day) + datetime.timedelta(n)).split() return date[0] for i in range(60): date = get_n_day_after('2020-01-16', i) date = "".join(date.split("-")) url = "https://ncportal.esrichina.com.cn/JKZX/yq_" + date + ".json" print(url) # Download the json file and save it locally time.sleep(1) file_name = url.split('/')[-1] r = requests.get(url) with open("data/" + file_name, "wb") as code: pass code.write(r.content)
Parsing json code and exporting data into a csv data
import json import os def readfile(path): files = os.listdir(path) file_list = [] for file in files: # traverse folder if not os.path.isdir(file): file_list.append(path + '/' + file) return file_list # Extract data data_dict = {} file_list = readfile("data") for file_name in file_list: f = open(file_name, encoding='utf-8') file_json = json.load(f) # Date of extraction date = (file_name.split('/')[1])[3:11] # date.insert(4, '-') # date.insert(7, '-') date = "%s-%s-%s" % (date[:4], date[4:6], date[6:8]) # Extract provinces and cities features = file_json['features'] for feature in features: properties = feature['properties'] province = properties['Province'] new_confirm = properties['Newly diagnosed'] new_death = properties['New death'] new_suspect = properties['New suspected'] total_confirm = properties['Cumulative diagnosis'] total_death = properties['Cumulative death'] total_suspect = properties['Cumulative suspicion'] write_str = "-".join( [str(new_confirm), str(new_suspect), str(new_death), str(total_confirm), str(total_suspect), str(total_death)]) # {date : { province : write_str } } if data_dict.get(date): data_dict[date].update({province: write_str}) else: data_dict[date] = {province: write_str} sorted(data_dict) # write file province_init = False file_name = 'COVID-19.CSV' with open(file_name, 'w') as file_writer: for date, value_dict in data_dict.items(): province_list = [] value_list = [] sorted(value_dict) for province, values in value_dict.items(): province_list.append(province) value_list.append(values) # Province of writing if province_init: file_writer.write(date + "," + ",".join(value_list) + "\n") else: # The first time to write data, you need to join the province province_init = True file_writer.write("," + ",".join(province_list) + "\n") file_writer.write(date + "," + ",".join(value_list) + "\n")