Reptile actual combat | python draw the distribution map of hongxingerke stores in China. Is your city the most?

Keywords: Python crawler

Recently, Hongxing Erke has been frequently searched

Today, let's use python crawler to see how many hongxingerke stores there are in the country

First, we open the map and search for 'Hongxing Erke'

F12 open the browser developer mode and find the following link.

Copy the link to the browser and find that it is a data set in json format. The number of provinces and cities we need and the corresponding number of cities are among them.

Send request

We first simulate the browser to send a request to obtain the json data set, and then obtain the hongxingerke stores and their corresponding quantities in each city

url = 'https://map.baidu.com/?newmap=1&reqflag=pcmap&biz=1&from=webmap&da_par=baidu&pcevaname=pc4.1&qt=s&da_src=searchBox.button&wd=%E9%B8%BF%E6%98%9F%E5%B0%94%E5%85%8B&c=1&src=0&wd2=&pn=0&sug=0&l=5&b=(7854419.220000001,831323.8799999999;15358291.22,8507227.879999999)&from=webmap&biz_forward={%22scaler%22:1,%22styles%22:%22pl%22}&sug_forward=&auth=yER4N%40Rwcw0cBSVCeS%3DdQBAfLdF6agFfuxLzNBVHVHRtxZhQxjh%40wWvvYgP1PcGCgYvjPuVtvYgPMGvgWv%40uVtvYgPPxRYuVtvYgP%40vYZcvWPCuVtvYgP%40ZPcPPuVtvYgPhPPyheuVtcvY1SGpuRtDpnSCE%40%40By1uVtCGYuVt1GgvPUDZYOYIZuVt1cv3uVtGccZcuVtPWv3GuBtR9KxXwPYIUvhgMZSguxzBEHLNRTVtcEWe1GD8zv7u%40ZPuVtc3CuVteuEthjzgjyBODQEYHUHBxfiKKvMuxcc%40AJ&seckey=cde6ebb241c3d75c675c8688828640edba33c570fc006f6ccdee864f2e95d88033fc19e794fee19c2417a6953ba260f3e91efa7e82cbc9c45b5854aec79ce924b08cce22526301f3a8c80710ebb635e73f5eccb560ee1dc38add2dfc793843279646449563fa4547850c144c3838de6fb1efaab7253aa6e99c1de56b4ddbad3905f480e4d46e5414c519465f08bedee98acac8fc7d2f84f413b041287538b09a811ee347b66a4c2c948f2ffa2f6e7674e0c5cb2b6407b610181af9064f870280fd7053482a91caa7cb762068ea41c4bb7bd2f7899f81a2ba5ab3fde28503a6fdc54b0fdee52cc2d02da76e1a4f1b4745&device_ratio=1&tn=B_NORMAL_MAP&nn=0&ie=utf-8&t=1627305062813'

    headers = {
        'Cookie': 'BIDUPSID=5FDDBE7E96E9CA6D71998093E123403A; PSTM=1627225875; BAIDUID=F934E08738623DF508F108DEF391CFB9:FG=1; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BCLID_BFESS=8512773460870798959; BDSFRCVID_BFESS=5UPOJeC62l07libepqHRKmSPxe5rbsOTH6aoyt6boQjiS8lguPwkEG0PHf8g0Ku-S2EqogKKy2OTH9DF_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SF_BFESS=tJk8_DPbJK-3fP36q4cBb-4WhmT22-us3g7W2hcH0b61EnR_XRQcbJ8LQ-Qi2lJTMITiaKJjBMb1DbRMLfjN5TODKf-DKb3pWDTm_q5TtUJMeCnTDMRh-l04XNbyKMnitIv9-pPKWhQrh459XP68bTkA5bjZKxtq3mkjbPbDfn028DKuDj-WDjJ0DGRf-b-X-I6b0nRH-njfebRNq4nKbICShG4tLlO9WDTm_DostI3SjJoNKbQ10xPD0n3OK6QHKj79-pPKKR7BfKQPhpQ8MqJbhMJtQnbW3mkjbpnDfn02OPKz0T5pKt4syPR8JfRnWn5RKfA-b4ncjRcTehoM3xI8LNj405OTbIFO0KJzJCcjqR8ZDTuBj55P; __yjs_duid=1_695635cb727c238e28cd4254a28a7a0e1627258379781; BAIDUID_BFESS=F934E08738623DF508F108DEF391CFB9:FG=1; __yjs_st=2_NDRiODllYWQzMjBiMzFhYTlmYWVjZTE4NjFkZTM5MmMwODhlZDE0MjVkYWVmMjIzMzc3MWI2Y2RlOTNkMWJkNDBhNmE2YTIyMTJlZjg0ODJiNzk0NDY2NTYxY2NkOGY5YjM5ODViMDAyZjAwY2E0MThjODUyMGM0N2JiMmEyZGEyMTA4ODdkNjViYjcwNDEwODhjNDkzNDg4YjQyMWNjYTI4ZjAzZDllYTg3YjE3ZDRiYWNlMmJkMzc3YjE1OGU5NWU4NjM3YWQxMjkwNDVkMmMyZTM1YTQ5ODgxNTA4ZjE3MDk2YTYwODg5MmY5ZTZlMmYxZGQ5ZTU1OTdkZGYxZV83X2VhYjhlOWZi; H_PS_PSSID=34300_34100_33969_34272_31254_33848_34282_26350_22158; delPer=0; PSINO=3; BA_HECTOR=002h218g2ka58g0lhq1gftcs10r; ab_sr=1.0.1_ZWRlNDJiMzk0ZWQ3YzZmYzgxMmQzOTIyZDBlN2FjZTIxNjIzODliZWE4MzZjZGEwZTBiMTIzNGRmNDhiYmM2NTJhZjI0ZjBkNTFlMjg4MWYxYmY3ZDMzMGVkNmQ1NTNhMDVkN2I1ZGViMDY2ZjBlNWJmOTk4NTBhZGIwOGU4OTg5YzNiM2QwZjVhMTFkYmQ0ODU2NTJkYzNkZmI0ZjI1MA==; PMS_JT=%28%7B%22s%22%3A1627305057015%2C%22r%22%3A%22https%3A//map.baidu.com/@11606355.22%2C4669275.88%2C5.4z%22%7D%29',
        'Referer': 'https://map.baidu.com/search/%E9%B8%BF%E6%98%9F%E5%B0%94%E5%85%8B/@11606355.22,4669275.88,5z?querytype=s&da_src=shareurl&wd=%E9%B8%BF%E6%98%9F%E5%B0%94%E5%85%8B&c=1&src=0&pn=0&sug=0&l=5&b=(6569474.192744261,1360353.0162781863;12256345.744431017,7177600.4441499)&from=webmap&biz_forward=%7B%22scaler%22:1,%22styles%22:%22pl%22%7D&seckey=cde6ebb241c3d75c675c8688828640edba33c570fc006f6ccdee864f2e95d88033fc19e794fee19c2417a6953ba260f3e91efa7e82cbc9c45b5854aec79ce924b08cce22526301f3a8c80710ebb635e73f5eccb560ee1dc38add2dfc793843279646449563fa4547850c144c3838de6fb1efaab7253aa6e99c1de56b4ddbad3905f480e4d46e5414c519465f08bedee98acac8fc7d2f84f413b041287538b09a811ee347b66a4c2c948f2ffa2f6e7674e0c5cb2b6407b610181af9064f870280fd7053482a91caa7cb762068ea41c4bb7bd2f7899f81a2ba5ab3fde28503a6fdc54b0fdee52cc2d02da76e1a4f1b4745&device_ratio=1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4573.0 Safari/537.36'
    }

    resp = requests.get(url, headers = headers)

    if resp.status_code == requests.codes.ok:
        print(resp.json())

Obtain the corresponding information as follows:

Next, let's get each province and the corresponding quantity, because our country has 23 provinces and municipalities directly under the central government, so we need to get it step by step

China has 34 provincial-level administrative regions, including 23 provinces, 5 autonomous regions, 4 municipalities directly under the central government and 2 special administrative regions.

The 23 provinces are: Hebei, Shanxi, Liaoning, Jilin, Heilongjiang, Jiangsu, Zhejiang, Anhui, Fujian, Jiangxi, Shandong, Henan, Hubei, Hunan, Guangdong, Hainan, Sichuan, Guizhou, Yunnan, Shaanxi, Gansu, Qinghai and Taiwan.

The five autonomous regions are Inner Mongolia Autonomous Region, Guangxi Zhuang Autonomous Region, Tibet Autonomous Region, Ningxia Hui Autonomous Region and Xinjiang Uygur Autonomous Region.

The four municipalities directly under the central government are Beijing, Tianjin, Shanghai and Chongqing.

The two special administrative regions are: Hong Kong Special Administrative Region and Macao Special Administrative Region.

The location of the four municipalities directly under the central government is different from that of other cities, and the distribution is as follows:

We will save the obtained information in the excel of the province with the following code:

Panda tutorial:

People can not refuse the pandas skills, simple but easy to use!

    prov = []
    value = []

    #  Get four municipalities
    hot_city = datas.json()['hot_city']
    for i in hot_city:
        pv = i.split('|')
        if 'Beijing' in pv[0]:
            prov.append(pv[0])
            value.append(pv[1])

        if 'Shanghai' in pv[0]:
            prov.append(pv[0])
            value.append(pv[1])

        if 'Tianjin' in pv[0]:
            prov.append(pv[0])
            value.append(pv[1])

        if 'Chongqing City' in pv[0]:
            prov.append(pv[0])
            value.append(pv[1])

    #  Print out all province information
    city_list = datas.json()['more_city']
    for item in city_list:
        #  Get the province of hongxingerke
        province = item['province']
        prov.append(province)

        #  Get the number of provinces where hongxingerke is located
        prov_num = item['num']
        value.append(prov_num)


        pd_data = pd.DataFrame({
            'province': prov,
            'quantity':  value,
        })

    pd_data.to_excel('province.xlsx')
    ic('Province information printing completed!')

The provincial data stored in excel are as follows:

Similarly, we can obtain the number of honghongxing Erke stores in specific cities in each province

All city information includes

Popular cities + more cities   Two parts

city = []
    value = []

    #  Get four municipalities
    hot_city = datas.json()['hot_city']
    for i in hot_city:
        pv = i.split('|')
        if 'Guangzhou City' in pv[0]:
            city.append(pv[0])
            value.append(pv[1])

        if 'Chengdu' in pv[0]:
            city.append(pv[0])
            value.append(pv[1])

        if 'Nanjing City' in pv[0]:
            city.append(pv[0])
            value.append(pv[1])

        if 'Hangzhou' in pv[0]:
            city.append(pv[0])
            value.append(pv[1])

        if 'Wuhan' in pv[0]:
            city.append(pv[0])
            value.append(pv[1])

        if 'Shenzhen City' in pv[0]:
            city.append(pv[0])
            value.append(pv[1])

    #  Print out all city information
    city_list = datas.json()['more_city']

    for item in city_list:
        cities = item['city']

        for i in cities:
            #  Get the urban area of hongxingerke Province
            cit = i['name']
            city.append(cit)
            #  Obtain the corresponding quantity of urban areas in the province where hongxingerke is located
            city_num = i['num']
            value.append(city_num)

            pd_data = pd.DataFrame({
                'city': city,
                'quantity': value,
            })

            pd_data.to_excel('city.xlsx')

    ic('City information printing completed!')

excel stores city data as follows:

We first use panda to read and clean the data

The main thing is to remove the words "province" and "autonomous region" behind the provinces

#  read file
    pd_data = pd.read_excel('province.xlsx')

    prov = pd_data['province'].tolist()
    prov_num = pd_data['quantity'].tolist()

    name = []
    for i in prov:
        if "province" in i:
            name.append(i.replace('province', ''))
        elif 'Inner Mongolia Autonomous Region' in i:
            name.append(i.replace('Autonomous Region', ''))
        else:
            name.append(i[:2])
    ic(name)
    ic(prov)
    
    '''
    2021-07-27 20:50:50.752477|name: ['Beijing',
                                  'Shanghai',
                                  'Tianjin',
                                  'Chongqing',
                                  'Guangdong',
                                  'Zhejiang',
                                  'Shandong',
                                  'Jiangsu',
                                  'Hebei',
                                  'Anhui',
                                  'Hunan',
                                  'Sichuan',
                                  'Fujian',
                                  'Henan',
                                  'Inner Mongolia',
                                  'Shanxi',
                                  'Guangxi',
                                  'Guizhou',
                                  'Heilongjiang',
                                  'Hubei',
                                  'Yunnan',
                                  'Gansu',
                                  'Liaoning',
                                  'Shaanxi',
                                  'Jiangxi',
                                  'Jilin',
                                  'Shanghai',
                                  'Xinjiang',
                                  'Tianjin',
                                  'Ningxia',
                                  'Hainan',
                                  'Tibet',
                                  'Qinghai']
2021-07-27 20:50:50.752477|prov: ['Beijing',
                                  'Shanghai',
                                  'Tianjin',
                                  'Chongqing City',
                                  'Guangdong Province',
                                  'Zhejiang Province',
                                  'Shandong Province',
                                  'Jiangsu Province',
                                  'Hebei Province',
                                  'Anhui Province',
                                  'Hunan Province',
                                  'Sichuan Province',
                                  'Fujian Province',
                                  'Henan Province',
                                  'Inner Mongolia Autonomous Region',
                                  'Shanxi Province',
                                  'Guangxi Zhuang Autonomous Region',
                                  'Guizhou Province',
                                  'Heilongjiang Province',
                                  'Hubei province',
                                  'Yunnan Province',
                                  'Gansu Province',
                                  'Liaoning Province',
                                  'Shaanxi Province',
                                  'Jiangxi Province',
                                  'Jilin Province',
                                  'Shanghai',
                                  'Xinjiang Uygur Autonomous Region',
                                  'Tianjin',
                                  'Ningxia Hui Autonomous Region',
                                  'Hainan ',
                                  'Tibet Autonomous Region',
                                  'Qinghai Province']
    '''


Next, we use pyecarts to visualize our cleaned data

map = (
      Map()
      .add("Quantity distribution", [list(z) for z in zip(prov, prov_num)], "china")
      .set_global_opts(
      title_opts=opts.TitleOpts(title="Distribution map of hongxingerke stores nationwide"),
      visualmap_opts=opts.VisualMapOpts(max_=500, is_piecewise=True),
      )

  )
  map.render('province.shtml')
  ic('The provincial distribution map has been drawn!')

The renderings are as follows:

The same is true for each city map where the province is located. We take Guangdong, which has the most stores, as an example. You can also choose any province

Grab data - > store data - > process data - > visualize data

The final effect is as follows:

After data visualization, it is clear at a glance, which is more pleasing to the eye than looking at excel. And get twice the result with half the effort.

Posted by Amitk on Sun, 17 Oct 2021 16:33:45 -0700