Look at Python's job prospects by crawling data from recruitment websites. That's true.

Keywords: Python JSON pip encoding

1. Data acquisition

The origin of this thing is that on a sunny afternoon... Habitual opening knows you're ready to go down and see a question just invite an answer.

So it sprouted up to collect information about Python job recruitment on a recruitment website, look at the current salary and job distribution, and do what you say.

First of all, the request head must be disguised, otherwise the first step will pop up your requests too frequently. Please try again later. Secondly, the website has multiple anti-crawling strategies. The solution is to get sessions first, then update our sessions to grab them, and finally get the idea. The data you want.

Right-click the Chrome browser to check the network and find the link https://www.lagou.com/jobs/positionAjax.json? Need AddtionalResult = false

You can see that the returned data is the Python recruitment details of the page, so I opened the direct prompt {status":false,"msg":"You operate too frequently, please visit later","clientIp","124.77.161.207","state": 2402}, tactfully I realized that things were not so simple.

The real battle is just beginning. Let's first analyze the requested message.

You can see that the request is passed as a post, passing parameters at the same time.

What I don't know in the process of learning can be added to me?
python learning communication deduction qun, 784758214
 There are good learning video tutorials, development tools and e-books in the group.
Share with you the current talent needs of python enterprises and how to learn python from zero foundation, and what to learn
datas = {
            'first': 'false',
            'pn': x,
            'kd': 'python',
        }

At the same time, it is not difficult to find that each click on the next page sends a get request at the same time.

After exploring, we found that this get request is the same as our post request, so the problem is much simpler, sort out the ideas.

Key words: python ** search scope: ** national ** data aging: ** 2019.05.05

#!/usr/bin/env python3.4
# encoding: utf-8
"""
Created on 19-5-05
@title: ''
@author: Xusl
"""
import json
import requests
import xlwt
import time

# json object to store job information, traverse to obtain company name, welfare benefits, workplace, academic requirements, type of work, release time, job title, salary, length of service
def get_json(url, datas):
    my_headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36",
        "Referer": "https://www.lagou.com/jobs/list_Python?city=%E5%85%A8%E5%9B%BD&cl=false&fromSearch=true&labelWords=&suginput=",
        "Content-Type": "application/x-www-form-urlencoded;charset = UTF-8"
    }
    time.sleep(5)
    ses = requests.session()    # Get session
    ses.headers.update(my_headers)  # To update
    ses.get("https://www.lagou.com/jobs/list_python?city=%E5%85%A8%E5%9B%BD&cl=false&fromSearch=true&labelWords=&suginput=")
    content = ses.post(url=url, data=datas)
    result = content.json()
    info = result['content']['positionResult']['result']
    info_list = []
    for job in info:
        information = []
        information.append(job['positionId'])  # Position corresponding ID
        information.append(job['city'])  # Post corresponding city
        information.append(job['companyFullName'])  # Full name of company
        information.append(job['companyLabelList'])  # Fringe benefits
        information.append(job['district'])  # Place of work
        information.append(job['education'])  # Educational requirements
        information.append(job['firstType'])  # Types of work
        information.append(job['formatCreateTime'])  # Release time
        information.append(job['positionName'])  # Job title
        information.append(job['salary'])  # salary
        information.append(job['workYear'])  # Working life
        info_list.append(information)
        # The list object is coded in json format, where the indent parameter is indented to 2
        # print(json.dumps(info_list, ensure_ascii=False, indent=2))
    # print(info_list)
    return info_list

def main():
    page = int(input('Please enter the total number of pages you want to grab:'))
    # kd = input('Please enter the position keyword you want to grab:')
    # city = input('Please enter the city you want to grab:')

    info_result = []
    title = ['post id', 'City', 'Full name of company', 'Fringe benefits', 'Place of work', 'Educational requirements', 'Types of work', 'Release time', 'Job title', 'salary', 'Working life']
    info_result.append(title)
    for x in range(1, page+1):
        url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
        datas = {
            'first': 'false',
            'pn': x,
            'kd': 'python',
        }
        try:
            info = get_json(url, datas)
            info_result = info_result + info
            print("The first%s Page Normal Collection" % x)
        except Exception as msg:
            print("The first%s Page problem" % x)

        # Create workbook, excel
        workbook = xlwt.Workbook(encoding='utf-8')
        # Create a table with the second parameter to confirm whether the same cell cell cell can reset its value
        worksheet = workbook.add_sheet('lagouzp', cell_overwrite_ok=True)
        for i, row in enumerate(info_result):
            # print(row)
            for j, col in enumerate(row):
                # print(col)
                worksheet.write(i, j, col)
        workbook.save('lagouzp.xls')

if __name__ == '__main__':
    main()

Of course, it's not enough to store in excel. Before that, we used matplotlib to do data visualization. This time we changed something new, pyecharts.

2. Understand pyecharts

pyecharts is a powerful data visualization tool that combines python with echarts, including a variety of charts

  • Bar (Column/Bar)
  • Bar3D (3D histogram)
  • Box plot
  • Effect Scatter
  • Funnel (funnel chart)
  • Gauge (dashboard)
  • Geo (Geographic coordinate system)
  • Graph
  • HeatMap
  • Kline (K-line chart)
  • Line (polyline/area map)
  • Line3D
  • Liquid (Water Ball Chart)
  • Map (Map)
  • Parallel (Parallel coordinate system)
  • Pie (pie chart)
  • Polar (polar coordinate system)
  • Radar (Radar)
  • Sankey (Sanjitu)
  • Scatter (scatter plot)
  • Scatter3D (3D Scatter3D)
  • Theme River
  • WordCloud

User-defined

  • Grid class: Parallel display of multiple graphs
  • Overlap class: Overlap on the same graph with different types of Graphs
  • Page class: The same page shows multiple images in sequence
  • Timeline class: Provides timeline multicast graphs

In addition, it should be noted that starting with version 0.3.2, in order to reduce the size of the project itself and maintain the lightweight operation of the pyecharts project, pyecharts will no longer bring its own map js file. If users need to use map charts (Geo, Map), they can install the corresponding map file packages by themselves.

  1. Global National Map: echarts-countries-pypkg (1.9MB): World Map and 213 Countries, including China Map
  2. Provincial Maps of China: echarts-china-provinces-pypkg (730KB): 23 provinces, 5 autonomous regions
  3. City Map of China: echarts-china-cities-pypkg (3.8MB): 370 Chinese cities

You can also install using commands

pip install echarts-countries-pypkg
pip install echarts-china-provinces-pypkg
pip install echarts-china-cities-pypkg

3. Data visualization (code + presentation)

  • Number of recruitment in cities
from pyecharts import Bar

city_nms_top10 = ['Beijing', 'Shanghai', 'Shenzhen', 'Chengdu', 'Hangzhou', 'Guangzhou', 'Wuhan', 'Nanjing', 'Suzhou', 'Zhengzhou', 'Tianjin', 'Xi'an', 'Dongguan', 'Zhuhai', 'Hefei', 'Xiamen', 'Ningbo',
                  'Nanning', 'Chongqing', 'Foshan', 'Dalian', 'Harbin', 'Changsha', 'Fuzhou', 'Zhongshan']
city_nums_top10 = [149, 95, 77, 22, 17, 17, 16, 13, 7, 5, 4, 4, 3, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1]

bar = Bar("Python post", "Number of cities")
bar.add("Number", city_nms, city_nums, is_more_utils=True)
# bar.print_echarts_options() # This line only prints configuration items for easy debugging
bar.render('Python Number of posts in cities.html')  # Generate local HTML files

  • Map Distribution Display (This scenario is not very meaningful, but more analysis)
from pyecharts import Geo

city_datas = [('Beijing', 149), ('Shanghai', 95), ('Shenzhen', 77), ('Chengdu', 22), ('Hangzhou', 17), ('Guangzhou', 17), ('Wuhan', 16), ('Nanjing', 13), ('Suzhou', 7),
     ('Zhengzhou', 5), ('Tianjin', 4), ('Xi'an', 4), ('Dongguan', 3), ('Zhuhai', 2), ('Hefei', 2), ('Xiamen', 2), ('Ningbo', 1), ('Nanning', 1), ('Chongqing', 1),
     ('Foshan', 1), ('Dalian', 1), ('Harbin', 1), ('Changsha', 1), ('Fuzhou', 1), ('Zhongshan', 1)]
geo = Geo("Python Map of Post City Distribution", "Data Source Label", title_color="#fff",
                  title_pos="center", width=1200,
                  height=600, background_color='#404a59')
attr, value = geo.cast(city_datas)
geo.add("", attr, value, visual_range=[0, 200], visual_text_color="#fff",
                symbol_size=15, is_visualmap=True)
geo.render("Python Map of Post City Distribution_scatter.html")

geo = Geo("Python Map of Post City Distribution", "Data Source Label", title_color="#fff",
                  title_pos="center", width=1200,
                  height=600, background_color='#404a59')
        attr, value = geo.cast(city_datas)
        geo.add("", attr, value, type="heatmap", visual_range=[0, 10], visual_text_color="#fff",
                symbol_size=15, is_visualmap=True)
        geo.render("Python Map of Post City Distribution_heatmap.html")

  • Recruitment in various cities
What I don't know in the process of learning can be added to me?
python learning communication deduction qun, 784758214
 There are good learning video tutorials, development tools and e-books in the group.
Share with you the current talent needs of python enterprises and how to learn python from zero foundation, and what to learn
from pyecharts import Pie

city_nms_top10= ['Beijing','Shanghai','Shenzhen','Chengdu','Guangzhou','Hangzhou','Wuhan','Nanjing','Suzhou','Zhengzhou']
city_nums_top10 = [149, 95, 77, 22, 17, 17, 16, 13, 7, 5]
pie = Pie()
pie.add("", city_nms_top10, city_nums_top10, is_label_show=True)
# pie.show_config()
pie.render('Python post distribution pie chart. html')

This also reflects why more and more IT practitioners go to first-tier cities one after another after graduation. Apart from the fact that the salary of first-tier cities is higher than that of second-tier and third-tier cities, the most important reason is the relationship between supply and demand. Because there are many first-tier jobs, the selectivity is higher. Looking at the situation on the second and third fronts, it is very likely that you will jump several times and find that companies in the same industry who can stay have stayed.

  • salary range

It can be seen that most of python's salaries are in the range of 10K to 20k. If you want to work in the Python industry, you can refer to the combination of working life and salary.

  • Education Requirements + Years of Work

Judging from the length of work, there are many recruits with working experience of 1-3 or 3-5 years, but few new students and less than one year. They are not very friendly to interns, and their academic qualifications generally require undergraduate courses. Most companies attach great importance to the qualifications of entry personnel. Although academic qualifications do not represent everything, it is undoubtedly true for an enterprise. For industry, the quickest and most effective way to judge a person's ability in a short period of time is undoubtedly to start with education. Education level is the first level, interview level is the second level.

However, this does not mean that people with low academic qualifications do not have a good way out. Nowadays, more and more college students are finding jobs. it is more and more difficult to find jobs. Competition is becoming fiercer and fiercer. Even if you have high academic qualifications, you can not guarantee that you can find satisfactory jobs. God pays for your diligence, especially in the it industry. Iteration of knowledge is more important than other industries. More frequent. Constantly learning, expanding the breadth and depth of their own learning, is the most correct decision.

When the winter of employment comes, we need a rational and objective view, not blindly pessimistic or optimistic. From the above data analysis, if you like Python, you can still get into the pit, but you should pay attention to a label with work experience, even if you have no work experience, you must try to do a complete project independently in the process of learning Python, crawlers, data analysis, or development, and try to be independent. Do a system, in the process of training their ability to think and solve problems. Continuous learning is the best investment in your future and the most correct posture to spend the winter.

Posted by chauffeur on Thu, 22 Aug 2019 02:14:00 -0700