Python crawler tutorial 30: crawling dragnet recruitment data information

Keywords: Python crawler Python crawler

preface 💨

The text and pictures of this article come from the network, only for learning and communication, and do not have any commercial purpose. If you have any questions, please contact us in time for handling.

Previous content 💨

Introduction to Python crawler 01: Douban Top movie crawling

Introduction to Python crawler 02: novel crawling

Introduction to Python crawler 03: second hand house data crawling

Python crawler tutorial 04: crawling Recruitment Information

Introduction to Python crawler 05: crawling of video barrage at station B

Python crawler tutorial 06: making word cloud after crawling data

Python crawler tutorial 07: Tencent video barrage crawling

Python crawler tutorial 08: crawl csdn articles and save them as PDF

Python crawler tutorial 09: multi thread crawling expression package pictures

Introduction to Python crawler tutorial 10: other bank wallpaper crawling

Introduction to Python crawler tutorial 11: crawling of the new king's glory skin picture

Introduction to Python crawler tutorial 12: crawling of hero League skin pictures

Introduction to Python crawler tutorial 13: high quality computer desktop wallpaper crawling

Python crawler tutorial 14: Audio Book crawling

Introduction to Python crawler tutorial 15: crawling of music website data

Introduction to Python crawler tutorial 17: crawling of music songs

Python crawler tutorial 18: good video crawling

Python crawling tutorial 19: YY short video crawling

Introduction to Python crawler tutorial 20: crawling and using IP proxy

Python crawler tutorial 21: crawling paid documents

Python crawler tutorial 22: Baidu translation JS decryption

Introduction to Python crawler tutorial 23: crawling of station A video and decrypting m3u8 video format

Python crawler tutorial 24: download a website paid document and save PDF

Introduction to Python crawler tutorial 25: bypass JS encryption parameters to realize batch download of audio watermark free video content

Python crawler tutorial 26: Kwai Kong video website data content download

Introduction to Python crawler tutorial 27: crawl the data content of an e-commerce platform and do data visualization

Introduction to Python crawler tutorial 28: climb the microblog hot search list and make dynamic data display

Introduction to Python crawler tutorial 29: crawl the data content of a barbecue shop and make a visual display

PS: if you need Python learning materials and answers, you can click the link below to get them by yourself
python free learning materials and group communication solutions. Click to join

Basic development environment 💨

  • Python 3.6
  • Pycharm

Use of related modules 💨

import csv
import requests

Install Python and add it to the environment variable. pip can install the relevant modules required.

💥 Demand data source analysis


Through the developer tool, you can check the url address and request method of the request after you know where the data can be obtained after packet capture analysis

💥 code implementation

import csv
import requests

f = open('data.csv', mode='a', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=[
    'title',
    'city',
    'Company name',
    'education',
    'experience',
    'salary',
    'Company benefits',
    'Detail page',
])
csv_writer.writeheader()
url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
data = {
    'first': 'true',
    'pn': '1',
    'kd': 'python'
}
headers = {
    'cookie': 'cookie',
    'referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36'
}
response = requests.post(url=url, data=data, headers=headers)
result = response.json()['content']['positionResult']['result']
for index in result:
    # pprint.pprint(index)
    title = index['positionName']  # title
    city = index['city']  # city
    area = index['district']  # region
    city_area = city + '-' + area
    company_name = index['companyFullName']  # Company name
    edu = index['education']  # education
    money = index['salary']  # salary
    exp = index['workYear']  # experience
    boon = index['positionAdvantage']  # Company benefits
    href = f'https://www.lagou.com/jobs/{index["positionId"]}.html'
    job_info = index['positionDetail'].replace('<br>\n', '').replace('<br>', '')
    dit = {
        'title': title,
        'city': city_area,
        'Company name': company_name,
        'education': edu,
        'experience': exp,
        'salary': money,
        'Company benefits': boon,
        'Detail page': href,
    }
    csv_writer.writerow(dit)
    txt_name = company_name + '-' + title + '.txt'
    with open(txt_name, mode='w', encoding='utf-8') as f:
        f.write(job_info)
    print(dit)

💥 Crawling data display


Posted by venkyphp on Thu, 07 Oct 2021 11:47:19 -0700