Python crawlers crawl asynchronously loaded data

Keywords: Python JSON encoding network

Preface

The text and pictures in this article are from the Internet. They are for study and communication only. They do not have any commercial use. The copyright is owned by the original author. If you have any questions, please contact us in time for processing.
Author: Effort and effort

Crawl qq music artist data interface data

https://y.qq.com/portal/singer_list.html
This is the address of the singer list

Analyzing Web Pages

  • The f12 Developer Option finds asynchronously loaded data in the network if you don't know it yet.You can go to the small edition of Python communication first.Skirt: After a long period of fighting and thinking (numerical homophonic) conversion can be found, there is the latest Python tutorial project available, more communication with the people inside, progress is faster!
  • Refresh to see the data
    Look at their response

https://u.y.qq.com/cgi-bin/musicu.fcg?-=getUCGI20652690515538596&g_tk=944669952&loginUin=1638786338&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0&data={"comm"%3A{"ct"%3A24%2C"cv"%3A0}%2C"singerList"%3A{"module"%3A"Music.SingerListServer"%2C"method"%3A"get_singer_list"%2C"param"%3A{"area"%3A-100%2C"sex"%3A-100%2C"genre"%3A-100%2C"index"%3A-100%2C"sin"%3A0%2C"cur_page"%3A1}}}
There will be a web address response Is the list of singers or the above web address Remember this web address and then ask for it

  • Paste Source First
import jsonpath
import json
import requests
import csv
import time
def writecsv(content):
	with open('spider2.csv','a+',encoding='utf-8',newline='') as filecsv:
		writer = csv.writer(filecsv)
		writer.writerow(content)
writecsv(['Singer Name','Singer Area','singer id','Singer picture'])
def getHtml(url):
	response=requests.get(url)
	response.encoding='utf-8'
	html=response.text
	html=json.loads(html)
	print(html)
	singname=jsonpath.jsonpath(html,'$..singerList..singer_name')
	singcountry=jsonpath.jsonpath(html,'$..singerList..country')
	singer_mid=jsonpath.jsonpath(html,'$..singerList..singer_mid')
	singer_pic=jsonpath.jsonpath(html,'$..singerList..singer_pic')
	print(singer_mid)
	for index,i in enumerate(singname):
		writecsv([i,singcountry[index],singer_mid[index],singer_pic[index]])
index=0
for i in range(0,801,80):
	index=index+1
	getHtml('https://u.y.qq.com/cgi-bin/musicu.fcg?-=getUCGI01616088836276819&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0&data={"comm":{"ct":24,"cv":0},"singerList":{"module":"Music.SingerListServer","method":"get_singer_list","param":{"area":-100,"sex":-100,"genre":-100,"index":-100,"sin":%(no)d,"cur_page":%(page)d}}}'% {'no':i,'page':index})
	time.sleep(3)
# index=0
# for i in range(0,801,80):
# 	index=index+1
# print(i)
# print(index)

It's convenient and practical to use json jsonpath and save it in csv format at the end. If you're not familiar with this, you can go to the small edition of Python to communicate first. Skirt: After a long period of fighting and thinking (homophonic number) conversion, you can find it. There are the latest Python tutorial items available, communicate with the people inside more, and progress is faster!
html=json.loads(html) Here you can save the web page data in JSON format and parse the JSON file jsonpath through jsonpath syntax similar to xpath to see the tutorial
Parse data in json format using jsonpath.jsonpath()$Is the root node cannot be lost

singname=jsonpath.jsonpath(html,'$..singerList..singer_name')

{
"singerList": {
"data": {
"area": -100,
"genre": -100,
"index": -100,
"sex": -100,
"singerlist": [
{
"country": "Mainland",
"singer_id": 5062,
"singer_mid": "002J4UUk29y8BY",
"singer_name": "Xue Zhiqian",
"singer_pic": "http://y.gtimg.cn/music/photo_new/T001R150x150M000002J4UUk29y8BY.webp"
},
{
"country": "Mainland",
"singer_id": 1199300,
"singer_mid": "0013RsPD3Xs0FG",
"singer_name": "Banyang",
"singer_pic": "http://y.gtimg.cn/music/photo_new/T001R150x150M0000013RsPD3Xs0FG.webp"
},

Here...singer_name under //singlist representing absolute where there is a match equal to xpath is good to find. You can print out the errors first, then the countries of singers and singers in turn, because their numbers are the same. Write each line in csv format by enumeration

With open ('spider2.csv','a+', encoding='utf-8', newline='). as filecsv: the first parameter is the file name, the second parameter is the append-read method, the third parameter is the encoding method to avoid scrambling, the fourth is to control empty lines to delete empty lines, which will automatically generate this CSV file after success to see if there are any more CSV files open

Such a simple asynchronously loaded data crawl is successful

Posted by nickiehow on Thu, 12 Dec 2019 20:10:55 -0800