[Python 3 crawler] Netease cloud Music List Download

Keywords: Python Windows

1, Objectives:

Download popular songs of Netease cloud music

 

2, Modules used:

  requests,multiprocessing,re.

 

3, Step:

(1) page analysis: first, open Netease cloud music, select the popular song list, you can see the following list of song lists, and then open the developer tool

Therefore, the url we need to request is https://music.163.com/discover/playlist, and then we use the requests.get() method to request the page. For the returned result, we use regular expression to parse it to get the name and id of the song list. The parsed regular expression is as follows:

res = requests.get(url, headers=headers)
data = re.findall('<a title="(.*?)" href="/playlist\?id=(\d+)" class="msk"></a>', res.text)

  

(2) after getting the name and id of the song list, construct the url of the song list, and then imitate step (1) to get the name and id of the song. The regular expression is as follows:

re.findall(r'<a href="/song\?id=(\d+)">(.*?)</a>', res.text)

After getting the song id, construct the url of the song, and then download the song using the requests.get().content method. The url of the song is constructed as follows:

"http://music.163.com/song/media/outer/url?id=%s "% (song id)

 

(3) because the names of some songs can't be saved as file names, try...except is used. For songs that can't be saved as file names, I choose pass to delete them==

    

(4) because there are many songs to download and there are many songs in one song list, the Pool method of multiprocessing module is used to improve the efficiency of program operation.

 

4, Specific code

It will take a long time to download all the songs, so let's download the first three songs first==

 1 import requests
 2 import re
 3 from multiprocessing import Pool
 4 
 5 headers = {
 6     'Referer': 'https://music.163.com/',
 7     "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.89 "
 8                   "Safari/537.36"
 9 }
10 
11 
12 def get_page(url):
13     res = requests.get(url, headers=headers)
14     data = re.findall('<a title="(.*?)" href="/playlist\?id=(\d+)" class="msk"></a>', res.text)
15 
16     pool = Pool(processes=4)
17     pool.map(get_songs, data[:3])
18     print("Download completed!")
19 
20 
21 def get_songs(data):
22     playlist_url = "https://music.163.com/playlist?id=%s" % data[1]
23     res = requests.get(playlist_url, headers=headers)
24     for i in re.findall(r'<a href="/song\?id=(\d+)">(.*?)</a>', res.text):
25         download_url = "http://music.163.com/song/media/outer/url?id=%s" % i[0]
26         try:
27             with open('music/' + i[1]+'.mp3', 'wb') as f:
28                 f.write(requests.get(download_url).content)
29         except FileNotFoundError:
30             pass
31         except OSError:
32             pass
33 
34 
35 if __name__ == '__main__':
36     hot_url = "https://music.163.com/discover/playlist/?order=hot"
37     get_page(hot_url)

 

5, Operation results

Posted by mindfield on Wed, 01 Jan 2020 08:20:21 -0800