Write a reptile that listens to novels with 50 lines of code. You can also listen to novels after taking a bath

Keywords: Python

On the way, I found that many people like listening to novels with headphones. My colleagues can actually listen to novels with one headphone all day. Xiao Bian expressed great shock. Download and listen to novels in Python today   tingchina.com   Audio.

List of titles and chapters

Click to open a Book randomly. On this page, you can use beautiful soup to obtain the list of book titles and audio of all individual chapters. Copy the address of the browser, such as: https://www.tingchina.com/yousheng/disp_31086.htm.

 

from bs4 import BeautifulSoup
import requests
import re
import random
import os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}

def get_detail_urls(url):
    url_list = []
    response = requests.get(url, headers=headers)
    response.encoding = 'gbk'
    soup = BeautifulSoup(response.text, 'lxml')
    name = soup.select('.red12')[0].strong.text
    if not os.path.exists(name):
        os.makedirs(name)
    div_list = soup.select('div.list a')
    for item in div_list:
        url_list.append({'name': item.string, 'url': 'https://www.tingchina.com/yousheng/{}'.format(item['href'])})
    return name, url_list

Audio address

Open the link of a single chapter, use the chapter name as the search term in the Elements panel, and find a script at the bottom, which is the address of the sound source.

It can be seen from the Network panel that the url domain name of the sound source is different from that of the chapter list. You should pay attention to this when getting the download link.  

 

 

def get_mp3_path(url):
    response = requests.get(url, headers=headers)
    response.encoding = 'gbk'
    soup = BeautifulSoup(response.text, 'lxml')
    script_text = soup.select('script')[-1].string
    fileUrl_search = re.search('fileUrl= "(.*?)";', script_text, re.S)
    if fileUrl_search:
        return 'https://t3344.tingchina.com' + fileUrl_search.group(1)

download

Surprises are always sudden. Take this https://t3344.tingchina.com/xxxx.mp3 Put it into the browser and run 404.

It must be the lack of key parameters. Back to the above, Network carefully observed the mp3 url and found a key keyword behind the url. As shown in the figure below, the key is from   https://img.tingchina.com/play/h5_ jsonp.asp? zero point five zero seven eight five five six five six eight five six two seven nine five   You can use a regular expression to get the key.

 

def get_key(url):
    url = 'https://img.tingchina.com/play/h5_jsonp.asp?{}'.format(str(random.random()))
    headers['referer'] = url
    response = requests.get(url, headers=headers)
    matched = re.search('(key=.*?)";', response.text, re.S)
    if matched:
        temp = matched.group(1)
        return temp[len(temp)-42:]

  Last last last  __ main__   Concatenate the above codes.

if __name__ == "__main__":
    url = input("Please enter the address of the browser page:")
    dir,url_list = get_detail_urls()

    for item in url_list:
        audio_url = get_mp3_path(item['url'])
        key = get_key(item['url'])
        audio_url = audio_url + '?key=' + key
        headers['referer'] = item['url']
        r = requests.get(audio_url, headers=headers,stream=True)
        with open(os.path.join(dir, item['name']),'ab') as f:
            f.write(r.content)
            f.flush()

summary

This Python crawler is relatively simple. The traffic of 30 yuan per month is not enough. With this small program, you can listen to novels without traffic on the subway.

The most important thing to learn Python is mentality. We are bound to encounter many problems in the process of learning. We may not be able to solve them if we want to break our head. This is normal. Don't rush to deny yourself and doubt yourself. If you have difficulties in learning at the beginning and want to find a python learning and communication environment, you can join us[ python skirt ], receiving learning materials and discussing together will save a lot of time and reduce many problems.
 

Posted by vipul73 on Wed, 01 Dec 2021 02:57:18 -0800