python. No3

Keywords: Python Windows JSON iOS

New year's Day is coming. Let's not talk about it. I'd like to give you a new year's Eve

Recently, the IOS students next to me asked for leave to stay at home. Hey, hey, I was left alone in the corner shivering. Although the air conditioner of the company was turned on to 28 ℃, it still couldn't melt my cold heart

Here you may ask, are there only two people in your company?
Ha ha ha ha, there are many people, but I can't get into the conversation about Chanel and Ferrari.

Idle, a lonely Internet to see the girls dancing. In a twinkling of an eye, an hour's long time has been like the sand in my hand. It has been quietly lost, but loneliness, like a maggot of tarsal bone, still lingers around me stubbornly.
What shall I do? It's impossible to be lonely. Only by writing BUG, ahhh, and code can you maintain your life like this.

Today, let's talk about Python downloading videos
Downloading video is nothing more than file I/O.

Three steps for python to download the video file: get the video url, request the link to get the video stream, write it to the local file for saving
Write a demo first:

import requests
 
print("Start downloading")
url = 'https://vdept.bdstatic.com/65796c493957364e51574e65504b757a/474c6d4464633277/37715aae880a70427da12d5d682d3a534b3947470869ecc29530cc310a944c1ac684c3ab83e07d18f65784748120987a.mp4?auth_key=1578908333-0-0-a70ed19862afa01935265e5a32906867'
r = requests.get(url, stream=True)
 
with open('test.mp4', "wb") as mp4:
    for chunk in r.iter_content(chunk_size=1024 * 1024):
        if chunk:
            mp4.write(chunk)
 
print("Download end")
 

Click Run and wait for a while

Then click to open the saved mp4 file, you can enjoy watching the little sister dance.
(of course, this code also applies to most colored websites)

Oh, I'm a genius

Just as I was about to empty station B, I found that the video link of station B was either copied to 403, or different from ordinary people

I'm surprised. Something must be wrong at this time.
Copy this link for a visit. Oh, it's not so easy

Why is this? Is station B involved in the spy? I quickly opened Tencent video, iqiyi looked at it, and found that this situation existed.

Finally, we found that the blob in front of the link appeared a little bit. google made a circle and concluded that the Blob URL / Object URL is a pseudo protocol, allowing blob and File objects to be used as URL sources for images, downloading binary data links, etc. In production scenarios, the blob format is often used to process the m3u8 address of sliced video. In fact, it is not for encryption, because the browser will still parse the blob and get the corresponding m3u8 address. The advantage of using blob uri is that it can interfere with the crawler at a certain level.
In short, F12 can't get the real address of the video directly


There are three steps for python to download the video file. The first step is stuck. At this time, I found that a thing called you get seems to be very popular among netizens. I checked its git address with the idea of "Hello everyone is really good". Oh, there are many star s

Read the document of you get with a try and find it's simple and appalling
Step 1: install you get, command: PIP3 install you get. Under normal circumstances, it is already installed. In case of special circumstances, please refer to the official document: https://github.com/soimort/you-get/wiki/%E4%B8%AD%E6%96%87%E8%AF%B4%E6%98%8E

It's also very simple to use. Take the dancing video of the little sister in station B as an example:
Link: https://www.bilibilili.com/video/av82238119? Spm_id_from = 333.5. B e63655f73746172.3
The page looks like this:

To save this dance video, just type the command from the python command line:

you-get -o D:\python\video https://www.bilibili.com/video/av82238119?spm_id_from=333.5.b_64616e63655f73746172.3


Wait for the progress bar to finish, and you can find your little sister in the directory D:\python\video, even the barrage has not been released (the xml file saves the video barrage, and the other is the video file)

Of course, you can also save it in your hidden folder or learning materials directory

Of course, this semi-automatic way will not be able to meet my vigorous energy. In this way, what year and what month will I get if I empty station B.

Think about it. Write a python file. So the question is, how to call you get in Python file?
Example:

from you_get import common

common.any_download(url='https://www.bilibili.com/video/av82238119?spm_id_from=333.5.b_64616e63655f73746172.3',info_only=False,output_dir=r'D:\python\video',merge=True)

Let's have a trial run. Hehehehehehe, it's actually running. The next thing will be easy to do. For example, I want the video of little sister dancing in station B. the example steps are as follows:
Step 1: find the dance part url of station B
Part 2: get the page and parse the url of each video page
Part 3: get video with you get
I can only see that I have finished the first edition in one go with the momentum of running the clouds and running the water, fishing for the moon and the moon
Run it. Oh, nothing
After some debugging, it is found that the video list of station B is loaded asynchronously through the interface. You can see it when you open F12, or copy the path of Xpath, but you can't get the data when you directly request the page. The interface is as follows. The return data format is json, and the arcurl is the url of the video playing page we need

After understanding these, it's easy to do. First, try the house dance. Take down the dance on the first page of the video heat row and enjoy it. The code is as follows:

import requests
import random
import json
from you_get import common

#Common request agents
header_list = [
    {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"},
    {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3676.400 QQBrowser/10.4.3469.400"},
    {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"},
    {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362"}
]


# Main function
if __name__ == "__main__":

    base_url = 'https://s.search.bilibili.com/cate/search?main_ver=v3&search_type=video&view_type=hot_rank&order=click&copy_right=-1&cate_id=20&page=1&pagesize=20&jsonp=jsonp&time_from=20200107&time_to=20200114&_=1578992689153'
    
    headers = random.choice(header_list)
    # Get page information
    response = requests.get(url=base_url, headers=headers)
    # Get request status code
    code = response.status_code
    if code == 200:
        urls_json = json.loads(response.content)
        for i in urls_json['result']:
            print(i['arcurl'])
            common.any_download(url=i['arcurl'],info_only=False,output_dir=r'D:\python\video',merge=True)
        
    print('the end!')

Run the file and get the following results:

Open the save directory to see:

Yo, who can stand it

In line with the principle that one page of house dance can be done, it can be hollowed out. Analyze and analyze the interface rules, and simply package several methods:

import queue
import requests
import re
import random
import time
import os
import json
from you_get import common

#Common request agents
header_list = [
    {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"},
    {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3676.400 QQBrowser/10.4.3469.400"},
    {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"},
    {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362"}
]

#Get video page url
def get_urls(base_url,start_page_num,queue):
    while True:
        #Generate request agent information
        headers = random.choice(header_list)
        #Assembly interface url
        page_url = base_url + str(start_page_num)
        # Get page information
        response = requests.get(url=page_url, headers=headers)
        # Get request status code
        code = response.status_code
        if code == 200:
            urls_json = json.loads(response.content)
            if urls_json['result']:
                for i in urls_json['result']:
                    queue.put(i['arcurl'])
                    print(i['arcurl'])
            else:
                break
        start_page_num += 1
    print('Get url done!')
 
#Download and save the video 
def get_vedio(save_dir,queue):
    while not queue.empty():
        #Take a break. I don't know what will happen too soon
        #time_num = 5
        #time.sleep(time_num)
        # The get method of the Queue is used to extract elements from the Queue
        vedio_page_url = queue.get()
        queue.task_done()
        common.any_download(url=vedio_page_url,info_only=False,output_dir=save_dir,merge=True)

# Main function
if __name__ == "__main__":
    # Novel chapter base address
    base_url = 'https://s.search.bilibili.com/cate/search?main_ver=v3&search_type=video&view_type=hot_rank&order=click&copy_right=-1&cate_id=20&pagesize=20&jsonp=jsonp&time_from=20200107&time_to=20200114&_=1578992689153&page='
    #Starting from page
    start_page_num = 1
    #Please write the directory in quotation marks. The first r cannot be omitted
    save_dir = r'D:\python\video'

    # Using Queue to construct a FIFO Queue
    video_urls_queue = queue.Queue()
    #Get video page url
    get_urls(base_url,start_page_num,video_urls_queue)
    #Download and save the video
    get_vedio(save_dir,video_urls_queue)

    print('the end!')

Looking at his "learning file disk" is filled slowly, showing a dirty and handsome smile....
I thought to myself, it's just the same for the people in station B

Ha-ha

The end!

People's biggest rivals are often not others, but their own laziness.
Don't expect to be lucky. Luck can't always be on you. You need to be able to eat at any time.
You have to do your best to be qualified to say that you are unlucky.

71 original articles published, praised 164, visited 150000+
Private letter follow

Posted by sribala on Wed, 15 Jan 2020 01:11:47 -0800