Download files in Python

Keywords: Programming Python encoding Windows pip

Prerequisite

The requests module needs to be installed in advance:

  • pip install requests

Put it in code.

import requests

url = 'XXX' #File download source URL
filename = '' #New file name after downloading to local
r = requests.get(url)
with open(filename, "wb") as code:
  code.write(r.content)

combat exercise with ammunition

From target URL Bilingual reading of big movies. Interstellar crossing (with English audio and word search APP) Download the audio file.

Through browser background analysis, you can find the URL of the audio.

In fact, it should be easy to download to the local through the browser's own download function.

However, repeated attempts are not enough.

So, think, the same browser, the same request URL, why can't get the desired results?

Try entering the URL for the audio in the address bar.

Access Denied, You are denied by bucket referer policy.

After seeing the rejected information, do you think it is the reason of the request header?

The browser then looks up the request header for the audio URL in the background.

It is possible that when the downloader makes a request, it does not need to bring parameters such as the Referer and is refused the connection.

It's time to play Python's powerful power. When you request to download, you should bring parameters such as Referer on the request header.

headers={
	"Referer": "http://www.ecustpress.cn/erweima/player.html?blid=11089",
	"Accept-Encoding":"identity;q=1, *;q=0",
	"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
	"Range":"bytes=0-"
}

res = requests.get(url % (i + 1), headers=headers)

In addition, it is found that all audio URLs in the target URL have obvious rules: 1.mp3, 2.mp3, 3.mp3..., 37.mp3.

So, all in one go download them to the local.

To sum up, the final code:

import requests
import os
import time
import random

url = "http://hldqrcode1.oss-cn-shanghai.aliyuncs.com/wapaudio/56474/%d.mp3"

headers={
"Referer": "http://www.ecustpress.cn/erweima/player.html?blid=11089",
"Accept-Encoding":"identity;q=1, *;q=0",
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
"Range":"bytes=0-"
}

for i in range(37):
    
    fileName = ("0" if i + 1 < 10 else "") + ("%d.mp3" % (i + 1))
    
    if os.path.exists(fileName):
        continue

    print "Download %s" % fileName
    
    try:
        res = requests.get(url % (i + 1), headers=headers)
        with open(fileName, "wb") as code:
            code.write(res.content)
        time.sleep(10 * random.random())

    except Exception as err:
        print "Something wrong happens when downloading %s" % fileName
        print err

Code to success!

Reference material

  1. Three ways to download files in python

  2. Python - Crawler [Requests set request Headers]

Posted by realnsleo on Fri, 07 Feb 2020 08:06:43 -0800